Re: Solr 4.0 - disappointing results sharding on 1 machine
Depends on where the bottlenecks are I guess. On a single system, increasing shards decreases throughput (this isn't specific to Solr). The increased parallelism *can* decrease latency to the degree that the parts that were parallelized outweigh the overhead. Going from one shard to two shards is also the most extreme case since the unsharded case as no distributed overhead whatsoever. What's the average CPU load during your tests? How are you testing (i.e. how many requests are in progress at the same time?) In your unsharded case, what's taking up the bulk of the time? -Yonik http://lucidworks.com On Thu, Sep 20, 2012 at 9:39 AM, Tom Mortimer tom.m.f...@gmail.com wrote: Hi all, After reading http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-search-speed/ , I thought I'd do my own experiments. I used 2M docs from wikipedia, indexed in Solr 4.0 Beta on a standard EC2 large instance. I compared an unsharded and 2-shard configuration (the latter set up with SolrCloud following the http://wiki.apache.org/solr/SolrCloud example). I wrote a simple python script to randomly throw queries from a hand-compiled list at Solr. The only extra I had turned on was facets (on document category). To my surprise, the performance of the 2-shard configuration is almost exactly half that of the unsharded index - unsharded 4983912891 results in 24920 searches; 0 errors 70.02 mean qps 0.35s mean query time, 2.25s max, 0.00s min 90% of qtimes = 0.83s 99% of qtimes = 1.42s 99.9% of qtimes = 1.68s 2-shard 4990351660 results in 24501 searches; 0 errors 34.07 mean qps 0.66s mean query time, 694.20s max, 0.01s min 90% of qtimes = 1.19s 99% of qtimes = 2.12s 99.9% of qtimes = 2.95s All caches were set to 4096 items, and performance looks ok in both cases (hit ratios close to 1.0, 0 evictions). I gave the single VM -Xmx1G and each shard VM -Xmx500M. I must be doing something stupid - surely this result is unexpected? Does anybody have any thoughts where it might be going wrong? cheers, Tom
Re: Understanding fieldCache SUBREADER insanity
The other thing to realize is that it's only insanity if it's unexpected or not-by-design (so the term is rather mis-named). It's more for core developers - if you are just using Solr without custom plugins, don't worry about it. -Yonik http://lucidworks.com On Wed, Sep 19, 2012 at 3:27 PM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: Hi Aaron, here there is some information about the insanity count: http://wiki.apache.org/solr/SolrCaching#The_Lucene_FieldCache As for the SUBREADER type, the javadocs say: Indicates an overlap in cache usage on a given field in sub/super readers. This probably means that you are using the same field for faceting and for sorting (tf_normalizedTotalHotttnesss), sorting uses the segment level cache and faceting uses by default the global field cache. This can be a problem because the field is duplicated in cache, and then it uses twice the memory. One way to solve this would be to change the faceting method on that field to 'fcs', which uses segment level cache (but may be a little bit slower). Tomás On Wed, Sep 19, 2012 at 3:16 PM, Aaron Daubman daub...@gmail.com wrote: Hi all, In reviewing a solr instance with somewhat variable performance, I noticed that its fieldCache stats show an insanity_count of 1 with the insanity type SUBREADER: ---snip--- insanity_count : 1 insanity#0 : SUBREADER: Found caches for descendants of ReadOnlyDirectoryReader(segments_k _6h9(3.3):C17198463)+tf_normalizedTotalHotttnesss 'ReadOnlyDirectoryReader(segments_k _6h9(3.3):C17198463)'='tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=[F#1965982057 'ReadOnlyDirectoryReader(segments_k _6h9(3.3):C17198463)'='tf_normalizedTotalHotttnesss',float,null=[F#1965982057 'MMapIndexInput(path=/io01/p/solr/playlist/a/playlist/index/_6h9.frq)'='tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=[F#1308116426 ---snip--- How can I decipher what this means and what, if anything, I should do to fix/improve the insanity? Thanks, Aaron
Re: Nodes cannot recover and become unavailable
On Wed, Sep 19, 2012 at 4:25 PM, Mark Miller markrmil...@gmail.com wrote: bq. I believe there were some changes made to the clusterstate.json recently that are not backwards compatible. Indeed - I think yonik committed something the other day - we prob should send an email out about this. Yeah, I was just in the process of committing another change, updating CHANGES and sending a message. -Yonik http://lucidworks.com
Re: Understanding fieldCache SUBREADER insanity
already-optimized, single-segment index That part is interesting... if true, then the type of insanity you saw should be impossible, and either the insanity detection or something else is broken. -Yonik http://lucidworks.com
SolrCloud clusterstate.json layout changes
Folks, Some changes have been committed in the past few days related to SOLR-3815 as part of the groundwork for SOLR-3755 (shard splitting). The resulting clusterstate.json now looks like the following: {collection1:{ shard1:{ range:8000-, replicas:{Rogue:8983_solr_collection1:{ shard:shard1, roles:null, state:active, core:collection1, collection:collection1, node_name:Rogue:8983_solr, base_url:http://Rogue:8983/solr;, leader:true}}}, shard2:{ range:0-7fff, replicas:{ Note the addition of the replicas level to make room for other properties at the shard level such as range (which define what hash range belongs in what shard). Although range now exists, it is ignored by the current code (i.e. indexing still uses hash MOD nShards to place documents). -Yonik http://lucidworks.com
Re: SolrCloud clusterstate.json layout changes
On Wed, Sep 19, 2012 at 5:27 PM, Yonik Seeley yo...@lucidworks.com wrote: Folks, Some changes have been committed in the past few days related to SOLR-3815 as part of the groundwork for SOLR-3755 (shard splitting). The resulting clusterstate.json now looks like the following: {collection1:{ shard1:{ range:8000-, replicas:{Rogue:8983_solr_collection1:{ shard:shard1, roles:null, state:active, core:collection1, collection:collection1, node_name:Rogue:8983_solr, base_url:http://Rogue:8983/solr;, leader:true}}}, shard2:{ range:0-7fff, replicas:{ Note the addition of the replicas level to make room for other properties at the shard level such as range (which define what hash range belongs in what shard). Although range now exists, it is ignored by the current code (i.e. indexing still uses hash MOD nShards to place documents). Correction - MOD was just one of the earliest methods, not the previous method which split the hash range up equally between all shards, and should still be the same when we switch to paying attention to the ranges. -Yonik http://lucidworks.com
Re: SOLR memory usage jump in JVM
On Tue, Sep 18, 2012 at 7:45 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: I used GC in different situations and tried back and forth. Yes, it reduces the used heap memory, but not by 5GB. Even so that GC from jconsole (or jvisualvm) is Full GC. Whatever Full GC means ;-) In the past at least, I've found that I had to hit Full GC from jconsole many times in a row until heap usage stabilizes at it's lowest point. You could check fieldCache and fieldValueCache to see how many entries there are before and after the memory bump. If that doesn't show anything different, I guess you may need to resort to a heap dump before and after. But while you bring GC into this, there is another interesting thing. - I have one slave running for a week which ends up around 18 to 20GB of heap memory. - the slave goes offline for replication (no user queries on this slave) - the slave gets replicated and starts a new searcher - the heap memory of the slave is still around 11 to 12GB - then I initiate a Full GC from jconsole which brings it down to about 8GB - then I call optimize (on a optimized index) and it then drops to 6.5GB like a fresh started system I have already looked through Uwe's blog but he says ...As a rule of thumb: Don’t use more than 1/4 of your physical memory as heap space for Java running Lucene/Solr,... That would be on my server 8GB for JVM heap, can't believe that the system will run for longer than 10 minutes with 8GB heap. As you probably know, it depends hugely on the usecases/queries: some configurations would be fine with a small amount of heap, other configurations that facet and sort on tons of different fields would not be. -Yonik http://lucidworks.com
Re: FilterCache Memory consumption high
On Mon, Sep 17, 2012 at 3:44 PM, Mike Schultz mike.schu...@gmail.com wrote: So I'm figuring 3MB per entry. With CacheSize=512 I expect something like 1.5GB of RAM, but with the server in steady state after 1/2 hour, it is 7GB larger than without the cache. Heap size and memory use aren't quite the same thing. Try running jconsole (it comes with every JDK), attaching to the process, and then make it run multiple garbage collections to see what the heap shrinks down to. -Yonik http://lucidworks.com
Re: [Solr4 beta] error 503 on commit
On Tue, Sep 11, 2012 at 10:52 AM, Radim Kolar h...@filez.com wrote: After investigating more, here is the tomcat log herebelow. It is indeed the same problem: exceeded limit of maxWarmingSearchers=2,. could not be solr able to close oldest warming searcher and replace it by new one? That approach can easily lead to starvation (i.e. you never get a new searcher usable for queries). -Yonik http://lucidworks.com
Re: solr.StrField with stored=true useless or bad?
On Tue, Sep 11, 2012 at 7:03 PM, sy...@web.de wrote: The purpose of stored=true is to store the raw string data besides the analyzed/transformed data for displaying purposes. This is fine for an analyzed solr.TextField, but for an StrField both values are the same. So is there any reason to apply stored=true on a StrField as well? You're over-thinking things a bit ;-) if you want to search on it: index it If you want to return it in search results: store it Those are two orthogonal things (even for StrField). Why? Indexed means full-text inverted index: words (terms) point to documents. It's not easy/fast for a given document to find out what terms point to it. Stored fields are all stored together and can be retrieved together given a document id. Hence search finds lists of document ids (via indexed fields), and can then return any of the stored fields for those document ids. -Yonik http://lucidworks.com
Re: Unexpected results in Solr 4 Pivot Faceting
On Fri, Sep 7, 2012 at 9:39 AM, Erik Hatcher erik.hatc...@gmail.com wrote: A trie field probably doesn't work properly, as it indexes multiple terms per value and you'd get odd values. I don't know about pivot faceting, but all of the other types of faceting take this into account (hence faceting works fine on trie fields). -Yonik http://lucidworks.com
Re: Solr 4.0alpha: edismax complaints on certain characters
I believe this is caused by the regex support in https://issues.apache.org/jira/browse/LUCENE-2039 It certainly seems wrong to interpret a slash in the middle of the word as the start of a regex, so I've reopened the issue. -Yonik http://lucidworks.com On Thu, Sep 6, 2012 at 9:34 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: Hello, I was under the impression that edismax was supposed to be crash proof and just ignore bad syntax. But I am either misconfiguring it or hit a weird bug. I basically searched for text containing '/' and got this: { 'responseHeader'={ 'status'=400, 'QTime'=9, 'params'={ 'qf'='TitleEN DescEN', 'indent'='true', 'wt'='ruby', 'q'='foo/bar', 'defType'='edismax'}}, 'error'={ 'msg'='org.apache.lucene.queryparser.classic.ParseException: Cannot parse \'foo/bar \': Lexical error at line 1, column 9. Encountered: EOF after : /bar ', 'code'=400}} Is that normal? If it is, is there a known list of characters I need to escape or do I just have to catch the exception and tell user to not do this again? Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Re: UnInvertedField limitations
It's actually limited to 24 bits to point to the term list in a byte[], but there are 256 different arrays, so the maximum capacity is 4B bytes of un-inverted terms, but each bucket is limited to 4B/256 so the real limit can come in at a little less due to luck. From the comments: * There is a single int[maxDoc()] which either contains a pointer into a byte[] for * the termNumber lists, or directly contains the termNumber list if it fits in the 4 * bytes of an integer. If the first byte in the integer is 1, the next 3 bytes * are a pointer into a byte[] where the termNumber list starts. * * There are actually 256 byte arrays, to compensate for the fact that the pointers * into the byte arrays are only 3 bytes long. The correct byte array for a document * is a function of it's id. -Yonik http://lucidworks.com On Thu, Sep 6, 2012 at 6:33 PM, Fuad Efendi f...@efendi.ca wrote: Hi Jack, 24bit = 16M possibilities, it's clear; just to confirm... the rest is unclear, why 4-byte can have 4 million cardinality? I thought it is 4 billions... And, just to confirm: UnInvertedField allows 16M cardinality, correct? On 12-08-20 6:51 PM, Jack Krupansky j...@basetechnology.com wrote: It appears that there is a hard limit of 24-bits or 16M for the number of bytes to reference the terms in a single field of a single document. It takes 1, 2, 3, 4, or 5 bytes to reference a term. If it took 4 bytes, that would allow 16/4 or 4 million unique terms - per document. Do you have such large documents? This appears to be a hard limit based of 24-bytes in a Java int. You can try facet.method=enum, but that may be too slow. What release of Solr are you running? -- Jack Krupansky -Original Message- From: Fuad Efendi Sent: Monday, August 20, 2012 4:34 PM To: Solr-User@lucene.apache.org Subject: UnInvertedField limitations Hi All, I have a problemŠ (Yonik, please!) help me, what is Term count limits? I possibly have 256,000,000 different terms in a fieldŠ or 16,000,000? Thanks! 2012-08-20 16:20:19,262 ERROR [solr.core.SolrCore] - [pool-1-thread-1] - : org.apache.solr.common.SolrException: Too many values for UnInvertedField faceting on field enrich_keywords_string_mv at org.apache.solr.request.UnInvertedField.init(UnInvertedField.java:179) at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField .j ava:668) at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:326) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java :4 23) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:206) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.ja va :85) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHa nd ler.java:204) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBas e. java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561) -- Fuad Efendi http://www.tokenizer.ca
Re: Injest pauses
On Wed, Aug 29, 2012 at 11:58 AM, Voth, Brad (GE Corporate) brad.v...@ge.com wrote: Anyone know the actual status of SOLR-2565, it looks to be marked as resolved in 4.* but I am still seeing long pauses during commits using 4.* SOLR-2565 is definitely committed - adds are no longer blocked by commits (at least at the Solr level). -Yonik http://lucidworks.com
Re: Ordering of fields
In 4.0 you can use the def function with pseudo-fields (returning function results as doc field values) http://wiki.apache.org/solr/FunctionQuery#def fl=a,b,c:def(myfield,10) -Yonik http://lucidworks.com On Wed, Aug 29, 2012 at 2:39 PM, Rohit Harchandani rhar...@gmail.com wrote: Hi all, Is there a way to specify the order in which fields are returned by solr? Also, is it possible to make solr return a blank/default value for a field not present for a particular document, apart from giving a default value in the schema and having it indexed? Thanks, Rohit Harchandani
Re: Sort on dynamic field
On Thu, Aug 16, 2012 at 8:00 AM, Peter Kirk p...@alpha-solutions.dk wrote: Hi, a question about sorting and dynamic fields in Solr Specification Version: 3.6.0.2012.04.06.11.34.07. I have a field defined like dynamicField name=*_int type=int indexed=true stored=true multiValued=false/ Where type int is fieldType name=int class=solr.TrieIntField precisionStep=0 omitNorms=true positionIncrementGap=0/ Try adding sortMissingLast=true to this type. -Yonik http://lucidworks.com
Re: Tlog vs. buffer + softcommit.
On Fri, Aug 10, 2012 at 11:19 AM, Bing Hua bh...@cornell.edu wrote: Thanks for the information. It definitely helps a lot. There're numDeletesToKeep = 1000; numRecordsToKeep = 100; in UpdateLog so this should probably be what you're referring to. However when I was doing indexing the total size of TLogs kept on increasing. It doesn't sound like the case where there's a cap for number of documents? No, there is no cap. That's why the following is in solrconfig.xml: autoCommit maxTime15000/maxTime openSearcherfalse/openSearcher /autoCommit That causes a hard commit every 15 seconds w/o opening a new searcher (i.e. you still retain control over exactly when the searcher view changes if you want). Also for peersync, can I find some intro online? Nothing yet - but the idea is pretty simple... sync up with peers by getting recent updates if possible. If that fails, we get in sync by copying over a full index. -Yonik http://lucidworks.com
Re: Tuning caching of geofilt queries
On Fri, Aug 10, 2012 at 1:47 PM, David Smiley (@MITRE.org) dsmi...@mitre.org wrote: Information I've read vary on exactly what is the accuracy of float vs double but at a kilometer there's no question a double is overkill. Back of the envelope: 23 mantissa bits + 1 implied bit == 24 effective mantissa bits in a 32 bit float. 40,000 km circumference / (2^24) = .0024 km (i.e. our resolution at the equator is 2.4m at best - there will be some lost unused space at the beginning and end of the +-180 number-line). Is that in line with what you've read? -Yonik http://lucidworks.com
Re: Documentation on the new updateLog transaction log feature?
On Fri, Aug 10, 2012 at 2:31 PM, David Smiley (@MITRE.org) dsmi...@mitre.org wrote: Is there any documentation on the updateLog transaction log feature in Solr 4? Not much beyond what's in solrconfig.xml I started a quick prototype using Solr 4 alpha with a fairly structured schema; no big text. I disabled auto-commit which came pre-enabled and there's no soft-commit either. With CURL I posted a 1.8GB CSV file. AFter some time, I find this huge ~2.6GB transaction log file that didn't want to go away. FWIW A small number of records had errors, and maybe half of the records were duplicates of existing records in the file because of duplicated IDs. When I restarted Solr, Solr spent a long time reading from the transaction log before it was ready. But the file is still there; I manually deleted it. This isn't a great user experience for a feature I have no intention of using Simply comment out the following in solrconfig.xml updateLog str name=dir${solr.data.dir:}/str /updateLog (no Solr Cloud for this project, and no so-called realtime get which has always struck me as an odd feature). It's often pretty important for anyone using Solr as a NoSQL store. -Yonik http://lucidworks.com
Re: null:java.lang.RuntimeException: [was class java.net.SocketTimeoutException] null
On Thu, Aug 9, 2012 at 10:11 AM, Markus Jelsma markus.jel...@openindex.io wrote: I've increased the connection time out on all 10 Tomcats from 1000ms to 5000ms. Indexing a larger amount of batches seems to run fine now. This, however, does not really answer the issue. What is exactly timing out here and why? It can be any communication with tomcat for any reason. For example, a commit needs to flush and fsync all segments, applying buffered deletes, etc, then open a new searcher and run any configured warming queries or autowarming. That can take some time. It's even longer if you want to optimize. Or a long GC pause could cause a socket timeout. For the stock jetty server, we set it to 50,000ms, which still may be too short for some things frankly. Here's the jetty documentation for the parameter: maxIdleTime: Set the maximum Idle time for a connection, which roughly translates to the Socket.setSoTimeout(int) call, although with NIO implementations other mechanisms may be used to implement the timeout. The max idle time is applied: when waiting for a new request to be received on a connection; when reading the headers and content of a request; when writing the headers and content of a response. Jetty interprets this value as the maximum time between some progress being made on the connection. So if a single byte is read or written, then the timeout (if implemented by jetty) is reset. However, in many instances, the reading/writing is delegated to the JVM, and the semantic is more strictly enforced as the maximum time a single read/write operation can take. Note, that as Jetty supports writes of memory mapped file buffers, then a write may take many 10s of seconds for large content written to a slow device. -Yonik http://lucidimagination.com I assume its the forwarding of documents from the `indexing node` to the correct shard leader but with 512 maxThreads it should be fine. Any hints? Thanks -Original message- From:Markus Jelsma markus.jel...@openindex.io Sent: Wed 08-Aug-2012 00:10 To: solr-user@lucene.apache.org Subject: RE: null:java.lang.RuntimeException: [was class java.net.SocketTimeoutException] null Jack, There are no peculiarities in the JVM graphs. Only increase in used threads and GC time. Heap space is collected quickly and doesn't suddenly increase. There's only 256MB available for the heap but it's fine. Yonik, I'll increase the time out to five seconds tomorrow and try to reproduce it with a low batch size of 32. Juding from what i've seen it should throw an error quickly with such a low batch size. However, what is timing out here? My client connection to the indexing node or something else that i don't see? Unfortunately no Jetty here (yet). Thanks Markus -Original message- From:Yonik Seeley yo...@lucidimagination.com Sent: Tue 07-Aug-2012 23:54 To: solr-user@lucene.apache.org Subject: Re: null:java.lang.RuntimeException: [was class java.net.SocketTimeoutException] null Could this be just a simple case of a socket timeout? Can you raise the timout on request threads in Tomcat? It's a lot easier to reproduce/diagnose stuff like this when people use the stock jetty server shipped with Solr. -Yonik http://lucidimagination.com On Tue, Aug 7, 2012 at 5:39 PM, Markus Jelsma markus.jel...@openindex.io wrote: A signicant detail is the batch size which we set to 64 documents due to earlier memory limitations. We index segments of roughly 300-500k records each time. Lowering the batch size to 32 lead to an early internal server error and the stack trace below. Increasing it to 128 allowed us to index some more records but it still throws the error after 200k+ indexed records. Increasing it even more to 256 records per batch allowed us to index an entire segment without errors. Another detail is that we do not restart the cluster between indexing attempts so it seems that something only builds up during indexing (nothing seems to leak afterwards) and throws an error. Any hints? Thanks, Markus -Original message- From:Markus Jelsma markus.jel...@openindex.io Sent: Tue 07-Aug-2012 20:08 To: solr-user@lucene.apache.org Subject: null:java.lang.RuntimeException: [was class java.net.SocketTimeoutException] null Hello, We sometimes see the error below in our `master` when indexing. Our master is currently the node we send documents to - we've not yet implemented CloudSolrServer in Apache Nutch. This causes the indexer to crash when using Nutch locally, the task is retried when running on Hadoop. We're running it locally in this test set up so there's only one indexing thread. Anyway, for me it's quite a cryptic error because i don't know what connection has timed out, i assume a connection from the indexing node to some other node in the cluster
Re: Tlog vs. buffer + softcommit.
On Thu, Aug 9, 2012 at 5:39 PM, Bing Hua bh...@cornell.edu wrote: I'm a bit confused with the purpose of Transaction Logs (Update Logs) in Solr. My understanding is, update request comes in, first the new item is put in RAM buffer as well as T-Log. After a soft commit happens, the new item becomes searchable but not hard committed in stable storage. Configuring soft commit interval to 1 sec achieves NRT. Then what exactly T-Log is doing in this scenario? It serves realtime-get... when even 1 second isn't acceptable (i.e. you need to be guaranteed of getting the latest version of a document): http://searchhub.org/dev/2011/09/07/realtime-get/ and also allows for a peer to ask give me the list of the last update events you know about. You can also kill -9 the server and solr will automatically recover from the log. what circumstances is it being cleared? A new log file is created every time a hard commit is done, and old log files are removed if newer log files contain enough entries to satisfy the needs of what I call peersync in solr cloud (currently ~100 updates IIRC). It's cleared after a hard commit and after there are enough entries in other log files to satisfy the lookback requirements of what I call peersync in SolrCloud (currently ~100 updates IIRC). -Yonik http://lucidimagination.com
Re: Recovery problem in solrcloud
Stack trace looks normal - it's just a multi-term query instantiating a bitset. The memory is being taken up somewhere else. How many documents are in your index? Can you get a heap dump or use some other memory profiler to see what's taking up the space? if I stop query more then ten minutes, the solr instance will start normally. Maybe queries are piling up in threads before the server is ready to handle them and then trying to handle them all at once gives an OOM? Is this live traffic or a test? How many concurrent requests get sent? -Yonik http://lucidimagination.com On Wed, Aug 8, 2012 at 2:43 AM, Jam Luo cooljam2...@gmail.com wrote: Aug 06, 2012 10:05:55 AM org.apache.solr.common.SolrException log SEVERE: null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:456) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:284) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:499) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111) at org.eclipse.jetty.server.Server.handle(Server.java:351) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534) at java.lang.Thread.run(Thread.java:722) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.util.FixedBitSet.init(FixedBitSet.java:54) at org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(MultiTermQueryWrapperFilter.java:104) at org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(ConstantScoreQuery.java:129) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:318) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:507) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1394) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1269) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:384) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:420) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:204) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1544) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) at
Re: Syntax for parameter substitution in function queries?
On Tue, Aug 7, 2012 at 3:01 PM, Timothy Hill timothy.d.h...@gmail.com wrote: Hello, all ... According to http://wiki.apache.org/solr/FunctionQuery/#What_is_a_Function.3F, it is possible under Solr 4.0 to perform parameter substitutions within function queries. However, I can't get the syntax provided in the documentation there to work *at all* with Solr 4.0 out of the box: the only location at which function queries can be specified, it seems, is in the 'fl' parameter. And attempts at parameter substitutions here fail. Using (haphazardly guessed) syntax like select?q=*:*fl=*, test_id:if(exists(employee), employee_id, socialsecurity_id), boost_id:sum($test_id, 10)wt=xml results in the following error Error parsing fieldname: Missing param test_id while parsing function 'sum($test_id, 10)' test_id needs to be an actual request parameter. This worked for me on the example data: http://localhost:8983/solr/query?q=*:*fl=*,%20test_id:if(exists(price),id,name),%20boost_id:sum($param,10)param=price -Yonik http://lucidimagination.com
Re: null:java.lang.RuntimeException: [was class java.net.SocketTimeoutException] null
Could this be just a simple case of a socket timeout? Can you raise the timout on request threads in Tomcat? It's a lot easier to reproduce/diagnose stuff like this when people use the stock jetty server shipped with Solr. -Yonik http://lucidimagination.com On Tue, Aug 7, 2012 at 5:39 PM, Markus Jelsma markus.jel...@openindex.io wrote: A signicant detail is the batch size which we set to 64 documents due to earlier memory limitations. We index segments of roughly 300-500k records each time. Lowering the batch size to 32 lead to an early internal server error and the stack trace below. Increasing it to 128 allowed us to index some more records but it still throws the error after 200k+ indexed records. Increasing it even more to 256 records per batch allowed us to index an entire segment without errors. Another detail is that we do not restart the cluster between indexing attempts so it seems that something only builds up during indexing (nothing seems to leak afterwards) and throws an error. Any hints? Thanks, Markus -Original message- From:Markus Jelsma markus.jel...@openindex.io Sent: Tue 07-Aug-2012 20:08 To: solr-user@lucene.apache.org Subject: null:java.lang.RuntimeException: [was class java.net.SocketTimeoutException] null Hello, We sometimes see the error below in our `master` when indexing. Our master is currently the node we send documents to - we've not yet implemented CloudSolrServer in Apache Nutch. This causes the indexer to crash when using Nutch locally, the task is retried when running on Hadoop. We're running it locally in this test set up so there's only one indexing thread. Anyway, for me it's quite a cryptic error because i don't know what connection has timed out, i assume a connection from the indexing node to some other node in the cluster when it passes a document to the correct leader? Each node of the 10 node cluster has the same configuration, Tomcat is configured with maxThreads=512 and a time out of one second. We're using today's trunk in this test set up and we cannot reliably reproduce the error. We've seen the error before so it's not a very recent issue. No errors are found in the other node's logs. 2012-08-07 17:52:05,260 ERROR [solr.servlet.SolrDispatchFilter] - [http-8080-exec-6] - : null:java.lang.RuntimeException: [was class java.net.SocketTimeoutException] null at com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18) at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731) at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:376) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:229) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:157) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:454) at io.openindex.solr.servlet.HttpResponseSolrDispatchFilter.doFilter(HttpResponseSolrDispatchFilter.java:219) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.SocketTimeoutException
Re: Urgent: Facetable but not Searchable Field
On Wed, Aug 1, 2012 at 7:58 AM, jayakeerthi s mail2keer...@gmail.com wrote: We have a requirement, where we need to implement 2 fields as Facetable, but the values of the fields should not be Searchable. The user fields uf feature of the edismax parser may work for you: http://wiki.apache.org/solr/ExtendedDisMax#uf_.28User_Fields.29 -Yonik http://lucidimagination.com
Re: [Announce] Solr 4.0-ALPHA with RankingAlgorithm 1.4.4 with Realtime NRT available for download
On Tue, Jul 24, 2012 at 8:24 AM, Nagendra Nagarajayya nnagaraja...@transaxtions.com wrote: SolrIndexSearcher is a heavy object with caches, etc. As I've said, the caches are configurable, and it's trivial to disable all caching (to the point where the cache objects are not even created). The reader member is not replaced in the existing SolrIndexSearcher object. The IndexSearcher.getIndexReader() method has been overriden in SolrIndexSearcher and all direct reader member access has been replaced with a getIndexReader() method call allowing a NRT reader to be supplied when realtime is enabled. In a single Solr request (that runs through multiple components like query, highlight, facet, and response writing), does IndexSearcher.getIndexReader() always return the same reader? If not, this breaks pretty much every standard solr component - but it will only be apparent under load, and if you are carefully sanity checking the results. -Yonik http://lucidimagination.com
Re: [Announce] Solr 4.0-ALPHA with RankingAlgorithm 1.4.4 with Realtime NRT available for download
On Mon, Jul 23, 2012 at 11:37 AM, Nagendra Nagarajayya nnagaraja...@transaxtions.com wrote: Realtime NRT algorithm enables NRT functionality in Solr by not closing the Searcher object and so is very fast. I am in the process of contributing the algorithm back to Apache Solr as a patch. Since you're in the process of contributing this back, perhaps you could explain your approach - it never made sense to me. Replacing the reader in an existing SolrIndexSearcher as you do means that all the related caches will be invalid (meaning you can't use solr's caches). You could just ensure that there is no auto-warming set up for Solr's caches (which is now the default), or you could disable caching altogether. It's not clear what you're comparing against when you claim it's faster. There are also consistency and concurrency issues with replacing the reader in an existing SolrIndexSearcher, which is supposed to have a static view of the index. If a reader replacement happens in the middle of a request, it's bound to cause trouble, including returning the wrong documents! -Yonik http://lucidimagination.com
Re: SOLR 4 Alpha Out Of Mem Err
I think what makes the most sense is to limit the number of connections to another host. A host only has so many CPU resources, and beyond a certain point throughput would start to suffer anyway (and then only make the problem worse). It also makes sense in that a client could generate documents faster than we can index them (either for a short period of time, or on average) and having flow control to prevent unlimited buffering (which is essentially what this is) makes sense. Nick - when you switched to HttpSolrServer, things worked because this added an explicit flow control mechanism. A single request (i.e. an add with one or more documents) is fully indexed to all endpoints before the response is returned. Hence if you have 10 indexing threads and are adding documents in batches of 100, there can be only 1000 documents buffered in the system at any one time. -Yonik http://lucidimagination.com
Re: Error 404 on every request
On Tue, Jul 17, 2012 at 6:01 AM, Nils Abegg nils.ab...@ffuf.de wrote: I have installed the 4.0 Alpha with the build-in Jetty Server on Ubuntu Server 12.04…i followed this tutorial to set it up: http://kingstonlabs.blogspot.de/2012/06/installing-solr-36-on-ubuntu-1204.html Instead of trying to install Solr, I'd suggest just starting with the stock server included with the binary distribution. If you have Java in your path, you just do: cd example java -jar start.jar -Yonik http://lucidimagination.com
Re: Computed fields - can I put a function in fl?
On Mon, Jul 16, 2012 at 4:43 AM, maurizio1976 maurizio.picc...@gmail.com wrote: Yes, sorry Just a typo. I meant q=*:*fq=start=0rows=10qt=wt=explainOther=fl=product:(if(show_product:true, product, ) thanks Functions normally derive their values from the fieldCache... there isn't currently a function to load stored fields (e.g. your product field), but it's not a bad idea (given this usecase). Here's an example with the exampledocs that shows IN_STOCK_PRICE only if the item is in stock, and otherwise shows 0. This works because price is a single-valued indexed field that the fieldCache works on. http://localhost:8983/solr/query? q=*:* fl=id, inStock, IN_STOCK_PRICE:if(inStock,price,0) -Yonik http://lucidimagination.com
Re: SOLR 4 Alpha Out Of Mem Err
Do you have the following hard autoCommit in your config (as the stock server does)? autoCommit maxTime15000/maxTime openSearcherfalse/openSearcher /autoCommit This is now fairly important since Solr now tracks information on every uncommitted document added. At some point we should probably hardcode some mechanism based on number of documents or time. -Yonik http://lucidimagination.com
Re: SOLR 4 Alpha Out Of Mem Err
On Sun, Jul 15, 2012 at 11:52 AM, Nick Koton nick.ko...@gmail.com wrote: Do you have the following hard autoCommit in your config (as the stock server does)? autoCommit maxTime15000/maxTime openSearcherfalse/openSearcher /autoCommit I have tried with and without that setting. When I described running with auto commit, that setting is what I mean. OK cool. You should be able to run the stock server (i.e. with this autocommit) and blast in updates all day long - it looks like you have more than enough memory. If you can't, we need to fix something. You shouldn't need explicit commits unless you want the docs to be searchable at that point. Solrj multi-threaded client sends several 1,000 docs/sec Can you expand on that? How many threads at once are sending docs to solr? Is each request a single doc or multiple? -Yonik http://lucidimagination.com
Re: SOLR 4 Alpha Out Of Mem Err
On Sun, Jul 15, 2012 at 12:52 PM, Jack Krupansky j...@basetechnology.com wrote: Maybe your rate of update is so high that the commit never gets a chance to run. I don't believe that is possible. If it is, it should be fixed. -Yonik http://lucidimagination.com
Re: Is it possible to alias a facet field?
On Sat, Jul 14, 2012 at 10:12 AM, Jamie Johnson jej2...@gmail.com wrote: So this got me close facet.field=testfieldfacet.field=%7B!key=mylabel%7Dtestfieldf.mylabel.limit=1 but the limit on the alias didn't seem to work. Is this expected? Per-field params don't currently look under the alias. I believe there's a JIRA open for this. -Yonik http://lucidimagination.com
Re: Updating documents
On Fri, Jul 13, 2012 at 1:41 PM, Jonatan Fournier jonatan.fourn...@gmail.com wrote: On Fri, Jul 13, 2012 at 12:57 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier jonatan.fourn...@gmail.com wrote: Is there a flag for: if document does not exist, create it for me? Not currently, but it certainly makes sense. The implementation should be easy. The most difficult part is figuring out the best syntax to specify this. Another idea: we could possibly switch to create-if-not-exist by default, and use the existing optimistic concurrency mechanism to specify that the document should exist. So specify _version_=1 if the document should exist and _version_=0 (the default) if you don't care. Yes that would be neat! I've just committed this change. One more question related to partial document update. So far I'm able to append to multivalue fields, set new value to regular/multivalue fields. One thing I didn't find is the remove command, what is its JSON syntax? Set it to the JSON value of null. -Yonik http://lucidimagination.com
Re: Updating documents
On Fri, Jul 13, 2012 at 3:50 PM, Jonatan Fournier jonatan.fourn...@gmail.com wrote: On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier jonatan.fourn...@gmail.com wrote: But later on when I want to append cat3 to the field by doing this: mv_f:{add:cat3}, ... I end up with something like this in the index: mv_f:[{add=cat3}], Obviously something is wrong with my syntax ;) Are you using a custom update processor chain? The DistributedUpdateProcessor currently contains the logic for optimistic concurrency and updates. If you're not already, try some test commands with the stock server. If you are already using the stock server, then perhaps you're not sending what you think you are to Solr? -Yonik http://lucidimagination.com
Re: Updating documents
On Thu, Jul 12, 2012 at 12:38 PM, Jonatan Fournier jonatan.fourn...@gmail.com wrote: On Thu, Jul 12, 2012 at 11:05 AM, Erick Erickson The partial documents update that Jonatan references also requires that all the fields be stored. If my only fields with stored=false are copyField (e.g. I don't need their content to rebuild the document), are they gonna be re-copied with the partial document update? Correct - your setup should be fine. Only original source fields (non copyField targets) should have stored=true -Yonik http://lucidimagination.com
Re: Updating documents
On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier jonatan.fourn...@gmail.com wrote: Is there a flag for: if document does not exist, create it for me? Not currently, but it certainly makes sense. The implementation should be easy. The most difficult part is figuring out the best syntax to specify this. Another idea: we could possibly switch to create-if-not-exist by default, and use the existing optimistic concurrency mechanism to specify that the document should exist. So specify _version_=1 if the document should exist and _version_=0 (the default) if you don't care. -Yonik http://lucidimagination.com
Re: Solr 4.0 Alpha taking lot of CPU
On Wed, Jul 11, 2012 at 8:11 PM, Pavitar Singh psi...@sprinklr.com wrote: We upgraded to Solr 4.0 Alpha and our CPU usage shot off to 400%.In profiling we are getting following trace. That could either be good or bad. Higher CPU can mean higher concurrency. Have you benchmarked your indexing performance? Example: going from 60 minutes for indexing and 200% average CPU usage to 30 minutes at 400% CPU would generally be considered a good thing. -Yonik http://lucidimagination.com - *100.0%* - *java.lang*.Thread.runhttps://rpm.newrelic.com/accounts/132291/applications/834717/profiles/1266# - *42 Collapsed methods (show)https://rpm.newrelic.com/accounts/132291/applications/834717/profiles/1266# * - *98.0%* - *org.apache.lucene.index* .DocumentsWriter.updateDocumenthttps://rpm.newrelic.com/accounts/132291/applications/834717/profiles/1266# - *77.0%* - *org.apache.lucene.index* .DocumentsWriterPerThread.updateDocumenthttps://rpm.newrelic.com/accounts/132291/applications/834717/profiles/1266# - *76.0%* - *org.apache.lucene.index* .DocFieldProcessor.processDocumenthttps://rpm.newrelic.com/accounts/132291/applications/834717/profiles/1266# - *76.0%* - *org.apache.lucene.index* .DocInverterPerField.processFieldshttps://rpm.newrelic.com/accounts/132291/applications/834717/profiles/1266# - *36.0%* - *org.apache.lucene.analysis.miscellaneous* .TrimFilter.incrementTokenhttps://rpm.newrelic.com/accounts/132291/applications/834717/profiles/1266# - *35.0%* - *org.apache.lucene.analysis.core* .LowerCaseFilter.incrementTokenhttps://rpm.newrelic.com/accounts/132291/applications/834717/profiles/1266# - *17.0%* - *org.apache.lucene.analysis.ngram* .NGramTokenFilter.incrementTokenhttps://rpm.newrelic.com/accounts/132291/applications/834717/profiles/1266# - *9.2%* - *org.apache.lucene.util* .AttributeSource.clearAttributeshttps://rpm.newrelic.com/accounts/132291/applications/834717/profiles/1266# - *34.0%* - *org.apache.lucene.index* .TermsHashPerField.addhttps://rpm.newrelic.com/accounts/132291/applications/834717/profiles/1266# - *12.0%* - *org.apache.lucene.index* .FreqProxTermsWriterPerField.addTermhttps://rpm.newrelic.com/accounts/132291/applications/834717/profiles/1266# - *11.0%* - *org.apache.lucene.index* .FreqProxTermsWriterPerField.writeProxhttps://rpm.newrelic.com/accounts/132291/applications/834717/profiles/1266# - *6.4%* - *org.apache.lucene.index* .TermsHashPerField.writeVInthttps://rpm.newrelic.com/accounts/132291/applications/834717/profiles/1266# - *11.0%* - * org.apache.lucene.analysis.tokenattributes* .CharTermAttributeImpl.fillBytesRefhttps://rpm.newrelic.com/accounts/132291/applications/834717/profiles/1266# - *15.0%* - *org.apache.lucene.index* .DocumentsWriterFlushControl.obtainAndLockhttps://rpm.newrelic.com/accounts/132291/applications/834717/profiles/1266#
Re: Nrt and caching
On Sat, Jul 7, 2012 at 9:59 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Currently the caches are stored per-multiple-segments, meaning after each 'soft' commit, the cache(s) will be purged. Depends which caches. Some caches are per-segment, and some caches are top level. It's also a trade-off... for some things, per-segment data structures would indeed turn around quicker on a reopen, but every query would be slower for it. -Yonik http://lucidimagination.com
Re: deleteById commitWithin question
On Thu, Jul 5, 2012 at 4:29 PM, Jamie Johnson jej2...@gmail.com wrote: I am running off of a snapshot taken 5/3/2012 of solr 4.0 and am noticing some issues around deleteById when a commitWithin parameter is included using SolrJ, specifically commit isn't executed. If I later just call commit on the solr instance I see the item is deleted though. Is anyone aware if this should work in that snapshot? I thought I remembered something like this... but looking at the commit log for DUH2, I don't see it. /opt/code/lusolr4$ svn log ./solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java | less r1357332 | yonik | 2012-07-04 12:23:09 -0400 (Wed, 04 Jul 2012) | 1 line log DBQ reordering events r1356858 | markrmiller | 2012-07-03 14:18:48 -0400 (Tue, 03 Jul 2012) | 1 line SOLR-3587: After reloading a SolrCore, the original Analyzer is still used rather than a new one r1356845 | yonik | 2012-07-03 13:47:56 -0400 (Tue, 03 Jul 2012) | 1 line SOLR-3559: DBQ reorder support r1355088 | sarowe | 2012-06-28 13:51:38 -0400 (Thu, 28 Jun 2012) | 1 line LUCENE-4172: clean up redundant throws clauses (merge from trunk) r1348984 | hossman | 2012-06-11 15:46:14 -0400 (Mon, 11 Jun 2012) | 1 line LUCENE-3949: fix license headers to not be javadoc style comments r1343813 | rmuir | 2012-05-29 12:16:38 -0400 (Tue, 29 May 2012) | 1 line create stable branch for 4.x releases r1328890 | yonik | 2012-04-22 11:01:55 -0400 (Sun, 22 Apr 2012) | 1 line SOLR-3392: fix search leak when openSearcher=false r1328883 | yonik | 2012-04-22 09:58:00 -0400 (Sun, 22 Apr 2012) | 1 line SOLR-3391: Make explicit commits cancel pending autocommits. I'll try out trunk quick and see if it currently works. -Yonik http://lucidimagination.com
Re: SolrCloud cache warming issues
On Tue, Jun 26, 2012 at 6:53 AM, Markus Jelsma markus.jel...@openindex.io wrote: Why would the documentCache not be populated via firstSearcher warming queries with a non-zero value for rows? Solr streams documents (the stored fields) returned to the user (so very large result sets can be supported w/o having the whole thing in memory). A warming query finds the document ids matching a query, but does not send them anywhere (and the stored fields aren't needed for anything else), hence the stored fields are never loaded. -Yonik http://lucidimagination.com
Re: SolrCloud cache warming issues
On Wed, Jun 27, 2012 at 12:23 PM, Erik Hatcher erik.hatc...@gmail.com wrote: On Jun 27, 2012, at 12:01 , Yonik Seeley wrote: On Tue, Jun 26, 2012 at 6:53 AM, Markus Jelsma markus.jel...@openindex.io wrote: Why would the documentCache not be populated via firstSearcher warming queries with a non-zero value for rows? Solr streams documents (the stored fields) returned to the user (so very large result sets can be supported w/o having the whole thing in memory). A warming query finds the document ids matching a query, but does not send them anywhere (and the stored fields aren't needed for anything else), hence the stored fields are never loaded. But if highlighting were enabled on those warming queries, it'd fill in the document cache, right? Correct. -Yonik http://lucidimagination.com
Re: Trying to avoid filtering on score, as I'm told that's bad
On Wed, Jun 27, 2012 at 6:50 PM, mcb thestreet...@gmail.com wrote: I have a function query that returns miles as a score along two points: q={!func}sub(sum(geodist(OriginCoordinates,39,-105),geodist(DestinationCoordinates,36,-97),Mileage),1000) The issue that I'm having now now my results give me a list of scores: *score:10.1 (mi) score: 20 (mi) score: 75 (mi) * But I would like to also add a clause that cuts off the results after X miles (say 50) so that 75 above would not be included in the results. Unfortunately I can't say fq=score:[0 TO 50], but perhaps there is another way? I'm on solr 4.0 If you want to cut off the whole function at 75, then frange can do that: q={!frange u=75}sub(sum(... http://lucene.apache.org/solr/api/org/apache/solr/search/FunctionRangeQParserPlugin.html -Yonik http://lucidimagination.com
Re: How to update one field without losing the others?
Atomic update is a very new feature coming in 4.0 (i.e. grab a recent nightly build to try it out). It's not documented yet, but here's the JIRA issue: https://issues.apache.org/jira/browse/SOLR-139?focusedCommentId=13269007page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13269007 -Yonik http://lucidimagination.com
Re: [Announce] Solr 3.6 with RankingAlgorithm 1.4.2 - NRT support
On Sun, May 27, 2012 at 11:57 AM, Radim Kolar h...@filez.com wrote: but i see RankingAlgorithm has fantastic results too and looking at its reference page it even powers sites like oracle.com and ebay.com. What reference page are you referring to? -Yonik http://lucidimagination.com
Re: [Announce] Solr 3.6 with RankingAlgorithm 1.4.2 - NRT support
On Sun, May 27, 2012 at 12:42 PM, Radim Kolar h...@filez.com wrote: What reference page are you referring to? http://tgels.com/wiki/en/Sites_using/downloaded_RankingAlgorithm_or_Solr-RA Ah, ok sites using/downloaded So someone with a .oracle email / domain checked it out - that certainly doesn't mean they are in production with it, or even plan to be. -Yonik http://lucidimagination.com
Re: What is the docs number in Solr explain query results for fieldnorm?
On Fri, May 25, 2012 at 2:13 PM, Tom Burton-West tburt...@umich.edu wrote: The explain (debugQuery) shows the following for fieldnorm: 0.625 = fieldNorm(field=ocr, doc=16624) What does the doc=16624 mean? It's the internal document id (i.e. it's debugging info and doesn't affect scoring) -Yonik http://lucidimagination.com
Re: How many doc/doc in the XML source file before indexing?
On Thu, May 24, 2012 at 7:29 AM, Michael Kuhlmann k...@solarier.de wrote: However, I doubt it. I've not been too deeply into the UpdateHandler yet, but I think it first needs to parse the complete XML file before it starts to index. Solr's update handlers all stream (XML, JSON, CSV), reading and indexing a document at a time from the input. -Yonik http://lucidimagination.com
Re: Update JSON not working for me
On Wed, May 16, 2012 at 1:43 PM, rjain15 rjai...@gmail.com wrote: http://localhost:8983/solr/select?q=title:monsterswt=jsonindent=true Try switching title:monsters to name:monsters https://issues.apache.org/jira/browse/SOLR-2598 Looks like the data was changed to use the name field instead and the docs were never updated (big downside to our non-versioned docs). -Yonik http://lucidimagination.com
Re: Update JSON not working for me
On Wed, May 16, 2012 at 2:36 PM, rjain15 rjai...@gmail.com wrote: No. Changing to name:monsters didn't work OK, but you'll have to do that if you get the other part working. Here is my guess, the UpdateJSON is not adding any new documents to the existing index. If that's true, the most likely culprit is your curl on windows (or the windows shell). You mentioned removing the single quotes in the curl command? Perhaps try replacing all those with double quotes. C:\Tools\Solr\apache-solr-4.0-2012-05-15_08-20-37\example\exampledocsC:\tools\curl\curl http://localhost:8983/solr/update?commit=true --data-binary @books.json -H Content-type:application/json I'd really recommend installing cygwin if you know any unix at all... not required, but will make your life much easier. -Yonik http://lucidimagination.com
Re: Update JSON not working for me
On Wed, May 16, 2012 at 4:10 PM, rjain15 rjai...@gmail.com wrote: Hi Firstly, apologies for the long post, I changed the quote to double quote (and sometimes it is messy copying from DOS windows) Here is the command and the output on the Jetty Server Window. I am highlighting some important pieces, I have enabled the LOG LEVEL to DEBUG on the JETTY window. C:\Tools\Solr\apache-solr-4.0-2012-05-15_08-20-37\example\exampledocsC:\tools\curl\curl http://localhost:8983/solr/update?commit=true; --data-binary @books.js on -H 'Content-type:application/json' May 16, 2012 4:05:49 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [collection1] webapp=/solr path=/update params={commit=true[ { id : 978-0641723445, There ya go - what should be the body of the post is in fact used as a very large parameter name. I get this behavior when I leave off the -H 'Content-type:application/json' when trying this on UNIX. This means that your content-type is not being set correctly by your curl command. Did you try changing those single quotes to double quotes at the end? C:\Tools\Solr\apache-solr-4.0-2012-05-15_08-20-37\example\exampledocsC:\tools\curl\curl http://localhost:8983/solr/update?commit=true; --data-binary @books.js -H Content-type:application/json -Yonik http://lucidimagination.com
Re: Problems with field names in solr functions
In trunk, see: * SOLR-2335: New 'field(...)' function syntax for refering to complex field names (containing whitespace or special characters) in functions. The schema in trunk also specifies: !-- field names should consist of alphanumeric or underscore characters only and not start with a digit. This is not currently strictly enforced, but other field names will not have first class support from all components and back compatibility is not guaranteed. -- -Yonik http://lucidimagination.com On Thu, May 10, 2012 at 11:28 AM, Iker Huerga iker.hue...@gmail.com wrote: Hi all, I am having problems when sorting solr documents using solr functions due to the field names. Imagine we want to sort the solr documents based on the sum of the scores of the matching fields. These field are created as follows dynamicField name=foo/bar-* type=float indexed=true stored=true/ The idea is that these fields store float values as in this example *field name=foo/bar-1234 50.45/field* The examples below illustrate the issue This query - http://URL/solr/select/?q=(*foo/bar-1234*:*)+AND+( http://184.73.38.213:8080/solr/select/?q=(EMMeT/Concept-5348008:*)+AND+(EMMeT/Concept-5347854:*)version=2.2start=0rows=10indent=onsort=sum(EMMeT/Concept-5348008,EMMeT/Concept-5347854)+descwt=json *foo/bar*http://184.73.38.213:8080/solr/select/?q=(EMMeT/Concept-5348008:*)+AND+(EMMeT/Concept-5347854:*)version=2.2start=0rows=10indent=onsort=sum(EMMeT/Concept-5348008,EMMeT/Concept-5347854)+descwt=json *-2345*:*)version=2.2start=0rows=10indent=onsort=sum( *foo/bar-1234*http://184.73.38.213:8080/solr/select/?q=(EMMeT/Concept-5348008:*)+AND+(EMMeT/Concept-5347854:*)version=2.2start=0rows=10indent=onsort=sum(EMMeT/Concept-5348008,EMMeT/Concept-5347854)+descwt=json , http://184.73.38.213:8080/solr/select/?q=(EMMeT/Concept-5348008:*)+AND+(EMMeT/Concept-5347854:*)version=2.2start=0rows=10indent=onsort=sum(EMMeT/Concept-5348008,EMMeT/Concept-5347854)+descwt=json *foo/bar*http://184.73.38.213:8080/solr/select/?q=(EMMeT/Concept-5348008:*)+AND+(EMMeT/Concept-5347854:*)version=2.2start=0rows=10indent=onsort=sum(EMMeT/Concept-5348008,EMMeT/Concept-5347854)+descwt=json *-2345* )+descwt=json it gives me the following exception * * *The request sent by the client was syntactically incorrect (sort param could not be parsed as a query, and is not a field that exists in the index: sum(foo/bar-1234,foo/bar-2345)).* Whereas if I rename the field removing the / and - the following query will work - http://URL/solr/select/?q=(*bar1234*:*)+AND+(*bar2345*:*)version=2.2start=0rows=10indent=onsort=sum( http://184.73.38.213:8080/solr/select/?q=(Concept5348008:*)+AND+(Concept5347854:*)version=2.2start=0rows=10indent=onsort=sum(Concept5348008,Concept5347854)+descwt=json *bar1234*http://184.73.38.213:8080/solr/select/?q=(Concept5348008:*)+AND+(Concept5347854:*)version=2.2start=0rows=10indent=onsort=sum(Concept5348008,Concept5347854)+descwt=json , *bar2345*http://184.73.38.213:8080/solr/select/?q=(Concept5348008:*)+AND+(Concept5347854:*)version=2.2start=0rows=10indent=onsort=sum(Concept5348008,Concept5347854)+descwt=json )+descwt=json response:{numFound:2,start:0,docs:[ { primaryDescRes:DescRes2, *bar1234*http://184.73.38.213:8080/solr/select/?q=(Concept5348008:*)+AND+(Concept5347854:*)version=2.2start=0rows=10indent=onsort=sum(Concept5348008,Concept5347854)+descwt=json :45.54, *bar2345*http://184.73.38.213:8080/solr/select/?q=(Concept5348008:*)+AND+(Concept5347854:*)version=2.2start=0rows=10indent=onsort=sum(Concept5348008,Concept5347854)+descwt=json :100.0}, { primaryDescRes:DescRes1, *bar1234*http://184.73.38.213:8080/solr/select/?q=(Concept5348008:*)+AND+(Concept5347854:*)version=2.2start=0rows=10indent=onsort=sum(Concept5348008,Concept5347854)+descwt=json :100.5, *bar2345*http://184.73.38.213:8080/solr/select/?q=(Concept5348008:*)+AND+(Concept5347854:*)version=2.2start=0rows=10indent=onsort=sum(Concept5348008,Concept5347854)+descwt=json :25.22}] }} I tried escaping the character as indicated in solr documentation [1], i.e. foo%2Fbar-12345 instead of foo/bar-12345, without success Could this be caused by the query parser? I would be extremely grateful if you could let me know any workaround for this Best Iker [1] http://wiki.apache.org/solr/SolrQuerySyntax#NOTE:_URL_Escaping_Special_Characters -- Iker Huerga http://www.ikerhuerga.com/
Re: Update JSON not working for me
I think this may be due to https://issues.apache.org/jira/browse/SOLR-2857 JIRA is down right now so I can't check, but I thought the intent was to have some back compat. Try changing the URL from /update/json to just /update in the meantime -Yonik http://lucidimagination.com On Mon, May 14, 2012 at 2:42 PM, Rajesh Jain rjai...@gmail.com wrote: Hi Jack I am following the http://wiki.apache.org/solr/UpdateJSON tutorials. The first example is of books.json, which I executed, but I dont see any books http://localhost:8983/solr/collection1/browse?q=cat%3Dbooks 0 results found in 26 ms Page 0 of 0 I modified the books.json to add my own book, but still no result. The money.xml works, so I converted the money.xml to money.json and added an extra currency. I don't see the new currency. My question is, how do I know if the UpdateJSON action was valid, if I don't see them in the http://localhost:8983/solr/collection1/browse?q=cat%3Dbooks Is there a way to find what is happening - maybe through log files? I am new to Solr, please help Thanks Rajesh On Mon, May 14, 2012 at 2:33 PM, Jack Krupansky j...@basetechnology.comwrote: Check the examples of update/json here: http://wiki.apache.org/solr/**UpdateJSONhttp://wiki.apache.org/solr/UpdateJSON In your case, either leave out the add level or add a doc level below it. For example: curl http://localhost:8983/solr/**update/jsonhttp://localhost:8983/solr/update/json-H 'Content-type:application/ **json' -d ' { add: {doc: {id : TestDoc1, title : test1} }, add: {doc: {id : TestDoc2, title : another test} } }' -- Jack Krupansky -Original Message- From: Rajesh Jain Sent: Monday, May 14, 2012 1:27 PM To: solr-user@lucene.apache.org Cc: Rajesh Jain Subject: Update JSON not working for me Hi, I am using the 4.x version of Solr, and following the UpdateJSON Solr Wiki 1. When I try to update using : curl 'http://localhost:8983/solr/**update/json?commit=truehttp://localhost:8983/solr/update/json?commit=true ' --data-binary @books.json -H 'Content-type:application/**json' I don't see any Category as Books in Velocity based Solr Browser the http://localhost:8983/solr/**collection1/browse/http://localhost:8983/solr/collection1/browse/ ? I see the following message on the startup window when I run this command C:\Tools\Solr\apache-solr-4.0-**2012-05-04_08-23-31\example\** exampledocsC:\tools\curl\curl http://localhost:8983/solr/**update/json?commit=truehttp://localhost:8983/solr/update/json?commit=true--data-binary @books .json -H 'Content-type:application/**json' { responseHeader:{ status:0, QTime:47}} 2. I wrote my own JSON file where I added an extra add directive My JSON File [ { add:{ id : MXN, cat : [currency], name : One Peso, inStock : true, price_c : 1,MXN, manu : 384, manu_id_s : Bank Mexico, features:Coins and notes } } ] I still don't see the addition in the existing Currency Categories. Please let me know if the UPDATEJSON works in 4.x or is this only for 3.6? Thanks Rajesh
Re: Update JSON not working for me
On Mon, May 14, 2012 at 3:11 PM, Rajesh Jain rjai...@gmail.com wrote: Hi Yonik i tried without the json in the URL, the result was same but in XML format Interesting... the XML response is fine (just not ideal). When I tried it, I did get a JSON response (perhaps I'm running a later version of trunk... the unified update handler is very new) $ curl 'http://localhost:8983/solr/update?commit=true' --data-binary @books.json -H 'Content-type:application/json' {responseHeader:{status:0,QTime:133}} -Yonik http://lucidimagination.com C:\Tools\Solr\apache-solr-4.0-2012-05-04_08-23-31\example\exampledocsC:\tools\curl\curl http://localhost:8983/solr/update?commit=true --data-binary @money.json -H 'Content-type:application/json' ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime45/in /lst /response On Mon, May 14, 2012 at 2:58 PM, Yonik Seeley yo...@lucidimagination.com wrote: I think this may be due to https://issues.apache.org/jira/browse/SOLR-2857 JIRA is down right now so I can't check, but I thought the intent was to have some back compat. Try changing the URL from /update/json to just /update in the meantime -Yonik http://lucidimagination.com On Mon, May 14, 2012 at 2:42 PM, Rajesh Jain rjai...@gmail.com wrote: Hi Jack I am following the http://wiki.apache.org/solr/UpdateJSON tutorials. The first example is of books.json, which I executed, but I dont see any books http://localhost:8983/solr/collection1/browse?q=cat%3Dbooks 0 results found in 26 ms Page 0 of 0 I modified the books.json to add my own book, but still no result. The money.xml works, so I converted the money.xml to money.json and added an extra currency. I don't see the new currency. My question is, how do I know if the UpdateJSON action was valid, if I don't see them in the http://localhost:8983/solr/collection1/browse?q=cat%3Dbooks Is there a way to find what is happening - maybe through log files? I am new to Solr, please help Thanks Rajesh On Mon, May 14, 2012 at 2:33 PM, Jack Krupansky j...@basetechnology.comwrote: Check the examples of update/json here: http://wiki.apache.org/solr/**UpdateJSONhttp://wiki.apache.org/solr/UpdateJSON In your case, either leave out the add level or add a doc level below it. For example: curl http://localhost:8983/solr/**update/jsonhttp://localhost:8983/solr/update/json-H 'Content-type:application/ **json' -d ' { add: {doc: {id : TestDoc1, title : test1} }, add: {doc: {id : TestDoc2, title : another test} } }' -- Jack Krupansky -Original Message- From: Rajesh Jain Sent: Monday, May 14, 2012 1:27 PM To: solr-user@lucene.apache.org Cc: Rajesh Jain Subject: Update JSON not working for me Hi, I am using the 4.x version of Solr, and following the UpdateJSON Solr Wiki 1. When I try to update using : curl 'http://localhost:8983/solr/**update/json?commit=truehttp://localhost:8983/solr/update/json?commit=true ' --data-binary @books.json -H 'Content-type:application/**json' I don't see any Category as Books in Velocity based Solr Browser the http://localhost:8983/solr/**collection1/browse/http://localhost:8983/solr/collection1/browse/ ? I see the following message on the startup window when I run this command C:\Tools\Solr\apache-solr-4.0-**2012-05-04_08-23-31\example\** exampledocsC:\tools\curl\curl http://localhost:8983/solr/**update/json?commit=truehttp://localhost:8983/solr/update/json?commit=true--data-binary @books .json -H 'Content-type:application/**json' { responseHeader:{ status:0, QTime:47}} 2. I wrote my own JSON file where I added an extra add directive My JSON File [ { add:{ id : MXN, cat : [currency], name : One Peso, inStock : true, price_c : 1,MXN, manu : 384, manu_id_s : Bank Mexico, features:Coins and notes } } ] I still don't see the addition in the existing Currency Categories. Please let me know if the UPDATEJSON works in 4.x or is this only for 3.6? Thanks Rajesh
Re: 1MB file to Zookeeper
On Sat, May 5, 2012 at 8:39 AM, Jan Høydahl jan@cominvent.com wrote: support for CouchDb, Voldemort or whatever. Hmmm... Or Solr! -Yonik
Re: 1MB file to Zookeeper
On Fri, May 4, 2012 at 12:50 PM, Mark Miller markrmil...@gmail.com wrote: And how should we detect if data is compressed when reading from ZooKeeper? I was thinking we could somehow use file extensions? eg synonyms.txt.gzip - then you can use different compression algs depending on the ext, etc. We would want to try and make it as transparent as possible though... At first I thought about adding a marker to the beginning of a file, but file extensions could work too, as long as the resource loader made it transparent (i.e. code would just need to ask for synonyms.txt, but the resource loader would search for synonyms.txt.gzip, etc, if the original name was not found) Hmmm, but this breaks down for things like watches - I guess that's where putting the encoding inside the file would be a better option. -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: solr: how to change display name of a facet?
On Thu, May 3, 2012 at 2:26 PM, okayndc bodymo...@gmail.com wrote: [...] I've experimented with this: str name=facet.field{!ex=dt key=Categories and Stuff}category/str I'm not really sure what 'ex=dt' does but it's obvious that 'key' is the desired display name? If there are spaces in the 'key' value, the display name gets cut off. What am I doing wrong? http://wiki.apache.org/solr/LocalParams For a non-simple parameter value, enclose it in single quotes ex excludes filters tagged with a value. See http://wiki.apache.org/solr/SimpleFacetParameters#Multi-Select_Faceting_and_LocalParams -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: access document by primary key
On Thu, May 3, 2012 at 3:01 PM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: Is this still true? Assuming that I know that there hasn't been updates or that I don't care to see a different version of the document, are the term QP or the raw QP faster than the real-time get handler? Sort of different things... query parsers only parse queries, not execute them. If you're looking for documents by ID though, the realtime-get hander should be the fastest, esp in a distributed setup. -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: NPE when faceting
Darn... looks likely that it's another bug from when part of UnInvertedField was refactored into Lucene. We really need some random tests that can catch bugs like these though - I'll see if I can reproduce. Can you open a JIRA issue for this? -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10 On Tue, May 1, 2012 at 4:51 PM, Jamie Johnson jej2...@gmail.com wrote: I had reported this issue a while back, hoping that it was something with my environment, but that doesn't seem to be the case. I am getting the following stack trace on certain facet queries. Previously when I did an optimize the error went away, does anyone have any insight into why specifically this could be happening? May 1, 2012 8:48:52 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at org.apache.lucene.index.DocTermOrds.lookupTerm(DocTermOrds.java:807) at org.apache.solr.request.UnInvertedField.getTermValue(UnInvertedField.java:636) at org.apache.solr.request.UnInvertedField.getCounts(UnInvertedField.java:411) at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:300) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:396) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:205) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:81) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:204) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1550) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111) at org.eclipse.jetty.server.Server.handle(Server.java:351) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534) at java.lang.Thread.run(Thread.java:662)
Re: commit fail
On Sat, Apr 28, 2012 at 7:02 AM, mav.p...@holidaylettings.co.uk mav.p...@holidaylettings.co.uk wrote: Hi, This is what the thread dump looks like. Any ideas? Looks like the thread taking up CPU is in LukeRequestHandler 1062730578@qtp-1535043768-5' Id=16, RUNNABLE on lock=, total cpu time=16156160.ms user time=16153110.msat org.apache.solr.handler.admin.LukeRequestHandler.getIndexedFieldsInfo(LukeR equestHandler.java:320) That probably accounts for the 1 CPU doing things... but it's not clear at all why commits are failing. Perhaps the commit is succeeding, but the client is just not waiting long enough for it to complete? -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: Recovery - too many updates received since start
On Tue, Apr 24, 2012 at 9:31 AM, Trym R. Møller t...@sigmat.dk wrote: Hi I experience that a Solr looses its connection with Zookeeper and re-establish it. After Solr is reconnection to Zookeeper it begins to recover. It has been missing the connection approximately 10 seconds and meanwhile the leader slice has received some documents (maybe about 1000 documents). Solr fails to update peer sync with the log message: Apr 21, 2012 10:13:40 AM org.apache.solr.update.PeerSync sync WARNING: PeerSync: core=mycollection_slice21_shard1 url=zk-1:2181,zk-2:2181,zk-3:2181 too many updates received since start - startingUpdates no longer overlaps with our currentUpdates Looking into PeerSync and UpdateLog I can see that 100 updates is the maximum allowed updates that a shard can be behind. Is it correct that this is not configurable and what is the reasons for choosing 100? I suspect that one must compare the work needed to replicate the full index with the performance loss/resource usage when enhancing the size of the UpdateLog? The peersync messages don't stream, so we need to limit how many docs will be in memory at once. If someone makes that streamable, I'd be more comfortable making the limit configurable. -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: commit stops
On Fri, Apr 27, 2012 at 9:18 AM, mav.p...@holidaylettings.co.uk mav.p...@holidaylettings.co.uk wrote: We have an index of about 3.5gb which seems to work fine until it suddenly stops accepting new commits. Users can still search on the front end but nothing new can be committed and it always times out on commit. Any ideas? Perhaps the commit happens to cause a major merge which may take a long time (and solr isn't going to allow overlapping commits). How long does a commit request take to time out? What Solr version is this? Do you have any kind of auto-commit set up? How often are you manually committing? -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: commit fail
On Fri, Apr 27, 2012 at 8:23 PM, mav.p...@holidaylettings.co.uk mav.p...@holidaylettings.co.uk wrote: Hi again, This is the only log entry I can find, regarding the failed commits… Still timing out as far as the client is concerned and there is actually nothing happening on the server in terms of load (staging environment). 1 CPU core seems busy constantly with solr but unsure what is happening. You can get a thread dump to see what the various threads are doing (use the solr admin, or kill -3). Sounds like it could just be either merging in progress or a commit in progress. -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: embedded solr populating field of type LatLonType
On Tue, Apr 24, 2012 at 4:05 PM, Jason Cunning jcunn...@ucar.edu wrote: My question is, what is the AppropriateJavaType for populating a solr field of type LatLonType? A String with both the lat and lon separated by a comma. Example: 12.34,56.78 -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: Title Boosting and IDF
On Wed, Apr 25, 2012 at 9:24 PM, Walter Underwood wun...@wunderwood.org wrote: Interestingly, I worked at two different web search companies with two different completely different search engines, and one arrived at an 8X title boost and the other at a 7.5X title boost. So I consider 8X a universal physical constant. Great info! Do you know if that 8x was after (i.e. already included) length normalization? -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
searcher leak on trunk after 2/1/2012
Folks, If you're using a trunk version after 2/1/2012 in conjunction with the shipped solrconfig.xml (which uses openSearcher=false in an autoCommit by default), then you should upgrade to a new version. There's a searcher leak when openSearcher=false is used with a commit that leads to files not being closed. This was just fixed in https://issues.apache.org/jira/browse/SOLR-3392 so if you're looking to use nightly builds, you will need one from Apr 23 or later. -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: # open files with SolrCloud
I can reproduce some kind of searcher leak issue here, even w/o SolrCloud, and I've opened https://issues.apache.org/jira/browse/SOLR-3392 -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: Solr Hanging
On Thu, Apr 19, 2012 at 4:25 AM, Trym R. Møller t...@sigmat.dk wrote: Hi I am using Solr trunk and have 7 Solr instances running with 28 leaders and 28 replicas for a single collection. After indexing a while (a couple of days) the solrs start hanging and doing a thread dump on the jvm I see blocked threads like the following: Thread 2369: (state = BLOCKED) - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise) - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=158 (Compiled frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() @bci=42, line=1987 (Compiled frame) - java.util.concurrent.LinkedBlockingQueue.take() @bci=29, line=399 (Compiled frame) - java.util.concurrent.ExecutorCompletionService.take() @bci=4, line=164 (Compiled frame) - org.apache.solr.update.SolrCmdDistributor.checkResponses(boolean) @bci=27, line=350 (Compiled frame) - org.apache.solr.update.SolrCmdDistributor.finish() @bci=18, line=98 (Compiled frame) - org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish() @bci=4, line=299 (Compiled frame) - org.apache.solr.update.processor.DistributedUpdateProcessor.finish() @bci=1, line=817 (Compiled frame) ... - org.mortbay.thread.QueuedThreadPool$PoolThread.run() @bci=25, line=582 (Interpreted frame) I read the stack trace as my indexing client has indexed a document and this Solr is now waiting for the replica? to respond before returning an answer to the client. Correct. What's the full stack trace like on both a leader and replica? We need to know what the replica is blocking on. What version of trunk are you using? -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: Distributed FacetComponent NullPointer Exception
facet.field={!terms=$organization__terms}organization This is referring to another request parameter that Solr should have added (organization__terms) . Did you cut-n-paste all of the parameters below? -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10 On Tue, Apr 17, 2012 at 10:13 AM, Jamie Johnson jej2...@gmail.com wrote: I'm noticing that this issue seems to be occurring with facet fields which have some unexpected characters. For instance the query that I see going across the wire is as follows facet=truetie=0.1ids=3F2504E0-4F89-11D3-9A0C-0305E82C3301qf=%0a++author^0.5+type^0.5+content_mvtxt^10++subject_phonetic^1+subject_txt^20%0a+++q.alt=*:*distrib=falseTest+%0a%0a%0a%0a?+%0a%0a%0a%0aDaily+News,Test+Association,Toyota,U.S.,Washington+Postrows=10rows=10NOW=1334670761188shard.url=JamiesMac.local:8502/solr/shard5-core1/fl=*,scoreq=bobfacet.field={!terms%3D$organization__terms}organizationisShard=true Now there is an obvious issue here with our data having these \n characters in it which I will be fixing shortly (plan to use a set of Character replace filters to remove extra white space). I am assuming that this is causing our issue, but would be nice if someone could confirm. On Tue, Apr 17, 2012 at 12:08 AM, Jamie Johnson jej2...@gmail.com wrote: I created to track this. https://issues.apache.org/jira/browse/SOLR-3362 On Mon, Apr 16, 2012 at 11:18 PM, Jamie Johnson jej2...@gmail.com wrote: doing some debugging this is the relevant block in FacetComponent String name = shardCounts.getName(j); long count = ((Number)shardCounts.getVal(j)).longValue(); ShardFacetCount sfc = dff.counts.get(name); sfc.count += count; the issue is sfc is null. I don't know if that should or should not occur, but if I add a check (if(sfc == null)continue;) then I think it would work. Is this appropriate? On Mon, Apr 16, 2012 at 10:45 PM, Jamie Johnson jej2...@gmail.com wrote: worth notingthe error goes away at times depending on the number of facets asked for. On Mon, Apr 16, 2012 at 10:38 PM, Jamie Johnson jej2...@gmail.com wrote: I found (what appears to be) the issue I am experiencing here http://lucene.472066.n3.nabble.com/NullPointerException-with-distributed-facets-td3528165.html but there were no responses to it. I've included the stack trace I am seeing, any ideas why this would happen? SEVERE: java.lang.NullPointerException at org.apache.solr.handler.component.FacetComponent.refineFacets(FacetComponent.java:489) at org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:278) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:307) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1550) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111) at org.eclipse.jetty.server.Server.handle(Server.java:351) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:890) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:944) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:634) at
Re: Problem with faceting on a boolean field
On Tue, Apr 17, 2012 at 2:22 PM, Kissue Kissue kissue...@gmail.com wrote: Hi, I am faceting on a boolean field called usedItem. There are a total of 607601 items in the index and they all have value for usedItem set to false. However when i do a search for *:* and faceting on usedItem, the num found is set correctly to 607601 but i get the facet result below: lst name=usedItemint name=false17971/int/lst You can verify by changing the query from *:* to usedItem:false (or adding an additional fq to that effect). -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: Changing precisionStep without a re-index
On Mon, Apr 16, 2012 at 12:12 PM, Michael Ryan mr...@moreover.com wrote: Is it safe to change the precisionStep for a TrieField without doing a re-index? Not really - it changes what tokens are indexed for them numbers and range queries won't work correctly. Sorting (FieldCache), function queries, etc, would still work, and exact match queries would still work. -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10 Specifically, I want to change a field from this: fieldType name=tlong class=solr.TrieLongField precisionStep=8 omitNorms=true positionIncrementGap=0/ to this: fieldType name=long class=solr.TrieLongField precisionStep=0 omitNorms=true positionIncrementGap=0/ By safe, I mean that searches will return the correct results, a FieldCache on the field will still work, clowns won't eat me... -Michael
Re: DeleteByQuery using xml commands in SolrCloud
On Mon, Apr 16, 2012 at 4:13 PM, Jamie Johnson jej2...@gmail.com wrote: I tried to execute the following on my cluster, but it had no results. Should this work? curl http://host:port/solr/collection1/update/?commit=true -H Contenet-Type: text/xml --data-binary 'deletequery*:*/query/delete' Is this a cut-n-paste of what you actually sent? If so, Content-Type is misspelled (but I'm not sure if that's the issue) -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: Can Solr solve this simple problem?
2012/4/16 Tomás Fernández Löbbe tomasflo...@gmail.com: I'm wondering if Solr is the best tool for this kind of usage. Solr is a text search engine Well, Lucene is a full-text search library, but Solr has always been far more. Dating back to it's first use in CNET, it was used as a browse engine (faceted search), sometimes without much of a full-text aspect at all. And we're moving more and more into the NoSQL realm (durability, realtime-get, and coming real soon - optimistic locking). -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: solr 3.5 taking long to index
On Thu, Apr 12, 2012 at 10:42 PM, Rohit ro...@in-rev.com wrote: The machine has a total ram of around 46GB. My Biggest concern is Solr index time gradually increasing and then the commit stops because of timeouts, out commit rate is very high, but I am not able to find the root cause of the issue. The difference you're seeing between 3.1 and 3.5 may be due to a bug in the former where fsync was not being called: https://issues.apache.org/jira/browse/LUCENE-3418 We commit every 5000 documents If you are doing bulk indexing, wait until the end to commit. Upcoming Solr4 has near realtime (soft commit) support to make doing frequent commits (for the purposes of visibility) less expensive. -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: It's hard to google on _val_
On Sun, Apr 15, 2012 at 11:34 AM, Benson Margulies bimargul...@gmail.com wrote: So, I've been experimenting to learn how the _val_ participates in scores. It seems to me that http://wiki.apache.org/solr/FunctionQuery should explain the *effect* of including an _val_ term in an ordinary query, starting with a constant. It's simply added to the score as any other clause in a boolean query would be. Positive values of _val_ did lead to positive increments in the score, but clearly not by simple addition. That's just because Lucene normalizes scores. By default, this is really just multiplying scores by a magic constant (that by default is the inverse of the sum of squared weights) and doesn't change relative orderings of docs. If you add debugQuery=true and look at the scoring explanations, you'll see that queryNorm component. If you want to go down the rabbit hole on trunk, see IndexSearcher.createNormalizedWeight() -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: It's hard to google on _val_
On Sun, Apr 15, 2012 at 12:14 PM, Yonik Seeley yo...@lucidimagination.com wrote: That's just because Lucene normalizes scores. By default, this is really just multiplying scores by a magic constant (that by default is the inverse of the sum of squared weights) Sorry... I missed the square root. Should be inverse of the square root of the sum of squared weights. See DefaultSimilarity.queryNorm: public float queryNorm(float sumOfSquaredWeights) { return (float)(1.0 / Math.sqrt(sumOfSquaredWeights)); } -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: performance impact using string or float when querying ranges
On Fri, Apr 13, 2012 at 8:11 AM, Erick Erickson erickerick...@gmail.com wrote: Well, I guess my first question is whether using stirngs is fast enough, in which case there's little reason to make your life more complex. But yes, range queries will be significantly faster with any of the Trie types than with strings. To elaborate on this point a bit... range queries on strings will be the same speed as a numeric field with precisionStep=0. You need a precisionStep 0 (so the number will be indexed in multiple parts) to speed up range queries on numeric fields. (See int vs tint in the solr schema). -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10 Trie types are all numeric types. Best Erick On Fri, Apr 13, 2012 at 3:49 AM, crive marco.cr...@gmail.com wrote: Hi All, is there a big difference in terms of performances when querying a range like [50.0 TO *] on a string field compared to a float field? At the moment I am using a dynamic field of type string to map some values coming from our database and their type can vary depending on the context (float/integer/string); it easier to use a dynamic field other than having to create a bespoke field for each type of value. Marco
Re: solr 3.4 with nTiers = 2: usage of ids param causes NullPointerException (NPE)
On Wed, Apr 11, 2012 at 8:16 AM, Dmitry Kan dmitry@gmail.com wrote: We have a system with nTiers, that is: Solr front base --- Solr front -- shards Although the architecture had this in mind (multi-tier), all of the pieces are not yet in place to allow it. The errors you see are a direct result of that. -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: I've broken delete in SolrCloud and I'm a bit clueless as to how
On Thu, Apr 12, 2012 at 2:21 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Please see the documentation: http://wiki.apache.org/solr/SolrCloud#Required_Config : : schema.xml : : You must have a _version_ field defined: : : field name=_version_ type=long indexed=true stored=true/ Seems like this is the kind of thing that should make Solr fail hard and fast on SolrCore init if it sees you are running in cloud mode and yet it doesn't find this -- similar to how some other features fail hard and fast if you don't have uniqueKey. Off the top of my head: _version_ is needed for solr cloud where a leader forwards updates to replicas, unless you're handing update distribution yourself or providing pre-built shards. _version_ is needed for realtime-get and optimistic locking We should document for sure... but at this point it's not clear what we should enforce. (not saying we shouldn't enforce anything... just that I haven't really thought about it) -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: SOLR 4 autocommit - is it working as I think it should?
On Wed, Apr 11, 2012 at 12:58 PM, vybe3142 vybe3...@gmail.com wrote: This morning, I've been looking at the autocommit functionality as defined in solrconfig.xml. By default, it appears that it should kick in 15 seconds after a new document has been added. I do see this event triggered via the SOLR/tomcat logs, but can't see the docs/terms in the index or query them. I haven't bothered with the softcommit yet as I'd like to first understand what the issue is wrt the autocommit. The 15 second hard autocommit is not for the purpose of update visibility, but for durability (hence the hard autocommit uses openSearcher=false). It simply makes sure that recent changes are flushed to disk. If you want to automatically see changes after some period of time, use an additional soft autocommit for that (and leave the hard autocommit exactly as configured), or use commitWithin when you do an update... that's more flexible and allows you to specify latency on a per-update basis. -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: SOLR issue - too many search queries
On Tue, Apr 10, 2012 at 8:51 AM, arunssasidhar arunssasid...@gmail.com wrote: We have a PHP web application which is using SOLR for searching. The APP is using CURL to connect to the SOLR server and which run in a loop with thousands of predefined keywords. That will create thousands of different search quires to SOLR at a given time. Thousands of concurrent queries? That's normally not a useful metric unless you have a very strange application. You normally want to look at the following: - throughput (queries per second) - latency (how long the queries take - average, 90%, 95%, etc) -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: SolrCloud replica and leader out of Sync somehow
On Thu, Apr 5, 2012 at 12:19 AM, Jamie Johnson jej2...@gmail.com wrote: Not sure if this got lost in the shuffle, were there any thoughts on this? Sorting by id could be pretty expensive (memory-wise), so I don't think it should be default or anything. We also need a way for a client to hit the same set of servers again anyway (to handle other possible variations like commit time). To handle the tiebreak stuff, you could also sort by _version_ - that should be unique in an index and is already used under the covers and hence shouldn't add any extra memory overhead. versions increase over time, so _version desc should give you newer documents first. -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10 On Wed, Mar 21, 2012 at 11:02 AM, Jamie Johnson jej2...@gmail.com wrote: Given that in a distributed environment the docids are not guaranteed to be the same across shards should the sorting use the uniqueId field as the tie breaker by default? On Tue, Mar 20, 2012 at 2:10 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Tue, Mar 20, 2012 at 2:02 PM, Jamie Johnson jej2...@gmail.com wrote: I'll try to dig for the JIRA. Also I'm assuming this could happen on any sort, not just score correct? Meaning if we sorted by a date field and there were duplicates in that date field order wouldn't be guaranteed for the same reasons right? Correct - internal docid is the tiebreaker for all sorts. -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: Evaluating Solr
On Wed, Apr 4, 2012 at 12:46 PM, Joseph Werner telco...@gmail.com wrote: For more routine changes, are record updates supported without the necessitity to rebuilt an index? For example if a description field for an item needs be changed, am I correct in reading that the recodrd need only be resubmitted? Correct. -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: solrcloud is deleteByQuery stored in transactions and forwarded like other operations?
On Wed, Apr 4, 2012 at 3:04 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks Mark. The delete by query is a very rare operation for us and I really don't have the liberty to update to current trunk right now. Do you happen to know about when the fix was made so I can see if we are before or after that time? Not difinitive, but a grep of svn log in solr/core shows: r1295665 | yonik | 2012-03-01 11:41:54 -0500 (Thu, 01 Mar 2012) | 1 line cloud: fix distributed deadlock w/ deleteByQuery r1243773 | yonik | 2012-02-13 22:00:22 -0500 (Mon, 13 Feb 2012) | 1 line dbq: fix param rename r1243768 | yonik | 2012-02-13 21:45:41 -0500 (Mon, 13 Feb 2012) | 1 line solrcloud: send deleteByQuery to all shard leaders to version and forward to replicas -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: Incremantally updating a VERY LARGE field - Is this possibe ?
On Wed, Apr 4, 2012 at 3:14 PM, vybe3142 vybe3...@gmail.com wrote: Updating a single field is not possible in solr. The whole record has to be rewritten. Unfortunate. Lucene allows it. I think you're mistaken - the same limitations apply to Lucene. -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: How do I use localparams/joins using SolrJ and/or the Admin GUI
On Sat, Mar 31, 2012 at 11:50 AM, Erick Erickson erickerick...@gmail.com wrote: Try escaping the '+' with %2B (as I remember). Shouldn't that be the other way? The admin UI should do any necessary escaping, so those + chars should instead be a spaces? -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: SOLR hangs - update timeout - please help
On Thu, Mar 29, 2012 at 4:24 AM, Lance Norskog goks...@gmail.com wrote: 5-7 seconds- there's the problem. If you want to have documents visible for search within that time, you want to use the trunk and near-real-time search. A hard commit does several hard writes to the disk (with the fsync() system call). It does not run smoothly at that rate. It is no surprise that eventually you hit a thread-locking bug. Are you speaking of a JVM bug, or something else? A Lucene bug? A Solr bug? Rafal, do you have a thread dump of when the update hangs (as opposed to at shutdown?) -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: SOLR hangs - update timeout - please help
On Thu, Mar 29, 2012 at 1:50 PM, Rafal Gwizdala rafal.gwizd...@gmail.com wrote: Below i'm pasting the thread dump taken when the update was hung (it's also attached to the first message of this topic) Interesting... It looks like there's only one thread in solr code (the one generating the thread dump). The stack trace looks like you switched Jetty to use the NIO connector perhaps? Could you try with the Jetty shipped with Solr (exactly as configured)? -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: SOLR hangs - update timeout - please help
Oops... my previous replies accidentally went off-list. I'll cut-n-paste below. OK, so it looks like there is probably no bug here - it's simply that commits can sometimes take a long time and updates were blocked during that time (and would have succeeded eventually except the jetty timeout was not set long enough). Things are better in trunk (4.0) with soft commits and updates that can proceed concurrently with commits. -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10 On Thu, Mar 29, 2012 at 3:11 PM, Rafal Gwizdala rafal.gwizd...@gmail.com wrote: You're right, this is not default Jetty from Solr - I configured it from scratch and then added Solr. Previously I had autocommit enabled and also did commit on every update so this might also contribute to the problem. Now I disabled it and made the updates less frequent. If the autocommit is allowed to happen together with 'manual' commit on update then there could be simultaneous commits, which now shouldn't happen - there will be at most one update/commit active at a time. Request timeout is default for jetty, but don't know what's that value. Best regards RG I wrote: On Thu, Mar 29, 2012 at 2:25 PM, Rafal Gwizdala rafal.gwizd...@gmail.com wrote: Yonik, I didn't say there was an update request active at the moment the thread dump was made, only that previous update requests failed with a timeout. So maybe this is the missing piece. I didn't enable nio with Jetty, probably it's there by default. Not with the jetty that comes with Solr. bq. If solr hangs next time I'll try to make a thread dump when the update request is waiting for completion. Great! We need to see where it's hanging! Also, how long did the request take to time out? Do you have auto-commit enabled? In the 3x series, updates will block while commits are in progress, so timeouts can happen if they are set too short (and it seems like maybe you aren't using the Jetty from Solr, so the configuration may not be ideal).
Re: bbox query and range queries
On Thu, Mar 29, 2012 at 6:20 PM, Alexandre Rocco alel...@gmail.com wrote: http://localhost:8984/solr/select?q=*:*fq=local:[-23.6677,-46.7315 TO -23.6709,-46.7261] Range queries always need to be [lower_bound TO upper_bound] Try http://localhost:8984/solr/select?q=*:*fq=local:[-23.6709,-46.7315 TO -23.6677,-46.7261] -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: bbox query and range queries
On Thu, Mar 29, 2012 at 6:44 PM, Alexandre Rocco alel...@gmail.com wrote: Yonik, Thanks for the heads-up. That one worked. Just trying to wrap around how it would work on a real case. To test this one I just got the coordinates from Google Maps and searched within the pair of coordinates as I got them. Should I always check which is the lower and upper to assemble the query? Yep... range query on LatLonField is currently pretty low level, and you need to ensure yourself that lat1=lat2 and lon1=lon2 in [lat1,lon1 TO lat2,lon2] -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: Optimizing in SolrCloud
On Thu, Mar 29, 2012 at 7:15 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks, does it matter that we are also updates to documents at various times? Do the deleted documents get removed when doing a merge or does that only get done on an optimize? Yes, any merge removes documents that have been marked as deleted (from the segments involved in the merge). Optimize can still make sense, but more often in scenarios where documents are updated infrequently. -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: NullPointException when Faceting
On Thu, Mar 29, 2012 at 6:33 PM, Jamie Johnson jej2...@gmail.com wrote: I recently got this stack trace when trying to execute a facet based query on my index. The error went away when I did an optimize but I was surprised to see it at all. Can anyone shed some light on why this may have happened? I don't see how that could happen (and I've never seen it happen). I recently fixed one NPE: https://issues.apache.org/jira/browse/SOLR-3150 Hopefully this isn't another! -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: SolrCloud replica and leader out of Sync somehow
On Tue, Mar 20, 2012 at 11:17 AM, Jamie Johnson jej2...@gmail.com wrote: ok, with my custom component out of the picture I still have the same issue. Specifically, when sorting by score on a leader and replica I am getting different doc orderings. Is this something anyone has seen? This is certainly possible and expected - sorting tiebreakers is by internal lucene docid, which can change (even on a single node!) If you need lists that don't shift around due to unrelated changes, make sure you don't have any ties! -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: SolrCloud replica and leader out of Sync somehow
On Tue, Mar 20, 2012 at 11:39 AM, Jamie Johnson jej2...@gmail.com wrote: HmmmOk, I don't see how it's possible for me to ensure that there are no ties. If a query were for *:* everything has a constant score, if the user requested 1 page then requested the next the results on the second page could be duplicates from what was on the first page. I don't remember ever seeing this issue on older versions of SolrCloud, although from what you're saying I should have. What could explain why I never saw this before? If you use replication only to duplicate an index (and avoid any merges), then you will have identical docids. Another possible fix to ensure proper ordering couldn't we always specify a sort order which contained the key? So for instance the user asks for score asc, we'd make this score asc,key asc so that results would be order by score and then by key so the results across pages would be consistent? Yep. And like I said, this is also an issue even on a single node. docid A can be before docid B, then a segment merge can cause these to be shuffled. -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: Multi-valued polyfields - Do they exist in the wild ?
On Tue, Mar 20, 2012 at 2:17 PM, ramdev.wud...@thomsonreuters.com wrote: Hi: We have been keen on using polyfields for a while. But we have been restricted from using it because they do not seem to support Multi-values (yet). Poly-fields should support multi-values, it's more what uses them may not. For example LatLon isn't multiValued because it doesn't have a mechanism to correlate multiple values per document. -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: Is there a way for SOLR / SOLRJ to index files directly bypassing HTTP streaming?
On Mon, Mar 19, 2012 at 4:38 PM, vybe3142 vybe3...@gmail.com wrote: Okay, I added the javabin handler snippet to the solrconfig.xml file (actually shared across all cores). I got further (the request made it past tomcat and into SOLR) but haven't quite succeeded yet. Server trace: Mar 19, 2012 3:31:35 PM org.apache.solr.core.SolrCore execute INFO: [testcore1] webapp=/solr path=/update/javabin params={waitSearcher=truecommit=trueliteral.id=testid1waitFlush=truewt=javabinstream.file=C:\work\SolrC lient\data\justin2.txtversion=2} status=500 QTime=82 Is this justin2.txt file in the javabin format? That's what you're telling Solr by hitting the /update/javabin URL. -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: Is there a way for SOLR / SOLRJ to index files directly bypassing HTTP streaming?
On Mon, Mar 19, 2012 at 5:48 PM, vybe3142 vybe3...@gmail.com wrote: Thanks for the response No, the file is plain text. All I'm trying to do is index plain ASCII text files via a remote reference to their file paths. The XML update handler expects a specific format of XML. The json, CSV, javabin update handlers likewise expect a specific document format. If you have Word, PDF, HTML, or plain text files, one way to index them is http://wiki.apache.org/solr/ExtractingRequestHandler (aka Solr Cell) -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: 400 Error adding field 'tags'='[a,b,c]'
Hmmm, this looks like it's generated by DocumentBuilder with the code catch( Exception ex ) { throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, ERROR: +getID(doc, schema)+Error adding field ' + field.getName() + '=' +field.getValue()+', ex ); } Unfortunately, you're not getting the message from the underlying exception. Is there a full stack trace in the logs? -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10 On Tue, Mar 13, 2012 at 7:05 PM, jlark alpti...@hotmail.com wrote: Hey Folks, I'm new to lucene/solr so pardon my lack of knowledge. I'm trying to feed some json to my solr instance through wget. I'm using the command wget 'http://localhost:8983/solr/update/json?commit=true' --post-file=itemsExported.json --header='Content-type:application/json' however the response I get is: 012-03-13 14:44:44 ERROR 400: ERROR: [doc=http://www.mysite.com] Error adding field 'tags'='[car,house,farm]' where the tag field in my schema looks like. field name=tags type=string indexed=true stored=true multiValued=true/ Not sure if I'm missing something. I'm not too sure on how to debug this further either so anyhelp on both would be great. I was able to feed and test with some dummy docs so I'm pretty sure my method of submission works. Are there any further logs I can look at or turn on? Thanks so much, Alp -- View this message in context: http://lucene.472066.n3.nabble.com/400-Error-adding-field-tags-a-b-c-tp3823853p3823853.html Sent from the Solr - User mailing list archive at Nabble.com.