Re: Best practice for rotating solr logs
I don't like the idea of restarting solr, since restarting it will empty the query cache. Bill, you mentioned using log4j. What are the steps involved to get log4j to do the log rotation? -- View this message in context: http://lucene.472066.n3.nabble.com/Best-practice-for-rotating-solr-logs-tp2759205p3987505.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Round Robin concept in distributed Solr
Hey Erick, It looks like the thread you mentioned talks about how to configure the shards parameter in the Solr query. I am more interested in the 'main' shard you query against when you make Solr queries (main shard being the shard you direct the query against, mainshard/select?q=*:*&shards=shard1,shard2,shard3) I think Suneel's original question is still unanswered, is it better to use Scenario A or Scenario B? I suppose the 'main' shard is going to create a sub query to the rest of the shards defined in the shard parameter, but I am still wondering if you query the same main shard every time if that is going to have a load/performance impact. Suneel wrote > >> So scenario A (round-robin): >> >> query 1: /solr-shard-1/select?q=dog... shards=shard-1,shard2 >> query 2: /solr-shard-2/select?q=dog... shards=shard-1,shard2 >> query 3: /solr-shard-1/select?q=dog... shards=shard-1,shard2 >> etc. >> >> or or scenario B (fixed): >> >> query 1: /solr-shard-1/select?q=dog... shards=shard-1,shard2 >> query 2: /solr-shard-1/select?q=dog... shards=shard-1,shard2 >> query 3: /solr-shard-1/select?q=dog... shards=shard-1,shard2 > Thank you for any help. Regards, Ryan Tabora -- View this message in context: http://lucene.472066.n3.nabble.com/Round-Robin-concept-in-distributed-Solr-tp3636345p3987494.html Sent from the Solr - User mailing list archive at Nabble.com.
How to use SolrJ index a file or a FileInputStream with some other attributes?
How to use SolrJ index a file or a FileInputStream with some other attributes? Please give me some example. Thanks a lot. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-use-SolrJ-index-a-file-or-a-FileInputStream-with-some-other-attributes-tp3987492.html Sent from the Solr - User mailing list archive at Nabble.com.
seeing errors during replication process on slave boxes - read past EOF
hello all, environment: solr 3.5 1 - master 2 - slave slaves are set to poll master every 10 minutes. i have had replication running on one master and two slaves - for a few weeks now. these boxes are not production boxes - just QA/test boxes. right after i started a re-index on the master - i started to see the following errors on both of the slave boxes. in previous test runs - i have not noticed any errors. can someone help me understand what is causing these errors? thank you, 2012-06-03 19:30:23,104 INFO [org.apache.solr.update.UpdateHandler] (pool-16-thread-1) start commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false) 2012-06-03 19:30:23,164 SEVERE [org.apache.solr.handler.ReplicationHandler] (pool-16-thread-1) SnapPull failed org.apache.solr.common.SolrException: Index fetch failed : at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:331) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:268) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.RuntimeException: java.io.IOException: read past EOF: MMapIndexInput(path="/appl/solr/stress/partcatalog/index/core1/index.20120514101522/_5kgm.fdx") at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1103) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:418) at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:470) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:321) ... 11 more Caused by: java.io.IOException: read past EOF: MMapIndexInput(path="/appl/solr/stress/partcatalog/index/core1/index.20120514101522/_5kgm.fdx") at org.apache.lucene.store.MMapDirectory$MMapIndexInput.readByte(MMapDirectory.java:279) at org.apache.lucene.store.DataInput.readInt(DataInput.java:84) at org.apache.lucene.store.MMapDirectory$MMapIndexInput.readInt(MMapDirectory.java:315) at org.apache.lucene.index.FieldsReader.(FieldsReader.java:138) at org.apache.lucene.index.SegmentCoreReaders.openDocStores(SegmentCoreReaders.java:212) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:117) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:93) at org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:235) at org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirectoryReader.java:34) at org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:506) at org.apache.lucene.index.DirectoryReader.access$000(DirectoryReader.java:45) at org.apache.lucene.index.DirectoryReader$2.doBody(DirectoryReader.java:498) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:754) at org.apache.lucene.index.DirectoryReader.doOpenNoWriter(DirectoryReader.java:493) at org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:450) at org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:396) at org.apache.lucene.index.IndexReader.openIfChanged(IndexReader.java:520) at org.apache.lucene.index.IndexReader.reopen(IndexReader.java:697) at org.apache.solr.search.SolrIndexReader.reopen(SolrIndexReader.java:414) at org.apache.solr.search.SolrIndexReader.reopen(SolrIndexReader.java:425) at org.apache.solr.search.SolrIndexReader.reopen(SolrIndexReader.java:35) at org.apache.lucene.index.IndexReader.openIfChanged(IndexReader.java:501) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1083) ... 14 more 2012-06-03 19:30:23,197 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Skipping download for /appl/solr/stress/partcatalog/index/core1/index.20120514101522/_5kiq.tis 2012-06-03 19:30:23,198 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Skipping download for /appl/solr/stress/partcatalog/index/core1/index.20120514101522/_5kit.tis 2012-06-03 19:30:23,198 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Skipping download for /appl/solr/stress/partcatalog/index/core1/index.20120514101522/_5kgm.fdt 2012-06-03 19:30:23,198 INFO [org.apa
Re: I got ERROR, Unable to execute query
I read your answer. Thank you. But I don't get that error from same table. This time I get error from test_5. but when I try to dataimport again, I can index test_5, but from test_7 I get that error. I don't know the reason. Could you help me? -- Is test_5 created by a stored procedure? If so, is there a possibility that the stored procedure may have done an update and not returned data - but just sometimes? -- Jack Krupansky 2012/6/2 Jihyun Suh > I use many tables for indexing. > > During dataimport, I get errors for some tables like "Unable to execute > query". But next time, when I try to dataimport for that table, I can do > successfully without any error. > > [Thread-17] ERROR o.a.s.h.d.EntityProcessorWrapper - Exception in entity > : > test_5:org.apache.solr.handler.dataimport.DataImportHandlerException: > Unable to execute query: > SELECT Title, url, synonym, description FROM test_5 WHERE status in > ('1','s') Processing Document # 11046 > > at > org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) > at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:253) > at > org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) > at > org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39) > at > org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) > at > org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) > at > org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:596) > at > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268) > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187) > at > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359) > at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427) > at > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408) > > I use many tables for indexing. > > During dataimport, I get errors for some tables like "Unable to execute > query". But next time, when I try to dataimport for that table, I can do > successfully without any error. > > [Thread-17] ERROR o.a.s.h.d.EntityProcessorWrapper - Exception in entity > : > test_5:org.apache.solr.handler.dataimport.DataImportHandlerException: > Unable to execute query: > SELECT Title, url, synonym, description FROM test_5 WHERE status in > ('1','s') Processing Document # 11046 > > at > org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) > at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:253) > at > org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) > at > org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39) > at > org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) > at > org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) > at > org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:596) > at > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268) > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187) > at > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359) > at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427) > at > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408) > >
Re: Sorting with customized function of score
I'm using the solr 4.0 nightly build version. In fact I intend to sort with a more complicate function including score, geodist() and other factors, so this example is to simplify the issue that I cannot sort with a customized function of score. More concrete, how can i make the sort like: sort=product(div(1,geodist()),score) desc ? Thanks, Toan. On Sun, Jun 3, 2012 at 2:56 PM, Erick Erickson wrote: > What version of Solr are you using? When I try this with 3.6 > I get an error that "score is not defined". > > But what are you trying to accomplish with this particular sort, it > won't actually do anything at all to your sort order or are you just > experimenting? > > Best > Erick > > On Fri, Jun 1, 2012 at 11:30 AM, Toan V Luu wrote: > > Hi, > > When i use "sort=score asc" then it works, but when I use a customized > > function like "sort=sum(score,2) asc" then I got an error "can not sort > on > > multivalued field: sum(score,2)". Do you know why and how to solve it? > > Thanks > > Toan. >
Re: Wildcard-Search Solr 3.5.0
And I closed the JIRA, see the comments. But the short form is that it's not worth the effort because of the edge cases. Jack writes up some of them; the short form is "what does stemming do with terms like organiz* ". Sure, it would produce one token (which is the main restriction on a MultiTermAware filter), but the output might not be anything equivalent to the stem of "organization", maybe not even "organize". Better to avoid that rat-hole, it seems like one of those problems that could suck up enormous amounts of time and _still_ not do what's expected. If you _really_ want to try this, you could always define your own "multiterm" analysis component that included the stemmer, see: http://www.lucidimagination.com/blog/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/ But don't say I didn't warn you ... Best Erick On Sun, Jun 3, 2012 at 8:25 AM, Erick Erickson wrote: > Chiming in late here, just back from vacation. But off the top of my > head, I don't see any reason SnowballPorterFilterFactory shouldn't > be MultiTermAware. > > I've created https://issues.apache.org/jira/browse/SOLR-3503 as > a placeholder. > > Erick > > On Fri, May 25, 2012 at 1:31 PM, wrote: >>> I don't know the specific rules in these specific stemmers, >>> but generally a >>> "less aggressive" stemming (e.g., "plural-only") of >>> "paintings" would be >>> "painting", while a "more aggressive" stemming would be >>> "paint". For some >>> "aggressive" stemmers the stemmed word is not even a word. >> >> Sounds logically :) >> >>> It would be nice to have doc with some example words for each stemmer. >> >> Absolutely! >> >> Thx alot! >>
Re: Distance Range Filtering
This seems to work with the example data, perhaps there's a more elegant way of doing it though http://localhost:8983/solr/select?q=*:*&sfield=store&pt=45.15,-93.85&fq={!frange l=6 u=400}geodist() returns all the stores between 6 and 400 Best Erick On Sat, Jun 2, 2012 at 8:43 PM, reeuv wrote: > Hi everyone > > I am trying to do distance range search using Solr. > > I know its very easy to do a search for filtering within the 5km range > /&q=*:*&fq={!geofilt pt=45.15,-93.85 sfield=store d=5}/ > > What I am after is how to do the same thing if I am looking in a range of > say *5 to 10 km* ?? > > Thanks > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Distance-Range-Filtering-tp3987395.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: OS memory Pollution on rsync?
Sujatha, A few thoughts: * If you are still using rsync replication, you may be using an old version of Solr. Consider upgrading (it's also more efficient with memory). * Yes, rsync will "pollute" OS cache, but is it really pollution if it makes the OS cache the index that is about to be made searchable? * You typically want to replica only changed portions of the index, not whole indices, so maybe there is no need to worry about this. * If you've properly warmed up the index after replication and searches are still taking a long time, then it's likely that query latency is unrelated to replication and "pollution". * Monitor your disk IO, CPU usage, and Solr cache usage - this will likely lead you in the right direction. Otis Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm > > From: Sujatha Arun >To: solr-user@lucene.apache.org >Sent: Sunday, June 3, 2012 4:32 AM >Subject: OS memory Pollution on rsync? > >Hello, > >When we migrated the solr webapps to a new server in production,We did an >rsync of all the indexes in our old server . On checking with Jconsole >,the OS free Ram was already filled to twice the Index size .There is no >other service in this server. The rsync was done twice. > >Even though we have more RAM in the new server ,some of the simple queries >are taking more than 2s to execute .If index was cached ,then why its >taking so long to execute > > >Any ideas? > >Regards >Sujatha >* >* > > >
Re: A few random questions about solr queries.
See below: On Tue, May 29, 2012 at 6:18 AM, santamaria2 wrote: > *1)* With faceting, how does facet.query perform in comparison to > facet.field? I'm just wondering this as in my use case, I need to facet over > a field -- which would get me the top n facets for that field, but I also > need to show the count for a "selected filter" which might have a relatively > low count so it doesn't appear in the top n returned facets. So the solution > would be to 'ensure' its presence by adding a 'facet.query=cat:val' in > addition to my facet.field=cat. You have two choices here. Either specify that the return should contain the "top", say, 1,000,000 responses (which would be a disaster in some cases) and facet by field, or facet by query. You really don't have any other choice than to add the facet.query here so performance is moot. > > I want to do this to quite a few fields. > > Related/example-based question: > When I facet over a field, and something gets returned, eg: John Smith (83), > and I also 'ensure' this facet's presence by having it in > facet.query=author:"John Smith", are two different calculations performed? > Or is the facet returned by facet.field also used by facet.query to obtain > the count? > I'm pretty sure that two different calculations are performed, but don't know for certain. But again, it seems like your use-case requires the addition of the query so why does it matter? > > > *2) *Is there a performance issue if I have around, say, 20 facet.query > conditions along with 10 facet.fields? 3/10 of those fields have around > 100,000 possible values. Remaining have a few hundred each. > It Depends (tm). You don't say, for instance, how big your index is. Or how much memory you have or. Really, the only good way to answer this question is to try it and _then_ worry about it. So far, you've really described your requirements so asking low-level implementation details seems premature unless and until you see a performance problem. > > > *3)* I've rummaged around a bit, looking for info on when to use q vs fq. I > want to clear my doubts for a certain use case. > > Where should my date range queries go? In q or fq? The default settings in > my site show results from the past 90 days with buttons to show stuff from > the last month and week as well. But the user is allowed to use a slider to > apply any date range... this is allowed, but it's not /that/ common. > I definitely use fq for filtering various tags. Choosing a tag is a common > activity. > In addition to Shawn's answer, using &fq clauses enables using of the filterCache which can substantially increase performance, but see this blog post for some interesting considerations when using NOW.. http://www.lucidimagination.com/blog/2012/02/23/date-math-now-and-filter-queries/ Best Erick > Should the date range query go in fq? As I mentioned, the default view shows > stuff from the past 90 days. So on each new day does this like invalidate > stuff in the cache? Or is stuff stored in the filtered cache in some way > that makes it easy to fetch stuff from the past 89 days when a query is > performed the next day? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/A-few-random-questions-about-solr-queries-tp3986562.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sorting with customized function of score
What version of Solr are you using? When I try this with 3.6 I get an error that "score is not defined". But what are you trying to accomplish with this particular sort, it won't actually do anything at all to your sort order or are you just experimenting? Best Erick On Fri, Jun 1, 2012 at 11:30 AM, Toan V Luu wrote: > Hi, > When i use "sort=score asc" then it works, but when I use a customized > function like "sort=sum(score,2) asc" then I got an error "can not sort on > multivalued field: sum(score,2)". Do you know why and how to solve it? > Thanks > Toan.
Re: Efficiently mining or parsing data out of XML source files
This seems really odd. How big are these XML files? Where are you parsing them? You could consider using a SolrJ program with a SAX-style parser. But the first question I'd answer is "what is slow?". The implications of your post is that parsing the XML is the slow part, it really shouldn't be taking anywhere near this long IMO... Best Erick On Thu, May 31, 2012 at 9:14 AM, Van Tassell, Kristian wrote: > I'm just wondering what the general consensus is on indexing XML data to Solr > in terms of parsing and mining the relevant data out of the file and putting > them into Solr fields. Assume that this is the XML file and resulting Solr > fields: > > XML data: > > foo > > garbage data > > > Solr Fields: > Id=1234 > Title=foo > Bar=val1 > > I'd previously set this process up using XSLT and have since tested using > XMLBeans, JAXB, etc. to get the relevant data. The speed at which this > occurs, however, is not acceptable. 2800 objects take 11 minutes to parse and > index into Solr. > > The big slowdown appears to be that I'm parsing the data with an XML parser. > > So, now I'm testing mining the data by opening the file as just a text file > (using Groovy) and picking out relevant data using regular expression > matching. I'm now able to parse (mine) the data and index the 2800 files in > 72 seconds. > > So I'm wondering if the typical solution people use is to go with a non-XML > solution. It seems to make sense considering the search index would only want > to store (as much data) as possible and not rely on the incoming documents > being xml compliant. > > Thanks in advance for any thoughts on this! > -Kristian > > > > > > >
Re: Rolling partitions with solr shards
Not really that I know of. One of the things on the wish list (and I think being worked on) is a pluggable hashing function for SolrCloud Best Erick On Sun, May 27, 2012 at 2:18 PM, avenka wrote: > Is there a simple way to get solr to maintain shards as rolling partitions by > date, e.g., the last day's documents in one shard, the week before yesterday > in the next shard, the month before that in the next shard, and so on? I > really don't need querying to be fast on the entire index, but it is > critical that it be blazing fast on recent documents. > > A related but different question: in which config file can I change the > default hash function to assign documents to shards? This outdated post > http://wiki.apache.org/solr/NewSolrCloudDesign > seems to suggest that you can define your own hash functions as well as > assign hash ranges to partitions, but I am not sure whether or how solr 3.6 > supports this. For that matter, I don't know whether or how SolrCloud (that > I understand is available only in solr4) supports this. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Rolling-partitions-with-solr-shards-tp3986315.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Queries to solr being blocked
What's probably happening is that you are re-opening searchers every minute and never getting any benefit from caching. The queries simply _look_ like they're blocked, when they're just taking a really long time. Try increasing your delta import interval to something like 10 minutes to test. Best Erick On Fri, May 25, 2012 at 3:38 PM, KPK wrote: > Hello > > I just wanted to ask if queries to solr index are blocked while delta > import? > I read at the wiki page that queries to solr are not blocked while full > imports, but the page doesnt mention anything about delta import. What > happens then? > > I am currently facing a problem, my query takes very long time to respond. > Currently I am scheduling delta import every 1 min, as my DB size keeps on > increasing every minute. But I doubt this is causing some performance issue. > I doubt if the query is being made to the solr index while the CRON job is > runing at the background for delta import. I am using > DataImportHandlerDeltaQuery Via FullImport for this purpose. > Is this causing a delay in responding to the query or is it smething else. > > Any help would be appreciated. > > Thanks, > Kushal > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Queries-to-solr-being-blocked-tp3986181.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard-Search Solr 3.5.0
Chiming in late here, just back from vacation. But off the top of my head, I don't see any reason SnowballPorterFilterFactory shouldn't be MultiTermAware. I've created https://issues.apache.org/jira/browse/SOLR-3503 as a placeholder. Erick On Fri, May 25, 2012 at 1:31 PM, wrote: >> I don't know the specific rules in these specific stemmers, >> but generally a >> "less aggressive" stemming (e.g., "plural-only") of >> "paintings" would be >> "painting", while a "more aggressive" stemming would be >> "paint". For some >> "aggressive" stemmers the stemmed word is not even a word. > > Sounds logically :) > >> It would be nice to have doc with some example words for each stemmer. > > Absolutely! > > Thx alot! >
OS memory Pollution on rsync?
Hello, When we migrated the solr webapps to a new server in production,We did an rsync of all the indexes in our old server . On checking with Jconsole ,the OS free Ram was already filled to twice the Index size .There is no other service in this server. The rsync was done twice. Even though we have more RAM in the new server ,some of the simple queries are taking more than 2s to execute .If index was cached ,then why its taking so long to execute Any ideas? Regards Sujatha * *