RE: Out of memory
Thanks Jaeger. Actually I am storing twitter streaming data into the core, so the rate of index is about 12tweets(docs)/second. The same solr contains 3 other cores but these cores are not very heavy. Now the twitter core has become very large (77516851) and its taking a long time to query (Mostly facet queries based on date, string fields). After sometime about 18-20hr solr goes out of memory, the thread dump doesn't show anything. How can I improve this besides adding more ram into the system. Regards, Rohit Mobile: +91-9901768202 About Me: http://about.me/rohitg -Original Message- From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] Sent: 13 September 2011 21:06 To: solr-user@lucene.apache.org Subject: RE: Out of memory numDocs is not the number of documents in memory. It is the number of documents currently in the index (which is kept on disk). Same goes for maxDocs, except that it is a count of all of the documents that have ever been in the index since it was created or optimized (including deleted documents). Your subject indicates that something is giving you some kind of Out of memory error. We might better be able to help you if you provide more information about your exact problem. JRJ -Original Message- From: Rohit [mailto:ro...@in-rev.com] Sent: Tuesday, September 13, 2011 2:29 PM To: solr-user@lucene.apache.org Subject: Out of memory I have solr running on a machine with 18Gb Ram , with 4 cores. One of the core is very big containing 77516851 docs, the stats for searcher given below searcherName : Searcher@5a578998 main caching : true numDocs : 77516851 maxDoc : 77518729 lockFactory=org.apache.lucene.store.NativeFSLockFactory@5a9c5842 indexVersion : 1308817281798 openedAt : Tue Sep 13 18:59:52 GMT 2011 registeredAt : Tue Sep 13 19:00:55 GMT 2011 warmupTime : 63139 . Is there a way to reduce the number of docs loaded into memory for this core? . At any given time I dont need data more than past 15 days, unless someone queries for it explicetly. How can this be achieved? . Will it be better to go for Solr replication or distribution if there is little option left Regards, Rohit Mobile: +91-9901768202 About Me: http://about.me/rohitg http://about.me/rohitg
EofException with Solr in Jetty
Hi all sometimes we have this error in our system. We are running Solr 3.1.0 running on Jetty 7.2.2 Anyone an idea how to tune this? 14:41:05,693 | ERROR | qtp283504850-36 | SolrDispatchFilter | apache.solr.common.SolrException 151 | 154 - mvn_ch.basis06.eld.indexer_ch.basis06.eld.indexer.solrserver_0.1-SNAPSHOT_war - 0 | org.eclipse.jetty.io.EofException at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:149) at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:96) at org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:184) at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:89) at org.apache.solr.response.BinaryResponseWriter.write(BinaryResponseWriter.java:46) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:336) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1322) at org.ops4j.pax.web.service.internal.WelcomeFilesFilter.doFilter(WelcomeFilesFilter.java:169) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1322) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:473) at org.ops4j.pax.web.service.jetty.internal.HttpServiceServletHandler.doHandle(HttpServiceServletHandler.java:70) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:516) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:929) at org.ops4j.pax.web.service.jetty.internal.HttpServiceContext.doHandle(HttpServiceContext.java:116) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:403) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:184) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:864) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.ops4j.pax.web.service.jetty.internal.JettyServerHandlerCollection.handle(JettyServerHandlerCollection.java:72) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:114) at org.eclipse.jetty.server.Server.handle(Server.java:352) at org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:596) at org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:1051) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:590) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:212) at org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:426) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:508) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.access$000(SelectChannelEndPoint.java:34) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:40) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:451) at java.lang.Thread.run(Thread.java:662) -- Michael Szalay Senior Software Engineer basis06 AG, Birkenweg 61, CH-3013 Bern - Fon +41 31 311 32 22 http://www.basis06.ch - source of smart business
solr 1.4 highlighting issue
Hello list, Not sure how many of you are still using solr 1.4 in production, but here is an issue with highlighting, that we've noticed: The query is: (drill AND ships) OR rigs Excerpt from the highlighting list: arr name=Contents str Within the fleet of 27 floating lt;emrigslt;/em (semisubmersibles and drillships) are 21 deepwater lt;emdrillinglt;/em /str /arr /lst Why did solr highlight drilling even though there is no ships in the text? * *-- Regards, Dmitry Kan
RE: Weird behaviors with not operators.
Thank you a lot for your answers! They help me to understand better how query parser works. -- View this message in context: http://lucene.472066.n3.nabble.com/Weird-behaviors-with-not-operators-tp3323065p3335087.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr 1.4 facet.limit behaviour in merging from several shards
Hi Chris, Thanks for taking this. Sorry for my confusing explanation. Since you requested a bigger picture, I'll give some more detail. In short: we don't do date facets, and sorting by date in reverse order happens naturally by design. All the data is split to shards. We use logical sharding, not hash based. Each shard contains piece of data that corresponds to a specific date range. We know in advance, which date range is represented by which shard. Each document in a shard has a field, which contains date in milliseconds which is a result of subtraction of the original document's date from a very big date in the future. In this way, if you issue a facet query against a shard and use facet.method=index you get hits from the shard ordered lexicographically in reverse order. Here is an example of two values: 9223370739060532807_docid1 9223370741484545807_docid2 The second value is larger than the first, which means that the document itself is older. Here is a typical facet query: wt=xmlstart=0hl.alternateField=Contentsversion=1df=Contentsq=aerospace+engineerhl.alternateFieldLength=10facet=truef.OppositeDateLongNumber_docid.facet.limit=1000facet.field=OppositeDateLongNumber_docidrows=1facet.sort=indexfacet.zeros=falseisShard=true The output xml is: (skipping the header) lst name=facet_fields lst name=OppositeDateLongNumber_docid int name=9223370722475651807_12/int int name=9223370722825037807_41/int int name=9223370723175759807_22/int int name=9223370723372652807_101/int int name=9223370723949606807_71/int /lst /lst Excerpt from the schema: fieldType name=text class=solr.TextField positionIncrementGap=100 omitNorms=true analyzer type=index !-- the order matters -- tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ReversedWildcardFilterFactory withOriginal=true maxPosAsterisk=3 maxPosQuestion=2 maxFractionAsterisk=0.33/ !-- here we have two more proprietary filters, one of which does stemming -- /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ !-- our proprietary stemming filter - /analyzer /fieldType field name=OppositeDateLongNumber_docid type=string indexed=true stored=true required=false omitNorms=true / field name=Contents type=text indexed=true stored=true omitNorms=true / Back to the problem: It has been reproducible, that if query ran from the solr - router reaches two or more shards, each of which generates around 1000 hits, upon merging, some portion of hits (on the time border between two shards) gets dropped. So the result hit list is uniform otherwise, except for the missing portion of hits in the middle. So the question is: if the facet search reaches two or more shards and each shard generates 1000 results, which entries will go into the final list of resulting entries, given the facet.limit=1000 set on the original distributed query? What is the algorithm in this case? Please let me know, if something is not clear or more detail is needed from schema / execution / design. Regards, Dmitry On Fri, Sep 9, 2011 at 12:22 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : When shooting a distributed query, we use facet.limit=1000. Then the merging : SOLR combines the results. We also use facet.zeros=false to ensure returning : only non-zero facet entries. : The issue that we found is that there was a gap in time in the final results : list (reverse sorted by date attached to each entry in all the shards), : whereby entries stamped with certain date disappeared. If we use different : query criteria, that produces less than 1000 results both in each of the : shards and combined, we see those missing entries. So the problem is not : in missing data, but in the combination algorithm. I don't understand what you mean by entries stamped with certain date ... are you saying the actaul results of the search seem to be missing documents, or that the fact counts returned seemed to be missing constraints that should be in the list? it seems like you are refering to documents missing from the actaul results (reverse sorted by date) but facet.limit can't affect anything about the results of the actual query. facet.limit also only applies to facet.field (not facet.date or facet.range), but you're talking about a date field can you please be specific about the requests you are executing (ie: what params) the schema you have (ie: what are the fields/types in use in all the params/query strings), the results you are getting, and the results you are expecting? actually providing the response xml is very helpful. (change the fl to hide any fields you consider sensitive) -Hoss --
Re: Out of memory
Hi Rohit, Do you use caching? How big is your index in size on the disk? What is the stack trace contents? The OOM problems that we have seen so far were related to the index physical size and usage of caching. I don't think we have ever found the exact cause of these problems, but sharding has helped to keep each index relatively small and OOM have gone away. You can also attach jconsole onto your SOLR via the jmx and monitor the memory / cpu usage in a graphical interface. I have also run garbage collector manually through jconsole sometimes and it was of a help. Regards, Dmitry On Wed, Sep 14, 2011 at 9:10 AM, Rohit ro...@in-rev.com wrote: Thanks Jaeger. Actually I am storing twitter streaming data into the core, so the rate of index is about 12tweets(docs)/second. The same solr contains 3 other cores but these cores are not very heavy. Now the twitter core has become very large (77516851) and its taking a long time to query (Mostly facet queries based on date, string fields). After sometime about 18-20hr solr goes out of memory, the thread dump doesn't show anything. How can I improve this besides adding more ram into the system. Regards, Rohit Mobile: +91-9901768202 About Me: http://about.me/rohitg -Original Message- From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] Sent: 13 September 2011 21:06 To: solr-user@lucene.apache.org Subject: RE: Out of memory numDocs is not the number of documents in memory. It is the number of documents currently in the index (which is kept on disk). Same goes for maxDocs, except that it is a count of all of the documents that have ever been in the index since it was created or optimized (including deleted documents). Your subject indicates that something is giving you some kind of Out of memory error. We might better be able to help you if you provide more information about your exact problem. JRJ -Original Message- From: Rohit [mailto:ro...@in-rev.com] Sent: Tuesday, September 13, 2011 2:29 PM To: solr-user@lucene.apache.org Subject: Out of memory I have solr running on a machine with 18Gb Ram , with 4 cores. One of the core is very big containing 77516851 docs, the stats for searcher given below searcherName : Searcher@5a578998 main caching : true numDocs : 77516851 maxDoc : 77518729 lockFactory=org.apache.lucene.store.NativeFSLockFactory@5a9c5842 indexVersion : 1308817281798 openedAt : Tue Sep 13 18:59:52 GMT 2011 registeredAt : Tue Sep 13 19:00:55 GMT 2011 warmupTime : 63139 . Is there a way to reduce the number of docs loaded into memory for this core? . At any given time I dont need data more than past 15 days, unless someone queries for it explicetly. How can this be achieved? . Will it be better to go for Solr replication or distribution if there is little option left Regards, Rohit Mobile: +91-9901768202 About Me: http://about.me/rohitg http://about.me/rohitg
Re: question about Field Collapsing/ grouping
Hi Jayendra Thanks a lot for your response, now i have two questions one that to get the count of groups is it must to apply the specified patch, if so can you help me a little how i can apply that patch in steps as i am new to solr/java. Regards Ahsan - Original Message - From: Jayendra Patil jayendra.patil@gmail.com To: solr-user@lucene.apache.org; Ahson Iqbal mianah...@yahoo.com Cc: Sent: Tuesday, September 13, 2011 10:55 AM Subject: Re: question about Field Collapsing/ grouping The time we implemented the feature, there was no straight forward solution. What we did is to facet on the grouped by field and counting the facets. This would give you the distinct count for the groups. You may also want to check the Patch @ https://issues.apache.org/jira/browse/SOLR-2242, which will return the facet counts and you need to count it by yourself. Regards, Jayendra On Tue, Sep 13, 2011 at 1:27 AM, Ahson Iqbal mianah...@yahoo.com wrote: Hi Is it possible to get number of groups that matched with specified query. like let say there are three fields in index DocumentID Content Industry and now i want to query as +(Content:is Content:the) group=truegroup.field=industry now is it possible to get how many industries matched with specified query. Please help. Regards Ahsan
Re: DIH delta last_index_time
On Wed, Sep 14, 2011 at 11:23 AM, Maria Vazquez maria.vazq...@dexone.com wrote: Hi, How do you handle the situation where the time on the server running Solr doesn¹t match the time in the database? Firstly, why is that the case? NTP is pretty universal these days. I¹m using the last_index_time saved by Solr in the delta query checking it against lastModifiedDate field in the database but the times are not in sync so I might lose some changes. Can we use something else other than last_index_time? Maybe something like last_pk or something. One possible way is to edit dataimport.properties, manually or through a script, to put the last_index_time back to a safe value. Regards, Gora
RE: Out of memory
Hi Dimtry, To answer your questions, -Do you use caching? I do user caching, but will disable it and give it a go. -How big is your index in size on the disk? These are the size of the data folder for each of the cores. Core1 : 64GB Core2 : 6.1GB Core3 : 7.9GB Core4 : 1.9GB Will try attaching a jconsole to my solr as suggested to get a better picture. Regards, Rohit -Original Message- From: Dmitry Kan [mailto:dmitry@gmail.com] Sent: 14 September 2011 08:15 To: solr-user@lucene.apache.org Subject: Re: Out of memory Hi Rohit, Do you use caching? How big is your index in size on the disk? What is the stack trace contents? The OOM problems that we have seen so far were related to the index physical size and usage of caching. I don't think we have ever found the exact cause of these problems, but sharding has helped to keep each index relatively small and OOM have gone away. You can also attach jconsole onto your SOLR via the jmx and monitor the memory / cpu usage in a graphical interface. I have also run garbage collector manually through jconsole sometimes and it was of a help. Regards, Dmitry On Wed, Sep 14, 2011 at 9:10 AM, Rohit ro...@in-rev.com wrote: Thanks Jaeger. Actually I am storing twitter streaming data into the core, so the rate of index is about 12tweets(docs)/second. The same solr contains 3 other cores but these cores are not very heavy. Now the twitter core has become very large (77516851) and its taking a long time to query (Mostly facet queries based on date, string fields). After sometime about 18-20hr solr goes out of memory, the thread dump doesn't show anything. How can I improve this besides adding more ram into the system. Regards, Rohit Mobile: +91-9901768202 About Me: http://about.me/rohitg -Original Message- From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] Sent: 13 September 2011 21:06 To: solr-user@lucene.apache.org Subject: RE: Out of memory numDocs is not the number of documents in memory. It is the number of documents currently in the index (which is kept on disk). Same goes for maxDocs, except that it is a count of all of the documents that have ever been in the index since it was created or optimized (including deleted documents). Your subject indicates that something is giving you some kind of Out of memory error. We might better be able to help you if you provide more information about your exact problem. JRJ -Original Message- From: Rohit [mailto:ro...@in-rev.com] Sent: Tuesday, September 13, 2011 2:29 PM To: solr-user@lucene.apache.org Subject: Out of memory I have solr running on a machine with 18Gb Ram , with 4 cores. One of the core is very big containing 77516851 docs, the stats for searcher given below searcherName : Searcher@5a578998 main caching : true numDocs : 77516851 maxDoc : 77518729 lockFactory=org.apache.lucene.store.NativeFSLockFactory@5a9c5842 indexVersion : 1308817281798 openedAt : Tue Sep 13 18:59:52 GMT 2011 registeredAt : Tue Sep 13 19:00:55 GMT 2011 warmupTime : 63139 . Is there a way to reduce the number of docs loaded into memory for this core? . At any given time I dont need data more than past 15 days, unless someone queries for it explicetly. How can this be achieved? . Will it be better to go for Solr replication or distribution if there is little option left Regards, Rohit Mobile: +91-9901768202 About Me: http://about.me/rohitg http://about.me/rohitg
Shouldn't ReversedWildcardFilterFactory resolve leadingWildcard?
Hi, I use the next fieldType: fieldType name=text_general_rev class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.ReversedWildcardFilterFactory withOriginal=true/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.ReversedWildcardFilterFactory withOriginal=true/ /analyzer /fieldType What I want is to find autocar when I'm searching for auto*, for example, but no leading wildcard is returned. When I check fieldType with Analysis, I get this : Index Analyzer autocar autocar autocar #1;racotua autocar Query Analyzer car car car car #1;rac car So using for my search car* shouldn't become rac* and match racotua ? Even if I search after rac* autocar is not found. Using for search *car* is very expensive so I'm trying to generate the reversed string and find it. There is a working configuration to accomplish this? -- View this message in context: http://lucene.472066.n3.nabble.com/Shouldn-t-ReversedWildcardFilterFactory-resolve-leadingWildcard-tp3335240p3335240.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: question about Field Collapsing/ grouping
Hi Ahson, http://wiki.apache.org/solr/FieldCollapsing group.ngroups seems to be added as an parameter, so you may not be needed to apply any patches. Solr 3.3 had released the grouping feature with it, so I presume it should already be included in it. Regards, Jayendra On Wed, Sep 14, 2011 at 4:22 AM, Ahson Iqbal mianah...@yahoo.com wrote: Hi Jayendra Thanks a lot for your response, now i have two questions one that to get the count of groups is it must to apply the specified patch, if so can you help me a little how i can apply that patch in steps as i am new to solr/java. Regards Ahsan - Original Message - From: Jayendra Patil jayendra.patil@gmail.com To: solr-user@lucene.apache.org; Ahson Iqbal mianah...@yahoo.com Cc: Sent: Tuesday, September 13, 2011 10:55 AM Subject: Re: question about Field Collapsing/ grouping The time we implemented the feature, there was no straight forward solution. What we did is to facet on the grouped by field and counting the facets. This would give you the distinct count for the groups. You may also want to check the Patch @ https://issues.apache.org/jira/browse/SOLR-2242, which will return the facet counts and you need to count it by yourself. Regards, Jayendra On Tue, Sep 13, 2011 at 1:27 AM, Ahson Iqbal mianah...@yahoo.com wrote: Hi Is it possible to get number of groups that matched with specified query. like let say there are three fields in index DocumentID Content Industry and now i want to query as +(Content:is Content:the) group=truegroup.field=industry now is it possible to get how many industries matched with specified query. Please help. Regards Ahsan
Re: How to plug a new ANTLR grammar
Hi Peter, Yes, with the tree it is pretty straightforward. I'd prefer to do it that way, but what is the purpose of the new qParser then? Is it just that the qParser was built with a different paradigms in mind where the parse tree was not in the equation? Anybody knows if there is any advantage? I looked bit more into the contrib org.apache.lucene.queryParser.standard.StandardQueryParser.java org.apache.lucene.queryParser.standard.QueryParserWrapper.java And some things there (like setting default fuzzy value) are in my case set directly in the grammar. So the query builder is still somehow involved in parsing (IMHO not good). But if someone knows some reasons to keep using the qParser, please let me know. Also, a question for Peter, at which stage do you use lucene analyzers on the query? After it was parsed into the tree, or before we start processing the query string? Thanks! Roman On Tue, Sep 13, 2011 at 10:14 PM, Peter Keegan peterlkee...@gmail.com wrote: Roman, I'm not familiar with the contrib, but you can write your own Java code to create Query objects from the tree produced by your lexer and parser something like this: StandardLuceneGrammarLexer lexer = new ANTLRReaderStream(new StringReader(queryString)); CommonTokenStream tokens = new CommonTokenStream(lexer); StandardLuceneGrammarParser parser = new StandardLuceneGrammarParser(tokens); StandardLuceneGrammarParser.query_return ret = parser.mainQ(); CommonTree t = (CommonTree) ret.getTree(); parseTree(t); parseTree (Tree t) { // recursively parse the Tree, visit each node visit (node); } visit (Tree node) { switch (node.getType()) { case (StandardLuceneGrammarParser.AND: // Create BooleanQuery, push onto stack ... } } I use the stack to build up the final Query from the queries produced in the tree parsing. Hope this helps. Peter On Tue, Sep 13, 2011 at 3:16 PM, Jason Toy jason...@gmail.com wrote: I'd love to see the progress on this. On Tue, Sep 13, 2011 at 10:34 AM, Roman Chyla roman.ch...@gmail.com wrote: Hi, The standard lucene/solr parsing is nice but not really flexible. I saw questions and discussion about ANTLR, but unfortunately never a working grammar, so... maybe you find this useful: https://github.com/romanchyla/montysolr/tree/master/src/java/org/apache/lucene/queryParser/iqp/antlr In the grammar, the parsing is completely abstracted from the Lucene objects, and the parser is not mixed with Java code. At first it produces structures like this: https://svnweb.cern.ch/trac/rcarepo/raw-attachment/wiki/MontySolrQueryParser/index.html But now I have a problem. I don't know if I should use query parsing framework in contrib. It seems that the qParser in contrib can use different parser generators (the default JavaCC, but also ANTLR). But I am confused and I don't understand this new queryParser from contrib. It is really very confusing to me. Is there any benefit in trying to plug the ANTLR tree into it? Because looking at the AST pictures, it seems that with a relatively simple tree walker we could build the same queries as the current standard lucene query parser. And it would be much simpler and flexible. Does it bring something new? I have a feeling I miss something... Many thanks for help, Roman -- - sent from my mobile 6176064373
Re: index not created
Hi Erick, I have not done anything different. I downloaded the solr tar from one of the mirror and then extracted it in the home directory started jetty and it works fine. For tomcat I copied the war file in my webapps folder and restarted tomcat changed the configuration to point it to my solr dir and started it again. Same setup everything is same. Even this time i have tried it with the example solr folder without multicore setup and in solrconfig.xml all the lib paths are same which were for jetty. But still nothing is getting indexed it shows that 1 document is there but text field doesn't show anything in it and nothing comes when i search for something from the document. Am i doing something wrong ? Please let me know. I have to implement it ASAP. Please help me or if you can give me document to implement the same in tomcat then i would try that way Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/index-not-created-tp3300744p3335291.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Shouldn't ReversedWildcardFilterFactory resolve leadingWildcard?
I found a partial solution. Using ReverseStringFilterFactory instead ReverseWildcardFilterFactory and searching after rac* will find autocar for example. -- View this message in context: http://lucene.472066.n3.nabble.com/Shouldn-t-ReversedWildcardFilterFactory-resolve-leadingWildcard-tp3335240p3335307.html Sent from the Solr - User mailing list archive at Nabble.com.
Schema fieldType y-m-d ?!?!
is it possible to index a datefield in the format of y-m-d ? i dont need the timestamp. so i can save me some space. which ways exists to search with a complex date-filter !? - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/Schema-fieldType-y-m-d-tp3335359p3335359.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Out of memory
Hi, OK 64GB fits into one shard quite nicely in our setup. But I have never used multicore setup. In total you have 79,9 GB. We try to have 70-100GB per shard with caching on. Do you do warming up of your index on starting? Also, there was a setting of pre-populating the cache. It could also help, if you can show some parts of your solrconfig file. What is the solr version you use? Regards, Dmitry On Wed, Sep 14, 2011 at 11:38 AM, Rohit ro...@in-rev.com wrote: Hi Dimtry, To answer your questions, -Do you use caching? I do user caching, but will disable it and give it a go. -How big is your index in size on the disk? These are the size of the data folder for each of the cores. Core1 : 64GB Core2 : 6.1GB Core3 : 7.9GB Core4 : 1.9GB Will try attaching a jconsole to my solr as suggested to get a better picture. Regards, Rohit -Original Message- From: Dmitry Kan [mailto:dmitry@gmail.com] Sent: 14 September 2011 08:15 To: solr-user@lucene.apache.org Subject: Re: Out of memory Hi Rohit, Do you use caching? How big is your index in size on the disk? What is the stack trace contents? The OOM problems that we have seen so far were related to the index physical size and usage of caching. I don't think we have ever found the exact cause of these problems, but sharding has helped to keep each index relatively small and OOM have gone away. You can also attach jconsole onto your SOLR via the jmx and monitor the memory / cpu usage in a graphical interface. I have also run garbage collector manually through jconsole sometimes and it was of a help. Regards, Dmitry On Wed, Sep 14, 2011 at 9:10 AM, Rohit ro...@in-rev.com wrote: Thanks Jaeger. Actually I am storing twitter streaming data into the core, so the rate of index is about 12tweets(docs)/second. The same solr contains 3 other cores but these cores are not very heavy. Now the twitter core has become very large (77516851) and its taking a long time to query (Mostly facet queries based on date, string fields). After sometime about 18-20hr solr goes out of memory, the thread dump doesn't show anything. How can I improve this besides adding more ram into the system. Regards, Rohit Mobile: +91-9901768202 About Me: http://about.me/rohitg -Original Message- From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] Sent: 13 September 2011 21:06 To: solr-user@lucene.apache.org Subject: RE: Out of memory numDocs is not the number of documents in memory. It is the number of documents currently in the index (which is kept on disk). Same goes for maxDocs, except that it is a count of all of the documents that have ever been in the index since it was created or optimized (including deleted documents). Your subject indicates that something is giving you some kind of Out of memory error. We might better be able to help you if you provide more information about your exact problem. JRJ -Original Message- From: Rohit [mailto:ro...@in-rev.com] Sent: Tuesday, September 13, 2011 2:29 PM To: solr-user@lucene.apache.org Subject: Out of memory I have solr running on a machine with 18Gb Ram , with 4 cores. One of the core is very big containing 77516851 docs, the stats for searcher given below searcherName : Searcher@5a578998 main caching : true numDocs : 77516851 maxDoc : 77518729 lockFactory=org.apache.lucene.store.NativeFSLockFactory@5a9c5842 indexVersion : 1308817281798 openedAt : Tue Sep 13 18:59:52 GMT 2011 registeredAt : Tue Sep 13 19:00:55 GMT 2011 warmupTime : 63139 . Is there a way to reduce the number of docs loaded into memory for this core? . At any given time I dont need data more than past 15 days, unless someone queries for it explicetly. How can this be achieved? . Will it be better to go for Solr replication or distribution if there is little option left Regards, Rohit Mobile: +91-9901768202 About Me: http://about.me/rohitg http://about.me/rohitg -- Regards, Dmitry Kan
Re: solr 1.4 highlighting issue
The highlighter gives you snippets of text surrounding words (terms) drawn from the query. The whole document should satisfy the query (ie it probably has ships/s somewhere else in it), but each snippet won't generally have all the terms. -Mike On 9/14/2011 2:54 AM, Dmitry Kan wrote: Hello list, Not sure how many of you are still using solr 1.4 in production, but here is an issue with highlighting, that we've noticed: The query is: (drill AND ships) OR rigs Excerpt from the highlighting list: arr name=Contents str Within the fleet of 27 floatinglt;emrigslt;/em (semisubmersibles and drillships) are 21 deepwaterlt;emdrillinglt;/em /str /arr /lst Why did solr highlight drilling even though there is no ships in the text? * *-- Regards, Dmitry Kan
Re: solr 1.4 highlighting issue
(11/09/14 15:54), Dmitry Kan wrote: Hello list, Not sure how many of you are still using solr 1.4 in production, but here is an issue with highlighting, that we've noticed: The query is: (drill AND ships) OR rigs Excerpt from the highlighting list: arr name=Contents str Within the fleet of 27 floatinglt;emrigslt;/em (semisubmersibles and drillships) are 21 deepwaterlt;emdrillinglt;/em /str /arr /lst Why did solr highlight drilling even though there is no ships in the text? Dmitry, This is expected, even if you use the latest version of Solr. You got the document because rigs was hit in the document, but then Highlighter tries to search individual terms of the query in the document again. koji -- Check out Query Log Visualizer for Apache Solr http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html http://www.rondhuit.com/en/
RE: NRT and commit behavior
Erick, Here is the answer to your questions: Our index is 267 GB We are not optimizing... No we have not profiled yet to check the bottleneck, but logs indicate opening the searchers is taking time... Nothing except SOLR Total memory is 16GB tomcat has 8GB allocated Everything 64 bit OS and JVM and Tomcat -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Sunday, September 11, 2011 11:37 AM To: solr-user@lucene.apache.org Subject: Re: NRT and commit behavior Hmm, OK. You might want to look at the non-cached filter query stuff, it's quite recent. The point here is that it is a filter that is applied only after all of the less expensive filter queries are run, One of its uses is exactly ACL calculations. Rather than calculate the ACL for the entire doc set, it only calculates access for docs that have made it past all the other elements of the query See SOLR-2429 and note that it is a 3.4 (currently being released) only. As to why your commits are taking so long, I have no idea given that you really haven't given us much to work with. How big is your index? Are you optimizing? Have you profiled the application to see what the bottleneck is (I/O, CPU, etc?). What else is running on your machine? It's quite surprising that it takes that long. How much memory are you giving the JVM? etc... You might want to review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Fri, Sep 9, 2011 at 9:41 AM, Tirthankar Chatterjee tchatter...@commvault.com wrote: Erick, What you said is correct for us the searches are based on some Active Directory permissions which are populated in Filter query parameter. So we don't have any warming query concept as we cannot fire for every user ahead of time. What we do here is that when user logs in we do an invalid query(which return no results instead of '*') with the correct filter query (which is his permissions based on the login). This way the cache gets warmed up with valid docs. It works then. Also, can you please let me know why commit is taking 45 mins to 1 hours on a good resourced hardware with multiple processors and 16gb RAM 64 bit VM, etc. We tried passing waitSearcher as false and found that inside the code it hard coded to be true. Is there any specific reason. Can we change that value to honor what is being passed. Thanks, Tirthankar -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Thursday, September 01, 2011 8:38 AM To: solr-user@lucene.apache.org Subject: Re: NRT and commit behavior Hmm, I'm guessing a bit here, but using an invalid query doesn't sound very safe, but I suppose it *might* be OK. What does invalid mean? Syntax error? not safe. search that returns 0 results? I don't know, but I'd guess that filling your caches, which is the point of warming queries, might be short circuited if the query returns 0 results but I don't know for sure. But the fact that invalid queries return quicker does not inspire confidence since the *point* of warming queries is to spend the time up front so your users don't have to wait. So here's a test. Comment out your warming queries. Restart your server and fire the warming query from the browser withdebugQuery=on and look at the QTime parameter. Now fire the same form of the query (as in the same sort, facet, grouping, etc, but presumably a valid term). See the QTime. Now fire the same form of the query with a *different* value in the query. That is, it should search on different terms but with the same sort, facet, etc. to avoid getting your data straight from the queryResultCache. My guess is that the last query will return much more quickly than the second query. Which would indicate that the first form isn't doing you any good. But a test is worth a thousand opinions. Best Erick On Wed, Aug 31, 2011 at 11:04 AM, Tirthankar Chatterjee tchatter...@commvault.com wrote: Also noticed that waitSearcher parameter value is not honored inside commit. It is always defaulted to true which makes it slow during indexing. What we are trying to do is use an invalid query (which wont return any results) as a warming query. This way the commit returns faster. Are we doing something wrong here? Thanks, Tirthankar -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Monday, July 18, 2011 11:38 AM To: solr-user@lucene.apache.org; yo...@lucidimagination.com Subject: Re: NRT and commit behavior In practice, in my experience at least, a very 'expensive' commit can still slow down searches significantly, I think just due to CPU (or i/o?) starvation. Not sure anything can be done about that. That's my experience in Solr 1.4.1, but since searches have always been async with commits, it probably is the same situation even in more recent versions, I'd guess. On 7/18/2011 11:07 AM, Yonik Seeley wrote:
RE: NRT and commit behavior
Erick, Also, we had our solrconfig where we have tried increasing the cache making the below value for autowarm count as 0 helps returning the commit call within the second, but that will slow us down on searches filterCache class=solr.FastLRUCache size=16384 initialSize=4096 autowarmCount=4096/ !-- Cache used to hold field values that are quickly accessible by document id. The fieldValueCache is created by default even if not configured here. fieldValueCache class=solr.FastLRUCache size=512 autowarmCount=128 showItems=32 / -- !-- queryResultCache caches results of searches - ordered lists of document ids (DocList) based on a query, a sort, and the range of documents requested. -- queryResultCache class=solr.LRUCache size=16384 initialSize=4096 autowarmCount=4096/ !-- documentCache caches Lucene Document objects (the stored fields for each document). Since Lucene internal document ids are transient, this cache will not be autowarmed. -- documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=512/ -Original Message- From: Tirthankar Chatterjee [mailto:tchatter...@commvault.com] Sent: Wednesday, September 14, 2011 7:31 AM To: solr-user@lucene.apache.org Subject: RE: NRT and commit behavior Erick, Here is the answer to your questions: Our index is 267 GB We are not optimizing... No we have not profiled yet to check the bottleneck, but logs indicate opening the searchers is taking time... Nothing except SOLR Total memory is 16GB tomcat has 8GB allocated Everything 64 bit OS and JVM and Tomcat -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Sunday, September 11, 2011 11:37 AM To: solr-user@lucene.apache.org Subject: Re: NRT and commit behavior Hmm, OK. You might want to look at the non-cached filter query stuff, it's quite recent. The point here is that it is a filter that is applied only after all of the less expensive filter queries are run, One of its uses is exactly ACL calculations. Rather than calculate the ACL for the entire doc set, it only calculates access for docs that have made it past all the other elements of the query See SOLR-2429 and note that it is a 3.4 (currently being released) only. As to why your commits are taking so long, I have no idea given that you really haven't given us much to work with. How big is your index? Are you optimizing? Have you profiled the application to see what the bottleneck is (I/O, CPU, etc?). What else is running on your machine? It's quite surprising that it takes that long. How much memory are you giving the JVM? etc... You might want to review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Fri, Sep 9, 2011 at 9:41 AM, Tirthankar Chatterjee tchatter...@commvault.com wrote: Erick, What you said is correct for us the searches are based on some Active Directory permissions which are populated in Filter query parameter. So we don't have any warming query concept as we cannot fire for every user ahead of time. What we do here is that when user logs in we do an invalid query(which return no results instead of '*') with the correct filter query (which is his permissions based on the login). This way the cache gets warmed up with valid docs. It works then. Also, can you please let me know why commit is taking 45 mins to 1 hours on a good resourced hardware with multiple processors and 16gb RAM 64 bit VM, etc. We tried passing waitSearcher as false and found that inside the code it hard coded to be true. Is there any specific reason. Can we change that value to honor what is being passed. Thanks, Tirthankar -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Thursday, September 01, 2011 8:38 AM To: solr-user@lucene.apache.org Subject: Re: NRT and commit behavior Hmm, I'm guessing a bit here, but using an invalid query doesn't sound very safe, but I suppose it *might* be OK. What does invalid mean? Syntax error? not safe. search that returns 0 results? I don't know, but I'd guess that filling your caches, which is the point of warming queries, might be short circuited if the query returns 0 results but I don't know for sure. But the fact that invalid queries return quicker does not inspire confidence since the *point* of warming queries is to spend the time up front so your users don't have to wait. So here's a test. Comment out your warming queries. Restart your server and fire the warming query from the browser withdebugQuery=on and look at the QTime parameter. Now fire the same form of the query (as in the same sort, facet, grouping, etc, but presumably a valid term). See the QTime. Now fire the same form of the query with a *different* value in the query. That
Re: How to return a function result instead of doclist in the Solr collapsing/grouping feature?
Well, what is the average of latitude and longitude? If you're asking for the average of all the docs that match, or the average of all the docs in the corpus, no, I don't think you can unless you write a custom plugin. Something like this has been talked about, see: https://issues.apache.org/jira/browse/SOLR-1622 but I don't think any such thing has been implemented. Best Erick On Mon, Sep 12, 2011 at 5:37 PM, Pablo Ricco pri...@gmail.com wrote: I have the following solr fields in schema.xml: - id (string) - name (string) - category(string) - latitude (double) - longitude(double) Is it possible to make a query that groups by category and returns the average of latitude and longitude instead of the doclist? Thanks, Pablo
RE: Out of memory
Thanks Dmirty for the offer to help, I am using some caching in one of the cores not. Earlier I was using on other cores too, but now I have commented them out because of frequent OOM, also some warming up in one of the core. I have share the links for my config files for all the 4 cores, http://haklus.com/crssConfig.xml http://haklus.com/rssConfig.xml http://haklus.com/twitterConfig.xml http://haklus.com/facebookConfig.xml Thanks again Rohit -Original Message- From: Dmitry Kan [mailto:dmitry@gmail.com] Sent: 14 September 2011 10:23 To: solr-user@lucene.apache.org Subject: Re: Out of memory Hi, OK 64GB fits into one shard quite nicely in our setup. But I have never used multicore setup. In total you have 79,9 GB. We try to have 70-100GB per shard with caching on. Do you do warming up of your index on starting? Also, there was a setting of pre-populating the cache. It could also help, if you can show some parts of your solrconfig file. What is the solr version you use? Regards, Dmitry On Wed, Sep 14, 2011 at 11:38 AM, Rohit ro...@in-rev.com wrote: Hi Dimtry, To answer your questions, -Do you use caching? I do user caching, but will disable it and give it a go. -How big is your index in size on the disk? These are the size of the data folder for each of the cores. Core1 : 64GB Core2 : 6.1GB Core3 : 7.9GB Core4 : 1.9GB Will try attaching a jconsole to my solr as suggested to get a better picture. Regards, Rohit -Original Message- From: Dmitry Kan [mailto:dmitry@gmail.com] Sent: 14 September 2011 08:15 To: solr-user@lucene.apache.org Subject: Re: Out of memory Hi Rohit, Do you use caching? How big is your index in size on the disk? What is the stack trace contents? The OOM problems that we have seen so far were related to the index physical size and usage of caching. I don't think we have ever found the exact cause of these problems, but sharding has helped to keep each index relatively small and OOM have gone away. You can also attach jconsole onto your SOLR via the jmx and monitor the memory / cpu usage in a graphical interface. I have also run garbage collector manually through jconsole sometimes and it was of a help. Regards, Dmitry On Wed, Sep 14, 2011 at 9:10 AM, Rohit ro...@in-rev.com wrote: Thanks Jaeger. Actually I am storing twitter streaming data into the core, so the rate of index is about 12tweets(docs)/second. The same solr contains 3 other cores but these cores are not very heavy. Now the twitter core has become very large (77516851) and its taking a long time to query (Mostly facet queries based on date, string fields). After sometime about 18-20hr solr goes out of memory, the thread dump doesn't show anything. How can I improve this besides adding more ram into the system. Regards, Rohit Mobile: +91-9901768202 About Me: http://about.me/rohitg -Original Message- From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] Sent: 13 September 2011 21:06 To: solr-user@lucene.apache.org Subject: RE: Out of memory numDocs is not the number of documents in memory. It is the number of documents currently in the index (which is kept on disk). Same goes for maxDocs, except that it is a count of all of the documents that have ever been in the index since it was created or optimized (including deleted documents). Your subject indicates that something is giving you some kind of Out of memory error. We might better be able to help you if you provide more information about your exact problem. JRJ -Original Message- From: Rohit [mailto:ro...@in-rev.com] Sent: Tuesday, September 13, 2011 2:29 PM To: solr-user@lucene.apache.org Subject: Out of memory I have solr running on a machine with 18Gb Ram , with 4 cores. One of the core is very big containing 77516851 docs, the stats for searcher given below searcherName : Searcher@5a578998 main caching : true numDocs : 77516851 maxDoc : 77518729 lockFactory=org.apache.lucene.store.NativeFSLockFactory@5a9c5842 indexVersion : 1308817281798 openedAt : Tue Sep 13 18:59:52 GMT 2011 registeredAt : Tue Sep 13 19:00:55 GMT 2011 warmupTime : 63139 . Is there a way to reduce the number of docs loaded into memory for this core? . At any given time I dont need data more than past 15 days, unless someone queries for it explicetly. How can this be achieved? . Will it be better to go for Solr replication or distribution if there is little option left Regards, Rohit Mobile: +91-9901768202 About Me: http://about.me/rohitg http://about.me/rohitg -- Regards, Dmitry Kan
Re: indexing data from rich documents - Tika with solr3.1
FileListEntityProcessor pre-supposes it's looking at files on disk. it doesn't know anything about the web. So, as the stack trace indicates, it tries to open a directory called http://. and fails. What is it you're really trying to do here? Perhaps if you explain your higher-level problem we can provide some help. Best Erick On Mon, Sep 12, 2011 at 11:53 PM, scorpking lehoank1...@gmail.com wrote: Hi, Can you explain me this problem? I have indexed data from multi file which use tika libs. And i have indexed data from http. But only one file (ex: http://myweb/filename.pdf). Now i have many file formats in a http path (ex:http://myweb/files/). I tried index data from a http path but it's not work. It is my data-config. *dataConfig dataSource type=BinURLDataSource name=bin encoding=utf-8/ document entity name=sd processor=FileListEntityProcessor fileName=.*\.(DOC)|(PDF)|(pdf)|(doc) baseDir=http://www.lc.unsw.edu.au/onlib/pdf/; recursive=true rootEntity=false transformer=DateFormatTransformer entity name=tika-test processor=TikaEntityProcessor url=${sd.fileAbsolutePath} format=text dataSource=bin field column=Author name=author meta=true/ field column=title name=title meta=true/ field column=text name=text/ /entity field column=file name=filename/ /entity /document /dataConfig* Error: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: 'baseDir' value: http://www.lc.unsw.edu.au/onlib/pdf/ is not a directory Processing Document # 1 at org.apache.solr.handler.dataimport.FileListEntityProcessor.init(FileListEntityProcessor.java:124) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:69) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:552) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:353) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:411) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:392) Thanks for your help. -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-data-from-rich-documents-Tika-with-solr3-1-tp3322555p3331651.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to plug a new ANTLR grammar
Also, a question for Peter, at which stage do you use lucene analyzers on the query? After it was parsed into the tree, or before we start processing the query string? I do the analysis before creating the tree. I'm pretty sure Lucene QueryParser does this, too. Peter On Wed, Sep 14, 2011 at 5:15 AM, Roman Chyla roman.ch...@gmail.com wrote: Hi Peter, Yes, with the tree it is pretty straightforward. I'd prefer to do it that way, but what is the purpose of the new qParser then? Is it just that the qParser was built with a different paradigms in mind where the parse tree was not in the equation? Anybody knows if there is any advantage? I looked bit more into the contrib org.apache.lucene.queryParser.standard.StandardQueryParser.java org.apache.lucene.queryParser.standard.QueryParserWrapper.java And some things there (like setting default fuzzy value) are in my case set directly in the grammar. So the query builder is still somehow involved in parsing (IMHO not good). But if someone knows some reasons to keep using the qParser, please let me know. Also, a question for Peter, at which stage do you use lucene analyzers on the query? After it was parsed into the tree, or before we start processing the query string? Thanks! Roman On Tue, Sep 13, 2011 at 10:14 PM, Peter Keegan peterlkee...@gmail.com wrote: Roman, I'm not familiar with the contrib, but you can write your own Java code to create Query objects from the tree produced by your lexer and parser something like this: StandardLuceneGrammarLexer lexer = new ANTLRReaderStream(new StringReader(queryString)); CommonTokenStream tokens = new CommonTokenStream(lexer); StandardLuceneGrammarParser parser = new StandardLuceneGrammarParser(tokens); StandardLuceneGrammarParser.query_return ret = parser.mainQ(); CommonTree t = (CommonTree) ret.getTree(); parseTree(t); parseTree (Tree t) { // recursively parse the Tree, visit each node visit (node); } visit (Tree node) { switch (node.getType()) { case (StandardLuceneGrammarParser.AND: // Create BooleanQuery, push onto stack ... } } I use the stack to build up the final Query from the queries produced in the tree parsing. Hope this helps. Peter On Tue, Sep 13, 2011 at 3:16 PM, Jason Toy jason...@gmail.com wrote: I'd love to see the progress on this. On Tue, Sep 13, 2011 at 10:34 AM, Roman Chyla roman.ch...@gmail.com wrote: Hi, The standard lucene/solr parsing is nice but not really flexible. I saw questions and discussion about ANTLR, but unfortunately never a working grammar, so... maybe you find this useful: https://github.com/romanchyla/montysolr/tree/master/src/java/org/apache/lucene/queryParser/iqp/antlr In the grammar, the parsing is completely abstracted from the Lucene objects, and the parser is not mixed with Java code. At first it produces structures like this: https://svnweb.cern.ch/trac/rcarepo/raw-attachment/wiki/MontySolrQueryParser/index.html But now I have a problem. I don't know if I should use query parsing framework in contrib. It seems that the qParser in contrib can use different parser generators (the default JavaCC, but also ANTLR). But I am confused and I don't understand this new queryParser from contrib. It is really very confusing to me. Is there any benefit in trying to plug the ANTLR tree into it? Because looking at the AST pictures, it seems that with a relatively simple tree walker we could build the same queries as the current standard lucene query parser. And it would be much simpler and flexible. Does it bring something new? I have a feeling I miss something... Many thanks for help, Roman -- - sent from my mobile 6176064373
Re: where is the SOLR_HOME ?
Hi Ahmad, While Solr is starting it writes the path to SOLR_HOME to the log. The message looks something like: Sep 14, 2011 9:14:53 AM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to 'solr/' If you're running the example, SOLR_HOME is usually apache-solr-3.3.0/example/solr Solr also writes a line like the following in the log for every JAR file it loads: Sep 14, 2011 9:14:53 AM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/home/jgrande/apache-solr-3.3.0/contrib/extraction/lib/pdfbox-1.3.1.jar' to classloader With this information you should be able to determine which JAR files Solr is loading and I'm pretty sure that it's loading all the files you need. The problem may be that you must also include apache-solr-analysis-extras-3.3.0.jar from the apache-solr-3.3.0/dist directory. Regards, *Juan* On Wed, Sep 14, 2011 at 12:19 AM, ahmad ajiloo ahmad.aji...@gmail.comwrote: Hi In this page ( http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTokenizerFactory http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTokenizerFactory ) http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTokenizerFactory said: Note: to use this filter, see solr/contrib/analysis-extras/README.txt for instructions on which jars you need to add to your SOLR_HOME/lib I can't find SOLR_HOME/lib ! 1- Is there: apache-solr-3.3.0\example\solr ? there is no directory which name is lib I created example/solr/lib directory and copied jar files to it and tested this expressions in solrconfig.xml : lib dir=../../example/solr/lib / lib dir=./lib / lib dir=../../../example/solr/lib / (for more assurance!!!) but it doesn't work and still has following errors ! 2- or: apache-solr-3.3.0\ ? there is no directory which name is lib 3- or : apache-solr-3.3.0\example ? there is a lib directory. I copied 4 libraries exist in solr/contrib/analysis-extras/ to apache-solr-3.3.0\example\lib but some errors exist in loading page http://localhost:8983/solr/admin; : I use Nutch to crawling the web and fetching web pages. I send data of Nutch to Solr for Indexing. according to Nutch tutorial ( http://wiki.apache.org/nutch/NutchTutorial#A6._Integrate_Solr_with_Nutch) I should copy schema.xml of Nutch to conf directory of Solr. So I added all of my required Analyzer like *ICUNormalizer2FilterFactory *to this new shema.xml this is schema.xml : -I added bold text to this file ?xml version=1.0 encoding=UTF-8 ? schema name=nutch version=1.3 types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=long class=solr.TrieLongField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=float class=solr.TrieFloatField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=date class=solr.TrieDateField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType *fieldType name=text_icu class=solr.TextField autoGeneratePhraseQueries=false analyzer tokenizer class=solr.ICUTokenizerFactory/ /analyzer /fieldType fieldType name=icu_sort_en class=solr.TextField analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.ICUCollationKeyFilterFactory locale=en strength=primary/ /analyzer /fieldType fieldType name=normalized class=solr.TextField analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.ICUNormalizer2FilterFactory name=nfkc_cf mode=compose/ /analyzer /fieldType fieldType name=folded class=solr.TextField analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.ICUFoldingFilterFactory/ /analyzer /fieldType fieldType name=transformed class=solr.TextField analyzer
RE: EofException with Solr in Jetty
Looking at the source for Jetty, line 149 in Jetty's HttpOutput java file looks like this: if (_closed) throw new IOException(Closed); [http://www.jarvana.com/jarvana/view/org/eclipse/jetty/aggregate/jetty-all/7.1.0.RC0/jetty-all-7.1.0.RC0-sources.jar!/org/eclipse/jetty/server/HttpOutput.java?format=ok -- which may or may not match exactly, but I doubt that this code changes all that often.] I would read this as Jetty thinking that this HTTP connection is closed. It this perhaps a case of your HTTP client disconnecting (or crashing) before Jetty can get the entire message (HTTP response) sent? (The other alternative that occurs to me would be that Solr told Jetty the response was all done, but then turned around and tried to send more in the response). -Original Message- From: Michael Szalay [mailto:michael.sza...@basis06.ch] Sent: Wednesday, September 14, 2011 1:47 AM To: solr-user@lucene.apache.org; JETTY user mailing list Subject: EofException with Solr in Jetty Hi all sometimes we have this error in our system. We are running Solr 3.1.0 running on Jetty 7.2.2 Anyone an idea how to tune this? 14:41:05,693 | ERROR | qtp283504850-36 | SolrDispatchFilter | apache.solr.common.SolrException 151 | 154 - mvn_ch.basis06.eld.indexer_ch.basis06.eld.indexer.solrserver_0.1-SNAPSHOT_war - 0 | org.eclipse.jetty.io.EofException at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:149) at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:96) at org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:184) at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:89) at org.apache.solr.response.BinaryResponseWriter.write(BinaryResponseWriter.java:46) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:336) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1322) at org.ops4j.pax.web.service.internal.WelcomeFilesFilter.doFilter(WelcomeFilesFilter.java:169) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1322) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:473) at org.ops4j.pax.web.service.jetty.internal.HttpServiceServletHandler.doHandle(HttpServiceServletHandler.java:70) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:516) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:929) at org.ops4j.pax.web.service.jetty.internal.HttpServiceContext.doHandle(HttpServiceContext.java:116) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:403) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:184) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:864) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.ops4j.pax.web.service.jetty.internal.JettyServerHandlerCollection.handle(JettyServerHandlerCollection.java:72) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:114) at org.eclipse.jetty.server.Server.handle(Server.java:352) at org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:596) at org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:1051) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:590) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:212) at org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:426) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:508) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.access$000(SelectChannelEndPoint.java:34) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:40) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:451) at java.lang.Thread.run(Thread.java:662) -- Michael Szalay Senior Software Engineer basis06 AG, Birkenweg 61, CH-3013 Bern - Fon +41 31 311 32 22 http://www.basis06.ch - source of smart business
BigDecimal data type
Hi, Is there a way to use BigDecimal as a data type in solr? I am using solr 3.3. Thanks.
Re: EofException with Solr in Jetty
We are using SolrJ 3.1 as our http client... So it may be a bug in there? Regards Michael -- Michael Szalay Senior Software Engineer basis06 AG, Birkenweg 61, CH-3013 Bern - Fon +41 31 311 32 22 http://www.basis06.ch - source of smart business - Ursprüngliche Mail - Von: Jay Jaeger - DOT jay.jae...@dot.wi.gov An: solr-user@lucene.apache.org, JETTY user mailing list jetty-us...@eclipse.org Gesendet: Mittwoch, 14. September 2011 15:21:19 Betreff: RE: EofException with Solr in Jetty Looking at the source for Jetty, line 149 in Jetty's HttpOutput java file looks like this: if (_closed) throw new IOException(Closed); [http://www.jarvana.com/jarvana/view/org/eclipse/jetty/aggregate/jetty-all/7.1.0.RC0/jetty-all-7.1.0.RC0-sources.jar!/org/eclipse/jetty/server/HttpOutput.java?format=ok -- which may or may not match exactly, but I doubt that this code changes all that often.] I would read this as Jetty thinking that this HTTP connection is closed. It this perhaps a case of your HTTP client disconnecting (or crashing) before Jetty can get the entire message (HTTP response) sent? (The other alternative that occurs to me would be that Solr told Jetty the response was all done, but then turned around and tried to send more in the response). -Original Message- From: Michael Szalay [mailto:michael.sza...@basis06.ch] Sent: Wednesday, September 14, 2011 1:47 AM To: solr-user@lucene.apache.org; JETTY user mailing list Subject: EofException with Solr in Jetty Hi all sometimes we have this error in our system. We are running Solr 3.1.0 running on Jetty 7.2.2 Anyone an idea how to tune this? 14:41:05,693 | ERROR | qtp283504850-36 | SolrDispatchFilter | apache.solr.common.SolrException 151 | 154 - mvn_ch.basis06.eld.indexer_ch.basis06.eld.indexer.solrserver_0.1-SNAPSHOT_war - 0 | org.eclipse.jetty.io.EofException at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:149) at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:96) at org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:184) at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:89) at org.apache.solr.response.BinaryResponseWriter.write(BinaryResponseWriter.java:46) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:336) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1322) at org.ops4j.pax.web.service.internal.WelcomeFilesFilter.doFilter(WelcomeFilesFilter.java:169) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1322) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:473) at org.ops4j.pax.web.service.jetty.internal.HttpServiceServletHandler.doHandle(HttpServiceServletHandler.java:70) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:516) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:929) at org.ops4j.pax.web.service.jetty.internal.HttpServiceContext.doHandle(HttpServiceContext.java:116) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:403) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:184) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:864) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.ops4j.pax.web.service.jetty.internal.JettyServerHandlerCollection.handle(JettyServerHandlerCollection.java:72) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:114) at org.eclipse.jetty.server.Server.handle(Server.java:352) at org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:596) at org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:1051) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:590) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:212) at org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:426) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:508) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.access$000(SelectChannelEndPoint.java:34) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:40) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:451) at java.lang.Thread.run(Thread.java:662) -- Michael Szalay Senior Software Engineer basis06 AG, Birkenweg 61, CH-3013 Bern - Fon +41 31 311 32 22 http://www.basis06.ch - source of smart business
Re: Managing solr machines (start/stop/status)
On 9/13/2011 6:05 PM, Jamie Johnson wrote: I know this isn't a solr specific question but I was wondering what folks do in regards to managing the machines in their solr cluster? Are there any recommendations for how to start/stop/manage these machines? Any suggestions would be appreciated. What do you mean by manage? For stopping and starting, I built my own redhat-friendly init script to handle jetty. It uses a file in /etc/sysconfig for commandline options. You can see my init script here: http://pastebin.com/GweJVGk5 Here's what I have in /etc/sysconfig/solr: STARTARGS=-Xms3072M -Xmx3072M -XX:NewSize=2048M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -Dsolr.solr.home=/index/solr -Dsolr.clustering.enabled=true -DSTOP.PORT=8079 -DSTOP.KEY=somePassword STOPARGS=-DSTOP.PORT=8079 -DSTOP.KEY=somePassword I'm running CentOS5, but I ran into a problem with the fuser command that I use in the init script. I filed a bug with CentOS, but since the bug comes from upstream, they were not able to fix it. You may need to install a new psmisc package to use the init script: http://bugs.centos.org/view.php?id=4260 The script works fine on CentOS 6. Thanks, Shawn
math with date and modulo
Hello. i am fighting with the FunctionQuery of Solr. I try to get a diff of today and an dateField. from this diff, i want do a modulo from another field with values of 1,3,6,12 in a function somthing like this. ( i know that some functions are not available in solr) q={!func}$v2=0v1=(NOW - $var)v2=modulo($v1,interval) OR (DIFF(Month of Today - Month of Search) MOD interval) = 0 can anybody give me some tipps ? - --- System One Server, 12 GB RAM, 2 Solr Instances, 8 Cores, 1 Core with 45 Million Documents other Cores 200.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/math-with-date-and-modulo-tp3335800p3335800.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: index not created
changed the configuration to point it to my solr dir and started it again You might look in your logs to see where Solr thinks the Solr home directory is and/or if it complains about not being able to find it. As a guess, it can't find it, perhaps because solr.solr.home does not point to the right place. As a result, the servlet can't actually find the Solr code, and isn't really indexing anything at all. For my tomcat install test, I put the following in startup.bat (Windows, but the Linux startup script startup.sh would be similar). set JAVA_OPTS=-Dsolr.solr.home=C:/pro/apache-solr-3.3.0/example/solr (my JAVA_OPTS has a bunch of other stuff for security, waffle, etc., but this is the one that would matter in your case). JRJ -Original Message- From: kumar8anuj [mailto:kumar.an...@gmail.com] Sent: Wednesday, September 14, 2011 4:21 AM To: solr-user@lucene.apache.org Subject: Re: index not created Hi Erick, I have not done anything different. I downloaded the solr tar from one of the mirror and then extracted it in the home directory started jetty and it works fine. For tomcat I copied the war file in my webapps folder and restarted tomcat changed the configuration to point it to my solr dir and started it again. Same setup everything is same. Even this time i have tried it with the example solr folder without multicore setup and in solrconfig.xml all the lib paths are same which were for jetty. But still nothing is getting indexed it shows that 1 document is there but text field doesn't show anything in it and nothing comes when i search for something from the document. Am i doing something wrong ? Please let me know. I have to implement it ASAP. Please help me or if you can give me document to implement the same in tomcat then i would try that way Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/index-not-created-tp3300744p3335291.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Schema fieldType y-m-d ?!?!
Just add a bogus 0 timestamp after it when you index it. That is what we did. Dates are not stored or indexed as characters, anyway, so space would not be any different one way or the other. JRJ -Original Message- From: stockii [mailto:stock.jo...@googlemail.com] Sent: Wednesday, September 14, 2011 4:56 AM To: solr-user@lucene.apache.org Subject: Schema fieldType y-m-d ?!?! is it possible to index a datefield in the format of y-m-d ? i dont need the timestamp. so i can save me some space. which ways exists to search with a complex date-filter !? - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/Schema-fieldType-y-m-d-tp3335359p3335359.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Shouldn't ReversedWildcardFilterFactory resolve leadingWildcard?
auto* is not a leading wildcard query, a leading wildcard query would be *car. Wildcard queries in general will take more time than regular queries, the more close the wildcard is to the first character, the more expensive the query is. With a regular field type, Solr will allow wildcards (not with dismax) like auto*, but not leading wildcard queries like *car. The ReverseWilcardFilter is there for allowing leading wildcards on searches. It only needs to be added at index time (not at query time). When using this field type, all the terms at index time will be reversed like you showed on your example, adding an *impossible character* at the beginning of the term to prevent it to match regular terms. 'autocar' will be indexed as 'autocar' and '#1;racotua' (see '#1;', that's the impossible character). When you search for 'auto*', Solr will resolve the query as always but if you search for '*car', the query parser (not any analysis filter, that's why you don't need to add the filter on query-time) will invert that term and add the 'impossible character' at the beginning, like '#1;rac*'. That's why '#1;racotuar' should match the query. From your configuration, if you remove the filter at query-time it should work. Regards, Tomás On Wed, Sep 14, 2011 at 6:26 AM, crisfromnova crisfromn...@gmail.comwrote: I found a partial solution. Using ReverseStringFilterFactory instead ReverseWildcardFilterFactory and searching after rac* will find autocar for example. -- View this message in context: http://lucene.472066.n3.nabble.com/Shouldn-t-ReversedWildcardFilterFactory-resolve-leadingWildcard-tp3335240p3335307.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH delta last_index_time
Hi Maria/Gora, I see this as more of a problem with the timezones in which the Solr server and the database server are located. Is this true ? If yes, one more possibility of handling this scenario would be to customize DataImportHandler code as follows 1. Add one more configuration property named dbTimeZone at the entity level in data-config.xml file 2. While saving the lastIndexTime in the properties file, save it according to the timezone specified in the config so that it is in sync with the database server time. Basically customize the code so that all the time related updates to the dataimport.properties file should be timezone specific. On Wed, Sep 14, 2011 at 4:31 AM, Gora Mohanty g...@mimirtech.com wrote: On Wed, Sep 14, 2011 at 11:23 AM, Maria Vazquez maria.vazq...@dexone.com wrote: Hi, How do you handle the situation where the time on the server running Solr doesn¹t match the time in the database? Firstly, why is that the case? NTP is pretty universal these days. I¹m using the last_index_time saved by Solr in the delta query checking it against lastModifiedDate field in the database but the times are not in sync so I might lose some changes. Can we use something else other than last_index_time? Maybe something like last_pk or something. One possible way is to edit dataimport.properties, manually or through a script, to put the last_index_time back to a safe value. Regards, Gora -- Thanks and Regards Rahul A. Warawdekar
query - part default OR and part default AND
Hi All, I have two fields in my schema: field1, field2 , for the sake of the example I'll define to phrases: phrase1 - solr is the best fts ever phrase2 - let us all contribute to open source for a better world now I want to perform the next query: field1:( phrase1) AND field2:(phrase2) my default operator is AND, but I want to search within field1 with AND operator between the tokens and within field2 with OR operator. what i already tried is to split phrase1 by whitespaces, changing the default search operator to OR in the schema and add + signs before each word: field1:(+solr +is +the +best +fts +ever) AND field2:(let us all contribute to open source for a better world) this query is not good.. because when I am splitting phrase1 it is not how the index time tokenizer splits it... so I am not getting the results I would like to... any idea any one? Thanks, Omri
NewSolrCloudDesign question
Hi, I am very excited to see this direction for Solr. I realize its early still, but is there any thought as to what the target release date might be (this year? next?). Also, will the new solr cloud support all query types including all forms of faceting, distributed IDF, ranging, sorting, paging etc? Thanks! Darren
Performance troubles with solr
Hi, i'm having performance troubles with solr. I don't know if i'm expection too much from solr or i missconfigured solr. When i run a single query its QTime is 500-1000~ ms (without any use of caches). When i run my test script (with use of caches) QTime increases exponentially, reaching 8000~ to 6~ ms. And Cpu usage also increases to %550~ My solr-start script: java -Duser.timezone=EET -Xmx6000m -jar ./start.jar 2,000,000~ documents , currently there aren't any commits but in future there will be 5,000~ updates/additions to documents every 3-5~ min via delta import. Search Query sort=userscore+desc start=0 q=photo_id:* AND gender:true AND country:MALAWI AND online:false fq=birth:[NOW-31YEARS/DAY TO NOW-17YEARS/DAY] ( Random age ranges ) fq=lastlogin:[* TO NOW-6MONTHS/DAY] ( Only 2 options, [* TO NOW-6MONTHS/DAY] or [NOW-6MONTHS/DAY TO *] ) fq=userscore:[500 TO *] ( Only 2 options, [500 TO *] or [* TO 500] ) rows=150 Schema field name=id type=long indexed=true stored=true required=true/ field name=username type=string indexed=true stored=false required=true/ field name=namesurname type=string indexed=true stored=false/ field name=network type=int indexed=true stored=false/ field name=photo_id type=int indexed=true stored=false/ field name=gender type=boolean indexed=true stored=false/ field name=country type=string indexed=true stored=false/ field name=birth type=tdate indexed=true stored=false/ field name=lastlogin type=tdate indexed=true stored=false/ field name=online type=boolean indexed=true stored=false/ field name=userscore type=int indexed=true stored=false/ Cache Sizes Lazy Load filterCache class=solr.FastLRUCache size=16384 initialSize=4096 autowarmCount=4096/ queryResultCache class=solr.LRUCache size=16384 initialSize=4096 autowarmCount=4096/ documentCache class=solr.LRUCache size=16384 initialSize=4096 autowarmCount=4096/ enableLazyFieldLoadingtrue/enableLazyFieldLoading
Re: Running solr on small amounts of RAM
Just wanted to follow up and say thanks for all the valuable replies. I'm in the process of testing everything. Thanks, Mike On Mon, Sep 12, 2011 at 1:20 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: Beyond the suggestions already made, i would add: a) being really aggressive about stop words can help keep the index size down, which can help reduce the amount of memory needed to scan the term lists b) faceting w/o any caching is likelye going to be too slow to be acceptible. c) don't sort on anything except score. -Hoss
RE: EofException with Solr in Jetty
I have not used SolrJ, but it probably is worth considering as a possible suspect. Also, do you have anything in between the client and the Solr server (a firewall, load balancer, etc.?) that might play games with HTTP connections? You might want to start up a network trace on the server or network to see if you can catch one to see what is going on. I looked at our Solr 3.1 prototype log (which has been running continuously without interruption since July 10!), and did not see any of these errors. We do not use SolrJ -- we use a combination of plain old HTTP/javascript/xslt and requests coming from another system as a (plain old XML) web service to get to Solr. However, that is under Jetty 6. JRJ -Original Message- From: Michael Szalay [mailto:michael.sza...@basis06.ch] Sent: Wednesday, September 14, 2011 8:27 AM To: solr-user@lucene.apache.org Subject: Re: EofException with Solr in Jetty We are using SolrJ 3.1 as our http client... So it may be a bug in there? Regards Michael -- Michael Szalay Senior Software Engineer basis06 AG, Birkenweg 61, CH-3013 Bern - Fon +41 31 311 32 22 http://www.basis06.ch - source of smart business - Ursprüngliche Mail - Von: Jay Jaeger - DOT jay.jae...@dot.wi.gov An: solr-user@lucene.apache.org, JETTY user mailing list jetty-us...@eclipse.org Gesendet: Mittwoch, 14. September 2011 15:21:19 Betreff: RE: EofException with Solr in Jetty Looking at the source for Jetty, line 149 in Jetty's HttpOutput java file looks like this: if (_closed) throw new IOException(Closed); [http://www.jarvana.com/jarvana/view/org/eclipse/jetty/aggregate/jetty-all/7.1.0.RC0/jetty-all-7.1.0.RC0-sources.jar!/org/eclipse/jetty/server/HttpOutput.java?format=ok -- which may or may not match exactly, but I doubt that this code changes all that often.] I would read this as Jetty thinking that this HTTP connection is closed. It this perhaps a case of your HTTP client disconnecting (or crashing) before Jetty can get the entire message (HTTP response) sent? (The other alternative that occurs to me would be that Solr told Jetty the response was all done, but then turned around and tried to send more in the response). -Original Message- From: Michael Szalay [mailto:michael.sza...@basis06.ch] Sent: Wednesday, September 14, 2011 1:47 AM To: solr-user@lucene.apache.org; JETTY user mailing list Subject: EofException with Solr in Jetty Hi all sometimes we have this error in our system. We are running Solr 3.1.0 running on Jetty 7.2.2 Anyone an idea how to tune this? 14:41:05,693 | ERROR | qtp283504850-36 | SolrDispatchFilter | apache.solr.common.SolrException 151 | 154 - mvn_ch.basis06.eld.indexer_ch.basis06.eld.indexer.solrserver_0.1-SNAPSHOT_war - 0 | org.eclipse.jetty.io.EofException at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:149) at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:96) at org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:184) at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:89) at org.apache.solr.response.BinaryResponseWriter.write(BinaryResponseWriter.java:46) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:336) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1322) at org.ops4j.pax.web.service.internal.WelcomeFilesFilter.doFilter(WelcomeFilesFilter.java:169) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1322) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:473) at org.ops4j.pax.web.service.jetty.internal.HttpServiceServletHandler.doHandle(HttpServiceServletHandler.java:70) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:516) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:929) at org.ops4j.pax.web.service.jetty.internal.HttpServiceContext.doHandle(HttpServiceContext.java:116) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:403) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:184) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:864) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.ops4j.pax.web.service.jetty.internal.JettyServerHandlerCollection.handle(JettyServerHandlerCollection.java:72) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:114) at org.eclipse.jetty.server.Server.handle(Server.java:352) at org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:596) at
Re: NewSolrCloudDesign question
On Wed, Sep 14, 2011 at 10:17 AM, dar...@ontrenet.com wrote: Hi, I am very excited to see this direction for Solr. I realize its early still, but is there any thought as to what the target release date might be (this year? next?). We've started to work on the new functionallity now, but an official release would be whenever Lucene/Solr 4.0 is released ;-) Also, will the new solr cloud support all query types including all forms of faceting, distributed IDF, ranging, sorting, paging etc? Yes, it will build off the current distributed search. We still need to implement distributed IDF, but that shouldn't be too hard. -Yonik http://www.lucene-eurocon.com - The Lucene/Solr User Conference
Re: NewSolrCloudDesign question
Thank you. Should be awesome when its ready! On Wed, 14 Sep 2011 10:25:26 -0400, Yonik Seeley yo...@lucidimagination.com wrote: On Wed, Sep 14, 2011 at 10:17 AM, dar...@ontrenet.com wrote: Hi, I am very excited to see this direction for Solr. I realize its early still, but is there any thought as to what the target release date might be (this year? next?). We've started to work on the new functionallity now, but an official release would be whenever Lucene/Solr 4.0 is released ;-) Also, will the new solr cloud support all query types including all forms of faceting, distributed IDF, ranging, sorting, paging etc? Yes, it will build off the current distributed search. We still need to implement distributed IDF, but that shouldn't be too hard. -Yonik http://www.lucene-eurocon.com - The Lucene/Solr User Conference
RE: Performance troubles with solr
I think folks are going to need a *lot* more information. Particularly 1. Just what does your test script do? Is it doing updates, or just queries of the sort you mentioned below? 2. If the test script is doing updates, how are those updates being fed to Solr? 3. What version of Solr are you running? 4. Why did you increase the default for jetty (around 384m) to 6000m, particularly given your relatively modest number of documents (2,000,000). 5. Machine characteristics, particularly operating system and physical memory on the machine. Please refer to http://wiki.apache.org/solr/UsingMailingLists for additional guidance in using the mailing list to get help. -Original Message- From: Yusuf Karakaya [mailto:karakaya...@gmail.com] Sent: Wednesday, September 14, 2011 9:19 AM To: solr-user@lucene.apache.org Subject: Performance troubles with solr Hi, i'm having performance troubles with solr. I don't know if i'm expection too much from solr or i missconfigured solr. When i run a single query its QTime is 500-1000~ ms (without any use of caches). When i run my test script (with use of caches) QTime increases exponentially, reaching 8000~ to 6~ ms. And Cpu usage also increases to %550~ My solr-start script: java -Duser.timezone=EET -Xmx6000m -jar ./start.jar 2,000,000~ documents , currently there aren't any commits but in future there will be 5,000~ updates/additions to documents every 3-5~ min via delta import. Search Query sort=userscore+desc start=0 q=photo_id:* AND gender:true AND country:MALAWI AND online:false fq=birth:[NOW-31YEARS/DAY TO NOW-17YEARS/DAY] ( Random age ranges ) fq=lastlogin:[* TO NOW-6MONTHS/DAY] ( Only 2 options, [* TO NOW-6MONTHS/DAY] or [NOW-6MONTHS/DAY TO *] ) fq=userscore:[500 TO *] ( Only 2 options, [500 TO *] or [* TO 500] ) rows=150 Schema field name=id type=long indexed=true stored=true required=true/ field name=username type=string indexed=true stored=false required=true/ field name=namesurname type=string indexed=true stored=false/ field name=network type=int indexed=true stored=false/ field name=photo_id type=int indexed=true stored=false/ field name=gender type=boolean indexed=true stored=false/ field name=country type=string indexed=true stored=false/ field name=birth type=tdate indexed=true stored=false/ field name=lastlogin type=tdate indexed=true stored=false/ field name=online type=boolean indexed=true stored=false/ field name=userscore type=int indexed=true stored=false/ Cache Sizes Lazy Load filterCache class=solr.FastLRUCache size=16384 initialSize=4096 autowarmCount=4096/ queryResultCache class=solr.LRUCache size=16384 initialSize=4096 autowarmCount=4096/ documentCache class=solr.LRUCache size=16384 initialSize=4096 autowarmCount=4096/ enableLazyFieldLoadingtrue/enableLazyFieldLoading
Re: EofException with Solr in Jetty
There is nothing between the client app and the solr server, its on the same machine and on the same app server, only going through the loopback interface. Unfortunatly, I cannot reproduce it, but I see it in the server log. Thanks Michael -- Michael Szalay Senior Software Engineer basis06 AG, Birkenweg 61, CH-3013 Bern - Fon +41 31 311 32 22 http://www.basis06.ch - source of smart business - Ursprüngliche Mail - Von: Jay Jaeger - DOT jay.jae...@dot.wi.gov An: solr-user@lucene.apache.org Gesendet: Mittwoch, 14. September 2011 16:23:45 Betreff: RE: EofException with Solr in Jetty I have not used SolrJ, but it probably is worth considering as a possible suspect. Also, do you have anything in between the client and the Solr server (a firewall, load balancer, etc.?) that might play games with HTTP connections? You might want to start up a network trace on the server or network to see if you can catch one to see what is going on. I looked at our Solr 3.1 prototype log (which has been running continuously without interruption since July 10!), and did not see any of these errors. We do not use SolrJ -- we use a combination of plain old HTTP/javascript/xslt and requests coming from another system as a (plain old XML) web service to get to Solr. However, that is under Jetty 6. JRJ -Original Message- From: Michael Szalay [mailto:michael.sza...@basis06.ch] Sent: Wednesday, September 14, 2011 8:27 AM To: solr-user@lucene.apache.org Subject: Re: EofException with Solr in Jetty We are using SolrJ 3.1 as our http client... So it may be a bug in there? Regards Michael -- Michael Szalay Senior Software Engineer basis06 AG, Birkenweg 61, CH-3013 Bern - Fon +41 31 311 32 22 http://www.basis06.ch - source of smart business - Ursprüngliche Mail - Von: Jay Jaeger - DOT jay.jae...@dot.wi.gov An: solr-user@lucene.apache.org, JETTY user mailing list jetty-us...@eclipse.org Gesendet: Mittwoch, 14. September 2011 15:21:19 Betreff: RE: EofException with Solr in Jetty Looking at the source for Jetty, line 149 in Jetty's HttpOutput java file looks like this: if (_closed) throw new IOException(Closed); [http://www.jarvana.com/jarvana/view/org/eclipse/jetty/aggregate/jetty-all/7.1.0.RC0/jetty-all-7.1.0.RC0-sources.jar!/org/eclipse/jetty/server/HttpOutput.java?format=ok -- which may or may not match exactly, but I doubt that this code changes all that often.] I would read this as Jetty thinking that this HTTP connection is closed. It this perhaps a case of your HTTP client disconnecting (or crashing) before Jetty can get the entire message (HTTP response) sent? (The other alternative that occurs to me would be that Solr told Jetty the response was all done, but then turned around and tried to send more in the response). -Original Message- From: Michael Szalay [mailto:michael.sza...@basis06.ch] Sent: Wednesday, September 14, 2011 1:47 AM To: solr-user@lucene.apache.org; JETTY user mailing list Subject: EofException with Solr in Jetty Hi all sometimes we have this error in our system. We are running Solr 3.1.0 running on Jetty 7.2.2 Anyone an idea how to tune this? 14:41:05,693 | ERROR | qtp283504850-36 | SolrDispatchFilter | apache.solr.common.SolrException 151 | 154 - mvn_ch.basis06.eld.indexer_ch.basis06.eld.indexer.solrserver_0.1-SNAPSHOT_war - 0 | org.eclipse.jetty.io.EofException at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:149) at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:96) at org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:184) at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:89) at org.apache.solr.response.BinaryResponseWriter.write(BinaryResponseWriter.java:46) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:336) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1322) at org.ops4j.pax.web.service.internal.WelcomeFilesFilter.doFilter(WelcomeFilesFilter.java:169) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1322) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:473) at org.ops4j.pax.web.service.jetty.internal.HttpServiceServletHandler.doHandle(HttpServiceServletHandler.java:70) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:516) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:929) at org.ops4j.pax.web.service.jetty.internal.HttpServiceContext.doHandle(HttpServiceContext.java:116) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:403) at
glassfish, solrconfig.xml and SolrException: Error loading DataImportHandler
Hi all, I am trying set up solr 3.3 with multicores in glassfish 3.1.1 and eclipse indigo. I have the following error: SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler' However, I have a line lib dir=../../dist/ regex=apache-solr-dataimporthandler-\d.*\.jar / in solrconfig.xml and dist is under the solr.solr.home directory. If I copy the jars to glassfish_root/glassfish/domains/domain1/lib/ext then this error will not appear. But this should not be right way. Any thought? Thanks.
Re: glassfish, solrconfig.xml and SolrException: Error loading DataImportHandler
Here's a thought. If dist is under solr.solr.home but your lib dir is set to be ../../dist. Wouldn't the lib dir be relative to solr.solr.home and therefore should just be dist? On Wed, 14 Sep 2011 07:45:45 -0700 (PDT), Xue-Feng Yang just4l...@yahoo.com wrote: Hi all, I am trying set up solr 3.3 with multicores in glassfish 3.1.1 and eclipse indigo. I have the following error: SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler' However, I have a line lib dir=../../dist/ regex=apache-solr-dataimporthandler-\d.*\.jar / in solrconfig.xml and dist is under the solr.solr.home directory. If I copy the jars to glassfish_root/glassfish/domains/domain1/lib/ext then this error will not appear. But this should not be right way. Any thought? Thanks.
Re: Performance troubles with solr
Thank you for your reply. I tried to give most of the information i can but obviously i missed some. 1. Just what does your test script do? Is it doing updates, or just queries of the sort you mentioned below? Test script only sends random queries. 2. If the test script is doing updates, how are those updates being fed to Solr? There are no updates right now, as i failed on performance. 3. What version of Solr are you running? I'm using Solr 3.3.0 4. Why did you increase the default for jetty (around 384m) to 6000m, particularly given your relatively modest number of documents (2,000,000). I was trying everything before asking here. 5. Machine characteristics, particularly operating system and physical memory on the machine. OS = Debian 6.0, Physcal Memory = 32 gb, CPU = 2x Intel Quad Core On Wed, Sep 14, 2011 at 5:38 PM, Jaeger, Jay - DOT jay.jae...@dot.wi.govwrote: I think folks are going to need a *lot* more information. Particularly 1. Just what does your test script do? Is it doing updates, or just queries of the sort you mentioned below? 2. If the test script is doing updates, how are those updates being fed to Solr? 3. What version of Solr are you running? 4. Why did you increase the default for jetty (around 384m) to 6000m, particularly given your relatively modest number of documents (2,000,000). 5. Machine characteristics, particularly operating system and physical memory on the machine. Please refer to http://wiki.apache.org/solr/UsingMailingLists for additional guidance in using the mailing list to get help. -Original Message- From: Yusuf Karakaya [mailto:karakaya...@gmail.com] Sent: Wednesday, September 14, 2011 9:19 AM To: solr-user@lucene.apache.org Subject: Performance troubles with solr Hi, i'm having performance troubles with solr. I don't know if i'm expection too much from solr or i missconfigured solr. When i run a single query its QTime is 500-1000~ ms (without any use of caches). When i run my test script (with use of caches) QTime increases exponentially, reaching 8000~ to 6~ ms. And Cpu usage also increases to %550~ My solr-start script: java -Duser.timezone=EET -Xmx6000m -jar ./start.jar 2,000,000~ documents , currently there aren't any commits but in future there will be 5,000~ updates/additions to documents every 3-5~ min via delta import. Search Query sort=userscore+desc start=0 q=photo_id:* AND gender:true AND country:MALAWI AND online:false fq=birth:[NOW-31YEARS/DAY TO NOW-17YEARS/DAY] ( Random age ranges ) fq=lastlogin:[* TO NOW-6MONTHS/DAY] ( Only 2 options, [* TO NOW-6MONTHS/DAY] or [NOW-6MONTHS/DAY TO *] ) fq=userscore:[500 TO *] ( Only 2 options, [500 TO *] or [* TO 500] ) rows=150 Schema field name=id type=long indexed=true stored=true required=true/ field name=username type=string indexed=true stored=false required=true/ field name=namesurname type=string indexed=true stored=false/ field name=network type=int indexed=true stored=false/ field name=photo_id type=int indexed=true stored=false/ field name=gender type=boolean indexed=true stored=false/ field name=country type=string indexed=true stored=false/ field name=birth type=tdate indexed=true stored=false/ field name=lastlogin type=tdate indexed=true stored=false/ field name=online type=boolean indexed=true stored=false/ field name=userscore type=int indexed=true stored=false/ Cache Sizes Lazy Load filterCache class=solr.FastLRUCache size=16384 initialSize=4096 autowarmCount=4096/ queryResultCache class=solr.LRUCache size=16384 initialSize=4096 autowarmCount=4096/ documentCache class=solr.LRUCache size=16384 initialSize=4096 autowarmCount=4096/ enableLazyFieldLoadingtrue/enableLazyFieldLoading
Re: glassfish, solrconfig.xml and SolrException: Error loading DataImportHandler
Thanks for a quick reply. I just tested as you suggested. The error is still there. The setup line lib dir=../../dist/ regex=apache-solr-dataimporthandler-\d.*\.jar / is actually coming with solr 3.3 release not by me. From: dar...@ontrenet.com dar...@ontrenet.com To: Xue-Feng Yang just4l...@yahoo.com Cc: solr-user@lucene.apache.org Sent: Wednesday, September 14, 2011 10:52:55 AM Subject: Re: glassfish, solrconfig.xml and SolrException: Error loading DataImportHandler Here's a thought. If dist is under solr.solr.home but your lib dir is set to be ../../dist. Wouldn't the lib dir be relative to solr.solr.home and therefore should just be dist? On Wed, 14 Sep 2011 07:45:45 -0700 (PDT), Xue-Feng Yang just4l...@yahoo.com wrote: Hi all, I am trying set up solr 3.3 with multicores in glassfish 3.1.1 and eclipse indigo. I have the following error: SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler' However, I have a line lib dir=../../dist/ regex=apache-solr-dataimporthandler-\d.*\.jar / in solrconfig.xml and dist is under the solr.solr.home directory. If I copy the jars to glassfish_root/glassfish/domains/domain1/lib/ext then this error will not appear. But this should not be right way. Any thought? Thanks.
RE: Performance troubles with solr
I don't have enough experience with filter queries to advise well on when to use fq vs. putting it in the query itself, but I do know that we are not using filter queries, and with index sizes ranging from 7 Million to 27+ Million we have not seen this kind of issue. Maybe keeping 16,384 filter queries around, particularly caching the ones with random age ranges is eating your memory up -- so perhaps try moving just that particular fq into q instead (since it is random) and just cache the ones where the number of options is limited? What happens if you try your test without the filter queries? What happens if you put the additional criteria that are in your filter query into the query itself? JRJ -Original Message- From: Yusuf Karakaya [mailto:karakaya...@gmail.com] Sent: Wednesday, September 14, 2011 9:54 AM To: solr-user@lucene.apache.org Subject: Re: Performance troubles with solr Thank you for your reply. I tried to give most of the information i can but obviously i missed some. 1. Just what does your test script do? Is it doing updates, or just queries of the sort you mentioned below? Test script only sends random queries. 2. If the test script is doing updates, how are those updates being fed to Solr? There are no updates right now, as i failed on performance. 3. What version of Solr are you running? I'm using Solr 3.3.0 4. Why did you increase the default for jetty (around 384m) to 6000m, particularly given your relatively modest number of documents (2,000,000). I was trying everything before asking here. 5. Machine characteristics, particularly operating system and physical memory on the machine. OS = Debian 6.0, Physcal Memory = 32 gb, CPU = 2x Intel Quad Core On Wed, Sep 14, 2011 at 5:38 PM, Jaeger, Jay - DOT jay.jae...@dot.wi.govwrote: I think folks are going to need a *lot* more information. Particularly 1. Just what does your test script do? Is it doing updates, or just queries of the sort you mentioned below? 2. If the test script is doing updates, how are those updates being fed to Solr? 3. What version of Solr are you running? 4. Why did you increase the default for jetty (around 384m) to 6000m, particularly given your relatively modest number of documents (2,000,000). 5. Machine characteristics, particularly operating system and physical memory on the machine. Please refer to http://wiki.apache.org/solr/UsingMailingLists for additional guidance in using the mailing list to get help. -Original Message- From: Yusuf Karakaya [mailto:karakaya...@gmail.com] Sent: Wednesday, September 14, 2011 9:19 AM To: solr-user@lucene.apache.org Subject: Performance troubles with solr Hi, i'm having performance troubles with solr. I don't know if i'm expection too much from solr or i missconfigured solr. When i run a single query its QTime is 500-1000~ ms (without any use of caches). When i run my test script (with use of caches) QTime increases exponentially, reaching 8000~ to 6~ ms. And Cpu usage also increases to %550~ My solr-start script: java -Duser.timezone=EET -Xmx6000m -jar ./start.jar 2,000,000~ documents , currently there aren't any commits but in future there will be 5,000~ updates/additions to documents every 3-5~ min via delta import. Search Query sort=userscore+desc start=0 q=photo_id:* AND gender:true AND country:MALAWI AND online:false fq=birth:[NOW-31YEARS/DAY TO NOW-17YEARS/DAY] ( Random age ranges ) fq=lastlogin:[* TO NOW-6MONTHS/DAY] ( Only 2 options, [* TO NOW-6MONTHS/DAY] or [NOW-6MONTHS/DAY TO *] ) fq=userscore:[500 TO *] ( Only 2 options, [500 TO *] or [* TO 500] ) rows=150 Schema field name=id type=long indexed=true stored=true required=true/ field name=username type=string indexed=true stored=false required=true/ field name=namesurname type=string indexed=true stored=false/ field name=network type=int indexed=true stored=false/ field name=photo_id type=int indexed=true stored=false/ field name=gender type=boolean indexed=true stored=false/ field name=country type=string indexed=true stored=false/ field name=birth type=tdate indexed=true stored=false/ field name=lastlogin type=tdate indexed=true stored=false/ field name=online type=boolean indexed=true stored=false/ field name=userscore type=int indexed=true stored=false/ Cache Sizes Lazy Load filterCache class=solr.FastLRUCache size=16384 initialSize=4096 autowarmCount=4096/ queryResultCache class=solr.LRUCache size=16384 initialSize=4096 autowarmCount=4096/ documentCache class=solr.LRUCache size=16384 initialSize=4096 autowarmCount=4096/ enableLazyFieldLoadingtrue/enableLazyFieldLoading
RE: glassfish, solrconfig.xml and SolrException: Error loading DataImportHandler
Some things to think about: When solr starts up, solr should report for the location of solr home. Is it what you expect? Is there any security on the dist directory that would prevent solr from accessing it? Is there a classloader policy set on glassfish that could be getting in the way? (your testing seems to eliminate the possibility of a JRE incompatibility) JRJ -Original Message- From: Xue-Feng Yang [mailto:just4l...@yahoo.com] Sent: Wednesday, September 14, 2011 10:07 AM To: solr-user@lucene.apache.org Subject: Re: glassfish, solrconfig.xml and SolrException: Error loading DataImportHandler Thanks for a quick reply. I just tested as you suggested. The error is still there. The setup line lib dir=../../dist/ regex=apache-solr-dataimporthandler-\d.*\.jar / is actually coming with solr 3.3 release not by me. From: dar...@ontrenet.com dar...@ontrenet.com To: Xue-Feng Yang just4l...@yahoo.com Cc: solr-user@lucene.apache.org Sent: Wednesday, September 14, 2011 10:52:55 AM Subject: Re: glassfish, solrconfig.xml and SolrException: Error loading DataImportHandler Here's a thought. If dist is under solr.solr.home but your lib dir is set to be ../../dist. Wouldn't the lib dir be relative to solr.solr.home and therefore should just be dist? On Wed, 14 Sep 2011 07:45:45 -0700 (PDT), Xue-Feng Yang just4l...@yahoo.com wrote: Hi all, I am trying set up solr 3.3 with multicores in glassfish 3.1.1 and eclipse indigo. I have the following error: SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler' However, I have a line lib dir=../../dist/ regex=apache-solr-dataimporthandler-\d.*\.jar / in solrconfig.xml and dist is under the solr.solr.home directory. If I copy the jars to glassfish_root/glassfish/domains/domain1/lib/ext then this error will not appear. But this should not be right way. Any thought? Thanks.
Re: query - part default OR and part default AND
Keep the default Search Operator as OR And for phrase1, on splitting on whitespace just add AND instead of +. Hopefully this should work. Please do confirm. -- View this message in context: http://lucene.472066.n3.nabble.com/query-part-default-OR-and-part-default-AND-tp3335851p3336194.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH delta last_index_time
Rahul is right. You may add a script to change the date in data-import.properties to half hour before the last modi time before each delta-import. -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-delta-last-index-time-tp3334992p3336203.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: glassfish, solrconfig.xml and SolrException: Error loading DataImportHandler
Thanks for your reply. Actually, some of the cores are working perfectly. So it's not the solr.solr.home problem. From: Jaeger, Jay - DOT jay.jae...@dot.wi.gov To: solr-user@lucene.apache.org solr-user@lucene.apache.org; 'Xue-Feng Yang' just4l...@yahoo.com Sent: Wednesday, September 14, 2011 11:21:18 AM Subject: RE: glassfish, solrconfig.xml and SolrException: Error loading DataImportHandler Some things to think about: When solr starts up, solr should report for the location of solr home. Is it what you expect? Is there any security on the dist directory that would prevent solr from accessing it? Is there a classloader policy set on glassfish that could be getting in the way? (your testing seems to eliminate the possibility of a JRE incompatibility) JRJ -Original Message- From: Xue-Feng Yang [mailto:just4l...@yahoo.com] Sent: Wednesday, September 14, 2011 10:07 AM To: solr-user@lucene.apache.org Subject: Re: glassfish, solrconfig.xml and SolrException: Error loading DataImportHandler Thanks for a quick reply. I just tested as you suggested. The error is still there. The setup line lib dir=../../dist/ regex=apache-solr-dataimporthandler-\d.*\.jar / is actually coming with solr 3.3 release not by me. From: dar...@ontrenet.com dar...@ontrenet.com To: Xue-Feng Yang just4l...@yahoo.com Cc: solr-user@lucene.apache.org Sent: Wednesday, September 14, 2011 10:52:55 AM Subject: Re: glassfish, solrconfig.xml and SolrException: Error loading DataImportHandler Here's a thought. If dist is under solr.solr.home but your lib dir is set to be ../../dist. Wouldn't the lib dir be relative to solr.solr.home and therefore should just be dist? On Wed, 14 Sep 2011 07:45:45 -0700 (PDT), Xue-Feng Yang just4l...@yahoo.com wrote: Hi all, I am trying set up solr 3.3 with multicores in glassfish 3.1.1 and eclipse indigo. I have the following error: SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler' However, I have a line lib dir=../../dist/ regex=apache-solr-dataimporthandler-\d.*\.jar / in solrconfig.xml and dist is under the solr.solr.home directory. If I copy the jars to glassfish_root/glassfish/domains/domain1/lib/ext then this error will not appear. But this should not be right way. Any thought? Thanks.
Re: BigDecimal data type
: Is there a way to use BigDecimal as a data type in solr? I am using solr : 3.3. if you just want to *store* BigDecimals in a solr index, then just use StrField with the canonical representation -- but if you want to sort or do range queries on the values, then no. Given that BigDecimal values allow for an arbitrary-precision unscaled value, i don't know if it would even be possible to BigDecimal values in a way that would allow arbitrary values to sort properly as Terms. -Hoss
Re: glassfish, solrconfig.xml and SolrException: Error loading DataImportHandler
After making another try, I found it worked with lib dir=../dist/ regex=apache-solr-dataimporthandler-\d.*\.jar / I leave this to here in case someone may need this too. Thanks From: dar...@ontrenet.com dar...@ontrenet.com To: Xue-Feng Yang just4l...@yahoo.com Cc: solr-user@lucene.apache.org Sent: Wednesday, September 14, 2011 10:52:55 AM Subject: Re: glassfish, solrconfig.xml and SolrException: Error loading DataImportHandler Here's a thought. If dist is under solr.solr.home but your lib dir is set to be ../../dist. Wouldn't the lib dir be relative to solr.solr.home and therefore should just be dist? On Wed, 14 Sep 2011 07:45:45 -0700 (PDT), Xue-Feng Yang just4l...@yahoo.com wrote: Hi all, I am trying set up solr 3.3 with multicores in glassfish 3.1.1 and eclipse indigo. I have the following error: SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler' However, I have a line lib dir=../../dist/ regex=apache-solr-dataimporthandler-\d.*\.jar / in solrconfig.xml and dist is under the solr.solr.home directory. If I copy the jars to glassfish_root/glassfish/domains/domain1/lib/ext then this error will not appear. But this should not be right way. Any thought? Thanks.
Re: Schema fieldType y-m-d ?!?!
What we did was get the date from db, and stored it in a string fieldType in the format mmdd. It works fine for us, as range query works just fine. -- View this message in context: http://lucene.472066.n3.nabble.com/Schema-fieldType-y-m-d-tp3335359p3336309.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH delta last_index_time
Thanks Rahul That sounds like a good solution, I will change the code to support different timezones. Maybe this could be included in next release of Solr since a few people mentioned this problem too. Thanks again Maria Sent from my Motorola ATRIX™ 4G on ATT -Original message- From: Rahul Warawdekar rahul.warawde...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, Sep 14, 2011 14:01:08 GMT+00:00 Subject: Re: DIH delta last_index_time Hi Maria/Gora, I see this as more of a problem with the timezones in which the Solr server and the database server are located. Is this true ? If yes, one more possibility of handling this scenario would be to customize DataImportHandler code as follows 1. Add one more configuration property named dbTimeZone at the entity level in data-config.xml file 2. While saving the lastIndexTime in the properties file, save it according to the timezone specified in the config so that it is in sync with the database server time. Basically customize the code so that all the time related updates to the dataimport.properties file should be timezone specific. On Wed, Sep 14, 2011 at 4:31 AM, Gora Mohanty g...@mimirtech.com wrote: On Wed, Sep 14, 2011 at 11:23 AM, Maria Vazquez maria.vazq...@dexone.com wrote: Hi, How do you handle the situation where the time on the server running Solr doesn¹t match the time in the database? Firstly, why is that the case? NTP is pretty universal these days. I¹m using the last_index_time saved by Solr in the delta query checking it against lastModifiedDate field in the database but the times are not in sync so I might lose some changes. Can we use something else other than last_index_time? Maybe something like last_pk or something. One possible way is to edit dataimport.properties, manually or through a script, to put the last_index_time back to a safe value. Regards, Gora -- Thanks and Regards Rahul A. Warawdekar
Re: Performance troubles with solr
I tried moving age query from filter query to normal query but nothing really changed. But when i try to move everything into query itself ( removed all filter queries) QTimes slowed much more. I don't have problem with memory or cpu usage, my problem is query response times. When i send only one query respond times vary from 500 ms to 1000 ms (non cached) and its too much. When i send a set of random queries (10-20 queries per second) response times goes crayz ( 8 seconds to 60+ seconds). On Wed, Sep 14, 2011 at 6:07 PM, Jaeger, Jay - DOT jay.jae...@dot.wi.govwrote: I don't have enough experience with filter queries to advise well on when to use fq vs. putting it in the query itself, but I do know that we are not using filter queries, and with index sizes ranging from 7 Million to 27+ Million we have not seen this kind of issue. Maybe keeping 16,384 filter queries around, particularly caching the ones with random age ranges is eating your memory up -- so perhaps try moving just that particular fq into q instead (since it is random) and just cache the ones where the number of options is limited? What happens if you try your test without the filter queries? What happens if you put the additional criteria that are in your filter query into the query itself? JRJ -Original Message- From: Yusuf Karakaya [mailto:karakaya...@gmail.com] Sent: Wednesday, September 14, 2011 9:54 AM To: solr-user@lucene.apache.org Subject: Re: Performance troubles with solr Thank you for your reply. I tried to give most of the information i can but obviously i missed some. 1. Just what does your test script do? Is it doing updates, or just queries of the sort you mentioned below? Test script only sends random queries. 2. If the test script is doing updates, how are those updates being fed to Solr? There are no updates right now, as i failed on performance. 3. What version of Solr are you running? I'm using Solr 3.3.0 4. Why did you increase the default for jetty (around 384m) to 6000m, particularly given your relatively modest number of documents (2,000,000). I was trying everything before asking here. 5. Machine characteristics, particularly operating system and physical memory on the machine. OS = Debian 6.0, Physcal Memory = 32 gb, CPU = 2x Intel Quad Core On Wed, Sep 14, 2011 at 5:38 PM, Jaeger, Jay - DOT jay.jae...@dot.wi.gov wrote: I think folks are going to need a *lot* more information. Particularly 1. Just what does your test script do? Is it doing updates, or just queries of the sort you mentioned below? 2. If the test script is doing updates, how are those updates being fed to Solr? 3. What version of Solr are you running? 4. Why did you increase the default for jetty (around 384m) to 6000m, particularly given your relatively modest number of documents (2,000,000). 5. Machine characteristics, particularly operating system and physical memory on the machine. Please refer to http://wiki.apache.org/solr/UsingMailingLists for additional guidance in using the mailing list to get help. -Original Message- From: Yusuf Karakaya [mailto:karakaya...@gmail.com] Sent: Wednesday, September 14, 2011 9:19 AM To: solr-user@lucene.apache.org Subject: Performance troubles with solr Hi, i'm having performance troubles with solr. I don't know if i'm expection too much from solr or i missconfigured solr. When i run a single query its QTime is 500-1000~ ms (without any use of caches). When i run my test script (with use of caches) QTime increases exponentially, reaching 8000~ to 6~ ms. And Cpu usage also increases to %550~ My solr-start script: java -Duser.timezone=EET -Xmx6000m -jar ./start.jar 2,000,000~ documents , currently there aren't any commits but in future there will be 5,000~ updates/additions to documents every 3-5~ min via delta import. Search Query sort=userscore+desc start=0 q=photo_id:* AND gender:true AND country:MALAWI AND online:false fq=birth:[NOW-31YEARS/DAY TO NOW-17YEARS/DAY] ( Random age ranges ) fq=lastlogin:[* TO NOW-6MONTHS/DAY] ( Only 2 options, [* TO NOW-6MONTHS/DAY] or [NOW-6MONTHS/DAY TO *] ) fq=userscore:[500 TO *] ( Only 2 options, [500 TO *] or [* TO 500] ) rows=150 Schema field name=id type=long indexed=true stored=true required=true/ field name=username type=string indexed=true stored=false required=true/ field name=namesurname type=string indexed=true stored=false/ field name=network type=int indexed=true stored=false/ field name=photo_id type=int indexed=true stored=false/ field name=gender type=boolean indexed=true stored=false/ field name=country type=string indexed=true stored=false/ field name=birth type=tdate indexed=true stored=false/ field name=lastlogin type=tdate indexed=true stored=false/ field name=online type=boolean indexed=true stored=false/ field name=userscore type=int
Re: Schema fieldType y-m-d ?!?!
If you don't need date-specific functions and/or faceting, you can store it as a int, like 20110914 and parse it in your application but I don't recommend... as a rule of thumb, dates should be stored as dates, the millenium bug (Y2K bug) was all about 'saving some space' remember?
Index not getting refreshed
Hi I am using Solr 3.2 on a live website. i get live user's data of about 2000 per day. I do an incremental index every 8 hours. but my search results always show the same result with same sorting order. when i check the same search from corresponding db, it gives me different results always (as new data regularly gets added) please suggest what might be the issue. is there any cache related problem at SOLR level thanks pawan
Re: Managing solr machines (start/stop/status)
On Sep 13, 2011, at 5:05 PM, Jamie Johnson wrote: I know this isn't a solr specific question but I was wondering what folks do in regards to managing the machines in their solr cluster? Are there any recommendations for how to start/stop/manage these machines? Any suggestions would be appreciated. One thing I use is csshx (http://code.google.com/p/csshx/) on my Mac when dealing with the various boxes in our cluster. You can issue commands in one terminal and they are duplicated in all other windows. Very useful for global stop/starts and updates.
Re: Index not getting refreshed
Hi Pawan, Can you please share more details on the indexing mechanism ? (DIH, SolrJ or any other) Please let us know the configuration details. On Wed, Sep 14, 2011 at 12:48 PM, Pawan Darira pawan.dar...@gmail.comwrote: Hi I am using Solr 3.2 on a live website. i get live user's data of about 2000 per day. I do an incremental index every 8 hours. but my search results always show the same result with same sorting order. when i check the same search from corresponding db, it gives me different results always (as new data regularly gets added) please suggest what might be the issue. is there any cache related problem at SOLR level thanks pawan -- Thanks and Regards Rahul A. Warawdekar
RE: DIH delta last_index_time
The solution that I am currently using is converting the last_index_time to UTC before comparing to the LastModified field in the DB. LastModified DATEADD(Hour, DATEDIFF(Hour, GETDATE(), GETUTCDATE()), '${dateimporter.last_index_time}') This may be another option if the LastModified date in the DB is stored in UTC. Regards, Claudia -Original Message- From: Vazquez, Maria (STM) [mailto:maria.vazq...@dexone.com] Sent: Wednesday, September 14, 2011 9:27 AM To: solr-user@lucene.apache.org Subject: Re: DIH delta last_index_time Thanks Rahul That sounds like a good solution, I will change the code to support different timezones. Maybe this could be included in next release of Solr since a few people mentioned this problem too. Thanks again Maria Sent from my Motorola ATRIX™ 4G on ATT -Original message- From: Rahul Warawdekar rahul.warawde...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, Sep 14, 2011 14:01:08 GMT+00:00 Subject: Re: DIH delta last_index_time Hi Maria/Gora, I see this as more of a problem with the timezones in which the Solr server and the database server are located. Is this true ? If yes, one more possibility of handling this scenario would be to customize DataImportHandler code as follows 1. Add one more configuration property named dbTimeZone at the entity level in data-config.xml file 2. While saving the lastIndexTime in the properties file, save it according to the timezone specified in the config so that it is in sync with the database server time. Basically customize the code so that all the time related updates to the dataimport.properties file should be timezone specific. On Wed, Sep 14, 2011 at 4:31 AM, Gora Mohanty g...@mimirtech.com wrote: On Wed, Sep 14, 2011 at 11:23 AM, Maria Vazquez maria.vazq...@dexone.com wrote: Hi, How do you handle the situation where the time on the server running Solr doesn¹t match the time in the database? Firstly, why is that the case? NTP is pretty universal these days. I¹m using the last_index_time saved by Solr in the delta query checking it against lastModifiedDate field in the database but the times are not in sync so I might lose some changes. Can we use something else other than last_index_time? Maybe something like last_pk or something. One possible way is to edit dataimport.properties, manually or through a script, to put the last_index_time back to a safe value. Regards, Gora -- Thanks and Regards Rahul A. Warawdekar
how would I use the new join feature given my schema.
I've been reading the information on the new join feature and am not quite sure how I would use it given my schema structure. I have User docs and BlogPost docs and I want to return all BlogPosts that match the fulltext title cool that belong to Users that match the description solr. Here are the 2 docs I have: ?xml version=1.0 encoding=UTF-8?add docfield name=class_nameUser/fieldfield name=login_sjtoy/fieldfield name=user_id_i192123/fieldfield name=description_texta solr user/field/field/doc docfield name=class_nameBlogPost/fieldfield name=user_id_i192123/fieldfield name=body_textthis is the description/fieldfield name=title_textthis is a cool title/field/field/doc /add?xml version=1.0 encoding=UTF-8?commit/ Is it possible to do this with the join functionality? If not, how would I do this? I'd appreciate any pointers or help on this. Jason
Re: DIH delta last_index_time
On Wed, Sep 14, 2011 at 9:56 PM, Vazquez, Maria (STM) maria.vazq...@dexone.com wrote: Thanks Rahul That sounds like a good solution, I will change the code to support different timezones. Maybe this could be included in next release of Solr since a few people mentioned this problem too. [...] If it was indeed a timezone issue, Solr can hardly be automatically aware of a difference in timezones. At least, IMHO, this is an implementation issue, and a timezone difference should normally be easy to diagnose. Regards, Gora
RE: Performance troubles with solr
How about this: Start with just what you had in your query (q) without the filter queries. Then add the fq's back in one at a time to see what is giving you problems -- leaving the birth filter query to the very last. Others on the list more experienced with filter queries might have a more direct answer... JRJ -Original Message- From: Yusuf Karakaya [mailto:karakaya...@gmail.com] Sent: Wednesday, September 14, 2011 11:31 AM To: solr-user@lucene.apache.org Subject: Re: Performance troubles with solr I tried moving age query from filter query to normal query but nothing really changed. But when i try to move everything into query itself ( removed all filter queries) QTimes slowed much more. I don't have problem with memory or cpu usage, my problem is query response times. When i send only one query respond times vary from 500 ms to 1000 ms (non cached) and its too much. When i send a set of random queries (10-20 queries per second) response times goes crayz ( 8 seconds to 60+ seconds). On Wed, Sep 14, 2011 at 6:07 PM, Jaeger, Jay - DOT jay.jae...@dot.wi.govwrote: I don't have enough experience with filter queries to advise well on when to use fq vs. putting it in the query itself, but I do know that we are not using filter queries, and with index sizes ranging from 7 Million to 27+ Million we have not seen this kind of issue. Maybe keeping 16,384 filter queries around, particularly caching the ones with random age ranges is eating your memory up -- so perhaps try moving just that particular fq into q instead (since it is random) and just cache the ones where the number of options is limited? What happens if you try your test without the filter queries? What happens if you put the additional criteria that are in your filter query into the query itself? JRJ -Original Message- From: Yusuf Karakaya [mailto:karakaya...@gmail.com] Sent: Wednesday, September 14, 2011 9:54 AM To: solr-user@lucene.apache.org Subject: Re: Performance troubles with solr Thank you for your reply. I tried to give most of the information i can but obviously i missed some. 1. Just what does your test script do? Is it doing updates, or just queries of the sort you mentioned below? Test script only sends random queries. 2. If the test script is doing updates, how are those updates being fed to Solr? There are no updates right now, as i failed on performance. 3. What version of Solr are you running? I'm using Solr 3.3.0 4. Why did you increase the default for jetty (around 384m) to 6000m, particularly given your relatively modest number of documents (2,000,000). I was trying everything before asking here. 5. Machine characteristics, particularly operating system and physical memory on the machine. OS = Debian 6.0, Physcal Memory = 32 gb, CPU = 2x Intel Quad Core On Wed, Sep 14, 2011 at 5:38 PM, Jaeger, Jay - DOT jay.jae...@dot.wi.gov wrote: I think folks are going to need a *lot* more information. Particularly 1. Just what does your test script do? Is it doing updates, or just queries of the sort you mentioned below? 2. If the test script is doing updates, how are those updates being fed to Solr? 3. What version of Solr are you running? 4. Why did you increase the default for jetty (around 384m) to 6000m, particularly given your relatively modest number of documents (2,000,000). 5. Machine characteristics, particularly operating system and physical memory on the machine. Please refer to http://wiki.apache.org/solr/UsingMailingLists for additional guidance in using the mailing list to get help. -Original Message- From: Yusuf Karakaya [mailto:karakaya...@gmail.com] Sent: Wednesday, September 14, 2011 9:19 AM To: solr-user@lucene.apache.org Subject: Performance troubles with solr Hi, i'm having performance troubles with solr. I don't know if i'm expection too much from solr or i missconfigured solr. When i run a single query its QTime is 500-1000~ ms (without any use of caches). When i run my test script (with use of caches) QTime increases exponentially, reaching 8000~ to 6~ ms. And Cpu usage also increases to %550~ My solr-start script: java -Duser.timezone=EET -Xmx6000m -jar ./start.jar 2,000,000~ documents , currently there aren't any commits but in future there will be 5,000~ updates/additions to documents every 3-5~ min via delta import. Search Query sort=userscore+desc start=0 q=photo_id:* AND gender:true AND country:MALAWI AND online:false fq=birth:[NOW-31YEARS/DAY TO NOW-17YEARS/DAY] ( Random age ranges ) fq=lastlogin:[* TO NOW-6MONTHS/DAY] ( Only 2 options, [* TO NOW-6MONTHS/DAY] or [NOW-6MONTHS/DAY TO *] ) fq=userscore:[500 TO *] ( Only 2 options, [500 TO *] or [* TO 500] ) rows=150 Schema field name=id type=long indexed=true stored=true required=true/ field name=username type=string indexed=true stored=false
Re: Can index size increase when no updates/optimizes are happening?
What is the machine used for? Was your user looking at a master? Slave? Something used for both? Measuring the size of all the files in the index? Or looking at memory? The index files shouldn't be getting bigger unless there were indexing operations going on. Is it at all possible that DIH was configured to run automatically (or any other indexing job for that matter) and your user didn't realize it? Best Erick 2011/9/13 Yury Kats yuryk...@yahoo.com: One of my users observed that the index size (in bytes) increased over night. There was no indexing activity at that time, only querying was taking place. Running optimize brought the index size back down to what it was when indexing finished the day before. What could explain that?
Re: select query does not find indexed pdf document
You can use copyField to put data from separate fields into a common search field. This page will help you get started on what mods you'd need to make on a fieldType to analyze it as you wish: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters But at a start think about WhitespaceTokenizer followed by LowerCaseFilterFactory AsciiFoldingFilterFactory NGramFilterFactory Pay attention to the note at the top that directs you to the full list, the page above contains a partial list. For instance, NGramFilterFactory isn't that page, it's on the page that's linked to: http://lucene.apache.org/solr/api/org/apache/solr/analysis/package-summary.html Best Erick On Tue, Sep 13, 2011 at 10:46 PM, Michael Dockery dockeryjava...@yahoo.com wrote: Thank you for your informative reply. I would like to start simple by combining both filename and content into the same default search field ...which my default schema xml calls text ... defaultSearchFieldtext/defaultSearchField ... also: -case and accent insensitive -no splits on numb3rs -no highlights -text processing same for index and search however I do like -I like ngrams prerrably (partial/prefix word/token search) what schema mod's would be needed? also what curl syntax to submit/index a pdf (with filename and content combined into the default search field)? From: Bob Sandiford bob.sandif...@sirsidynix.com To: Michael Dockery dockeryjava...@yahoo.com Cc: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Monday, September 12, 2011 1:38 PM Subject: RE: select query does not find indexed pdf document Hi, Michael. Well, the stock answer is, 'it depends' For example - would you want to be able to search filename without searching file contents, or would you always search both of them together? If both, then copy both the file name and the parsed file content from the pdf into a single search field, and you can set that up as the default search field. Or - what kind of processing / normalizing do you want on this data? Case insensitive? Accent insensitive? If a 'word' contains camel case (e.g. TheVeryIdea), do you want that split on the case changes? (but then watch out for things like iPad) If a 'word' contains numbers, do want them left together, or separated? Do you want stemming (where searching for 'stemming' would also find 'stem', 'stemmed', that sort of thing?) Is this always English, or are the other languages involved. Do you want the text processing to be the same for indexing vs searching? Do you want to be able to find hits based on the first few characters of a term? (ngrams) Do you want to be able to highlight text segments where the search terms were found? probably you want to read up on the various tokenizers and filters that are available. Do some prototyping and see how it looks. Here's a starting point: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters Basically, there is no 'one size fits all' here. Part of the power of Solr / Lucene is its configurability to achieve the results your business case calls for. Part of the drawback of Solr / Lucene - especially for new folks - is its configurability to achieve the results you business case calls for. :) Anyone got anything else to suggest for Michael? Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.comhttp://www.sirsidynix.com/ From: Michael Dockery [mailto:dockeryjava...@yahoo.com] Sent: Monday, September 12, 2011 1:18 PM To: Bob Sandiford Subject: Re: select query does not find indexed pdf document thank you. that worked. Any tips for very very basic setup of the schema xml? or is the default basic enough? I basically only want to search search on filename and file contents From: Bob Sandiford bob.sandif...@sirsidynix.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org; Michael Dockery dockeryjava...@yahoo.com Sent: Monday, September 12, 2011 10:04 AM Subject: RE: select query does not find indexed pdf document Um - looks like you specified your id value as pdfy, which is reflected in the results from the *:* query, but your id query is searching for vpn, hence no matches... What does this query yield? http://www/SearchApp/select/?q=id:pdfy Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.commailto:bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Michael Dockery [mailto:dockeryjava...@yahoo.commailto:dockeryjava...@yahoo.com] Sent: Monday, September 12, 2011 9:56 AM To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org Subject: Re: select query does not find indexed pdf document http://www/SearchApp/select/?q=id:vpn yeilds this: ?xml version=1.0 encoding=UTF-8 ? - response -
RegexTransformer - need help with regex value
Hello, Feel free to point me to alternate sources of information if you deem this question unworthy of the Solr list :) But until then please hear me out! When my config is something like: field column=imageUrl regex=.*img src=.(.*)\.gif..alt=.* sourceColName=description / I don't get any data. But when my config is like: field column=imageUrl regex=.*img src=.(.*)..alt=.* sourceColName=description / I get the following data as the value for imageUrl: http://g-ecx.images-amazon.com/images/G/01/x-locale/common/customer-reviews/stars-5-0._V192240867_.gif; width=64 As the result shows, this is a string that should be able to match even on the 1st regex=.*img src=.(.*)\.gif..alt=.* and produce a result like: http://g-ecx.images-amazon.com/images/G/01/x-locale/common/customer-reviews/stars-5-0._V192240867_ But it doesn't! Can anyone tell me why that would be the case? Is it something about the way RegexTransformer is wired or is it just my regex value that isn't right?
Re: query - part default OR and part default AND
: phrase1 - solr is the best fts ever : phrase2 - let us all contribute to open source for a better world : : now I want to perform the next query: : : field1:( phrase1) AND field2:(phrase2) : : my default operator is AND, but I want to search within field1 with AND : operator between the tokens and within field2 with OR operator. ... : field1:(+solr +is +the +best +fts +ever) AND field2:(let us all contribute : to open source for a better world) First off -- be careful about your wording. you are calling these phrases but in these examples, what you are really doing is searching for a set of terms. there's no such thing as an OR phrase search -- when searching for phrases the entire phrase is mandatory, your only option is if you want to include any slop in how far apart the individual terms may be. having said that: if what you want to do is search all of a set of terms in field1, and any of a set of terms in field2, you can use localparams and the _query_ hook in the LucneeQParser to split this up into multiple params where you specify a diffenret default op... q=_query_:{!q.op=AND df=field1 v=$f1} _query_:{!q.op=OR df=field2 v=$f2} f1=solr is the best fts ever f2=let us all contribute to open source for a better world : what i already tried is to split phrase1 by whitespaces, changing the : default search operator to OR in the schema and add + signs before each : word: ... : this query is not good.. because when I am splitting phrase1 it is not how : the index time tokenizer splits it... so I am not getting the results I Second: this is't how the Lucene QueryParser works: whitepsace is a metacharacter for the queryparser, it splits on (unescaped) whitespace to determine individual clauses (which are then used to build boolean queries) before it ever consults the analyzer for the specified field (it actually doesn't even know which field it should use until it evaluates the whitespace it's parsing) Reading between the lines, i *think* what you are saying is that you wnat to search for an exact phrase on field1, but any of hte words in field2, which is as simple as... field1:solr is the best fts ever AND field2:(let us all contribute to open source for a better world) ...of course, if it's easier for your client to specify those as distinct params, you can still use the _query_ hook and local params, along with the field QParser.. q=_query_:{!field f=field1 v=$f1} _query_:{!q.op=OR df=field2 v=$f2} f1=solr is the best fts ever f2=let us all contribute to open source for a better world ...Lots of options. https://wiki.apache.org/solr/SolrQuerySyntax -Hoss
Re: math with date and modulo
: I try to get a diff of today and an dateField. from this diff, i want do a : modulo from another field with values of 1,3,6,12 ... : (DIFF(Month of Today - Month of Search) MOD interval) = 0 a) it looks like modulus was never implemneted as a function ... probably overlooked before it has not java.lang.Math.* static? ... please file a bug, it should be fairly trivial to add. b) even with a mod(a,b) function, i'm not sure that you could do whta it seems like you want to do -- the ms() function will easily let you conpute the number of milliseconds between two date field, but there's no fuction that will give you the numeric value for the month of year (or day of month, or hour of day, etc...) for a date field ... even if there was, i don't think your calculation would work when the current month is before the month indexed in the date field. If i understand your goal, you are probably better off indexing the month as it's own field (either numericly or just as a simple string) and then computing the list of matches you care about in the client (ie: fq=month:(feb, may, aug, nov) ) -Hoss
Re: RegexTransformer - need help with regex value
Thanks a bunch, got it working with a reluctant qualifier and the use of quot; as the escaped representation of double qoutes within the regex value so that the config file doesn't crash burn: field column=imageUrl regex=.*?img src=quot;(.*?)quot;.* sourceColName=description / Cheers, - Pulkit On Wed, Sep 14, 2011 at 2:24 PM, Pulkit Singhal pulkitsing...@gmail.com wrote: Hello, Feel free to point me to alternate sources of information if you deem this question unworthy of the Solr list :) But until then please hear me out! When my config is something like: field column=imageUrl regex=.*img src=.(.*)\.gif..alt=.* sourceColName=description / I don't get any data. But when my config is like: field column=imageUrl regex=.*img src=.(.*)..alt=.* sourceColName=description / I get the following data as the value for imageUrl: http://g-ecx.images-amazon.com/images/G/01/x-locale/common/customer-reviews/stars-5-0._V192240867_.gif; width=64 As the result shows, this is a string that should be able to match even on the 1st regex=.*img src=.(.*)\.gif..alt=.* and produce a result like: http://g-ecx.images-amazon.com/images/G/01/x-locale/common/customer-reviews/stars-5-0._V192240867_ But it doesn't! Can anyone tell me why that would be the case? Is it something about the way RegexTransformer is wired or is it just my regex value that isn't right?
Norms - scoring issue
Hi All, I hope someone could shed some light on the issue I'm facing with solr 3.1.0. It looks like it's computing diferrent fieldNorm values despite my configuration that aims to ignore it. field name=item_name type=textgen indexed=true store=true omitNorms=true omitTermFrequencyAndPositions=true / field name=item_description type=textTight indexed=true store=true omitNorms=true omitTermFrequencyAndPositions=true / field name=item_tags type=text indexed=true stored=true multiValued=true omitNorms=true omitTermFrequencyAndPositions=true / I also have a custom class that extends DefaultSimilarity to override the idf method. Query: str name=qitem_name:octopus seafood OR item_description:octopus seafood OR item_tags:octopus seafood/str str name=sortscore desc,item_ranking desc/str The first 2 results are: doc float name=score0.5217492/float str name=item_nameGrilled Octopus/str arr name=item_tagsstrSeafood, tapas/str/arr /doc doc float name=score0.49379835/float str name=item_nameoctopus marisco/str arr name=item_tagsstrAppetizer, Mexican, Seafood, food/str/arr /doc Does anyone know why they get a different score? I'm expecting them to have the same scoring because both matched the two search terms. I checked the debug information and it seems that the difference involves the fieldNorm values. 1) Grilled Octopus 0.52174926 = (MATCH) product of: 0.7826238 = (MATCH) sum of: 0.4472136 = (MATCH) weight(item_name:octopus in 69), product of: 0.4472136 = queryWeight(item_name:octopus), product of: 1.0 = idf(docFreq=2, maxDocs=449) 0.4472136 = queryNorm 1.0 = (MATCH) fieldWeight(item_name:octopus in 69), product of: 1.0 = tf(termFreq(item_name:octopus)=1) 1.0 = idf(docFreq=2, maxDocs=449) 1.0 = fieldNorm(field=item_name, doc=69) 0.1118034 = (MATCH) weight(text:seafood in 69), product of: 0.4472136 = queryWeight(text:seafood), product of: 1.0 = idf(docFreq=8, maxDocs=449) 0.4472136 = queryNorm 0.25 = (MATCH) fieldWeight(text:seafood in 69), product of: 1.0 = tf(termFreq(text:seafood)=1) 1.0 = idf(docFreq=8, maxDocs=449) 0.25 = fieldNorm(field=text, doc=69) 0.1118034 = (MATCH) weight(text:seafood in 69), product of: 0.4472136 = queryWeight(text:seafood), product of: 1.0 = idf(docFreq=8, maxDocs=449) 0.4472136 = queryNorm 0.25 = (MATCH) fieldWeight(text:seafood in 69), product of: 1.0 = tf(termFreq(text:seafood)=1) 1.0 = idf(docFreq=8, maxDocs=449) 0.25 = fieldNorm(field=text, doc=69) 0.1118034 = (MATCH) weight(text:seafood in 69), product of: 0.4472136 = queryWeight(text:seafood), product of: 1.0 = idf(docFreq=8, maxDocs=449) 0.4472136 = queryNorm 0.25 = (MATCH) fieldWeight(text:seafood in 69), product of: 1.0 = tf(termFreq(text:seafood)=1) 1.0 = idf(docFreq=8, maxDocs=449) 0.25 = fieldNorm(field=text, doc=69) 0.667 = coord(4/6) 2) octopus marisco 0.49379835 = (MATCH) product of: 0.7406975 = (MATCH) sum of: 0.4472136 = (MATCH) weight(item_name:octopus in 81), product of: 0.4472136 = queryWeight(item_name:octopus), product of: 1.0 = idf(docFreq=2, maxDocs=449) 0.4472136 = queryNorm 1.0 = (MATCH) fieldWeight(item_name:octopus in 81), product of: 1.0 = tf(termFreq(item_name:octopus)=1) 1.0 = idf(docFreq=2, maxDocs=449) 1.0 = fieldNorm(field=item_name, doc=81) 0.09782797 = (MATCH) weight(text:seafood in 81), product of: 0.4472136 = queryWeight(text:seafood), product of: 1.0 = idf(docFreq=8, maxDocs=449) 0.4472136 = queryNorm 0.21875 = (MATCH) fieldWeight(text:seafood in 81), product of: 1.0 = tf(termFreq(text:seafood)=1) 1.0 = idf(docFreq=8, maxDocs=449) 0.21875 = fieldNorm(field=text, doc=81) 0.09782797 = (MATCH) weight(text:seafood in 81), product of: 0.4472136 = queryWeight(text:seafood), product of: 1.0 = idf(docFreq=8, maxDocs=449) 0.4472136 = queryNorm 0.21875 = (MATCH) fieldWeight(text:seafood in 81), product of: 1.0 = tf(termFreq(text:seafood)=1) 1.0 = idf(docFreq=8, maxDocs=449) 0.21875 = fieldNorm(field=text, doc=81) 0.09782797 = (MATCH) weight(text:seafood in 81), product of: 0.4472136 = queryWeight(text:seafood), product of: 1.0 = idf(docFreq=8, maxDocs=449) 0.4472136 = queryNorm 0.21875 = (MATCH) fieldWeight(text:seafood in 81), product of: 1.0 = tf(termFreq(text:seafood)=1) 1.0 = idf(docFreq=8, maxDocs=449) 0.21875 = fieldNorm(field=text, doc=81) 0.667 = coord(4/6) Thanks in advance, Adolfo.
[ANNOUNCE] Apache Solr 3.4.0 released
September 14 2011, Apache Solr™ 3.4.0 available The Lucene PMC is pleased to announce the release of Apache Solr 3.4.0. Apache Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites. This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at: http://www.apache.org/dyn/closer.cgi/lucene/solr (see note below). If you are already using Apache Solr 3.1, 3.2 or 3.3, we strongly recommend you upgrade to 3.4.0 because of the index corruption bug on OS or computer crash or power loss (LUCENE-3418), now fixed in 3.4.0. See the CHANGES.txt file included with the release for a full list of details. Solr 3.4.0 Release Highlights: * Bug fixes and improvements from Apache Lucene 3.4.0, including a major bug (LUCENE-3418) whereby a Lucene index could easily become corrupted if the OS or computer crashed or lost power. * SolrJ client can now parse grouped and range facets results (SOLR-2523). * A new XsltUpdateRequestHandler allows posting XML that's transformed by a provided XSLT into a valid Solr document (SOLR-2630). * Post-group faceting option (group.truncate) can now compute facet counts for only the highest ranking documents per-group. (SOLR-2665). * Add commitWithin update request parameter to all update handlers that were previously missing it. This tells Solr to commit the change within the specified amount of time (SOLR-2540). * You can now specify NIOFSDirectory (SOLR-2670). * New parameter hl.phraseLimit speeds up FastVectorHighlighter (LUCENE-3234). * The query cache and filter cache can now be disabled per request See http://wiki.apache.org/solr/CommonQueryParameters#Caching_of_filters (SOLR-2429). * Improved memory usage, build time, and performance of SynonymFilterFactory (LUCENE-3233). * Added omitPositions to the schema, so you can omit position information while still indexing term frequencies (LUCENE-2048). * Various fixes for multi-threaded DataImportHandler. Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. Happy searching, Apache Lucene/Solr Developers
Re: glassfish, solrconfig.xml and SolrException: Error loading DataImportHandler
: References: 41dfe0136ddf091e98d45dea9f0da1ab@localhost : cab_8yd9obtkvkdktqpfnuzmey-afbzajyvgahh58+mccgiq...@mail.gmail.com : Message-ID: 1316011545.626.yahoomail...@web110411.mail.gq1.yahoo.com : Subject: glassfish, solrconfig.xml and SolrException: Error loading : DataImportHandler : In-Reply-To: : cab_8yd9obtkvkdktqpfnuzmey-afbzajyvgahh58+mccgiq...@mail.gmail.com https://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. -Hoss
Re: Performance troubles with solr
: q=photo_id:* AND gender:true AND country:MALAWI AND online:false photo_id:* does not mean what you probably think it means. you most likely want photo_id:[* TO *] given your current schema, but i would recommend adding a new has_photo boolean field and using that instead. thta alone should explain a big part of what those queries would be slow. you didn't describe how your q param varies in your test queries (just your fq). I'm assuming gender and online can vary, and that you sometimes don't use the photo_id clauses, and that the country clause can vary, but that these clauses are always all mandatory. in which case i would suggest using fq for all of them individually, and leaving your q param as *:* (unless you sometimes sort on the actual solr score, in which case leave it as whatever part of hte queyr you actually want to contribute to hte score) Lastly: I don't remember off the top of my head how int and tinit are defined in the example solrconfig files, but you should consider your usage of them carefully -- particularly with the precisionStep and which fields you do range queries on. -Hoss
facet.method=fc
Is the parameter facet.method=fc still needed ? Thank you. Patrick.
Re: facet.method=fc
: Is the parameter facet.method=fc still needed ? https://wiki.apache.org/solr/SimpleFacetParameters#facet.method The default value is fc (except for BoolField) since it tends to use less memory and is faster when a field has many unique terms in the index. -Hoss
Re: Index not getting refreshed
: I am using Solr 3.2 on a live website. i get live user's data of about 2000 : per day. I do an incremental index every 8 hours. but my search results : always show the same result with same sorting order. when i check the same Are you commiting? Are you using replication? Are you using a sort order that might not make it obvious that the new docs are actaully there? (ie: sort=timestamp asc) -Hoss
Document frequency for all documents found by a query
Hi there I'm using Solr to do some category mapping, and part of this process consists of finding frequently occuring terms for each category id. My index consists of a number of documents (mostly containing between 1 and 4 tokens), and a category id that this document belongs to. Ideally I'd like to generate document frequencies for each term restricted by category, but when I use the following http request it gives me the frequencies over the whole index (ignoring the category ids). http://localhost:8983/solr/select?qt=tvrhq=category_id:9fl=xtv.all=truerows=1000 Is it possible to make Solr return document frequency over just the documents returned from the query? If not what is the proper way to do this? Thanks, Tomek Rej
Re: Document frequency for all documents found by a query
Nevermind I just discovered faceting which does exactly what I want. Sorry about that. On Thu, Sep 15, 2011 at 11:31 AM, Tomek Rej tomek@roamz.com wrote: Hi there I'm using Solr to do some category mapping, and part of this process consists of finding frequently occuring terms for each category id. My index consists of a number of documents (mostly containing between 1 and 4 tokens), and a category id that this document belongs to. Ideally I'd like to generate document frequencies for each term restricted by category, but when I use the following http request it gives me the frequencies over the whole index (ignoring the category ids). http://localhost:8983/solr/select?qt=tvrhq=category_id:9fl=xtv.all=truerows=1000 Is it possible to make Solr return document frequency over just the documents returned from the query? If not what is the proper way to do this? Thanks, Tomek Rej
Re: any docs on using the GeoHashField?
When I retrieve the value the lat/lon pair that comes out is not exactly the same as what I indexed, which made be think it was actually stored as the hash and then transformed back? Anyhow - I'm trying to understand the actual use case for the field as it exists - essentially you are saying I could query with a geohash and use data in this field type to do a distance-based filter from the lat,lon point corresponding to the geohash? -Peter On Thu, Sep 8, 2011 at 5:34 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : I would think I could index a lat,lon pair into a GeoHashField (that : works) and then retrieve the field value to see the computed geohash. ... : What am I missing - how can I retrieve the hash? I don't think it's designed to work that way. GeoHashField provides GeoHash based search support for lat/lon values through it's internal (indexed) representaiton -- much like TrieLongField provides efficient range queries using trie encoding -- but the stored value is still the lat/lon pair (just as a TrieLongField is still the long value) If you want to store/retrive a raq GeoHash string, i think you have to compute it yourself (or put the logic in an UpdateProcessor). org.apache.lucene.spatial.geohash.GeoHashUtils should take care of all the heavy lifting for you. -Hoss -- Peter M. Wolanin, Ph.D. : Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com : 781-313-8322 Get a free, hosted Drupal 7 site: http://www.drupalgardens.com;
Complex Fields, Indexing Storing
Hi all, I have a quick question about how complex fields (subFields etc) interact with indexing and storage. Let's say I have this (partial) schema: fieldType name=location class=solr.LatLonType subFieldSuffix=_coordinate/ … field name=pnt type=location indexed=true stored=true required=true / dynamicField name=*_coordinate type=tdouble indexed=true stored=false/ (slightly modified from the example/ project). If my goals were to: 1) Always query the location by lat,lng (never the subfields directly) 2) Never need to pull out the 'pnt' value from results (I always just need the document's ID back, not its contents) 3) Not have any unnecessary indexes Is there anything that I should change about the set-up above? Will the schema as it stands index both the two dynamic fields that get created, as well as the 'pnt' field above them (in effect creating 3 indexes)? Or is the indexed='true' on the 'pnt' field just needed so it properly connects to the two indexes created for the dynamic fields? Thanks in advance, Mike
Re: Index not getting refreshed
I am commiting but not doing replication now. Mine sort order also includes last login timestamp. the new profiles are being reflected in my SOLR admin db. but its not listed on my website. On Thu, Sep 15, 2011 at 4:25 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I am using Solr 3.2 on a live website. i get live user's data of about 2000 : per day. I do an incremental index every 8 hours. but my search results : always show the same result with same sorting order. when i check the same Are you commiting? Are you using replication? Are you using a sort order that might not make it obvious that the new docs are actaully there? (ie: sort=timestamp asc) -Hoss
Re: Index not getting refreshed
I have written simple java code to index my data. i am creating xml documents adding in the index. sorry, but due to company's policy could not share the configuration details here. On Wed, Sep 14, 2011 at 10:42 PM, Rahul Warawdekar rahul.warawde...@gmail.com wrote: Hi Pawan, Can you please share more details on the indexing mechanism ? (DIH, SolrJ or any other) Please let us know the configuration details. On Wed, Sep 14, 2011 at 12:48 PM, Pawan Darira pawan.dar...@gmail.com wrote: Hi I am using Solr 3.2 on a live website. i get live user's data of about 2000 per day. I do an incremental index every 8 hours. but my search results always show the same result with same sorting order. when i check the same search from corresponding db, it gives me different results always (as new data regularly gets added) please suggest what might be the issue. is there any cache related problem at SOLR level thanks pawan -- Thanks and Regards Rahul A. Warawdekar
Re: indexing data from rich documents - Tika with solr3.1
Hi Erick Erickson, Now, we have many files format(doc, ppt, pdf, ...), File's purpose serve to search details content of education in that files. Because i am new solr, so maybe i understand not enough depth about Apache Tika. At the moment i can't index pdf files from http, with one file is ok. Thank for your attention. -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-data-from-rich-documents-Tika-with-solr3-1-tp3322555p3337963.html Sent from the Solr - User mailing list archive at Nabble.com.