Re: Solr and DateTimes - bug?
Hi Mauricio, Thanks for the suggestions :) I'm already running mono 2.10.5 so i should be safe.. And thanks to everybody for quick answers and friendly attitude. Best regards, Nicklas On 2011-09-13 03:01, Mauricio Scheffer wrote: Hi Nicklas, Use a nullable DateTime type instead of MinValue. It's semantically more correct, and SolrNet will do the right mapping. I also heard that Mono had a bug in date parsing, it didn't behave just like .NET : https://github.com/mausch/SolrNet/commit/f3a76ea5535633f4b301e644e25eb2dc7f0cb7ef IIRC this bug was fixed in Mono 2.10 or so, so make sure you're running the latest version. Finally, there's a specific mailing list for questions about SolrNet: http://groups.google.com/group/solrnet Cheers, Mauricio On Mon, Sep 12, 2011 at 7:54 AM, Nicklas Overgaardnick...@isharp.dkwrote: I see. I'm using that date to flag that my entity has not yet ended. I can just use another constant which Solr is capable of returning in the correct format. The nice thing about DateTime.MinValue is that it's just part of the .net framework :) Hope that the issue is resolved at some point. I'm wondering if it would be possible for you (or someone else) to fix the issue with years from 1 to 999 being formatted incorrectly, and then creating a new ticket for the issue with negative years? Best regards, Nicklas On 2011-09-12 07:02, Chris Hostetter wrote: : The XML output when performing a query via the solr interface is like this: :datename=endDate1-01-**01T00:00:00Z/date i think you mean:date name=endDate1-01-01T00:00:**00Z/date : So my question is: Is this a bug in the solr output engine, or should mono : be able to parse the date as given from solr? I have not yet tried it out : on .net as I do not have access to a windows machine at the moment. it is in fact a bug in Solr that not a lot of people have been overly concerned with some most people don't deal with dates that far back https://issues.apache.org/**jira/browse/SOLR-1899https://issues.apache.org/jira/browse/SOLR-1899 ...I spent a little time working on it at one point but got side tracked by other things since there are a coupld of related issues with the canonical iso8601 date format arround year 0 that made it non obvious what hte ideal solution was. -Hoss
Re: question about Field Collapsing/ grouping
Isn't that what the parameter group.ngroups=true is for? -- View this message in context: http://lucene.472066.n3.nabble.com/question-about-Field-Collapsing-grouping-tp3331821p3332471.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: question about Field Collapsing/ grouping
yup .. seems the group count feature is included now, as mentioned by Klein. Regards, Jayendra On Tue, Sep 13, 2011 at 8:27 AM, O. Klein kl...@octoweb.nl wrote: Isn't that what the parameter group.ngroups=true is for? -- View this message in context: http://lucene.472066.n3.nabble.com/question-about-Field-Collapsing-grouping-tp3331821p3332471.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: can indexing information stored in db rather than filesystem?
Thanks for u r reply guys As suggested i agree that we are losing many of the benefits of Solr/Lucene but i still want to store the index output (index files) in db table please suggest what are the steps i need to follow to configure the db with SOLR Engine (As how we done in it solrconfig.xml dataDir${solr.data.dir:}/dataDir similarly i would like to give the path for db table)..? -- View this message in context: http://lucene.472066.n3.nabble.com/can-indexing-information-stored-in-db-rather-than-filesystem-tp3319687p3332663.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: can indexing information stored in db rather than filesystem?
I'm curious; what benefits do you think you'll get by storing the files in some DB table? On Tuesday 13 September 2011 15:51:19 kiran.bodigam wrote: Thanks for u r reply guys As suggested i agree that we are losing many of the benefits of Solr/Lucene but i still want to store the index output (index files) in db table please suggest what are the steps i need to follow to configure the db with SOLR Engine (As how we done in it solrconfig.xml dataDir${solr.data.dir:}/dataDir similarly i would like to give the path for db table)..? -- View this message in context: http://lucene.472066.n3.nabble.com/can-indexing-information-stored-in-db-r ather-than-filesystem-tp3319687p3332663.html Sent from the Solr - User mailing list archive at Nabble.com. -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
RE: can indexing information stored in db rather than filesystem?
I don't think you understand. Solr does not have the code to do that. It just isn't there, nor would I expect it would ever be there. Solr is open source though. You could look at the code and figure out how to do it (though why anyone would do that remains beyond my ability to understand). As the saying goes: Knock yourself out. (Happy programmer's day to all. http://en.wikipedia.org/wiki/Programmers'_Day ). JRJ -Original Message- From: kiran.bodigam [mailto:kiran.bodi...@gmail.com] Sent: Tuesday, September 13, 2011 8:51 AM To: solr-user@lucene.apache.org Subject: Re: can indexing information stored in db rather than filesystem? Thanks for u r reply guys As suggested i agree that we are losing many of the benefits of Solr/Lucene but i still want to store the index output (index files) in db table please suggest what are the steps i need to follow to configure the db with SOLR Engine (As how we done in it solrconfig.xml dataDir${solr.data.dir:}/dataDir similarly i would like to give the path for db table)..? -- View this message in context: http://lucene.472066.n3.nabble.com/can-indexing-information-stored-in-db-rather-than-filesystem-tp3319687p3332663.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: can indexing information stored in db rather than filesystem?
On Sep 13, 2011, at 6:51 AM, kiran.bodigam wrote: As suggested i agree that we are losing many of the benefits of Solr/Lucene but i still want to store the index output (index files) in db table please suggest what are the steps i need to follow to configure the db with SOLR Engine The steps are: 1. write the Java code to do that 2. submit it as contrib, because it is such a bad idea that I doubt it will be added to the common code wunder -- Walter Underwood
using a function query with OR and spaces?
I had queries breaking on me when there were spaces in the text I was searching for. Originally I had : fq=state_s:New York and that would break, I found a work around by using: fq={!raw f=state_s}New York My problem now is doing this with an OR query, this is what I have now, but it doesn't work: fq=({!raw f=country_s}United States OR {!raw f=city_s}New York
Re: How to combine RSS w/ Tika when using Data Import Handler (DIH)
Hello Everyone, I've been investigating and I understand that using the RegexTransformer is an option that is open for identifying and extracting data to multiple fields from a single rss value source ... But rather than hack together something I once again wanted to check with the community: Is there another option for navigating the HTML DOM tree using some well-tested transformer or TIka or something? Thanks! - Pulkit On Mon, Sep 12, 2011 at 1:45 PM, Pulkit Singhal pulkitsing...@gmail.comwrote: Given an RSS raw feed source link such as the following: http://persistent.info/cgi-bin/feed-proxy?url=http%3A%2F%2Fwww.amazon.com%2Frss%2Ftag%2Fblu-ray%2Fnew%2Fref%3Dtag_rsh_hl_ersn I can easily get to the value of the description for an item like so: field column=description xpath=/rss/item/description / But the content of description happens to be in HTML and sadly it is this HTML chunk that has some pretty decent information that I would like to import as well. 1) For example it has the image for the item: img src= http://ecx.images-amazon.com/images/I/51yyAAoYzKL._SL160_SS160_.jpg; ... / 2) It has the price for the item: span class=tgProductPrice$13.99/span And many other useful pieces of data that aren't in a proper rss format but they are simply thrown together inside the html chunk that is served as the value for the xpath=/rss/item/description So, how can I configure DIH to start importing this html information as well? Is Tika the way to go? Can someone give a brief example of what a config file with both Tika config and RSS config would/should look like? Thanks! - Pulkit
Re: using a function query with OR and spaces?
On Sep 13, 2011, at 8:37 AM, Jason Toy wrote: I had queries breaking on me when there were spaces in the text I was searching for. Originally I had : fq=state_s:New York and that would break, I found a work around by using: fq={!raw f=state_s}New York My problem now is doing this with an OR query, this is what I have now, but it doesn't work: fq=({!raw f=country_s}United States OR {!raw f=city_s}New York Couldn't you do: fq=(country_s:(United States) OR city_s:(New York)) I think that should work though you probably will need to surround the queries with quotes to get the exact phrase match.
Re: using a function query with OR and spaces?
: Subject: using a function query with OR and spaces? First off, what you are asking about is a filter query not a function query https://wiki.apache.org/solr/CommonQueryParameters#fq : I had queries breaking on me when there were spaces in the text I was : searching for. Originally I had : : : fq=state_s:New York : and that would break, I found a work around by using: : : fq={!raw f=state_s}New York assuming the field is a StrField, the raw or Term QParsers will work, or you can quote the value using something like fq=stats_s:New York : My problem now is doing this with an OR query, this is what I have now, but : it doesn't work: ... : fq=({!raw f=country_s}United States OR {!raw f=city_s}New York That's beacause: a) local params (ie: the {! ...} syntax) must comeat the start of a SOlr Param as an instruction of how to parse it. b) the raw and term QParsers don't support *any* query markup/syntax (like OR modifiers). If you want to build a complex query using multiple clauses that are constucted using specific QParsers, you need to build them up using multiple query params and/or the _query_ hook in the LuceneQParser... fq=_query_:{!term f=state_s}New York OR _query_:{!term f=country_s}United States https://wiki.apache.org/solr/LocalParams http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/ -Hoss
Re: How to combine RSS w/ Tika when using Data Import Handler (DIH)
: I've been investigating and I understand that using the RegexTransformer is : an option that is open for identifying and extracting data to multiple : fields from a single rss value source ... But rather than hack together : something I once again wanted to check with the community: Is there another : option for navigating the HTML DOM tree using some well-tested transformer : or TIka or something? I don't think so ... if it's a *really* wellformed feed, then the description will actually be xhtml nodes (with the appropriate namespace) that are already part of the Document's DOM. But if it's just a blob of CDATA that happens to contain welformed HTML, then I think a regex is currently your best option -- you'll probably want something tailor made for the subtleties of the site whose RSS you're scraping anyway since things like are chars in the URLs html escaped? is going to vary from site to site. It would probably be possible to write a DIH Transformer based on something like tagsoup to actually produce a DOM from an arbitrary html string in an entity, so you could then treat it as a subentity and use the XPathEntityProcessor -- but i don't think i've seen anyone talk about doing anything like that before. -Hoss
Highlight compounded word instead of part
I am using DictionaryCompoundWordTokenFilterFactory and want to highlight the whole word instead of the part that matched the dictionary. So when the query word matches on compoundedword, the whole word compoundedword is highlighted, instead of just word. Any ideas? -- View this message in context: http://lucene.472066.n3.nabble.com/Highlight-compounded-word-instead-of-part-tp225p225.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr messing up the UK GBP (pound) symbol in response, even though Java environment variabe has file encoding is set to UTF 8....
: Any idea why solr is unable to return the pound sign as-is? : : I tried typing in £ 1 million in Solr admin GUI and got following response. ... : str name=q£ 1 million/str ... : Here is my Java Properties I got also from admin interface: ... : catalina.home = : /home/rbhagdev/SCCRepos/SCC_Platform/search/solr/target/ Looks like you are using tomcat, so I suspect you are getting bit by this... https://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config If that's not the problem, please try running the example/exampledocs/test_utf8.sh script against your Solr instance (you'll need to change the URL variable to match your host:port) -Hoss
How to plug a new ANTLR grammar
Hi, The standard lucene/solr parsing is nice but not really flexible. I saw questions and discussion about ANTLR, but unfortunately never a working grammar, so... maybe you find this useful: https://github.com/romanchyla/montysolr/tree/master/src/java/org/apache/lucene/queryParser/iqp/antlr In the grammar, the parsing is completely abstracted from the Lucene objects, and the parser is not mixed with Java code. At first it produces structures like this: https://svnweb.cern.ch/trac/rcarepo/raw-attachment/wiki/MontySolrQueryParser/index.html But now I have a problem. I don't know if I should use query parsing framework in contrib. It seems that the qParser in contrib can use different parser generators (the default JavaCC, but also ANTLR). But I am confused and I don't understand this new queryParser from contrib. It is really very confusing to me. Is there any benefit in trying to plug the ANTLR tree into it? Because looking at the AST pictures, it seems that with a relatively simple tree walker we could build the same queries as the current standard lucene query parser. And it would be much simpler and flexible. Does it bring something new? I have a feeling I miss something... Many thanks for help, Roman
Re: How to serach on specific file types ?
1- How can I put the file extension into my index? I'm using Nutch to crawling web pages and sending Nutch's data to Solr for indexing. and I have no idea to put the file extension to my index. 2- please give me some help links about mime type. I'm new to Solr and don't know anything about mime type. please note that I should index data of Nutch and I couldn't find useful commands in Nutch tutorial for advanced indexing! thank you very much On Mon, Sep 12, 2011 at 6:07 PM, Jaeger, Jay - DOT jay.jae...@dot.wi.govwrote: Some possibilities: 1) Put the file extension into your index (that is what we did when we were testing indexing documents with Solr) 2) Put a mime type for the document into your index. 3) Put the whole file name / URL into your index, and match on part of the name. This will give some false positives. JRJ -Original Message- From: ahmad ajiloo [mailto:ahmad.aji...@gmail.com] Sent: Monday, September 12, 2011 5:58 AM To: solr-user@lucene.apache.org Subject: Fwd: How to serach on specific file types ? Hello I want to search on articles. So need to find only specific files like doc, docx, and pdf. I don't need any html pages. Thus the result of our search should only consists of doc, docx, and pdf files. can you help me?
Get field value in custom searchcomponent (solr 3.3)
What is the best way to get a float field value from docID? I tried the following code but when it runs throws an exception For input string: `??eI at line float lat = Float.parseFloat(tlat); schemal.xml: ... fieldType name=float class=solr.TrieFloatField precisionStep=0 omitNorms=true positionIncrementGap=0/ ... field name=latitude type=float indexed=true stored=true multiValued=false / component.java: @Override public void process(ResponseBuilder rb) throws IOException { DocSet docs = rb.getResults().docSet; SolrIndexSearcher searcher = req.getSearcher() FieldCache.StringIndex slat = FieldCache.DEFAULT.getStringIndex(searcher.getReader(), latitude); DocIterator iter = docs.iterator(); while (iter.hasNext()) { int docID = iter.nextDoc(); String tlat = slat.lookup[slat.order[docID]]; if (tlat != null) { float lat = Float.parseFloat(tlat); //Exception! } } } Thanks, Pablo
Re: How to serach on specific file types ?
: 1- How can I put the file extension into my index? I'm using Nutch to : crawling web pages and sending Nutch's data to Solr for indexing. and I have : no idea to put the file extension to my index. : 2- please give me some help links about mime type. I'm new to Solr and don't : know anything about mime type. please note that I should index data of Nutch : and I couldn't find useful commands in Nutch tutorial for advanced indexing! : thank you very much I think you need to ask on the nutch user's list about the type of schema nutch uses when indexing into Solr, wether it creates a specific field for file extension, and/or how you can modify the nutch indexer to create a field like that for you. Assuming you get nutch to create a field named extension you can query solr for only docs that have a certain extension by adding it as an fq... q=what i wantfq=extension:doc -Hoss
Re: How to serach on specific file types ?
1- How can I put the file extension into my index? I'm using Nutch to crawling web pages and sending Nutch's data to Solr for indexing. and I have no idea to put the file extension to my index. To get the file extension in a separate field you can copyField the url and use Solr's char pattern replace filter to strip away everything up to the last dot, if there is any. 2- please give me some help links about mime type. I'm new to Solr and don't know anything about mime type. please note that I should index data of Nutch and I couldn't find useful commands in Nutch tutorial for advanced indexing! thank you very much Use Nutch' index-more plugin. It'll by default add two or three values to a multi valued field (type); both sub-types and the complete mime-type of i'm not mistaken. There's a configuration directive to have it only index the complete mime-type. On Mon, Sep 12, 2011 at 6:07 PM, Jaeger, Jay - DOT jay.jae...@dot.wi.govwrote: Some possibilities: 1) Put the file extension into your index (that is what we did when we were testing indexing documents with Solr) 2) Put a mime type for the document into your index. 3) Put the whole file name / URL into your index, and match on part of the name. This will give some false positives. JRJ -Original Message- From: ahmad ajiloo [mailto:ahmad.aji...@gmail.com] Sent: Monday, September 12, 2011 5:58 AM To: solr-user@lucene.apache.org Subject: Fwd: How to serach on specific file types ? Hello I want to search on articles. So need to find only specific files like doc, docx, and pdf. I don't need any html pages. Thus the result of our search should only consists of doc, docx, and pdf files. can you help me?
Re: using a function query with OR and spaces?
I wrote the title wrong, its a filter query, not a function query, thanks for the correction. The field is a string, I had tried fq=stats_s:New York before and that did not work, I'm puzzled to why this didn't work. I tried out your b suggestion and that worked,thanks! On Tue, Sep 13, 2011 at 9:00 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Subject: using a function query with OR and spaces? First off, what you are asking about is a filter query not a function query https://wiki.apache.org/solr/CommonQueryParameters#fq : I had queries breaking on me when there were spaces in the text I was : searching for. Originally I had : : : fq=state_s:New York : and that would break, I found a work around by using: : : fq={!raw f=state_s}New York assuming the field is a StrField, the raw or Term QParsers will work, or you can quote the value using something like fq=stats_s:New York : My problem now is doing this with an OR query, this is what I have now, but : it doesn't work: ... : fq=({!raw f=country_s}United States OR {!raw f=city_s}New York That's beacause: a) local params (ie: the {! ...} syntax) must comeat the start of a SOlr Param as an instruction of how to parse it. b) the raw and term QParsers don't support *any* query markup/syntax (like OR modifiers). If you want to build a complex query using multiple clauses that are constucted using specific QParsers, you need to build them up using multiple query params and/or the _query_ hook in the LuceneQParser... fq=_query_:{!term f=state_s}New York OR _query_:{!term f=country_s}United States https://wiki.apache.org/solr/LocalParams http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/ -Hoss
Re: Adding Query Filter custom implementation to Solr's pipeline
: If you do need to implement something truely custom, writing it as your : own QParser to trigger via an fq can be advantageous so it can cached : and re-used by many queries. I forgot to mention a very cool new feature that is about to be released in Solr 3.4 You can now instruct Solr that an fq filter query should not be cached, in which case Solr will only consult it after executing the main query -- which can be handy if you have some filtering logic that is very expensive to compute for each document, and you only wnat to evaluate for documents that have already been matched by the main query and all other filter queries. Details are on the wiki... https://wiki.apache.org/solr/CommonQueryParameters#Caching_of_filters -Hoss
Re: How to plug a new ANTLR grammar
I'd love to see the progress on this. On Tue, Sep 13, 2011 at 10:34 AM, Roman Chyla roman.ch...@gmail.com wrote: Hi, The standard lucene/solr parsing is nice but not really flexible. I saw questions and discussion about ANTLR, but unfortunately never a working grammar, so... maybe you find this useful: https://github.com/romanchyla/montysolr/tree/master/src/java/org/apache/lucene/queryParser/iqp/antlr In the grammar, the parsing is completely abstracted from the Lucene objects, and the parser is not mixed with Java code. At first it produces structures like this: https://svnweb.cern.ch/trac/rcarepo/raw-attachment/wiki/MontySolrQueryParser/index.html But now I have a problem. I don't know if I should use query parsing framework in contrib. It seems that the qParser in contrib can use different parser generators (the default JavaCC, but also ANTLR). But I am confused and I don't understand this new queryParser from contrib. It is really very confusing to me. Is there any benefit in trying to plug the ANTLR tree into it? Because looking at the AST pictures, it seems that with a relatively simple tree walker we could build the same queries as the current standard lucene query parser. And it would be much simpler and flexible. Does it bring something new? I have a feeling I miss something... Many thanks for help, Roman -- - sent from my mobile 6176064373
Out of memory
I have solr running on a machine with 18Gb Ram , with 4 cores. One of the core is very big containing 77516851 docs, the stats for searcher given below searcherName : Searcher@5a578998 main caching : true numDocs : 77516851 maxDoc : 77518729 lockFactory=org.apache.lucene.store.NativeFSLockFactory@5a9c5842 indexVersion : 1308817281798 openedAt : Tue Sep 13 18:59:52 GMT 2011 registeredAt : Tue Sep 13 19:00:55 GMT 2011 warmupTime : 63139 . Is there a way to reduce the number of docs loaded into memory for this core? . At any given time I dont need data more than past 15 days, unless someone queries for it explicetly. How can this be achieved? . Will it be better to go for Solr replication or distribution if there is little option left Regards, Rohit Mobile: +91-9901768202 About Me: http://about.me/rohitg http://about.me/rohitg
Lucene Grid question
Hi, I have a huge Lucene index, which I'd like to split between machines (Grid). E.g. say I have a chain of book-stores, in different countries, and I'm aiming for the following: - Each country has its own index file, on its own machine (e.g. books from Japan are indexed on machine japan1) - Most users search only within their own country (e.g. search only the japan1 index) - But sometimes, they might ask to search the entire chain (all countries), meaning some sort of map/reduce (=collect data from all countries). The main challenge is the entire chain search, especially if I want reasonable ranking. After some investigation (+great help from Hibernate Search forum), I've seen the following suggestions: 1) Implement a LuceneDirectory that transparently spreads across several machines. I'm not sure how the Search would work - can I ask each index for *relevant* data only? Or would I need to maintain one huge combined file, allowing random access for the Searcher? 2) Run an IndexReader on each machine. They tell me each reader can report its relevant term-frequencies, and based on that I can fetch relevant results from each machine. Apparently the ranking won't be perfect (for the overhaul result), but bearable. Now, I'm not familiar with Lucene internals, and would really appreciate your views on it. - Any good articles on Lucene Gridding? - Any idea whether approach #1 makes any sense (IMHO it's not very sensible if I need to merge everything to a single huge file). - Any good implementations (of either approaches)? So far I found Hibernate Search 4, and Solandra. Thanks very much.
Using the contrib flexible query parser in Solr
Has anyone used the Flexible Query Parser (https://issues.apache.org/jira/browse/LUCENE-1567) in Solr? I'm just starting to look at it for the first time and was wondering if it is something that can be dropped into Solr fairly easily, or if more extensive changes are needed. I thought perhaps someone had already done this, but I couldn't find anything in the Solr bug tracker. -Michael
Re: How to plug a new ANTLR grammar
Roman, I'm not familiar with the contrib, but you can write your own Java code to create Query objects from the tree produced by your lexer and parser something like this: StandardLuceneGrammarLexer lexer = new ANTLRReaderStream(new StringReader(queryString)); CommonTokenStream tokens = new CommonTokenStream(lexer); StandardLuceneGrammarParser parser = new StandardLuceneGrammarParser(tokens); StandardLuceneGrammarParser.query_return ret = parser.mainQ(); CommonTree t = (CommonTree) ret.getTree(); parseTree(t); parseTree (Tree t) { // recursively parse the Tree, visit each node visit (node); } visit (Tree node) { switch (node.getType()) { case (StandardLuceneGrammarParser.AND: // Create BooleanQuery, push onto stack ... } } I use the stack to build up the final Query from the queries produced in the tree parsing. Hope this helps. Peter On Tue, Sep 13, 2011 at 3:16 PM, Jason Toy jason...@gmail.com wrote: I'd love to see the progress on this. On Tue, Sep 13, 2011 at 10:34 AM, Roman Chyla roman.ch...@gmail.com wrote: Hi, The standard lucene/solr parsing is nice but not really flexible. I saw questions and discussion about ANTLR, but unfortunately never a working grammar, so... maybe you find this useful: https://github.com/romanchyla/montysolr/tree/master/src/java/org/apache/lucene/queryParser/iqp/antlr In the grammar, the parsing is completely abstracted from the Lucene objects, and the parser is not mixed with Java code. At first it produces structures like this: https://svnweb.cern.ch/trac/rcarepo/raw-attachment/wiki/MontySolrQueryParser/index.html But now I have a problem. I don't know if I should use query parsing framework in contrib. It seems that the qParser in contrib can use different parser generators (the default JavaCC, but also ANTLR). But I am confused and I don't understand this new queryParser from contrib. It is really very confusing to me. Is there any benefit in trying to plug the ANTLR tree into it? Because looking at the AST pictures, it seems that with a relatively simple tree walker we could build the same queries as the current standard lucene query parser. And it would be much simpler and flexible. Does it bring something new? I have a feeling I miss something... Many thanks for help, Roman -- - sent from my mobile 6176064373
Re: OOM issue
Multiple webapps will not help you, they're still on the underlying memory. In fact, it'll make matters worse since they won't share resources. So questions become: 1 Why do you have 10 cores? Putting 10 cores on the same machine doesn't really do much. It can make lots of sense to put 10 cores on the same machine for *indexing*, then replicate them out. But putting 10 cores on one machine in hopes of making better use of memory isn't useful. It may be useful to just go to one core. 2 Indexing, reindexing and searching on a single machine is requiring a lot from that machine. Really you should consider having a master/slave setup. 3 But assuming more hardware of any sort isn't in the cards, sure. reduce your cache sizes. Look at ramBufferSizeMB and make it small. 4 Consider indexing with Tika via SolrJ and only sending the finished document to Solr. Best Erick On Mon, Sep 12, 2011 at 5:42 AM, Manish Bafna manish.bafna...@gmail.com wrote: Number of cache is definitely going to reduce heap usage. Can you run those xlsx file separately with Tika and see if you are getting OOM issue. On Mon, Sep 12, 2011 at 3:09 PM, abhijit bashetti abhijitbashe...@gmail.com wrote: I am facing the OOM issue. OTHER than increasing the RAM , Can we chnage some other parameters to avoid the OOM issue. such as minimizing the filter cache size , document cache size etc. Can you suggest me some other option to avoid the OOM issue? Thanks in advance! Regards, Abhijit
Re: Document row in solr Result
Not sure if it really applies, but consider the QueryElevationComponent. It can force the display of certain documents (identified by search term) to the top of the results list. Best Erick On Mon, Sep 12, 2011 at 5:44 AM, Eric Grobler impalah...@googlemail.com wrote: Hi Pierre, Great idea, that will speed things up! Thank your very much. Regards Ericz On Mon, Sep 12, 2011 at 10:19 AM, Pierre GOSSE pierre.go...@arisem.comwrote: Hi Eric, If you want a query informing one customer of its product row at any given time, the easiest way is to filter on submission date greater than this customer's and return the result count. If you have 500 products with an earlier submission date, your row number is 501. Hope this helps, Pierre -Message d'origine- De : Eric Grobler [mailto:impalah...@googlemail.com] Envoyé : lundi 12 septembre 2011 11:00 À : solr-user@lucene.apache.org Objet : Re: Document row in solr Result Hi Manish, Thank you for your time. For upselling reasons I want to inform the customer that: your product is on the last page of the search result. However, click here to put your product back on the first page... Here is an example: I have a phone with productid 635001 in the iphone category. When I sort this category by submissiondate this product will be near the end of the result (on row 9863 in this example). At the moment I have to scan nearly 1 rows in the client to determine the position of this product. Is there a more efficient way to find the position of a specific document in a resultset without returning the full result? q=category:iphone fl=productid sort=submissiondate desc rows=1 row productid submissiondate 1 656569 2011-09-12 08:12 2 656468 2011-09-12 08:03 3 656201 2011-09-11 23:41 ... 9863 635001 2011-08-11 17:22 ... 9922 634423 2011-08-10 21:51 Regards Ericz On Mon, Sep 12, 2011 at 9:38 AM, Manish Bafna manish.bafna...@gmail.com wrote: You might not be able to find the row index. Can you post your query in detail. The kind of inputs and outputs you are expecting. On Mon, Sep 12, 2011 at 2:01 PM, Eric Grobler impalah...@googlemail.com wrote: Hi Manish, Thanks for your reply - but how will that return me the row index of the original query. Regards Ericz On Mon, Sep 12, 2011 at 9:24 AM, Manish Bafna manish.bafna...@gmail.com wrote: fq - filter query parameter searches within the results. On Mon, Sep 12, 2011 at 1:49 PM, Eric Grobler impalah...@googlemail.com wrote: Hi Solr experts, If you have a site with products sorted by submission date, the product of a customer might be on page 1 on the first day, and then move down to page x as other customers submit newer entries. To find the row of a product you can of course run the query and loop through the result until you find the specific productid like: q=category:myproducttype fl=productid sort=submissiondate desc rows=1 But is there perhaps a more efficient way to do this? Maybe a special syntax to search within the result. Thanks Ericz
RE: can indexing information stored in db rather than filesystem?
Nicely put. ;^) -Original Message- From: Walter Underwood [mailto:wun...@wunderwood.org] Sent: Tuesday, September 13, 2011 9:16 AM To: solr-user@lucene.apache.org Subject: Re: can indexing information stored in db rather than filesystem? On Sep 13, 2011, at 6:51 AM, kiran.bodigam wrote: As suggested i agree that we are losing many of the benefits of Solr/Lucene but i still want to store the index output (index files) in db table please suggest what are the steps i need to follow to configure the db with SOLR Engine The steps are: 1. write the Java code to do that 2. submit it as contrib, because it is such a bad idea that I doubt it will be added to the common code wunder -- Walter Underwood
RE: Out of memory
numDocs is not the number of documents in memory. It is the number of documents currently in the index (which is kept on disk). Same goes for maxDocs, except that it is a count of all of the documents that have ever been in the index since it was created or optimized (including deleted documents). Your subject indicates that something is giving you some kind of Out of memory error. We might better be able to help you if you provide more information about your exact problem. JRJ -Original Message- From: Rohit [mailto:ro...@in-rev.com] Sent: Tuesday, September 13, 2011 2:29 PM To: solr-user@lucene.apache.org Subject: Out of memory I have solr running on a machine with 18Gb Ram , with 4 cores. One of the core is very big containing 77516851 docs, the stats for searcher given below searcherName : Searcher@5a578998 main caching : true numDocs : 77516851 maxDoc : 77518729 lockFactory=org.apache.lucene.store.NativeFSLockFactory@5a9c5842 indexVersion : 1308817281798 openedAt : Tue Sep 13 18:59:52 GMT 2011 registeredAt : Tue Sep 13 19:00:55 GMT 2011 warmupTime : 63139 . Is there a way to reduce the number of docs loaded into memory for this core? . At any given time I dont need data more than past 15 days, unless someone queries for it explicetly. How can this be achieved? . Will it be better to go for Solr replication or distribution if there is little option left Regards, Rohit Mobile: +91-9901768202 About Me: http://about.me/rohitg http://about.me/rohitg
Can index size increase when no updates/optimizes are happening?
One of my users observed that the index size (in bytes) increased over night. There was no indexing activity at that time, only querying was taking place. Running optimize brought the index size back down to what it was when indexing finished the day before. What could explain that?
Document Boost not evaluated when using standard Query Type?
Hey all I want to show all documents with of a certain type. The documents should be ordered by the index time document boost. So I expected that this would work: /select?debugQuery=onq=doctype:musicq.op=ORqt=standard But in fact every document gets the same score: 0.7306 = (MATCH) fieldWeight(doctype:music in 1), product of: 1.0 = tf(termFreq(doctype:music)=1) 0.7306 = idf(docFreq=37138, maxDocs=37138) 1.0 = fieldNorm(field=doctype, doc=1) So I am a bit confused now? When is the (index time) document boost evaluated? ( My understanding was that during indexing time the document field values are multiplicated - and that during search this will result in the higher scores? ) Is there a better way to get a list of all documents (matching a simple where clause) sorted by documents boost? Thanks for any hints. Daniel
Re: DIH load only selected documents with XPathEntityProcessor
This solution doesn't seem to be working for me. I am using Solr trunk and I have the same question as Bernd with a small twist: the field that should NOT be empty, happens to be a derived field called price, see the config below: entity ... transformer=RegexTransformer,HTMLStripTransformer,DateFormatTransformer, script:skipRow field column=description xpath=/rss/channel/item/description / field column=price regex=.*\$(\d*.\d*) sourceColName=description / ... /entity I have also changed the sample script to check the price field isntead of the link field that was being used as an example in this thread earlier: script ![CDATA[ function skipRow(row) { var price = row.get( 'price' ); if ( price == null || price == '' ) { row.put( '$skipRow', 'true' ); } return row; } ]] /script Does anyone have any thoughts on what I'm missing? Thanks! - Pulkit On Mon, Jan 10, 2011 at 3:06 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Hi Gora, thanks a lot, very nice solution, works perfectly. I will dig more into ScriptTransformer, seems to be very powerful. Regards, Bernd Am 08.01.2011 14:38, schrieb Gora Mohanty: On Fri, Jan 7, 2011 at 12:30 PM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Hello list, is it possible to load only selected documents with XPathEntityProcessor? While loading docs I want to drop/skip/ignore documents with missing URL. Example: documents document titlefirst title/title ididentifier_01/id linkhttp://www.foo.com/path/bar.html/link /document document titlesecond title/title ididentifier_02/id link/link /document /documents The first document should be loaded, the second document should be ignored because it has an empty link (should also work for missing link field). [...] You can use a ScriptTransformer, along with $skipRow/$skipDoc. E.g., something like this for your data import configuration file: dataConfig script![CDATA[ function skipRow(row) { var link = row.get( 'link' ); if( link == null || link == '' ) { row.put( '$skipRow', 'true' ); } return row; } ]]/script dataSource type=FileDataSource / document entity name=f processor=FileListEntityProcessor baseDir=/home/gora/test fileName=.*xml newerThan='NOW-3DAYS' recursive=true rootEntity=false dataSource=null entity name=top processor=XPathEntityProcessor forEach=/documents/document url=${f.fileAbsolutePath} transformer=script:skipRow field column=link xpath=/documents/document/link/ field column=title xpath=/documents/document/title/ field column=id xpath=/documents/document/id/ /entity /entity /document /dataConfig Regards, Gora
Re: DIH load only selected documents with XPathEntityProcessor
Oh and Im sure that I'm using Java 6 because the properties from the Solr webpage spit out: java.runtime.version = 1.6.0_26-b03-384-10M3425 On Tue, Sep 13, 2011 at 4:15 PM, Pulkit Singhal pulkitsing...@gmail.comwrote: This solution doesn't seem to be working for me. I am using Solr trunk and I have the same question as Bernd with a small twist: the field that should NOT be empty, happens to be a derived field called price, see the config below: entity ... transformer=RegexTransformer,HTMLStripTransformer,DateFormatTransformer, script:skipRow field column=description xpath=/rss/channel/item/description / field column=price regex=.*\$(\d*.\d*) sourceColName=description / ... /entity I have also changed the sample script to check the price field isntead of the link field that was being used as an example in this thread earlier: script ![CDATA[ function skipRow(row) { var price = row.get( 'price' ); if ( price == null || price == '' ) { row.put( '$skipRow', 'true' ); } return row; } ]] /script Does anyone have any thoughts on what I'm missing? Thanks! - Pulkit On Mon, Jan 10, 2011 at 3:06 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Hi Gora, thanks a lot, very nice solution, works perfectly. I will dig more into ScriptTransformer, seems to be very powerful. Regards, Bernd Am 08.01.2011 14:38, schrieb Gora Mohanty: On Fri, Jan 7, 2011 at 12:30 PM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Hello list, is it possible to load only selected documents with XPathEntityProcessor? While loading docs I want to drop/skip/ignore documents with missing URL. Example: documents document titlefirst title/title ididentifier_01/id linkhttp://www.foo.com/path/bar.html/link /document document titlesecond title/title ididentifier_02/id link/link /document /documents The first document should be loaded, the second document should be ignored because it has an empty link (should also work for missing link field). [...] You can use a ScriptTransformer, along with $skipRow/$skipDoc. E.g., something like this for your data import configuration file: dataConfig script![CDATA[ function skipRow(row) { var link = row.get( 'link' ); if( link == null || link == '' ) { row.put( '$skipRow', 'true' ); } return row; } ]]/script dataSource type=FileDataSource / document entity name=f processor=FileListEntityProcessor baseDir=/home/gora/test fileName=.*xml newerThan='NOW-3DAYS' recursive=true rootEntity=false dataSource=null entity name=top processor=XPathEntityProcessor forEach=/documents/document url=${f.fileAbsolutePath} transformer=script:skipRow field column=link xpath=/documents/document/link/ field column=title xpath=/documents/document/title/ field column=id xpath=/documents/document/id/ /entity /entity /document /dataConfig Regards, Gora
Managing solr machines (start/stop/status)
I know this isn't a solr specific question but I was wondering what folks do in regards to managing the machines in their solr cluster? Are there any recommendations for how to start/stop/manage these machines? Any suggestions would be appreciated.
DIH skipping imports with skipDoc vs skipDoc
Hello, 1) The documented explanation of skipDoc and skipRow is not enough for me to discern the difference between them: $skipDoc : Skip the current document . Do not add it to Solr. The value can be String true/false $skipRow : Skip the current row. The document will be added with rows from other entities. The value can be String true/false Can someone please elaborate and help me out with an example? 2) I am working off the Solr trunk (4.x) and nothing I do seems to make the import for a given row/doc get skipped. As proof I've added these tests to my data import xml and all the rows are still getting indexed!!! If anyone sees something wrong with my config please tell me. Make sure to take note of the blatant use of row.put( '$skipDoc', 'true' ); and field column=$skipDoc template=true/ Yet stuff still gets imported, this is beyond me. Need a fresh pair of eyes :) dataConfig dataSource type=URLDataSource / script ![CDATA[ function skipRow(row) { row.put( '$skipDoc', 'true' ); return row; } ]] /script document entity name=amazon pk=link url=http://www.amazon.com/gp/rss/new-releases/apparel/1040660/ref=zg_bsnr_1040660_rsslink; processor=XPathEntityProcessor forEach=/rss/channel | /rss/channel/item transformer=RegexTransformer,HTMLStripTransformer,DateFormatTransformer,script:skipRow,TemplateTransformer field column=description xpath=/rss/channel/item/description / field column=price regex=.*\$(\d*.\d*) sourceColName=description / field column=$skipDoc template=true/ field column=link xpath=/rss/channel/item/link / /entity /document /dataConfig Thanks! - Pulkit
Re: select query does not find indexed pdf document
Thank you for your informative reply. I would like to start simple by combining both filename and content into the same default search field ...which my default schema xml calls text ... defaultSearchFieldtext/defaultSearchField ... also: -case and accent insensitive -no splits on numb3rs -no highlights -text processing same for index and search however I do like -I like ngrams prerrably (partial/prefix word/token search) what schema mod's would be needed? also what curl syntax to submit/index a pdf (with filename and content combined into the default search field)? From: Bob Sandiford bob.sandif...@sirsidynix.com To: Michael Dockery dockeryjava...@yahoo.com Cc: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Monday, September 12, 2011 1:38 PM Subject: RE: select query does not find indexed pdf document Hi, Michael. Well, the stock answer is, 'it depends' For example - would you want to be able to search filename without searching file contents, or would you always search both of them together? If both, then copy both the file name and the parsed file content from the pdf into a single search field, and you can set that up as the default search field. Or - what kind of processing / normalizing do you want on this data? Case insensitive? Accent insensitive? If a 'word' contains camel case (e.g. TheVeryIdea), do you want that split on the case changes? (but then watch out for things like iPad) If a 'word' contains numbers, do want them left together, or separated? Do you want stemming (where searching for 'stemming' would also find 'stem', 'stemmed', that sort of thing?) Is this always English, or are the other languages involved. Do you want the text processing to be the same for indexing vs searching? Do you want to be able to find hits based on the first few characters of a term? (ngrams) Do you want to be able to highlight text segments where the search terms were found? probably you want to read up on the various tokenizers and filters that are available. Do some prototyping and see how it looks. Here's a starting point: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters Basically, there is no 'one size fits all' here. Part of the power of Solr / Lucene is its configurability to achieve the results your business case calls for. Part of the drawback of Solr / Lucene - especially for new folks - is its configurability to achieve the results you business case calls for. :) Anyone got anything else to suggest for Michael? Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.comhttp://www.sirsidynix.com/ From: Michael Dockery [mailto:dockeryjava...@yahoo.com] Sent: Monday, September 12, 2011 1:18 PM To: Bob Sandiford Subject: Re: select query does not find indexed pdf document thank you. that worked. Any tips for very very basic setup of the schema xml? or is the default basic enough? I basically only want to search search on filename and file contents From: Bob Sandiford bob.sandif...@sirsidynix.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org; Michael Dockery dockeryjava...@yahoo.com Sent: Monday, September 12, 2011 10:04 AM Subject: RE: select query does not find indexed pdf document Um - looks like you specified your id value as pdfy, which is reflected in the results from the *:* query, but your id query is searching for vpn, hence no matches... What does this query yield? http://www/SearchApp/select/?q=id:pdfy Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.commailto:bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Michael Dockery [mailto:dockeryjava...@yahoo.commailto:dockeryjava...@yahoo.com] Sent: Monday, September 12, 2011 9:56 AM To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org Subject: Re: select query does not find indexed pdf document http://www/SearchApp/select/?q=id:vpn yeilds this: ?xml version=1.0 encoding=UTF-8 ? - response - lstname=responseHeader intname=status0/int intname=QTime15/int - lstname=params strname=qid:vpn/str /lst /lst result name=responsenumFound=0start=0/ /response * http://www/SearchApp/select/?q=*:* yeilds this: ?xml version=1.0 encoding=UTF-8 ? - response - lstname=responseHeader intname=status0/int intname=QTime16/int - lstname=params strname=q*.*/str /lst /lst - resultname=responsenumFound=1start=0 - doc strname=authordoc/str - arrname=content_type strapplication/pdf/str /arr strname=idpdfy/str datename=last_modified2011-05-20T02:08:48Z/date - arrname=title strdmvpndeploy.pdf/str /arr /doc /result /response From: Jan Høydahl jan@cominvent.commailto:jan@cominvent.com To:
Re: Document Boost not evaluated when using standard Query Type?
: I want to show all documents with of a certain type. The documents : should be ordered by the index time document boost. ... : But in fact every document gets the same score: : : 0.7306 = (MATCH) fieldWeight(doctype:music in 1), product of: : 1.0 = tf(termFreq(doctype:music)=1) : 0.7306 = idf(docFreq=37138, maxDocs=37138) : 1.0 = fieldNorm(field=doctype, doc=1) Index boosts are folded into the fieldNorm. by the looks of it, you are using omitNorms=true on the field doctype : Is there a better way to get a list of all documents (matching a simple : where clause) sorted by documents boost? fieldNorms are very corse. In my opinion, if you have a weighting you want to use to affect score sort, it's better to index that weight as a numeric field, and explicitly factor it into the score using a function query... q={!boost b=yourWeightField v=$qq}qq=doctype:music More info... https://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html http://www.lucidimagination.com/blog/2011/06/20/solr-powered-isfdb-part-10/ https://github.com/lucidimagination/isfdb-solr/commit/75f830caa1a11fd97ab48d6428096cf63f53cb3b -Hoss
where is the SOLR_HOME ?
Hi In this page (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTokenizerFactory http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTokenizerFactory )http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTokenizerFactorysaid: Note: to use this filter, see solr/contrib/analysis-extras/README.txt for instructions on which jars you need to add to your SOLR_HOME/lib I can't find SOLR_HOME/lib ! 1- Is there: apache-solr-3.3.0\example\solr ? there is no directory which name is lib I created example/solr/lib directory and copied jar files to it and tested this expressions in solrconfig.xml : lib dir=../../example/solr/lib / lib dir=./lib / lib dir=../../../example/solr/lib / (for more assurance!!!) but it doesn't work and still has following errors ! 2- or: apache-solr-3.3.0\ ? there is no directory which name is lib 3- or : apache-solr-3.3.0\example ? there is a lib directory. I copied 4 libraries exist in solr/contrib/analysis-extras/ to apache-solr-3.3.0\example\lib but some errors exist in loading page http://localhost:8983/solr/admin; : I use Nutch to crawling the web and fetching web pages. I send data of Nutch to Solr for Indexing. according to Nutch tutorial ( http://wiki.apache.org/nutch/NutchTutorial#A6._Integrate_Solr_with_Nutch) I should copy schema.xml of Nutch to conf directory of Solr. So I added all of my required Analyzer like *ICUNormalizer2FilterFactory *to this new shema.xml this is schema.xml : -I added bold text to this file ?xml version=1.0 encoding=UTF-8 ? schema name=nutch version=1.3 types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=long class=solr.TrieLongField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=float class=solr.TrieFloatField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=date class=solr.TrieDateField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType *fieldType name=text_icu class=solr.TextField autoGeneratePhraseQueries=false analyzer tokenizer class=solr.ICUTokenizerFactory/ /analyzer /fieldType fieldType name=icu_sort_en class=solr.TextField analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.ICUCollationKeyFilterFactory locale=en strength=primary/ /analyzer /fieldType fieldType name=normalized class=solr.TextField analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.ICUNormalizer2FilterFactory name=nfkc_cf mode=compose/ /analyzer /fieldType fieldType name=folded class=solr.TextField analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.ICUFoldingFilterFactory/ /analyzer /fieldType fieldType name=transformed class=solr.TextField analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.ICUTransformFilterFactory id=Traditional-Simplified/ /analyzer /fieldType* fieldType name=url class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1/ /analyzer /fieldType *fieldType name=text_fa class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.PersianCharFilterFactory/ tokenizer class=solr.StandardTokenizerFactory/ /analyzer /fieldType fieldType name=text_fanormal class=solr.TextField positionIncrementGap=100 analyzer charFilter
Re: Document Boost not evaluated when using standard Query Type?
Thanks, that helped! Am Sep 14, 2011 um 4:56 PM schrieb Chris Hostetter: fieldNorms are very corse. In my opinion, if you have a weighting you want to use to affect score sort, it's better to index that weight as a numeric field, and explicitly factor it into the score using a function query... q={!boost b=yourWeightField v=$qq}qq=doctype:music More info... https://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html http://www.lucidimagination.com/blog/2011/06/20/solr-powered-isfdb-part-10/ https://github.com/lucidimagination/isfdb-solr/commit/75f830caa1a11fd97ab48d6428096cf63f53cb3b -Hoss
Re: Document Boost not evaluated when using standard Query Type?
fieldNorms are very corse. In my opinion, if you have a weighting you want to use to affect score sort, it's better to index that weight as a numeric field, and explicitly factor it into the score using a function query... I see that in this use case this makes most sense - thanks. But why are fieldNorms in general very corse? Thanks, Daniel
DIH delta last_index_time
Hi, How do you handle the situation where the time on the server running Solr doesn¹t match the time in the database? I¹m using the last_index_time saved by Solr in the delta query checking it against lastModifiedDate field in the database but the times are not in sync so I might lose some changes. Can we use something else other than last_index_time? Maybe something like last_pk or something. Thanks in advance. Maria