Re: NRT and warmupTime of filterCache
it'll negatively impact the desired goal of low latency new index readers? - yes, i think so, thats the reason because i dont understand the wiki-article ... i set the warmupCount to 500 and i got no error messages, that solr isnt available ... but solr-stats.jsp show me a warmuptime of warmupTime : 12174 why ? is the warmuptime in solrconfig.xml the maximum time in ms, for autowarming ? or what does it really means ? - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/NRT-and-warmupTime-of-filterCache-tp2654886p2659560.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: NRT and warmupTime of filterCache
okay, not the time ... the items ... - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/NRT-and-warmupTime-of-filterCache-tp2654886p2659562.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrJ and digest authentication
I figured it out. Since this Solr server does not has an SSL interface, I had to change the following line from 443 to 80: AuthScope scope = new AuthScope(host, 80, resin); Erlend On 09.03.11 17.09, Erlend Garåsen wrote: I'm trying to do a search with SolrJ using digest authentication, but I'm getting the following error: org.apache.solr.common.SolrException: Unauthorized I'm setting up SolrJ this way: HttpClient client = new HttpClient(); ListString authPrefs = new ArrayListString(); authPrefs.add(AuthPolicy.DIGEST); client.getParams().setParameter(AuthPolicy.AUTH_SCHEME_PRIORITY, authPrefs); AuthScope scope = new AuthScope(host, 443, resin); client.getState().setCredentials(scope, new UsernamePasswordCredentials(username, password)); client.getParams().setAuthenticationPreemptive(true); SolrServer server = new CommonsHttpSolrServer(server, client); Is this something which is not supported by SolrJ or have I written something wrong in the code above? Erlend -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
Possible to sort in .xml file?
Hi, I'm trying to setup Solr so that we can sort using: document_views asc,score ...is this possible via the solrconfig.xml/schema.xml file? I know its possible to do via adding sort= , but the Perl module (WebService::Solr) doesn't seem to offer the option to pass in this value :( TIA -- Andy Newby a...@ultranerds.com
Re: Possible to sort in .xml file?
Is there no generic parameter store in the Solr module you can use for passing the sort parameter? If not, you can define your sort parameter as default in the request handler you use in solrconfig. See the shipped config for examples. On Thursday 10 March 2011 11:25:01 Andy Newby wrote: Hi, I'm trying to setup Solr so that we can sort using: document_views asc,score ...is this possible via the solrconfig.xml/schema.xml file? I know its possible to do via adding sort= , but the Perl module (WebService::Solr) doesn't seem to offer the option to pass in this value :( TIA -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Possible to sort in .xml file?
No, look for request handlers. requestHandler name=search class=solr.SearchHandler default=true !-- default values for query parameters can be specified, these will be overridden by parameters in the request -- lst name=defaults str name=echoParamsexplicit/str int name=rows10/int /lst !-- In addition to defaults, appends params can be specified to identify values which should be appended to the list of multi-val params from the query (or the existing defaults). -- !-- In this example, the param fq=instock:true would be appended to any query time fq params the user may specify, as a mechanism for partitioning the index, independent of any user selected filtering that may also be desired (perhaps as a result of faceted searching). NOTE: there is *absolutely* nothing a client can do to prevent these appends values from being used, so don't use this mechanism unless you are sure you always want it. -- !-- lst name=appends str name=fqinStock:true/str /lst -- etc... You can add any valid parameter there as default. http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml On Thursday 10 March 2011 11:34:47 Andy Newby wrote: Hi, Thanks for the quick reply! I did a quick look in the solrconfig.xml file, but can't see anything about sort, appart from: !-- An optimization that attempts to use a filter to satisfy a search. If the requested sort does not include score, then the filterCache will be checked for a filter matching the query. If found, the filter will be used as the source of document ids, and then the sort will be applied to that. useFilterForSortedQuerytrue/useFilterForSortedQuery -- TIA Andy On Thu, Mar 10, 2011 at 10:33 AM, Markus Jelsma markus.jel...@openindex.iowrote: Is there no generic parameter store in the Solr module you can use for passing the sort parameter? If not, you can define your sort parameter as default in the request handler you use in solrconfig. See the shipped config for examples. On Thursday 10 March 2011 11:25:01 Andy Newby wrote: Hi, I'm trying to setup Solr so that we can sort using: document_views asc,score ...is this possible via the solrconfig.xml/schema.xml file? I know its possible to do via adding sort= , but the Perl module (WebService::Solr) doesn't seem to offer the option to pass in this value :( TIA -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350 -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
disquery - difference qf qs / pf ps
Hi i understand what qf and qs parameters are but i can't understand what pf and ps are exactly. someone can explain it to me?? for example qf=title^2 name^1.2 surname^1 qs=3 it means i search in title field with boost 2 or in name field with boost 1.2 or in surname field with boost 1 and the maximum slop beetween term to match is 3. right?? and the ps? pf? (phrase filter and phrase slop)? can i use all 4 parameters together?? Thanx -- Gastone Penzo
Re: disquery - difference qf qs / pf ps
Hi i understand what qf and qs parameters are but i can't understand what pf and ps are exactly. someone can explain it to me?? for example qf=title^2 name^1.2 surname^1 qs=3 it means i search in title field with boost 2 or in name field with boost 1.2 or in surname field with boost 1 and the maximum slop beetween term to match is 3. right?? and the ps? pf? (phrase filter and phrase slop)? can i use all 4 parameters together?? Yes you can use all 4 parameters together. Please see similar discussion: http://search-lucene.com/m/KWkYf2kE4Ng1/
Re: disquery - difference qf qs / pf ps
Thank you very much. i understand the difference beetween qs and ps but not what pf is...is it necessary to use ps? Yes you can use all 4 parameters together. Please see similar discussion: http://search-lucene.com/m/KWkYf2kE4Ng1/ -- Gastone Penzo
Re: Math-generated fields during query
Not at the moment if i'm not mistaken. The same issue is with Solr 3.1 where relative distances are not being returned as field value when doing spatial filtering. To retrieve the value one must use the score as the some pseudo field. http://wiki.apache.org/solr/SpatialSearch#Returning_the_distance On Wednesday 09 March 2011 23:06:33 Peter Sturge wrote: Hi, I was wondering if it is possible during a query to create a returned field 'on the fly' (like function query, but for concrete values, not score). For example, if I input this query: q=_val_:product(15,3)fl=*,score For every returned document, I get score = 45. If I change it slightly to add *:* like this: q=*:* _val_:product(15,3)fl=*,score I get score = 32.526913. If I try my use case of _val_:product(qty_ordered,unit_price), I get varying scores depending on...well depending on something. I understand this is doing relevance scoring, but it doesn't seem to tally with the FunctionQuery Wiki [example at the bottom of the page]: q=boxname:findbox+_val_:product(product(x,y),z)fl=*,score ...where score will contain the resultant volume. Is there a trick to getting not a score, but the actual value of quantity*price (e.g. product(5,2.21) == 11.05)? Many thanks -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: disquery - difference qf qs / pf ps
i understand the difference beetween qs and ps but not what pf is...is it necessary to use ps? pf (Phrase Fields) and ps (Phrase Slop) related to each other. Lets say you have q=term1 term1pf=title textps=10 We can think as if dismax adds title:term1 term2~10 text:term1 term2~10 imaginary optional clauses to your original query. Optional means they effect order of documents. http://wiki.apache.org/solr/DisMaxQParserPlugin#pf_.28Phrase_Fields.29
Re: NRT in Solr
Bill, I think all of the improvements can be made, however they are fairly large structural changes that would require perhaps several patches. The other issue is we'll likely land RT this year (or next) and then the cached values need to be appended to as the documents are added, that and they'll be across several DWPTs (see LUCENE-2324). So one could easily do work for per-segment caching, and then need to go back and do per-segment, append caches. I'm not sure caching is needed at all, especially with the recent speed improvements, except for facets which resemble field caches, and probably should be subsumed there. Jason On Wed, Mar 9, 2011 at 8:27 PM, Bill Bell billnb...@gmail.com wrote: So it looks like can handle adding new documents, and expiring old documents. Updating a document is not part of the game. This would work well for message boards or tweet type solutions. Solr can do this as well directly. Why wouldn't you just improve the document and facet caching so that when you append there is not a huge hit to Solr? Also we could add a expiration to documents as well. The big issue for me is that when I update Solr I need to replicate that change quickly to all slaves. If we changed replication to stream to the slaves in Near Real Time and not have to create a whole new index version, warming, etc, that would be awesome. That combined with better caching smarts and we have a near perfect solution. Thanks. On 3/9/11 3:29 PM, Smiley, David W. dsmi...@mitre.org wrote: Zoie adds NRT to Solr: http://snaprojects.jira.com/wiki/display/ZOIE/Zoie+Solr+Plugin I haven't tried it yet but looks cool. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/ On Mar 9, 2011, at 9:01 AM, Jason Rutherglen wrote: Jae, NRT hasn't been implemented NRT as of yet in Solr, I think partially because major features such as replication, caching, and uninverted faceting suddenly are no longer viable, eg, it's another round of testing etc. It's doable, however I think the best approach is a separate request call path, to avoid altering to current [working] API. On Tue, Mar 8, 2011 at 1:27 PM, Jae Joo jaejo...@gmail.com wrote: Hi, Is NRT in Solr 4.0 from trunk? I have checkouted from Trunk, but could not find the configuration for NRT. Regards Jae
Re: FunctionQueries and FieldCache and OOM
Well, it's quite hard to debug because the values listed on the stats page in the fieldCache section don't make much sense. Reducing precision with NOW/HOUR, however, does seem to make a difference. It is hard (or impossible) to reproduce this is a test setup with the same index but without continues updates and without stress tests. Firing manual queries with different values for the bf parameter don't show any difference in the values listed on the stats page. Someone cares to provide an explanation? Thanks On Wednesday 09 March 2011 22:21:19 Markus Jelsma wrote: Hi, In one of the environments i'm working on (4 Solr 1.4.1. nodes with replication, 3+ million docs, ~5.5GB index size, high commit rate (~1-2min), high query rate (~50q/s), high number of updates (~1000docs/commit)) the nodes continuously run out of memory. During development we frequently ran excessive stress tests and after tuning JVM and Solr settings all ran fine. A while ago i added the DisMax bq parameter for boosting recent documents, documents older than a day receive 50% less boost, similar to the example but with a much steeper slope. For clarity, i'm not using the ordinal function but the reciprocal version in the bq parameter which is warned against when using Solr 1.4.1 according to the wiki. This week we started the stress tests and nodes are going down again. I've reconfigured the nodes to have different settings for the bq parameter (or no bq parameter). It seems the bq the cause of the misery. Issue SOLR- keeps popping up but it has not been resolved. Is there anyone who can confirm one of those patches fixes this issue before i waste hours of work finding out it doesn't? ;) Am i correct when i assume that Lucene FieldCache entries are added for each unique function query? In that case, every query is a unique cache entry because it operates on milliseconds. If all doesn't work i might be able to reduce precision by operating on minutes or even more instead of milli seconds. I, however, cannot use other nice math function in the ms() parameter so that might make things difficult. However, date math seems available (NOW/HOUR) so i assume it would also work for SOME_DATE_FIELD/HOUR as well. This way i just might prevent useless entries. My apologies for this long mail but it may prove useful for other users and hopefully we find the solution and can update the wiki to add this warning. Cheers, -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: NRT and warmupTime of filterCache
- yes, i think so, thats the reason because i dont understand the wiki-article ... Maybe the article is out of date? I think it's grossly inefficient to warm the searchers at all in the NRT case. Queries are being performed across *all* segments, even though there should only be 1 that's new that may require warming. However given the new segment's so small, there should be no reason to warm it at all? On Thu, Mar 10, 2011 at 12:14 AM, stockii stock.jo...@googlemail.com wrote: it'll negatively impact the desired goal of low latency new index readers? - yes, i think so, thats the reason because i dont understand the wiki-article ... i set the warmupCount to 500 and i got no error messages, that solr isnt available ... but solr-stats.jsp show me a warmuptime of warmupTime : 12174 why ? is the warmuptime in solrconfig.xml the maximum time in ms, for autowarming ? or what does it really means ? - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/NRT-and-warmupTime-of-filterCache-tp2654886p2659560.html Sent from the Solr - User mailing list archive at Nabble.com.
DIH : modify document in sibling entity of root entity
Dear all, in DIH, is it possible to have two sibling entities where: - the first one is the root entity that creates the documents by iterating over a table that has one row per document. - the second one is executed after the completion of the first entity iteration, and it provides more data that is added to the newly created documents. I've set up such a dih configuration, and the second entity is executed, but no data is written into the index apart from the data extracted by the root entity (=no document is modified?). Documents are identified by the unique key 'id' which is defined by pk=id on both entities. Is this supposed to work at all? I haven't found anything so far on the net but I could have used the wrong keywords for searching, of course. As answer to the maybe obvious question why I'm not using a subentity: I thought that this solution might be faster because it iterates over the second data source instead of hitting it with a query per each document. Anyway, the main reason I tried this is because I want to know whether it works. I'm still not sure whether it should work but I'm doing something wrong... Thanks! Chantal
Re: FunctionQueries and FieldCache and OOM
Alright, i can now confirm the issue has been resolved by reducing precision. The garbage collector on nodes without reduced precision has a real hard time keeping up and clearly shows a very different graph of heap consumption. Consider using MINUTE, HOUR or DAY as precision in case you suffer from excessive memory consumption: recip(ms(NOW/PRECISION,DATE_FIELD),TIME_FRACTION,1,1) On Thursday 10 March 2011 15:14:25 Markus Jelsma wrote: Well, it's quite hard to debug because the values listed on the stats page in the fieldCache section don't make much sense. Reducing precision with NOW/HOUR, however, does seem to make a difference. It is hard (or impossible) to reproduce this is a test setup with the same index but without continues updates and without stress tests. Firing manual queries with different values for the bf parameter don't show any difference in the values listed on the stats page. Someone cares to provide an explanation? Thanks On Wednesday 09 March 2011 22:21:19 Markus Jelsma wrote: Hi, In one of the environments i'm working on (4 Solr 1.4.1. nodes with replication, 3+ million docs, ~5.5GB index size, high commit rate (~1-2min), high query rate (~50q/s), high number of updates (~1000docs/commit)) the nodes continuously run out of memory. During development we frequently ran excessive stress tests and after tuning JVM and Solr settings all ran fine. A while ago i added the DisMax bq parameter for boosting recent documents, documents older than a day receive 50% less boost, similar to the example but with a much steeper slope. For clarity, i'm not using the ordinal function but the reciprocal version in the bq parameter which is warned against when using Solr 1.4.1 according to the wiki. This week we started the stress tests and nodes are going down again. I've reconfigured the nodes to have different settings for the bq parameter (or no bq parameter). It seems the bq the cause of the misery. Issue SOLR- keeps popping up but it has not been resolved. Is there anyone who can confirm one of those patches fixes this issue before i waste hours of work finding out it doesn't? ;) Am i correct when i assume that Lucene FieldCache entries are added for each unique function query? In that case, every query is a unique cache entry because it operates on milliseconds. If all doesn't work i might be able to reduce precision by operating on minutes or even more instead of milli seconds. I, however, cannot use other nice math function in the ms() parameter so that might make things difficult. However, date math seems available (NOW/HOUR) so i assume it would also work for SOME_DATE_FIELD/HOUR as well. This way i just might prevent useless entries. My apologies for this long mail but it may prove useful for other users and hopefully we find the solution and can update the wiki to add this warning. Cheers, -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: NRT and warmupTime of filterCache
Maybe the article is out of date? - maybe .. i dont know in my case it make no sense and i use another configuration ... - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/NRT-and-warmupTime-of-filterCache-tp2654886p2660814.html Sent from the Solr - User mailing list archive at Nabble.com.
Error on string searching # [STRANGE]
I have a text field indexed using WordDelimeter Indexed in that way doc field name=myfieldS.#L.W.VI.37/field ... /doc Serching in that way: http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.#L.W.VI.37) Makes this error: org.apache.lucene.queryParser.ParseException: Cannot parse 'myfield:(S.': Lexical error at line 1, column 17. Encountered: EOF after : \S. It seems that # is a wrong character for query... I try urlencoding o adding a slash before or removing quotes but other errors comes: http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.#L.W.VI.37) org.apache.lucene.queryParser.ParseException: Cannot parse 'myfield:(S.': Encountered EOF at line 1, column 15. Was expecting one of: AND ... OR ... NOT ... + ... - ... ( ... ) ... * ... ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM ... [ ... { ... NUMBER ... Any idea how to solve this? Maybe a bug? Or probably I'm missing something. Dario.
Re: Error on string searching # [STRANGE]
I think that the problem is with the # symbol, because it has a special meaning when used inside a URL. Try replacing it with %23, like this: http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.%23L.W.VI.37) Regards, * Juan G. Grande* -- Solr Consultant @ http://www.plugtree.com -- Blog @ http://juanggrande.wordpress.com On Thu, Mar 10, 2011 at 12:45 PM, Dario Rigolin dario.rigo...@comperio.itwrote: I have a text field indexed using WordDelimeter Indexed in that way doc field name=myfieldS.#L.W.VI.37/field ... /doc Serching in that way: http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.#L.W.VI.37) Makes this error: org.apache.lucene.queryParser.ParseException: Cannot parse 'myfield:(S.': Lexical error at line 1, column 17. Encountered: EOF after : \S. It seems that # is a wrong character for query... I try urlencoding o adding a slash before or removing quotes but other errors comes: http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.#L.W.VI.37) org.apache.lucene.queryParser.ParseException: Cannot parse 'myfield:(S.': Encountered EOF at line 1, column 15. Was expecting one of: AND ... OR ... NOT ... + ... - ... ( ... ) ... * ... ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM ... [ ... { ... NUMBER ... Any idea how to solve this? Maybe a bug? Or probably I'm missing something. Dario.
Re: Math-generated fields during query
As a workaround can you not have a search component run after the querycomponent, and have the qty_ordered,unit_price as stored fields and returned with the fl parameter and have your custom component do the calc, unless you need to sort by this value too? Dan On Wed, Mar 9, 2011 at 10:06 PM, Peter Sturge peter.stu...@gmail.com wrote: Hi, I was wondering if it is possible during a query to create a returned field 'on the fly' (like function query, but for concrete values, not score). For example, if I input this query: q=_val_:product(15,3)fl=*,score For every returned document, I get score = 45. If I change it slightly to add *:* like this: q=*:* _val_:product(15,3)fl=*,score I get score = 32.526913. If I try my use case of _val_:product(qty_ordered,unit_price), I get varying scores depending on...well depending on something. I understand this is doing relevance scoring, but it doesn't seem to tally with the FunctionQuery Wiki [example at the bottom of the page]: q=boxname:findbox+_val_:product(product(x,y),z)fl=*,score ...where score will contain the resultant volume. Is there a trick to getting not a score, but the actual value of quantity*price (e.g. product(5,2.21) == 11.05)? Many thanks
Re: Error on string searching # [STRANGE]
On Thursday, March 10, 2011 04:53:51 pm Juan Grande wrote: I think that the problem is with the # symbol, because it has a special meaning when used inside a URL. Try replacing it with %23, like this: http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.%23L.W.VI.37) If I do urlencoding and changing in %23 I get this error 3 java.lang.ArrayIndexOutOfBoundsException: 3 at org.apache.lucene.search.MultiPhraseQuery$MultiPhraseWeight.scorer(MultiPhraseQuery.java:185) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:208) at org.apache.lucene.search.Searcher.search(Searcher.java:88) Regards, * Juan G. Grande* -- Solr Consultant @ http://www.plugtree.com -- Blog @ http://juanggrande.wordpress.com On Thu, Mar 10, 2011 at 12:45 PM, Dario Rigolin dario.rigo...@comperio.itwrote: I have a text field indexed using WordDelimeter Indexed in that way doc field name=myfieldS.#L.W.VI.37/field ... /doc Serching in that way: http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.#L.W.VI.37) Makes this error: org.apache.lucene.queryParser.ParseException: Cannot parse 'myfield:(S.': Lexical error at line 1, column 17. Encountered: EOF after : \S. It seems that # is a wrong character for query... I try urlencoding o adding a slash before or removing quotes but other errors comes: http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.#L.W.VI.37) org.apache.lucene.queryParser.ParseException: Cannot parse 'myfield:(S.': Encountered EOF at line 1, column 15. Was expecting one of: AND ... OR ... NOT ... + ... - ... ( ... ) ... * ... ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM ... [ ... { ... NUMBER ... Any idea how to solve this? Maybe a bug? Or probably I'm missing something. Dario.
Re: DIH : modify document in sibling entity of root entity
Hi Chantal, i'm not sure if i understood you correctly (if at all)? Two entities, not arranged as sub-entitiy, but using values from the previous entity? Could you paste your dataimport the relevant part of the logging-output? Regards Stefan On Thu, Mar 10, 2011 at 4:12 PM, Chantal Ackermann chantal.ackerm...@btelligent.de wrote: Dear all, in DIH, is it possible to have two sibling entities where: - the first one is the root entity that creates the documents by iterating over a table that has one row per document. - the second one is executed after the completion of the first entity iteration, and it provides more data that is added to the newly created documents. I've set up such a dih configuration, and the second entity is executed, but no data is written into the index apart from the data extracted by the root entity (=no document is modified?). Documents are identified by the unique key 'id' which is defined by pk=id on both entities. Is this supposed to work at all? I haven't found anything so far on the net but I could have used the wrong keywords for searching, of course. As answer to the maybe obvious question why I'm not using a subentity: I thought that this solution might be faster because it iterates over the second data source instead of hitting it with a query per each document. Anyway, the main reason I tried this is because I want to know whether it works. I'm still not sure whether it should work but I'm doing something wrong... Thanks! Chantal
Re: True master-master fail-over without data gaps (choosing CA in CAP)
Hi, - Original Message From: Jake Luciani jak...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, March 9, 2011 8:07:00 PM Subject: Re: True master-master fail-over without data gaps (choosing CA in CAP) Yeah sure. Let me update this on the Solandra wiki. I'll send across the link Excellent. You could include ES there, too, if you feel extra adventurous. ;) I think you hit the main two shortcomings atm. - Grandma, why are your eyes so big? - To see you better. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ -Jake On Wed, Mar 9, 2011 at 6:17 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Jake, Maybe it's time to come up with the Solandra/Solr matrix so we can see Solandra's strengths (e.g. RT, no replication) and weaknesses (e.g. I think I saw a mention of some big indices?) or missing feature (e.g. no delete by query), etc. Thanks! Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Jake Luciani jak...@gmail.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Wed, March 9, 2011 6:04:13 PM Subject: Re: True master-master fail-over without data gaps (choosing CA in CAP) Jason, It's predecessor did, Lucandra. But Solandra is a new approach that manages shards of documents across the cluster for you and uses solrs distributed search to query indexes. Jake On Mar 9, 2011, at 5:15 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Doesn't Solandra partition by term instead of document? On Wed, Mar 9, 2011 at 2:13 PM, Smiley, David W. dsmi...@mitre.org wrote: I was just about to jump in this conversation to mention Solandra and go fig, Solandra's committer comes in. :-) It was nice to meet you at Strata, Jake. I haven't dug into the code yet but Solandra strikes me as a killer way to scale Solr. I'm looking forward to playing with it; particularly looking at disk requirements and performance measurements. ~ David Smiley On Mar 9, 2011, at 3:14 PM, Jake Luciani wrote: Hi Otis, Have you considered using Solandra with Quorum writes to achieve master/master with CA semantics? -Jake On Wed, Mar 9, 2011 at 2:48 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, Original Message From: Robert Petersen rober...@buy.com Can't you skip the SAN and keep the indexes locally? Then you would have two redundant copies of the index and no lock issues. I could, but then I'd have the issue of keeping them in sync, which seems more fragile. I think SAN makes things simpler overall. Also, Can't master02 just be a slave to master01 (in the master farm and separate from the slave farm) until such time as master01 fails? Then No, because it wouldn't be in sync. It would always be N minutes behind, and when the primary master fails, the secondary would not have all the docs - data loss. master02 would start receiving the new documents with an indexes complete up to the last replication at least and the other slaves would be directed by LB to poll master02 also... Yeah, complete up to the last replication is the problem. It's a data gap that now needs to be filled somehow. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, March 09, 2011 9:47 AM To: solr-user@lucene.apache.org Subject: Re: True master-master fail-over without data gaps (choosing CA in CAP) Hi, - Original Message From: Walter Underwood wun...@wunderwood.org On Mar 9, 2011, at 9:02 AM, Otis Gospodnetic wrote: You mean it's not possible to have 2 masters that are in nearly real-time sync? How about with DRBD? I know people use DRBD to keep 2 Hadoop NNs (their edit logs) in sync to avoid the current NN SPOF, for example, so I'm thinking this could be doable with Solr masters, too, no? If you add fault-tolerant, you run into the CAP Theorem. Consistency, availability, partition: choose two. You cannot have it all. Right, so I'll take Consistency and Availability, and I'll put my 2 masters in the same rack (which has redundant switches, power supply,
question regarding proper placement of geofilt in fq=
Hi, I am using rev 1036236 of solr trunk running as a servlet in Tomcat 7. The doc set is sharded over 11 shards. Currently, I have all the shards running in a single tomcat. Please see the bottom of the email for the bits of my schema.xml and solrconfig.xml that might help you understand my configuration. I am seeing what I think is strange behavior when I try to use the geofilt in a filter query. Here's what I am seeing: 1. If put the {!geofilt} as the last argument of the fq= parameter and I send the following distributed query to my sharded index: /select?start=0rows=30q=foodfq=b_type:shops AND {!geofilt}qt=spatialdismaxfl=*%2Cscorefacet=falsept=38.029191,-78.479266sfield=lat_longd=80shards=... I get a syntax error. Which seems odd to me. 2. If I move the {!geofilt} to the first position in the fq= and send the following distributed query: /select?start=0rows=30q=foodfq={!geofilt} AND b_type:T01qt=spatialdismaxfl=*%2Cscorefacet=falsept=38.029191,-78.479266sfield=lat_longd=80shards=... Then only the geofilt is apply, not the b_type:T01. Which seems odd to me. I would expect both filters to be applied. 3. Finally, when I submit this query as: /select?start=0rows=30q=foodfq=_query_:{!geofilt} AND b_type:T01qt=spatialdismaxfl=*%2Cscorefacet=falsept=38.029191,-78.479266sfield=lat_longd=80shards=... This works as I had hoped, i.e. both the geofilt and the b_type filters are applied. Am I trying to use geofilt in the wrong way or is this possibly a bug? Thanks, Jerry Mindek !--schema.xml-- field name=cn type=text indexed=true stored=true required=true / field name=dn type=string indexed=true stored=true required=false / field name=t1 type=text indexed=true stored=true / field name=ts type=string indexed=true stored=true/ field name=lb type=text indexed=true stored=false / field name=sim type=string indexed=true stored=true / field name=s4_s type=text indexed=true stored=false / field name=stat type=string indexed=true stored=true / field name=pst type=text indexed=true stored=true / fieldType name=location class=solr.LatLonType subFieldSuffix=_coordinate/ ... field name=type b_type=string indexed=true stored=true/ field name=lat_long type=location indexed=true stored=true / !-end snippet schema.xml-- !-solrconfig.xml -- requestHandler name=spatialdismax class=solr.DisMaxRequestHandler lst name=defaults str name=sortscore desc/str str name=facettrue/str str name=facet.mincount1/str str name=echoParamsexplicit/str int name=rows20/int float name=tie0.01/float str name=qf cn^2.0 t1^2.0 ts^2.0 lb^2.0 s4_s^2.0 sim^2.0 /str str name=pf cn^2.0 t1^2.0 ts^2.0 lb^2.0 s4_s^2.0 sim^2.0 /str str name=fldn, cn, t1, stat, pst, pct, ts, sv, score/str str name=mm 2lt;-1 5lt;-2 6lt;90% /str int name=ps100/int str name=q.alt*:*/str /lst /requestHandler !-end snippet solrconfig.xml--
Re: Error on string searching # [STRANGE] [FIX]
On Thursday, March 10, 2011 04:58:43 pm Dario Rigolin wrote: It seems fixed by setting into WordDelimiterTokenizer catenateWords=0 catenateNumbers=0 Instead of 1 on both... Nice to know... On Thursday, March 10, 2011 04:53:51 pm Juan Grande wrote: I think that the problem is with the # symbol, because it has a special meaning when used inside a URL. Try replacing it with %23, like this: http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.%23L.W.VI.37) If I do urlencoding and changing in %23 I get this error 3 java.lang.ArrayIndexOutOfBoundsException: 3 at org.apache.lucene.search.MultiPhraseQuery$MultiPhraseWeight.scorer(MultiPhr aseQuery.java:185) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:208) at org.apache.lucene.search.Searcher.search(Searcher.java:88) Regards, * Juan G. Grande* -- Solr Consultant @ http://www.plugtree.com -- Blog @ http://juanggrande.wordpress.com On Thu, Mar 10, 2011 at 12:45 PM, Dario Rigolin dario.rigo...@comperio.itwrote: I have a text field indexed using WordDelimeter Indexed in that way doc field name=myfieldS.#L.W.VI.37/field ... /doc Serching in that way: http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.#L.W.VI.37) Makes this error: org.apache.lucene.queryParser.ParseException: Cannot parse 'myfield:(S.': Lexical error at line 1, column 17. Encountered: EOF after : \S. It seems that # is a wrong character for query... I try urlencoding o adding a slash before or removing quotes but other errors comes: http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.#L.W.VI.37) org.apache.lucene.queryParser.ParseException: Cannot parse 'myfield:(S.': Encountered EOF at line 1, column 15. Was expecting one of: AND ... OR ... NOT ... + ... - ... ( ... ) ... * ... ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM ... [ ... { ... NUMBER ... Any idea how to solve this? Maybe a bug? Or probably I'm missing something. Dario.
Re: DIH : modify document in sibling entity of root entity
On Thu, Mar 10, 2011 at 8:42 PM, Chantal Ackermann chantal.ackerm...@btelligent.de wrote: [...] Is this supposed to work at all? I haven't found anything so far on the net but I could have used the wrong keywords for searching, of course. As answer to the maybe obvious question why I'm not using a subentity: I thought that this solution might be faster because it iterates over the second data source instead of hitting it with a query per each document. [...] I think that what you are after can be handled by Solr's CachedSqlEntityProcessor: http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor Two major caveats here: * I am not 100% sure that I have understood your requirements. * The documentation for CachedSqlEntityProcessor needs to be improved. Will see if I can test it, and come up with a better example. As I have not actually used this, it could be that I have misunderstood its purpose. Regards, Gora
Re: disquery - difference qf qs / pf ps
On 3/10/2011 8:15 AM, Gastone Penzo wrote: Thank you very much. i understand the difference beetween qs and ps but not what pf is...is it necessary to use ps? It's not neccesary to use anything, including Solr. pf: Will take the entire query the user entered, make it into a single phrase, and boost documents within the already existing result set that match that phrase. pf does not change the result set, it just changes the ranking. ps: Will set phrase query slop on that pf query of the entire entered search string, that effects boosting.
Re: DIH : modify document in sibling entity of root entity
Hi Stefan, thanks for your time! No, the second entity is not reusing values from the previous one. It just provides more fields to it, and, of course the unique identifier - which in case of the second entity is not unique: document name=contributor entity name=contributor pk=id rootEntity=true query=select CONTRIBUTOR_ID as id, CONTRIBUTOR_NAME as name, EXT_ID as extid fromDIM_CONTRIBUTOR /entity entity name=appearance pk=id rootEntity=false transformer=RegexTransformer query=select CONTENTID as contentid, SUBVALUE fromCONTENT_VALUE where ID_ATTRIBUTE=170 field column=ignore sourceColName=SUBVALUE groupNames=id,type,pos,character regex=(\d+);(\d+);(\d+);([^;]*);\d*;[A-Z0-9]*;\d* / /entity /document and here are the fields: field name=id type=slong indexed=true stored=true required=true / field name=name type=string indexed=true stored=true required=true termVectors=true / field name=contentid type=slong indexed=true stored=true multiValued=true / field name=character type=string indexed=true stored=true multiValued=true termVectors=true / field name=type type=sint indexed=true stored=true multiValued=true / (For the sake of simplicity I've removed some fields that would be created using copyfield instructions and transformers.) I'm currently trying to run this using a subentity using the SQL restriction SUBVALUE like '${contributor.id};%' but this takes ages... The other one finished in under a minute (and it did actually process the second entity, I think, it just didn't modify the index). The current one runs for about 30min, and has only processed 22,000 documents out of more than 390,000. (Of course, there is probably no index on that column) Thanks for any suggestions! Chantal On Thu, 2011-03-10 at 17:13 +0100, Stefan Matheis wrote: Hi Chantal, i'm not sure if i understood you correctly (if at all)? Two entities, not arranged as sub-entitiy, but using values from the previous entity? Could you paste your dataimport the relevant part of the logging-output? Regards Stefan On Thu, Mar 10, 2011 at 4:12 PM, Chantal Ackermann chantal.ackerm...@btelligent.de wrote: Dear all, in DIH, is it possible to have two sibling entities where: - the first one is the root entity that creates the documents by iterating over a table that has one row per document. - the second one is executed after the completion of the first entity iteration, and it provides more data that is added to the newly created documents. I've set up such a dih configuration, and the second entity is executed, but no data is written into the index apart from the data extracted by the root entity (=no document is modified?). Documents are identified by the unique key 'id' which is defined by pk=id on both entities. Is this supposed to work at all? I haven't found anything so far on the net but I could have used the wrong keywords for searching, of course. As answer to the maybe obvious question why I'm not using a subentity: I thought that this solution might be faster because it iterates over the second data source instead of hitting it with a query per each document. Anyway, the main reason I tried this is because I want to know whether it works. I'm still not sure whether it should work but I'm doing something wrong... Thanks! Chantal
Re: Math-generated fields during query
Hi Dan, Yes, you're right - in fact that was precisely what I was thinking of doing! Also looking at SOLR-1298 SOLR-1566 - which would be good for applying functions generically rather than on a per-use-case basis. Thanks! Peter On Thu, Mar 10, 2011 at 3:58 PM, dan sutton danbsut...@gmail.com wrote: As a workaround can you not have a search component run after the querycomponent, and have the qty_ordered,unit_price as stored fields and returned with the fl parameter and have your custom component do the calc, unless you need to sort by this value too? Dan On Wed, Mar 9, 2011 at 10:06 PM, Peter Sturge peter.stu...@gmail.com wrote: Hi, I was wondering if it is possible during a query to create a returned field 'on the fly' (like function query, but for concrete values, not score). For example, if I input this query: q=_val_:product(15,3)fl=*,score For every returned document, I get score = 45. If I change it slightly to add *:* like this: q=*:* _val_:product(15,3)fl=*,score I get score = 32.526913. If I try my use case of _val_:product(qty_ordered,unit_price), I get varying scores depending on...well depending on something. I understand this is doing relevance scoring, but it doesn't seem to tally with the FunctionQuery Wiki [example at the bottom of the page]: q=boxname:findbox+_val_:product(product(x,y),z)fl=*,score ...where score will contain the resultant volume. Is there a trick to getting not a score, but the actual value of quantity*price (e.g. product(5,2.21) == 11.05)? Many thanks
Re: DIH : modify document in sibling entity of root entity
Hi Gora, thanks for making me read this part of the documentation again! This processor probably cannot do what I need out of the box but I will try to extend it to allow specifying a regular expression in its where attribute. Thanks! Chantal On Thu, 2011-03-10 at 17:39 +0100, Gora Mohanty wrote: On Thu, Mar 10, 2011 at 8:42 PM, Chantal Ackermann chantal.ackerm...@btelligent.de wrote: [...] Is this supposed to work at all? I haven't found anything so far on the net but I could have used the wrong keywords for searching, of course. As answer to the maybe obvious question why I'm not using a subentity: I thought that this solution might be faster because it iterates over the second data source instead of hitting it with a query per each document. [...] I think that what you are after can be handled by Solr's CachedSqlEntityProcessor: http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor Two major caveats here: * I am not 100% sure that I have understood your requirements. * The documentation for CachedSqlEntityProcessor needs to be improved. Will see if I can test it, and come up with a better example. As I have not actually used this, it could be that I have misunderstood its purpose. Regards, Gora
Custom fieldtype with sharding?
Hi all, I'm having an issue with using a custom fieldtype with distributed search. It may be the case that what I'm looking for could be accomplished in a different way, but this is my first stab at it. I'm looking to store XML in a field. What I've done, which works fine, is to: - on ingest, wrap the XML in a CDATA tag - write a simple class that extends org.apache.solr.schema.TextField, which writes an XML node much in the way that a textfield would, but without escaping the contents It looks like this: public class XMLField extends TextField { @Override public void write(TextResponseWriter xmlWriter, String name, Fieldable f) throws java.io.IOException { Writer writer = xmlWriter.getWriter(); writer.write(xml name= + '' + name + '' + ''); writer.write(f.stringValue(), 0, f.stringValue() == null ? 0 : f.stringValue().length()); writer.write(/xml); } } Like I said, simple. Not especially pretty, but it does the job. Works fine for normal searching, I get back a response like: xml name=xmlFieldxml-contents-unescaped//xml When I try to use this with distributed searching, though, it comes back written as a normal textfield, like: str name=xmlFieldlt;xml-contents-have-been-escaped/gt;/str It looks like it doesn't know anything about my custom fieldtype at all, and is defaulting to writing it as a StrField or TextField instead. So, my question: - is there a better way to do this? I'd be fine if it came back with a 'str' element name, as long as it's not escaped. - is there perhaps a different class I should extend to do this with sharded searching? - should I just bite the bullet and manually unescape the xml after receiving the response? I'd really prefer not to do this if I can get around it. Thanks in advance for any help. Peter
Re: question regarding proper placement of geofilt in fq=
Can you use 2 fq parameters ? The default op is usually set to AND. Bill Bell Sent from mobile On Mar 10, 2011, at 9:33 AM, Jerry Mindek jmin...@manta.com wrote: Hi, I am using rev 1036236 of solr trunk running as a servlet in Tomcat 7. The doc set is sharded over 11 shards. Currently, I have all the shards running in a single tomcat. Please see the bottom of the email for the bits of my schema.xml and solrconfig.xml that might help you understand my configuration. I am seeing what I think is strange behavior when I try to use the geofilt in a filter query. Here's what I am seeing: 1. If put the {!geofilt} as the last argument of the fq= parameter and I send the following distributed query to my sharded index: /select?start=0rows=30q=foodfq=b_type:shops AND {!geofilt}qt=spatialdismaxfl=*%2Cscorefacet=falsept=38.029191,-78.479266sfield=lat_longd=80shards=... I get a syntax error. Which seems odd to me. 2. If I move the {!geofilt} to the first position in the fq= and send the following distributed query: /select?start=0rows=30q=foodfq={!geofilt} AND b_type:T01qt=spatialdismaxfl=*%2Cscorefacet=falsept=38.029191,-78.479266sfield=lat_longd=80shards=... Then only the geofilt is apply, not the b_type:T01. Which seems odd to me. I would expect both filters to be applied. 3. Finally, when I submit this query as: /select?start=0rows=30q=foodfq=_query_:{!geofilt} AND b_type:T01qt=spatialdismaxfl=*%2Cscorefacet=falsept=38.029191,-78.479266sfield=lat_longd=80shards=... This works as I had hoped, i.e. both the geofilt and the b_type filters are applied. Am I trying to use geofilt in the wrong way or is this possibly a bug? Thanks, Jerry Mindek !--schema.xml-- field name=cn type=text indexed=true stored=true required=true / field name=dn type=string indexed=true stored=true required=false / field name=t1 type=text indexed=true stored=true / field name=ts type=string indexed=true stored=true/ field name=lb type=text indexed=true stored=false / field name=sim type=string indexed=true stored=true / field name=s4_s type=text indexed=true stored=false / field name=stat type=string indexed=true stored=true / field name=pst type=text indexed=true stored=true / fieldType name=location class=solr.LatLonType subFieldSuffix=_coordinate/ ... field name=type b_type=string indexed=true stored=true/ field name=lat_long type=location indexed=true stored=true / !-end snippet schema.xml-- !-solrconfig.xml -- requestHandler name=spatialdismax class=solr.DisMaxRequestHandler lst name=defaults str name=sortscore desc/str str name=facettrue/str str name=facet.mincount1/str str name=echoParamsexplicit/str int name=rows20/int float name=tie0.01/float str name=qf cn^2.0 t1^2.0 ts^2.0 lb^2.0 s4_s^2.0 sim^2.0 /str str name=pf cn^2.0 t1^2.0 ts^2.0 lb^2.0 s4_s^2.0 sim^2.0 /str str name=fldn, cn, t1, stat, pst, pct, ts, sv, score/str str name=mm 2lt;-1 5lt;-2 6lt;90% /str int name=ps100/int str name=q.alt*:*/str /lst /requestHandler !-end snippet solrconfig.xml--
Re: question regarding proper placement of geofilt in fq=
Also _query_ is the right approach when using fq with 2 Boolean values. Just make sure you double quote the {!geofilt} when using that. Bill Bell Sent from mobile On Mar 10, 2011, at 9:33 AM, Jerry Mindek jmin...@manta.com wrote: Hi, I am using rev 1036236 of solr trunk running as a servlet in Tomcat 7. The doc set is sharded over 11 shards. Currently, I have all the shards running in a single tomcat. Please see the bottom of the email for the bits of my schema.xml and solrconfig.xml that might help you understand my configuration. I am seeing what I think is strange behavior when I try to use the geofilt in a filter query. Here's what I am seeing: 1. If put the {!geofilt} as the last argument of the fq= parameter and I send the following distributed query to my sharded index: /select?start=0rows=30q=foodfq=b_type:shops AND {!geofilt}qt=spatialdismaxfl=*%2Cscorefacet=falsept=38.029191,-78.479266sfield=lat_longd=80shards=... I get a syntax error. Which seems odd to me. 2. If I move the {!geofilt} to the first position in the fq= and send the following distributed query: /select?start=0rows=30q=foodfq={!geofilt} AND b_type:T01qt=spatialdismaxfl=*%2Cscorefacet=falsept=38.029191,-78.479266sfield=lat_longd=80shards=... Then only the geofilt is apply, not the b_type:T01. Which seems odd to me. I would expect both filters to be applied. 3. Finally, when I submit this query as: /select?start=0rows=30q=foodfq=_query_:{!geofilt} AND b_type:T01qt=spatialdismaxfl=*%2Cscorefacet=falsept=38.029191,-78.479266sfield=lat_longd=80shards=... This works as I had hoped, i.e. both the geofilt and the b_type filters are applied. Am I trying to use geofilt in the wrong way or is this possibly a bug? Thanks, Jerry Mindek !--schema.xml-- field name=cn type=text indexed=true stored=true required=true / field name=dn type=string indexed=true stored=true required=false / field name=t1 type=text indexed=true stored=true / field name=ts type=string indexed=true stored=true/ field name=lb type=text indexed=true stored=false / field name=sim type=string indexed=true stored=true / field name=s4_s type=text indexed=true stored=false / field name=stat type=string indexed=true stored=true / field name=pst type=text indexed=true stored=true / fieldType name=location class=solr.LatLonType subFieldSuffix=_coordinate/ ... field name=type b_type=string indexed=true stored=true/ field name=lat_long type=location indexed=true stored=true / !-end snippet schema.xml-- !-solrconfig.xml -- requestHandler name=spatialdismax class=solr.DisMaxRequestHandler lst name=defaults str name=sortscore desc/str str name=facettrue/str str name=facet.mincount1/str str name=echoParamsexplicit/str int name=rows20/int float name=tie0.01/float str name=qf cn^2.0 t1^2.0 ts^2.0 lb^2.0 s4_s^2.0 sim^2.0 /str str name=pf cn^2.0 t1^2.0 ts^2.0 lb^2.0 s4_s^2.0 sim^2.0 /str str name=fldn, cn, t1, stat, pst, pct, ts, sv, score/str str name=mm 2lt;-1 5lt;-2 6lt;90% /str int name=ps100/int str name=q.alt*:*/str /lst /requestHandler !-end snippet solrconfig.xml--
Re: docBoost
Okay I think I have the idea: dataConfig dataSource type=JdbcDataSource name=animals batchSize=-1 driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/animals?characterEncoding=UTF8amp;zeroDateTimeBehavior=convertToNull user=user password=pass/ script![CDATA[ function BoostScores(row) { // if searching for recommendations add in the boost score if(some_condition) { row.put('$docBoost', row.get('boost_score')); } // end if(some_condition) return row; } // end function BoostRecommendations(row) ]]/script document entity name=animal dataSource=animals pk=id query=SELECT * FROM animals field column=id name=id / field column=genus name=genus / field column=species name=species / entity name=boosters dataSource=boosts query=SELECT boost_score FROM boosts WHERE animal_id=${ animal.id} field column=boost_score name=boost_score / /entity /entity /document /dataConfig Now, am I right in thinking that the boost score is only when the data is loaded? If so, that's close to what I want to do but not exactly. I would like to load all the data without boosting any scores but storing what the boost score would be. And then, depending on the search, boost scores by the value. For example, if a user searches for dog, they would get search results that were unboosted. However, I would also want the option to pass in a flag of some kind so that if a user searches for dog, they would get search results with the boost score factored in. Ideally it would be something like: Regular search: http://localhost/solr/search/?q=dog Boosted search: http://localhost/solr/search?q=dogboost=true To achieve this, would it be applied in the data import handler? If so, what would I need to put in for some_condition? Thanks for all the help so far. I truly do appreciate it. Thanks, Brian Lamb On Wed, Mar 9, 2011 at 11:50 PM, Bill Bell billnb...@gmail.com wrote: Yes just add if statement based on a field type and do a row.put() only if that other value is a certain value. On 3/9/11 1:39 PM, Brian Lamb brian.l...@journalexperts.com wrote: That makes sense. As a follow up, is there a way to only conditionally use the boost score? For example, in some cases I want to use the boost score and in other cases I want all documents to be treated equally. On Wed, Mar 9, 2011 at 2:42 PM, Jayendra Patil jayendra.patil@gmail.com wrote: you can use the ScriptTransformer to perform the boost calcualtion and addition. http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer dataConfig script![CDATA[ function f1(row) { // Add boost row.put('$docBoost',1.5); return row; } ]]/script document entity name=e pk=id transformer=script:f1 query=select * from X /entity /document /dataConfig Regards, Jayendra On Wed, Mar 9, 2011 at 2:01 PM, Brian Lamb brian.l...@journalexperts.com wrote: Anyone have any clue on this on? On Tue, Mar 8, 2011 at 2:11 PM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, I am using dataimport to create my index and I want to use docBoost to assign some higher weights to certain docs. I understand the concept behind docBoost but I haven't been able to find an example anywhere that shows how to implement it. Assuming the following config file: document entity name=animal dataSource=animals pk=id query=SELECT * FROM animals field column=id name=id / field column=genus name=genus / field column=species name=species / entity name=boosters dataSource=boosts query=SELECT boost_score FROM boosts WHERE animal_id = ${ animal.id} field column=boost_score name=boost_score / /entity /entity /document How do I add in a docBoost score? The boost score is currently in a separate table as shown above.
Re: Sorting
Any ideas on this one? On Wed, Mar 9, 2011 at 2:00 PM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, I know that I can add sort=score desc to the url to sort in descending order. However, I would like to sort a MoreLikeThis response which returns records like this: lst name=moreLikeThis result name=3 numFound=113611 start=0 maxScore=0.4392774 result name=2 numFound= start=0 maxScore=0.5392774 /lst I don't want them grouped by result; I would just like have them all thrown together and then sorted according to score. I have an XSLT which does put them altogether and returns the following: moreLikeThis similar scorex./score idsome_id/id /similar /moreLikeThis However it appears that it basically applies the stylesheet to result name=3 then result name=2. How can I make it so that with my XSLT, the results appear sorted by score?
Solr
Hi , I need notes and detail about solr because of Now I am working in solr so i need help . Regards , Yazhini . K NCSI , M.Sc ( Software Engineering ) .
Re: Possible to sort in .xml file?
: I know its possible to do via adding sort= , but the Perl module : (WebService::Solr) doesn't seem to offer the option to pass in this value :( according to the docs, you can pass any query params you want to the sort method... http://search.cpan.org/~bricas/WebService-Solr-0.11/lib/WebService/Solr.pm#search%28_$query,_\%options_%29 All key-value pairs supplied in \%options are serialzied in the request URL. -Hoss
Re: Solr
Start by reading http://wiki.apache.org/solr/FrontPage and the provided links (introduction, tutorial, etc. ) 2011/3/10 yazhini.k vini yazhini@gmail.com Hi , I need notes and detail about solr because of Now I am working in solr so i need help . Regards , Yazhini . K NCSI , M.Sc ( Software Engineering ) .
If statements in DataImportHandler?
Is it possible to conditionally load sub-entities in DataImportHandler, based on the gathered value of parent entities?
Re: New PHP API for Solr (Logic Solr API)
How about the Solr PHP Client (http://code.google.com/p/solr-php-client/)? We use this and have been quite happy with it, and it seems that it addresses all of the concerns you expressed. What advantages does yours offer? Liam On 8 March 2011 17:02, Burak burak...@gmail.com wrote: On 03/07/2011 12:43 AM, Stefan Matheis wrote: Burak, what's wrong with the existing PHP-Extension (http://php.net/manual/en/book.solr.php)? I think wrong is not the appropriate word here. But if I had to summarize why I wrote this API: * Not everybody is enthusiastic about adding another item to an already long list of server dependencies. I just wanted a pure PHP option. * I am not a C programmer either so the ability to understand the source code and modify it according to my needs is another advantage. * Yes, a PECL package would be faster. However, in 99% of the cases, after everything is said, coded, and byte-code cached, my biggest bottlenecks end up being the database and network. * Last of all, choice is what open source means to me. Burak -- Liam O'Boyle IntelligenceBank Pty Ltd Level 1, 31 Coventry Street Southbank, Victoria 3006, Australia P: +613 8618 7810 F: +613 8618 7899 M: +61 403 88 66 44 *Awarded 2010 Best New Business and Business of the Year - Business3000 Awards* This email and any attachments are confidential and may contain legally privileged information or copyright material. If you are not an intended recipient, please contact us at once by return email and then delete both messages. We do not accept liability in connection with transmission of information using the internet.
Solr and Permissions
Morning, We use solr to index a range of content to which, within our application, access is restricted by a system of user groups and permissions. In order to ensure that search results don't reveal information about items which the user doesn't have access to, we need to somehow filter the results; this needs to be done within Solr itself, rather than after retrieval, so that the facet and result counts are correct. Currently we do this by creating a filter query which specifies all of the items which may be allowed to match (e.g. id: (foo OR bar OR blarg OR ...)), but this has definite scalability issues - we're starting to run into issues, as this can be a set of ORs of potentially unlimited size (and practically, we're hitting the low thousands sometimes). While we can adjust maxBooleanClauses upwards, I understand that this has performance implications... So, has anyone had to implement something similar in the past? Any suggestions for a more scalable approach? Any advice on safe and sensible limits on how far I can push maxBooleanClauses? Thanks for your advice, Liam
Re: Solr and Permissions
How about assigning content types to documents in the index, and map users to a set of content types they are allowed to access? That way you will pass in fewer parameters in the fq. -sujit On Fri, 2011-03-11 at 11:53 +1100, Liam O'Boyle wrote: Morning, We use solr to index a range of content to which, within our application, access is restricted by a system of user groups and permissions. In order to ensure that search results don't reveal information about items which the user doesn't have access to, we need to somehow filter the results; this needs to be done within Solr itself, rather than after retrieval, so that the facet and result counts are correct. Currently we do this by creating a filter query which specifies all of the items which may be allowed to match (e.g. id: (foo OR bar OR blarg OR ...)), but this has definite scalability issues - we're starting to run into issues, as this can be a set of ORs of potentially unlimited size (and practically, we're hitting the low thousands sometimes). While we can adjust maxBooleanClauses upwards, I understand that this has performance implications... So, has anyone had to implement something similar in the past? Any suggestions for a more scalable approach? Any advice on safe and sensible limits on how far I can push maxBooleanClauses? Thanks for your advice, Liam
Re: If statements in DataImportHandler?
On Fri, Mar 11, 2011 at 4:48 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Is it possible to conditionally load sub-entities in DataImportHandler, based on the gathered value of parent entities? Probably the easies way to do that is with a transformer. Please see the DIH Wiki page for details: http://wiki.apache.org/solr/DataImportHandler#Transformer Regards, Gora
Re: If statements in DataImportHandler?
Right that's not within the XML however, and it's unclear how to access the upper level entities that have already been instantiated, eg, beyond the given 'transform' row. On Thu, Mar 10, 2011 at 8:02 PM, Gora Mohanty g...@mimirtech.com wrote: On Fri, Mar 11, 2011 at 4:48 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Is it possible to conditionally load sub-entities in DataImportHandler, based on the gathered value of parent entities? Probably the easies way to do that is with a transformer. Please see the DIH Wiki page for details: http://wiki.apache.org/solr/DataImportHandler#Transformer Regards, Gora
Re: Solr and Permissions
I have similar requirements. Content type is one solution; but there are also other use cases where this not enough. Another requirement is, when the access permission is changed, we need to update the field - my understanding is we can not unless re-index the whole document again. Am I correct? thanks, canal From: Sujit Pal sujit@comcast.net To: solr-user@lucene.apache.org Sent: Fri, March 11, 2011 10:39:27 AM Subject: Re: Solr and Permissions How about assigning content types to documents in the index, and map users to a set of content types they are allowed to access? That way you will pass in fewer parameters in the fq. -sujit On Fri, 2011-03-11 at 11:53 +1100, Liam O'Boyle wrote: Morning, We use solr to index a range of content to which, within our application, access is restricted by a system of user groups and permissions. In order to ensure that search results don't reveal information about items which the user doesn't have access to, we need to somehow filter the results; this needs to be done within Solr itself, rather than after retrieval, so that the facet and result counts are correct. Currently we do this by creating a filter query which specifies all of the items which may be allowed to match (e.g. id: (foo OR bar OR blarg OR ...)), but this has definite scalability issues - we're starting to run into issues, as this can be a set of ORs of potentially unlimited size (and practically, we're hitting the low thousands sometimes). While we can adjust maxBooleanClauses upwards, I understand that this has performance implications... So, has anyone had to implement something similar in the past? Any suggestions for a more scalable approach? Any advice on safe and sensible limits on how far I can push maxBooleanClauses? Thanks for your advice, Liam
Re: If statements in DataImportHandler?
On Fri, Mar 11, 2011 at 10:23 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Right that's not within the XML however, and it's unclear how to access the upper level entities that have already been instantiated, eg, beyond the given 'transform' row. The second example for a ScriptTransformer in http://wiki.apache.org/solr/DataImportHandler#Transformer should give you an idea of how to proceed: * row.get( 'category' ) gets the field 'category' from the current entity to which the ScriptTransformer is being applied. * Fields from higher-level entities will need to be passed in using DIH variables. E.g., if you have a higher-level entity called 'parent', and are getting data from the current entity via a database select, e.g., entity... query=select category from mytable you will need to modify the query to something like entity... transformer=script:mytrans query=select category from mytable, ${parent.id} as id and add field column=id inside the current entity (cannot remember now if this is required, or can be dispensed with). Regards, Gora
Re: DIH : modify document in sibling entity of root entity
The DIH is strictly tree-structured. Data flows down the tree. If the first sibling is the root entity, nothing is used from the second sibling. This configuration is something that it the DIH should fail. On Thu, Mar 10, 2011 at 9:14 AM, Chantal Ackermann chantal.ackerm...@btelligent.de wrote: Hi Gora, thanks for making me read this part of the documentation again! This processor probably cannot do what I need out of the box but I will try to extend it to allow specifying a regular expression in its where attribute. Thanks! Chantal On Thu, 2011-03-10 at 17:39 +0100, Gora Mohanty wrote: On Thu, Mar 10, 2011 at 8:42 PM, Chantal Ackermann chantal.ackerm...@btelligent.de wrote: [...] Is this supposed to work at all? I haven't found anything so far on the net but I could have used the wrong keywords for searching, of course. As answer to the maybe obvious question why I'm not using a subentity: I thought that this solution might be faster because it iterates over the second data source instead of hitting it with a query per each document. [...] I think that what you are after can be handled by Solr's CachedSqlEntityProcessor: http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor Two major caveats here: * I am not 100% sure that I have understood your requirements. * The documentation for CachedSqlEntityProcessor needs to be improved. Will see if I can test it, and come up with a better example. As I have not actually used this, it could be that I have misunderstood its purpose. Regards, Gora -- Lance Norskog goks...@gmail.com
Re: Solr and Permissions
As Canal points out, grouping into types is not always possible. In our case, permissions are not on a per-type level, but either on a per folder (of which there can be hundreds) or per item in some cases (of which there can be... any number at all). Reindexing is also to slow to really be an option; some of the items use Tika to extract content, which means that we need to reextract the content (variable length of time; average is about half a second, but on some documents it will sit there until the connection times out) . Querying it, modifying then resubmitting without rerunning content extraction is still faster, but involves sending even more data over the network; either way is relatively slow. Liam On 11 March 2011 16:24, go canal goca...@yahoo.com wrote: I have similar requirements. Content type is one solution; but there are also other use cases where this not enough. Another requirement is, when the access permission is changed, we need to update the field - my understanding is we can not unless re-index the whole document again. Am I correct? thanks, canal From: Sujit Pal sujit@comcast.net To: solr-user@lucene.apache.org Sent: Fri, March 11, 2011 10:39:27 AM Subject: Re: Solr and Permissions How about assigning content types to documents in the index, and map users to a set of content types they are allowed to access? That way you will pass in fewer parameters in the fq. -sujit On Fri, 2011-03-11 at 11:53 +1100, Liam O'Boyle wrote: Morning, We use solr to index a range of content to which, within our application, access is restricted by a system of user groups and permissions. In order to ensure that search results don't reveal information about items which the user doesn't have access to, we need to somehow filter the results; this needs to be done within Solr itself, rather than after retrieval, so that the facet and result counts are correct. Currently we do this by creating a filter query which specifies all of the items which may be allowed to match (e.g. id: (foo OR bar OR blarg OR ...)), but this has definite scalability issues - we're starting to run into issues, as this can be a set of ORs of potentially unlimited size (and practically, we're hitting the low thousands sometimes). While we can adjust maxBooleanClauses upwards, I understand that this has performance implications... So, has anyone had to implement something similar in the past? Any suggestions for a more scalable approach? Any advice on safe and sensible limits on how far I can push maxBooleanClauses? Thanks for your advice, Liam -- Liam O'Boyle IntelligenceBank Pty Ltd Level 1, 31 Coventry Street Southbank, Victoria 3006, Australia P: +613 8618 7810 F: +613 8618 7899 M: +61 403 88 66 44 *Awarded 2010 Best New Business and Business of the Year - Business3000 Awards* This email and any attachments are confidential and may contain legally privileged information or copyright material. If you are not an intended recipient, please contact us at once by return email and then delete both messages. We do not accept liability in connection with transmission of information using the internet.
Re: Solr and Permissions
To be fair, I think there is a slight difference between a Content Management and a Search Engine. Access control at per document level, per type level, supporting dynamic role changes, etc.are more like content management use cases; where search solution like Solr focuses on different set of use cases; But in real world, any content management systems need full text search; so the question is to how to support search with permission control. JackRabbit integrated with Lucene/Tika, this could be one solution but I do not know its performance and scalability; CouchDB also integrates with Lucene/Tika, another option? I have yet to see a Search Engine that provides some sort of Content Management features like we are discussing here (Solr, Elastic Search ?) Then the last option is probably to build an application that works with a document repository with all necessary content management features and Solr which provides search capability; and handling the permissions outside Solr? thanks, canal From: Liam O'Boyle liam.obo...@intelligencebank.com To: solr-user@lucene.apache.org Cc: go canal goca...@yahoo.com Sent: Fri, March 11, 2011 2:28:19 PM Subject: Re: Solr and Permissions As Canal points out, grouping into types is not always possible. In our case, permissions are not on a per-type level, but either on a per folder (of which there can be hundreds) or per item in some cases (of which there can be... any number at all). Reindexing is also to slow to really be an option; some of the items use Tika to extract content, which means that we need to reextract the content (variable length of time; average is about half a second, but on some documents it will sit there until the connection times out) . Querying it, modifying then resubmitting without rerunning content extraction is still faster, but involves sending even more data over the network; either way is relatively slow. Liam On 11 March 2011 16:24, go canal goca...@yahoo.com wrote: I have similar requirements. Content type is one solution; but there are also other use cases where this not enough. Another requirement is, when the access permission is changed, we need to update the field - my understanding is we can not unless re-index the whole document again. Am I correct? thanks, canal From: Sujit Pal sujit@comcast.net To: solr-user@lucene.apache.org Sent: Fri, March 11, 2011 10:39:27 AM Subject: Re: Solr and Permissions How about assigning content types to documents in the index, and map users to a set of content types they are allowed to access? That way you will pass in fewer parameters in the fq. -sujit On Fri, 2011-03-11 at 11:53 +1100, Liam O'Boyle wrote: Morning, We use solr to index a range of content to which, within our application, access is restricted by a system of user groups and permissions. In order to ensure that search results don't reveal information about items which the user doesn't have access to, we need to somehow filter the results; this needs to be done within Solr itself, rather than after retrieval, so that the facet and result counts are correct. Currently we do this by creating a filter query which specifies all of the items which may be allowed to match (e.g. id: (foo OR bar OR blarg OR ...)), but this has definite scalability issues - we're starting to run into issues, as this can be a set of ORs of potentially unlimited size (and practically, we're hitting the low thousands sometimes). While we can adjust maxBooleanClauses upwards, I understand that this has performance implications... So, has anyone had to implement something similar in the past? Any suggestions for a more scalable approach? Any advice on safe and sensible limits on how far I can push maxBooleanClauses? Thanks for your advice, Liam -- Liam O'Boyle IntelligenceBank Pty Ltd Level 1, 31 Coventry Street Southbank, Victoria 3006, Australia P: +613 8618 7810 F: +613 8618 7899 M: +61 403 88 66 44 *Awarded 2010 Best New Business and Business of the Year - Business3000 Awards* This email and any attachments are confidential and may contain legally privileged information or copyright material. If you are not an intended recipient, please contact us at once by return email and then delete both messages. We do not accept liability in connection with transmission of information using the internet.
Problem with copyfield
I want to implement type ahead styling feature for description field.For that I defined ngtext fieldtype.I indexed description as text and then using copyfield indexed into ngtext field.But I found out that it is not working. If I put ngtext directly as a field type value without using copyfield it is working fine. I am not able to understand the reason behind it? fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType fieldType name=ngtext class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=50/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType field name=description type=text indexed=true stored=true / copyField source=id dest=ng_text/