Re: Show filename in search result using a FileListEntityProcessor
You should use file instead of fileName in column field column=file name=fileName/ Don't forget to add the 'fileName' to the schema.xml in the fields section. field name=fileName type=string indexed=true stored=true / Have fun, Daniel Rijkhof 06 12 14 12 17 On Mon, May 16, 2011 at 4:20 PM, Marcel Panse marcel.pa...@gmail.comwrote: Hi, thanks for the reply. I tried a couple of things both in the tika-test entity and in the entity named 'f'. In the tika-test entity I tried: field column=fileName name=${f.fileName} / field column=fileName name=${f.file} / even field column=fileName name=${f.fileAbsolutePath} / I also tried doing things in the entity 'f' like: field column=fileName name=fileName/ field column=fileName name=file/ None of it works. I also added fileName to the schema like: field name=fileName type=string indexed=true stored=true / In fields. Doesn't help. Can anyone provide me with a working example? I'm pretty stuck here on something that seems really trivial and simple :-( On Sat, May 14, 2011 at 22:56, kbootz kbo...@caci.com wrote: There is a JIRA item(can't recall it atm) that addresses the issue with the docs. I'm running 3.1 and per your example you should be able to get it using ${f.file}. I think* it should also be in the entity desc. but I'm also new and that's just how I access it. GL -- View this message in context: http://lucene.472066.n3.nabble.com/Show-filename-in-search-result-using-a-FileListEntityProcessor-tp2939193p2941305.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: filter cache and negative filter query
: query that in fact returns the negative results. As a simple example, : I believe that, for a boolean field, -field:true is exactly the same as : +field:false, but the former is a negative query and the latter is a that's not strictly true in all cases... * if the field is multivalued=true, a doc may contain both false and true in field, in which case it would match +field:false but it would not match -field:true * if the field is not multivalued-false, and required=false, a doc may not contain any value, in which case it would match -field:true but it would not match +field:false You're totally right. But it was just an example. I just didn't think about specifying the field to be single valued and required. I did some testing yesterday about how are filteres cached, using the admin interface. I noticed that if I perform a facet.query on a boolean field testing it to be true or false it always looks to add two entries to the query cache. May be it also adds an entry to test for unexsistence of the value? And if I perform a facet.field on the same boolean field, three new entries are inserted into the filter cache. May be one for true, one for false and one for unexsistence? I really don't know what it's exactly doing, but doesn't look, at first sight, like a very optimal behaviour... I'm testing on 1.4.1 lucidworks version of solr, using the boolean field inStock of its example schema, with its example data.
Out of memory on sorting
Hi, We are moving to a multi-core Solr installation with each of the core having millions of documents, also documents would be added to the index on an hourly basis. Everything seems to run find and I getting the expected result and performance, except where sorting is concerned. I have an index size of 13217121 documents, now when I want to get documents between two dates and then sort them by ID solr goes out of memory. This is with just me using the system, we might also have simultaneous users, how can I improve this performance? Rohit
Re: Out of memory on sorting
Explicit Warming of Sort Fields If you do a lot of field based sorting, it is advantageous to add explicitly warming queries to the newSearcher and firstSearcher event listeners in your solrconfig which sort on those fields, so the FieldCache is populated prior to any queries being executed by your users. firstSearcher lst str name=qsolr rocks/strstr name=start0/strstr name=rows10/strstr name=sortempID asc/str/lst On Thu, May 19, 2011 at 2:39 PM, Rohit ro...@in-rev.com wrote: Hi, We are moving to a multi-core Solr installation with each of the core having millions of documents, also documents would be added to the index on an hourly basis. Everything seems to run find and I getting the expected result and performance, except where sorting is concerned. I have an index size of 13217121 documents, now when I want to get documents between two dates and then sort them by ID solr goes out of memory. This is with just me using the system, we might also have simultaneous users, how can I improve this performance? Rohit
SOLR Custom datasource integration
Hi, We are trying build enterprise search solution using SOLR , out data source is Database which is interfaced with JPA. Solution looks like SOLR INDEX JPA Oracle database. We need help to findout what is the best approch integrate Solr Index with JPA. We tried out two appoches Approch 1 - 1 Polulating SolrInputDocument with data from JPA 2 Updating EmbeddedSolrServer with captured data using SolrJ API. Approch 2 - 1 Customizing dataimporthandler of HTTPSolrServer 2 Retrieving data in dataimporthandler using JPA entity. Functional requirement - 1 Solution should be performant for huge magnitude of data 2 Should be scalable We have few question which will help us to decide solution Will like know which one is better approch to meet our requirement. Is it good idea to integrate with Lucene against using EmbeddedSolrServer + JPA If JVM is crashes , EmbeddedSolrServer content will be lost on reboot. Can we get support from Jasper Experts team ? can we buy ? how ? -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Custom-datasource-integration-tp2960475p2960475.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Highlighting does not work when using !boost as a nested query
Hi, The query is generated dynamically and can be more or less complex depending on different parameters. I'm also not free to give many details of our implementation, but I'll give you the minimal query string that fails and the relevant pieces of the config. The query string is: /select?q=+id:12345^0.01 +_query_:{!boost b=$dateboost v=$qq deftype=dismax}dateboost=recip(ms(NOW/DAY,published_date),3.16e-11,1,1)qq=user_textqf=text1^2 text2pf=text1^2 text2tie=0.1q.alt=*:*hl=truehl.fl=text1 text2hl.mergeContiguous=true where id is an int and text1 and text2 are type text. hl.fl has proven to be necessary whenever I use dismax in an inner query. Ohterwise, only text2 (the default field) is highlighted, and not both fields appearing in qf. For example, q={!dismax v=$qq}... does not require hl.fl to highlight both text1 and text2. q=+_query_:{!dismax v=$qq}... only highlights text2, unless I specify hl.fl. The given query is probably not minimal in the sense that some of the dismax-related parameters can be omitted and the query still fails. But the one given always fails (and adding more complexity to it does not make it work, quite obviously). Unfortunately, hl.requireFieldMatch=false does not help. Request handler config is the following: requestHandler name=standard class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str /lst /requestHandler Highlighter config is the following: highlighting fragmenter name=gap class=org.apache.solr.highlight.GapFragmenter default=true lst name=defaults int name=hl.fragsize100/int /lst /fragmenter fragmenter name=regex class=org.apache.solr.highlight.RegexFragmenter lst name=defaults int name=hl.fragsize70/int float name=hl.regex.slop0.5/float str name=hl.regex.pattern[-\w ,/\n\']{20,200}/str /lst /fragmenter formatter name=html class=org.apache.solr.highlight.HtmlFormatter default=true lst name=defaults str name=hl.simple.preem/str str name=hl.simple.post/em/str /lst /formatter /highlighting If there's any other information that could be useful, just ask. Thank you very much for your help, Juan El 16/05/2011, a las 23:18, Chris Hostetter escribió: : As I said in my previous message, if I issue: : q=+field1:range +field2:value +_query_:{!dismax v=$qq} : highlighting works. I've just discovered the problem is not just with {!boost...}. If I just add a bf parameter to the previous query, highlighting also fails. : Anybody knows what can be happening? I'm really stuck on this problem... Just a hunch, but i suspect the problem has to do with highlighter (or maybe it's the fragment generator?) trying to determine matches from query types it doens't understand I thought there was a query param you could use to tell the highlighter to use an alternate query string (that would be simpler) instead of the real query ... but i'm not seeing it in the docs. hl.requireFieldMatch=false might also help (not sure) In general it would probably be helpful for folks if you could post the *entire* request you are making (full query string and all request params) along with the solrconfig.xml sections that show how your request handler and highlighter are configured. -Hoss
How do I write/build query using qf parameter of dismax handler for my use case?
Hi, How do I write/build a Solr query using dismax handler for my application specific use case explained below: Snippet of fields definition from schema.xml: field name=documentid type=string indexed=true stored=true required=true / field name=companyid type=long indexed=true stored=true required=true / field name=textfield1 type=text indexed=true stored=false required=true / field name=textfield2 type=text indexed=true stored=false required=true / field name=textfield3 type=text indexed=true stored=false required=true / uniqueKeydocumentid/uniqueKey defaultSearchFieldtextfield1/defaultSearchField Now, I want to search for documents containing solr and struts in all 3 text fields (textfield1, textfield2, textfield3) but within the companyid = 100. As you can see from above statement, companyid=100 is common here but search keywords should be searched only in 3 text fields (textfield1, textfield2, textfield3). I also understand that this can be written as shown below by qualifying all the 3 text fields explicitly: http://localhost/solr/select?q=companyid:100textfield1:solr AND strutstextfield2:solr AND strutstextfield3:solr AND struts But how do I write/build a query using qf parameter of dismax query handler, so that I don't need to specify all the 3 fields explicitly. Wiki says: For each word in the query string, dismax builds a DisjunctionMaxQuery object for that word across all of the fields in the qf param NOTE: I'm using edismax as my default query type in my Search Handler. Regards, Gnanam
RE: Out of memory on sorting
Thanks for pointing me in the right direction, now I see the configuration for firstsearcher or newsearcher, the str name=q needs to configured previously. In my case the q is every changing, users can actually search for anything and the possibilities of queries unlimited. How can I make this generic? -Rohit -Original Message- From: rajini maski [mailto:rajinima...@gmail.com] Sent: 19 May 2011 14:53 To: solr-user@lucene.apache.org Subject: Re: Out of memory on sorting Explicit Warming of Sort Fields If you do a lot of field based sorting, it is advantageous to add explicitly warming queries to the newSearcher and firstSearcher event listeners in your solrconfig which sort on those fields, so the FieldCache is populated prior to any queries being executed by your users. firstSearcher lst str name=qsolr rocks/strstr name=start0/strstr name=rows10/strstr name=sortempID asc/str/lst On Thu, May 19, 2011 at 2:39 PM, Rohit ro...@in-rev.com wrote: Hi, We are moving to a multi-core Solr installation with each of the core having millions of documents, also documents would be added to the index on an hourly basis. Everything seems to run find and I getting the expected result and performance, except where sorting is concerned. I have an index size of 13217121 documents, now when I want to get documents between two dates and then sort them by ID solr goes out of memory. This is with just me using the system, we might also have simultaneous users, how can I improve this performance? Rohit
Field collapsing patch issues
Hi All! Kindly provide me the links for suitable patches that are applied to solr version 1.4.1 and 3.0 so that field collapsing should work properly. Thanks in advance! Isha garg
Re: How do I write/build query using qf parameter of dismax handler for my use case?
edismax supports full query format of lucene parser.But you can search using filter queries eg. qf=textfield1, textfield2, textfield3fq=textfield1:solr AND strutsfq=textfield2:solr AND strutsfq=textfield3:solr AND struts fq=companyid:100 - Thanx: Grijesh www.gettinhahead.co.in -- View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-write-build-query-using-qf-parameter-of-dismax-handler-for-my-use-case-tp2960766p2960911.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: filter cache and negative filter query
lookups to work with an arbitrary query, you would either need to changed the cache structure from Query=DocSet to a mapping of Query=[DocSet,inverseionBit] and store the same cache value needs needs with two keys -- both the positive and the negative; or you keep the Well, I don't know how it's working right now, but I guess that, as the positive version is being stored, when you look a negative query up, you already have a similar lookup problem: or you store two keys for the same value or you just transform the negative query into a positive canonical one before looking it up. The same could be done in this case, with the difference that yes, you need an inversion bit stored too. The double lookup option sounds worse, though benchmarking should be done to know for sure. Would this optimization influence only memory usage or also smaller sets are faster to intersect, for example? Well, in any case, saving memory allows to use the additional memory to speed up the application, for example, with bigger caches.
Re: Highlighting does not work when using !boost as a nested query
By the way, I was wrong when saying that using bf instead of !boost did not work either. I probably hit more than one problem at the same time when I first tested that. I've retested now and this works: /select?q=+id:12345^0.01 +_query_:{!dismax v=$qq}bf=recip(ms(NOW/DAY,published_date),3.16e-11,1,1)qq=user_textqf=text1^2 text2pf=text1^2 text2tie=0.1q.alt=*:*hl=truehl.fl=text1 text2hl.mergeContiguous=true But I don't get the multiplicative boost I'd like to use... El 19/05/2011, a las 11:31, Juan Antonio Farré Basurte escribió: Hi, The query is generated dynamically and can be more or less complex depending on different parameters. I'm also not free to give many details of our implementation, but I'll give you the minimal query string that fails and the relevant pieces of the config. The query string is: /select?q=+id:12345^0.01 +_query_:{!boost b=$dateboost v=$qq deftype=dismax}dateboost=recip(ms(NOW/DAY,published_date),3.16e-11,1,1)qq=user_textqf=text1^2 text2pf=text1^2 text2tie=0.1q.alt=*:*hl=truehl.fl=text1 text2hl.mergeContiguous=true where id is an int and text1 and text2 are type text. hl.fl has proven to be necessary whenever I use dismax in an inner query. Ohterwise, only text2 (the default field) is highlighted, and not both fields appearing in qf. For example, q={!dismax v=$qq}... does not require hl.fl to highlight both text1 and text2. q=+_query_:{!dismax v=$qq}... only highlights text2, unless I specify hl.fl. The given query is probably not minimal in the sense that some of the dismax-related parameters can be omitted and the query still fails. But the one given always fails (and adding more complexity to it does not make it work, quite obviously). Unfortunately, hl.requireFieldMatch=false does not help. Request handler config is the following: requestHandler name=standard class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str /lst /requestHandler Highlighter config is the following: highlighting fragmenter name=gap class=org.apache.solr.highlight.GapFragmenter default=true lst name=defaults int name=hl.fragsize100/int /lst /fragmenter fragmenter name=regex class=org.apache.solr.highlight.RegexFragmenter lst name=defaults int name=hl.fragsize70/int float name=hl.regex.slop0.5/float str name=hl.regex.pattern[-\w ,/\n\']{20,200}/str /lst /fragmenter formatter name=html class=org.apache.solr.highlight.HtmlFormatter default=true lst name=defaults str name=hl.simple.preem/str str name=hl.simple.post/em/str /lst /formatter /highlighting If there's any other information that could be useful, just ask. Thank you very much for your help, Juan El 16/05/2011, a las 23:18, Chris Hostetter escribió: : As I said in my previous message, if I issue: : q=+field1:range +field2:value +_query_:{!dismax v=$qq} : highlighting works. I've just discovered the problem is not just with {!boost...}. If I just add a bf parameter to the previous query, highlighting also fails. : Anybody knows what can be happening? I'm really stuck on this problem... Just a hunch, but i suspect the problem has to do with highlighter (or maybe it's the fragment generator?) trying to determine matches from query types it doens't understand I thought there was a query param you could use to tell the highlighter to use an alternate query string (that would be simpler) instead of the real query ... but i'm not seeing it in the docs. hl.requireFieldMatch=false might also help (not sure) In general it would probably be helpful for folks if you could post the *entire* request you are making (full query string and all request params) along with the solrconfig.xml sections that show how your request handler and highlighter are configured. -Hoss
Solr book
Hello, Does anyone know if there is a v 3.1 book coming any time soon? Regards, Savvas
Re: indexing directed graph
Thank you Gora in advance! However, I decided to create a bean for indexing something like that: ... String[] vertices String[] edges int[] triple_inx_levels ... So I can search in vertices text edge text in vertices edges array fields, and I hope to find the relation from triple_inx_levels array, where I will save indexes ot the upper two array in specific order(with some math function, I do not find out yet). I will try in this way, I hope this will enough for me. -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-directed-graph-tp2949556p2960964.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr book
Hello! Take a look at the Solr resources page on the wiki (http://wiki.apache.org/solr/SolrResources). -- Regards, Rafał Kuć http://solr.pl
RE: How do I write/build query using qf parameter of dismax handler for my use case?
edismax supports full query format of lucene parser.But you can search using filter queries eg. qf=textfield1, textfield2, textfield3fq=textfield1:solr AND strutsfq=textfield2:solr AND strutsfq=textfield3:solr AND struts fq=companyid:100 Is it not possible to build query without filter queries fq? For example, something like this (I believe this is syntactically not correct, but something equivalent to this): q=companyid:100 AND solr AND strutsqf= textfield1,textfield2,textfield3 Basically, I'm just trying/finding to simplify the query syntax.
SOLR-2209
Hi All, I am having some problems with the presence of unnecessary parenthesis in my query. A query such as: title:software AND (title:engineer) will return no results. Remove the parenthesis fix the issue but then since my user can enter the parenthesis by himself I need to find a way to fix or work-around this bug. I found that this is related to SOLR-2209 but there is no activity on this bug. Anyone know if this will get fixed some time in the future or if it is already fixed in Solr 4? Otherwise, could someone point me to the code handling this so that I can attempt to make a fix? Thx
Re: Solr book
great, thanks! So, I guess the Solr In Action and Solr Cookbook will be based on 3.1.. :) 2011/5/19 Rafał Kuć ra...@alud.com.pl Hello! Take a look at the Solr resources page on the wiki (http://wiki.apache.org/solr/SolrResources). -- Regards, Rafał Kuć http://solr.pl
Re: sorting on date field in facet query
Hi Erick, It is about ordering the facet information. The result set is empty via rows=0. Here is the logics and example: Each doc has string field someStr and a date field associated with it, and same doc id has same value of the date field. Question: is it possible to sort the facet values given below on that date field? curl http://localhost:8983/solr/select?q=someStr:networkfacet=truefacet.field=idfacet.limit=1000facet.mincount=1rows=0 result excerpt: lst name=facet_fields lst name=id int name=T-AS_1386229 54 /int int name=T-AS_1386181 45 /int int name=T-CP_1370095 36 /int int name=T-AS_1377809 25 /int int name=T-CP_1380207 18 /int int name=T-CP_1373820 11 /int int name=T-AS_1372073-1 8 /int int name=T-AS_1367577 6 /int int name=T-AS_1383141 5 /int int name=T-AS_1383648-1 5 /int int name=T-AS_1351183-1 4 /int /lst /lst Regards, Dmitry On Wed, May 18, 2011 at 3:33 PM, Erick Erickson erickerick...@gmail.comwrote: Can you provide an example of what you are trying to do? Are you referring to ordering the result set or the facet information? Best Erick On Wed, May 18, 2011 at 7:21 AM, Dmitry Kan dmitry@gmail.com wrote: Hello list, Is it possible to sort on date field in a facet query in SOLR 3.1? -- Regards, Dmitry Kan
Re: Fuzzy search and solr 4.0
Well the good news is FuzzyQuery is indeed much faster in Lucene/Solr 4.0. But the bad news is... FuzzyQuery won't do what you need here. You need some sort of FuzzyPhraseQuery, which is able to replace terms similar to one another (comp/company/corporation) by some metric. I don't know of such a query in Lucene/Solr... but it'd be a nice addition. Others have asked about this before. FuzzyQuery finds terms close to other terms, when measured by edit distance, eg fuzzy/wuzzy/muzzy are all edit distance one from each other. Mike http://blog.mikemccandless.com On Wed, May 18, 2011 at 8:03 PM, Guilherme Aiolfi grad...@gmail.com wrote: Hi, I want to do a fuzzy search that compare a phrase to a field in solr. For example: abc company ltda will be compared to abc comp, abc corporation, def company ltda, nothing to match here. The thing is the it has to always returns documents sorted by its score. I've found some good algorithms to do that, like StrikeAMatch[1] and JaroWinkler. Using the JaroWinkler with strdist() I can do exactly that. But, I rather prefer to use the StrikeAMatch that had a patch in the lucene jira that was never commited. So, I contacted the author of that patch and he told me that I should use the solr 4.0 that it has now some pretty good new fuzzy search enhancements that made StrikeAMatch seems toys for kids. Anyone know how can I achieve that using solr 4.0? [1] http://www.catalysoft.com/articles/StrikeAMatch.html
Re: Out of memory on sorting
The warming queries warm up the caches used in sorting. So just including the sort=. will warm the sort caches. the terms searched are not important. The same is true with facets... However, I don't understand how that relates to your OOM problems. I'd expect the OOM to start happening on startup, you'd be doing the operation that runs you out of memory on startup... So, we need more details: 1 how is your sort field defined? String? Integer? If it's a string and you could change it to a numeric type, you'd use a lot less memory. 2 How many distinct terms? I'm guessing one/document actually, this is somewhat of an anti-pattern in Solr for all it's sometimes necessary. 3 How much memory are you allocating for the JVM? 4 What other fields are you sorting on and how many unique values in each? Solr Admin can help you here Best Erick On Thu, May 19, 2011 at 6:20 AM, Rohit ro...@in-rev.com wrote: Thanks for pointing me in the right direction, now I see the configuration for firstsearcher or newsearcher, the str name=q needs to configured previously. In my case the q is every changing, users can actually search for anything and the possibilities of queries unlimited. How can I make this generic? -Rohit -Original Message- From: rajini maski [mailto:rajinima...@gmail.com] Sent: 19 May 2011 14:53 To: solr-user@lucene.apache.org Subject: Re: Out of memory on sorting Explicit Warming of Sort Fields If you do a lot of field based sorting, it is advantageous to add explicitly warming queries to the newSearcher and firstSearcher event listeners in your solrconfig which sort on those fields, so the FieldCache is populated prior to any queries being executed by your users. firstSearcher lst str name=qsolr rocks/strstr name=start0/strstr name=rows10/strstr name=sortempID asc/str/lst On Thu, May 19, 2011 at 2:39 PM, Rohit ro...@in-rev.com wrote: Hi, We are moving to a multi-core Solr installation with each of the core having millions of documents, also documents would be added to the index on an hourly basis. Everything seems to run find and I getting the expected result and performance, except where sorting is concerned. I have an index size of 13217121 documents, now when I want to get documents between two dates and then sort them by ID solr goes out of memory. This is with just me using the system, we might also have simultaneous users, how can I improve this performance? Rohit
Re: Field collapsing patch issues
Here's the root issue, and all available patches: https://issues.apache.org/jira/browse/SOLR-236 I confess I have no clue what's what here, so you're largely on your own. There are some encouraging titles (note you can sort the patches by date, which might help in figuring out which to use).. Best Erick On Thu, May 19, 2011 at 6:43 AM, Isha Garg isha.g...@orkash.com wrote: Hi All! Kindly provide me the links for suitable patches that are applied to solr version 1.4.1 and 3.0 so that field collapsing should work properly. Thanks in advance! Isha garg
Spatial search with SolrJ 3.1 ? How to
How do you construct a query in java for spatial search ? not the default solr REST interface -- View this message in context: http://lucene.472066.n3.nabble.com/Spatial-search-with-SolrJ-3-1-How-to-tp2961136p2961136.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Out of memory on sorting
Hi Erick, My OOM problem starts when I query the core with 13217121 documents. My schema and other details are given below, 1 how is your sort field defined? String? Integer? If it's a string and you could change it to a numeric type, you'd use a lot less memory. We primarily use two different sort criteria one is a date field and the other is string (id). I cannot change the id field as this is also the uniquekey for my schema. 2 How many distinct terms? I'm guessing one/document actually,this is somewhat of an anti-pattern in Solr for all it's sometimes necessary. Since one of the field is a timestamp instance and the other a unique key all are distinct. (These are tweets happening for keyword) 3 How much memory are you allocating for the JVM? I am starting solr with the following command java -Xms1024M -Xmx-2048M start.jar All out test case for moving to solr has passed, this is proving to be a big set back. Help would be greatly appreciated. Regards, Rohit -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 19 May 2011 18:21 To: solr-user@lucene.apache.org Subject: Re: Out of memory on sorting The warming queries warm up the caches used in sorting. So just including the sort=. will warm the sort caches. the terms searched are not important. The same is true with facets... However, I don't understand how that relates to your OOM problems. I'd expect the OOM to start happening on startup, you'd be doing the operation that runs you out of memory on startup... So, we need more details: 1 how is your sort field defined? String? Integer? If it's a string and you could change it to a numeric type, you'd use a lot less memory. 2 How many distinct terms? I'm guessing one/document actually, this is somewhat of an anti-pattern in Solr for all it's sometimes necessary. 3 How much memory are you allocating for the JVM? 4 What other fields are you sorting on and how many unique values in each? Solr Admin can help you here Best Erick On Thu, May 19, 2011 at 6:20 AM, Rohit ro...@in-rev.com wrote: Thanks for pointing me in the right direction, now I see the configuration for firstsearcher or newsearcher, the str name=q needs to configured previously. In my case the q is every changing, users can actually search for anything and the possibilities of queries unlimited. How can I make this generic? -Rohit -Original Message- From: rajini maski [mailto:rajinima...@gmail.com] Sent: 19 May 2011 14:53 To: solr-user@lucene.apache.org Subject: Re: Out of memory on sorting Explicit Warming of Sort Fields If you do a lot of field based sorting, it is advantageous to add explicitly warming queries to the newSearcher and firstSearcher event listeners in your solrconfig which sort on those fields, so the FieldCache is populated prior to any queries being executed by your users. firstSearcher lst str name=qsolr rocks/strstr name=start0/strstr name=rows10/strstr name=sortempID asc/str/lst On Thu, May 19, 2011 at 2:39 PM, Rohit ro...@in-rev.com wrote: Hi, We are moving to a multi-core Solr installation with each of the core having millions of documents, also documents would be added to the index on an hourly basis. Everything seems to run find and I getting the expected result and performance, except where sorting is concerned. I have an index size of 13217121 documents, now when I want to get documents between two dates and then sort them by ID solr goes out of memory. This is with just me using the system, we might also have simultaneous users, how can I improve this performance? Rohit
Re: Spatial search with SolrJ 3.1 ? How to
On Thu, May 19, 2011 at 8:52 AM, martin_groenhof martin.groen...@yahoo.com wrote: How do you construct a query in java for spatial search ? not the default solr REST interface It depends on what you are trying to do - a spatial request (as currently implemented in Solr) is typically more than just a query... it can be filtering by a bounding box, filtering by a distance radius, or using a distance (geodist) function query in another way such as sorting by it or using it as a factor in relevance. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
Re: Fuzzy search and solr 4.0
You, or any other solr member, knows a good fuzzy string matching library to recommend? On Thu, May 19, 2011 at 9:39 AM, Michael McCandless luc...@mikemccandless.com wrote: Well the good news is FuzzyQuery is indeed much faster in Lucene/Solr 4.0. But the bad news is... FuzzyQuery won't do what you need here. You need some sort of FuzzyPhraseQuery, which is able to replace terms similar to one another (comp/company/corporation) by some metric. I don't know of such a query in Lucene/Solr... but it'd be a nice addition. Others have asked about this before. FuzzyQuery finds terms close to other terms, when measured by edit distance, eg fuzzy/wuzzy/muzzy are all edit distance one from each other. Mike http://blog.mikemccandless.com On Wed, May 18, 2011 at 8:03 PM, Guilherme Aiolfi grad...@gmail.com wrote: Hi, I want to do a fuzzy search that compare a phrase to a field in solr. For example: abc company ltda will be compared to abc comp, abc corporation, def company ltda, nothing to match here. The thing is the it has to always returns documents sorted by its score. I've found some good algorithms to do that, like StrikeAMatch[1] and JaroWinkler. Using the JaroWinkler with strdist() I can do exactly that. But, I rather prefer to use the StrikeAMatch that had a patch in the lucene jira that was never commited. So, I contacted the author of that patch and he told me that I should use the solr 4.0 that it has now some pretty good new fuzzy search enhancements that made StrikeAMatch seems toys for kids. Anyone know how can I achieve that using solr 4.0? [1] http://www.catalysoft.com/articles/StrikeAMatch.html
[Announce[ White paper describing Near Real Time Implementation with Solr and RankingAlgorithm
Hi! I would like to announce a white paper that describes the technical details of Near Real Time implementation with Solr and the RankingAlgorithm. The paper discusses the modifications made to enable NRT. You can download the white paper from here: http://solr-ra.tgels.com/papers/NRT_Solr_RankingAlgorithm.pdf The modified src can also be downloaded from here: http://solr-ra.tgels.com Regards, - Nagendra Nagarajayya http://solr-ra.tgels.com
Re: sorting on date field in facet query
The only two ways to influence facet order is by count and alphabetically. facet.sort=index will sort by alpha, the default is facet.sort=count All that said, I still don't quite understand what you're asking for. Facets are simply a count of the documents that have unique values for, in your case, the id field. It doesn't make sense to sort the returned facets by some other field. You can facet on the other field and sort *that*. Sorting the documents returned is unrelated, but I don't think that's what you're asking... Or I completely miss the point... Best Erick On Thu, May 19, 2011 at 8:24 AM, Dmitry Kan dmitry@gmail.com wrote: Hi Erick, It is about ordering the facet information. The result set is empty via rows=0. Here is the logics and example: Each doc has string field someStr and a date field associated with it, and same doc id has same value of the date field. Question: is it possible to sort the facet values given below on that date field? curl http://localhost:8983/solr/select?q=someStr:networkfacet=truefacet.field=idfacet.limit=1000facet.mincount=1rows=0 result excerpt: lst name=facet_fields lst name=id int name=T-AS_1386229 54 /int int name=T-AS_1386181 45 /int int name=T-CP_1370095 36 /int int name=T-AS_1377809 25 /int int name=T-CP_1380207 18 /int int name=T-CP_1373820 11 /int int name=T-AS_1372073-1 8 /int int name=T-AS_1367577 6 /int int name=T-AS_1383141 5 /int int name=T-AS_1383648-1 5 /int int name=T-AS_1351183-1 4 /int /lst /lst Regards, Dmitry On Wed, May 18, 2011 at 3:33 PM, Erick Erickson erickerick...@gmail.comwrote: Can you provide an example of what you are trying to do? Are you referring to ordering the result set or the facet information? Best Erick On Wed, May 18, 2011 at 7:21 AM, Dmitry Kan dmitry@gmail.com wrote: Hello list, Is it possible to sort on date field in a facet query in SOLR 3.1? -- Regards, Dmitry Kan
Re: sorting on date field in facet query
Dmitry, how should that work? Take a this short sample-data: id | date T-AS_1386229 | 1995-12-31T23:59:59Z T-AS_1386181 | 1996-12-31T23:59:59Z T-AS_1386229 | 1997-12-31T23:59:59Z So, you'll have two facets for the ids .. but how should they be sorted? One (of the two) is the first and the other the last Document .. so, sort by lowest date? highest date? i guess, that would/could not really work. Perhaps we have to ask another Question .. what are you trying to achieve? Boost by Date? Regards Stefan On Thu, May 19, 2011 at 2:24 PM, Dmitry Kan dmitry@gmail.com wrote: Hi Erick, It is about ordering the facet information. The result set is empty via rows=0. Here is the logics and example: Each doc has string field someStr and a date field associated with it, and same doc id has same value of the date field. Question: is it possible to sort the facet values given below on that date field? curl http://localhost:8983/solr/select?q=someStr:networkfacet=truefacet.field=idfacet.limit=1000facet.mincount=1rows=0 result excerpt: lst name=facet_fields lst name=id int name=T-AS_1386229 54 /int int name=T-AS_1386181 45 /int int name=T-CP_1370095 36 /int int name=T-AS_1377809 25 /int int name=T-CP_1380207 18 /int int name=T-CP_1373820 11 /int int name=T-AS_1372073-1 8 /int int name=T-AS_1367577 6 /int int name=T-AS_1383141 5 /int int name=T-AS_1383648-1 5 /int int name=T-AS_1351183-1 4 /int /lst /lst Regards, Dmitry On Wed, May 18, 2011 at 3:33 PM, Erick Erickson erickerick...@gmail.comwrote: Can you provide an example of what you are trying to do? Are you referring to ordering the result set or the facet information? Best Erick On Wed, May 18, 2011 at 7:21 AM, Dmitry Kan dmitry@gmail.com wrote: Hello list, Is it possible to sort on date field in a facet query in SOLR 3.1? -- Regards, Dmitry Kan
Re: Out of memory on sorting
See below: On Thu, May 19, 2011 at 9:06 AM, Rohit ro...@in-rev.com wrote: Hi Erick, My OOM problem starts when I query the core with 13217121 documents. My schema and other details are given below, H, how many cores are you running and what are they doing? Because they all use the same memory pool, so you may be getting some carry-over. So one strategy would be just to move this core to a dedicated machine. 1 how is your sort field defined? String? Integer? If it's a string and you could change it to a numeric type, you'd use a lot less memory. We primarily use two different sort criteria one is a date field and the other is string (id). I cannot change the id field as this is also the uniquekey for my schema. OK, but can you use a separate field just for sorting? Populate it with a copyField and sort on that rather than ID. This is only helpful if you can make a compact representation, e.g. integer. 2 How many distinct terms? I'm guessing one/document actually,this is somewhat of an anti-pattern in Solr for all it's sometimes necessary. Since one of the field is a timestamp instance and the other a unique key all are distinct. (These are tweets happening for keyword) Not one, but two fields where all values are distinct. Although I don't think the timestamp is much of a problem, assuming you're storing it as one of the numeric types (I'd especially make sure it was one of the Trie types, specifically tdate if you're going to do range queries). There are tricks for dealing with this, but your id field will get you a bigger bang for the buck, concentrate on that first. 3 How much memory are you allocating for the JVM? I am starting solr with the following command java -Xms1024M -Xmx-2048M start.jar Well, you can bump this higher if you're on 64 bit OSs, The other possibility is to shard your index. But really, with 13M documents this should fit on one machine. What does your statistics page tell you, especially about cache usage? All out test case for moving to solr has passed, this is proving to be a big set back. Help would be greatly appreciated. Regards, Rohit -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 19 May 2011 18:21 To: solr-user@lucene.apache.org Subject: Re: Out of memory on sorting The warming queries warm up the caches used in sorting. So just including the sort=. will warm the sort caches. the terms searched are not important. The same is true with facets... However, I don't understand how that relates to your OOM problems. I'd expect the OOM to start happening on startup, you'd be doing the operation that runs you out of memory on startup... So, we need more details: 1 how is your sort field defined? String? Integer? If it's a string and you could change it to a numeric type, you'd use a lot less memory. 2 How many distinct terms? I'm guessing one/document actually, this is somewhat of an anti-pattern in Solr for all it's sometimes necessary. 3 How much memory are you allocating for the JVM? 4 What other fields are you sorting on and how many unique values in each? Solr Admin can help you here Best Erick On Thu, May 19, 2011 at 6:20 AM, Rohit ro...@in-rev.com wrote: Thanks for pointing me in the right direction, now I see the configuration for firstsearcher or newsearcher, the str name=q needs to configured previously. In my case the q is every changing, users can actually search for anything and the possibilities of queries unlimited. How can I make this generic? -Rohit -Original Message- From: rajini maski [mailto:rajinima...@gmail.com] Sent: 19 May 2011 14:53 To: solr-user@lucene.apache.org Subject: Re: Out of memory on sorting Explicit Warming of Sort Fields If you do a lot of field based sorting, it is advantageous to add explicitly warming queries to the newSearcher and firstSearcher event listeners in your solrconfig which sort on those fields, so the FieldCache is populated prior to any queries being executed by your users. firstSearcher lst str name=qsolr rocks/strstr name=start0/strstr name=rows10/strstr name=sortempID asc/str/lst On Thu, May 19, 2011 at 2:39 PM, Rohit ro...@in-rev.com wrote: Hi, We are moving to a multi-core Solr installation with each of the core having millions of documents, also documents would be added to the index on an hourly basis. Everything seems to run find and I getting the expected result and performance, except where sorting is concerned. I have an index size of 13217121 documents, now when I want to get documents between two dates and then sort them by ID solr goes out of memory. This is with just me using the system, we might also have simultaneous users, how can I improve this performance? Rohit
Re: sorting on date field in facet query
Hi, Thanks for the questions, guys, and sorry for the confusion. I should start with a broader picture of what we are trying to achieve. The only problem is that I cannot speak about specifics of the task we are solving the way we do. We currently sort the facets on the client side, having the date values at hand (done by an boolean query to SOLR with a list of ids). However, sometimes we have glitches, that is since we limit the facets to first facet.limit ones, and there is no date boosting we may have some facet counts end up beyond the facet counts range and that's sad. One way around it would be to facet with pagination, where a page would correspond to a date subrange in the range of required dates. But we haven't tried it yet before we investigate what can be done inside SOLR (by modifying its source code, if needed). So as said every solr doc that has some id in the solr index (this id is used to combine several solr docs logically, only that purpose; this design comes from the task definition) has a date field, and the value of that date field is always same for a given doc id across all the solr docs with the same doc id. Now, taking the Stefan's example, I would like to sort desc the facets by date (yes, date boosting during the facet gathering process) that were calculated against someStr field: int name=T-AS_1386181 45 /int int name=T-AS_1386229 54 /int So SOLR facet component would ignore the counts and sort the facets by dates desc (in reverse chronological order). Is it possible to implement such a solution through some class inheritance in facet component? Regards, Dmitry On Thu, May 19, 2011 at 4:25 PM, Stefan Matheis matheis.ste...@googlemail.com wrote: Dmitry, how should that work? Take a this short sample-data: id | date T-AS_1386229 | 1995-12-31T23:59:59Z T-AS_1386181 | 1996-12-31T23:59:59Z T-AS_1386229 | 1997-12-31T23:59:59Z So, you'll have two facets for the ids .. but how should they be sorted? One (of the two) is the first and the other the last Document .. so, sort by lowest date? highest date? i guess, that would/could not really work. Perhaps we have to ask another Question .. what are you trying to achieve? Boost by Date? Regards Stefan On Thu, May 19, 2011 at 2:24 PM, Dmitry Kan dmitry@gmail.com wrote: Hi Erick, It is about ordering the facet information. The result set is empty via rows=0. Here is the logics and example: Each doc has string field someStr and a date field associated with it, and same doc id has same value of the date field. Question: is it possible to sort the facet values given below on that date field? curl http://localhost:8983/solr/select?q=someStr:networkfacet=truefacet.field=idfacet.limit=1000facet.mincount=1rows=0 result excerpt: lst name=facet_fields lst name=id int name=T-AS_1386229 54 /int int name=T-AS_1386181 45 /int int name=T-CP_1370095 36 /int int name=T-AS_1377809 25 /int int name=T-CP_1380207 18 /int int name=T-CP_1373820 11 /int int name=T-AS_1372073-1 8 /int int name=T-AS_1367577 6 /int int name=T-AS_1383141 5 /int int name=T-AS_1383648-1 5 /int int name=T-AS_1351183-1 4 /int /lst /lst Regards, Dmitry On Wed, May 18, 2011 at 3:33 PM, Erick Erickson erickerick...@gmail.com wrote: Can you provide an example of what you are trying to do? Are you referring to ordering the result set or the facet information? Best Erick On Wed, May 18, 2011 at 7:21 AM, Dmitry Kan dmitry@gmail.com wrote: Hello list, Is it possible to sort on date field in a facet query in SOLR 3.1? -- Regards, Dmitry Kan
Re: SOLR-2209
What version of Solr are you using? Because this works fine for me. Could you attach the results of adding debugQuery=on in both instances? The parsed form of the query is identical in 1.4.1 as far as I can tell. The bug you're referencing is a peculiarity of the not (-) operator I think. Best Erick On Thu, May 19, 2011 at 7:25 AM, Jean-Sebastien Vachon jean-sebastien.vac...@wantedtech.com wrote: Hi All, I am having some problems with the presence of unnecessary parenthesis in my query. A query such as: title:software AND (title:engineer) will return no results. Remove the parenthesis fix the issue but then since my user can enter the parenthesis by himself I need to find a way to fix or work-around this bug. I found that this is related to SOLR-2209 but there is no activity on this bug. Anyone know if this will get fixed some time in the future or if it is already fixed in Solr 4? Otherwise, could someone point me to the code handling this so that I can attempt to make a fix? Thx
Re: Spatial search with SolrJ 3.1 ? How to
I don't care about the method, I just want results within let's say 10km of a lat,lng ? (I can do this with REST) but don't know how to with a Java API [code]SpatialOptions spatialOptions = new SpatialOptions(company.getLatitude() + , + company.getLongitude(), 10, new SchemaField(geolocation, null), searchName, 20, DistanceUnits.KILOMETERS); LatLonType latLonType = new LatLonType(); Query query = latLonType.createSpatialQuery(new SpatialFilterQParser(searchString.toString(), solrq, solrq, null, true), spatialOptions);[/code] (I am trying with this, but it does not seem to be compatible with solr only lucene) Any example will do, Thx -- View this message in context: http://lucene.472066.n3.nabble.com/Spatial-search-with-SolrJ-3-1-How-to-tp2961136p2961452.html Sent from the Solr - User mailing list archive at Nabble.com.
Facetting: Some questions concerning method:fc
Hey all! I have a few questions concerning the field cache method for faceting. The wiki says for enum method: This was the default (and only) method for faceting multi-valued fields prior to Solr 1.4. . And for fc method: This was the default method for single valued fields prior to Solr 1.4. . I just ran into the problem of using fc for a field which can have multiple terms for one field. The facet counts would be wrong, seemingly only counting the first term in the field of each document. I observed this in Solr 1.4.1 and in 3.1 with the same index. Question 1: The quotes above say prior to Solr 1.4. Has this changed? Is there another method for multi-valued faceting since Solr 1.4? Question 2: Very weird is another observation: When faceting on another field, namely the text field holding a large variety of terms and especially a lot of different terms in one single field, the fc method seems to count everything correctly. In fact, the results between fc and enum don't seem to differ. The field in which the fc and enum faceting results differ consists of a lot of terms which have all start- end end offsets 0, 0 and position increment 1. Could this be a problem? Best regards, Erik
how to convert YYYY-MM-DD to YYY-MM-DD hh:mm:ss - DIH
Hello i want to index some datefields with this dateformat: -mm-dd. Solr thwows an exception like this: can not be represented as java.sql.Date i am unsing ...transformer=DateFormatTransformer and ...zeroDateTimeBehavoir=convertToNull how can i say to DIH to convert this fields in correct format ?? thx - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-convert--MM-DD-to-YYY-MM-DD-hh-mm-ss-DIH-tp2961481p2961481.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Facetting: Some questions concerning method:fc
On Thu, May 19, 2011 at 9:56 AM, Erik Fäßler erik.faess...@uni-jena.de wrote: I have a few questions concerning the field cache method for faceting. The wiki says for enum method: This was the default (and only) method for faceting multi-valued fields prior to Solr 1.4. . And for fc method: This was the default method for single valued fields prior to Solr 1.4. . I just ran into the problem of using fc for a field which can have multiple terms for one field. The facet counts would be wrong, seemingly only counting the first term in the field of each document. I observed this in Solr 1.4.1 and in 3.1 with the same index. That doesn't sound right... the results should always be identical between facet.method=fc and facet.method=enum. Are you sure you didn't index a multi-valued field and then change the fieldType in the schema to be single valued? Are you sure the field is indexed the way you think it is? If so, is there an easy way for someone to reproduce what you are seeing? -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
Re: sorting on date field in facet query
Oh, isn't that ducky. The facet.sort parameter only sorts ascending as far as I can tell. Which is exactly the reverse of what you want. Would it work to cleverly encode the facet field to do what you want just by a lexical sort? Something like use a very large constant, subtract the date for each record from that and then put that in a new field that you facet/sort by? Then un-transform it for display? Let's say you have a range from 0-9. Then your facet field could be something like original doc values doc 1: 2 - oldest doc 2: 5 doc 3: 8 - newest You'd store values like these in facetme (9 - orig value) + text doc1: 7_docid1 doc2: 4_docid2 doc3: 1_docid3 Now a natural ordering (facet.sort=index) wold return them in date order. If this was a well-defined process you could easily transform it back for proper display. Although watch out for leading zeros! Thinking off the top of my head here Erick On Thu, May 19, 2011 at 9:46 AM, Dmitry Kan dmitry@gmail.com wrote: Hi, Thanks for the questions, guys, and sorry for the confusion. I should start with a broader picture of what we are trying to achieve. The only problem is that I cannot speak about specifics of the task we are solving the way we do. We currently sort the facets on the client side, having the date values at hand (done by an boolean query to SOLR with a list of ids). However, sometimes we have glitches, that is since we limit the facets to first facet.limit ones, and there is no date boosting we may have some facet counts end up beyond the facet counts range and that's sad. One way around it would be to facet with pagination, where a page would correspond to a date subrange in the range of required dates. But we haven't tried it yet before we investigate what can be done inside SOLR (by modifying its source code, if needed). So as said every solr doc that has some id in the solr index (this id is used to combine several solr docs logically, only that purpose; this design comes from the task definition) has a date field, and the value of that date field is always same for a given doc id across all the solr docs with the same doc id. Now, taking the Stefan's example, I would like to sort desc the facets by date (yes, date boosting during the facet gathering process) that were calculated against someStr field: int name=T-AS_1386181 45 /int int name=T-AS_1386229 54 /int So SOLR facet component would ignore the counts and sort the facets by dates desc (in reverse chronological order). Is it possible to implement such a solution through some class inheritance in facet component? Regards, Dmitry On Thu, May 19, 2011 at 4:25 PM, Stefan Matheis matheis.ste...@googlemail.com wrote: Dmitry, how should that work? Take a this short sample-data: id | date T-AS_1386229 | 1995-12-31T23:59:59Z T-AS_1386181 | 1996-12-31T23:59:59Z T-AS_1386229 | 1997-12-31T23:59:59Z So, you'll have two facets for the ids .. but how should they be sorted? One (of the two) is the first and the other the last Document .. so, sort by lowest date? highest date? i guess, that would/could not really work. Perhaps we have to ask another Question .. what are you trying to achieve? Boost by Date? Regards Stefan On Thu, May 19, 2011 at 2:24 PM, Dmitry Kan dmitry@gmail.com wrote: Hi Erick, It is about ordering the facet information. The result set is empty via rows=0. Here is the logics and example: Each doc has string field someStr and a date field associated with it, and same doc id has same value of the date field. Question: is it possible to sort the facet values given below on that date field? curl http://localhost:8983/solr/select?q=someStr:networkfacet=truefacet.field=idfacet.limit=1000facet.mincount=1rows=0 result excerpt: lst name=facet_fields lst name=id int name=T-AS_1386229 54 /int int name=T-AS_1386181 45 /int int name=T-CP_1370095 36 /int int name=T-AS_1377809 25 /int int name=T-CP_1380207 18 /int int name=T-CP_1373820 11 /int int name=T-AS_1372073-1 8 /int int name=T-AS_1367577 6 /int int name=T-AS_1383141 5 /int int name=T-AS_1383648-1 5 /int int name=T-AS_1351183-1 4 /int /lst /lst Regards, Dmitry On Wed, May 18, 2011 at 3:33 PM, Erick Erickson erickerick...@gmail.com wrote: Can you provide an example of what you are trying to do? Are you referring to ordering the result set or the facet information? Best Erick On Wed, May 18, 2011 at 7:21 AM, Dmitry Kan dmitry@gmail.com wrote: Hello list, Is it possible to sort on date field in a facet query in SOLR 3.1? -- Regards, Dmitry Kan
Re: sorting on date field in facet query
Thanks Erick, this sounds solid to me! It of course will require the repost of the entire index (pretty big one, sharded), but that's not an issue as we periodically do that anyway. Thanks and regards, Dmitry On Thu, May 19, 2011 at 5:08 PM, Erick Erickson erickerick...@gmail.comwrote: Oh, isn't that ducky. The facet.sort parameter only sorts ascending as far as I can tell. Which is exactly the reverse of what you want. Would it work to cleverly encode the facet field to do what you want just by a lexical sort? Something like use a very large constant, subtract the date for each record from that and then put that in a new field that you facet/sort by? Then un-transform it for display? Let's say you have a range from 0-9. Then your facet field could be something like original doc values doc 1: 2 - oldest doc 2: 5 doc 3: 8 - newest You'd store values like these in facetme (9 - orig value) + text doc1: 7_docid1 doc2: 4_docid2 doc3: 1_docid3 Now a natural ordering (facet.sort=index) wold return them in date order. If this was a well-defined process you could easily transform it back for proper display. Although watch out for leading zeros! Thinking off the top of my head here Erick On Thu, May 19, 2011 at 9:46 AM, Dmitry Kan dmitry@gmail.com wrote: Hi, Thanks for the questions, guys, and sorry for the confusion. I should start with a broader picture of what we are trying to achieve. The only problem is that I cannot speak about specifics of the task we are solving the way we do. We currently sort the facets on the client side, having the date values at hand (done by an boolean query to SOLR with a list of ids). However, sometimes we have glitches, that is since we limit the facets to first facet.limit ones, and there is no date boosting we may have some facet counts end up beyond the facet counts range and that's sad. One way around it would be to facet with pagination, where a page would correspond to a date subrange in the range of required dates. But we haven't tried it yet before we investigate what can be done inside SOLR (by modifying its source code, if needed). So as said every solr doc that has some id in the solr index (this id is used to combine several solr docs logically, only that purpose; this design comes from the task definition) has a date field, and the value of that date field is always same for a given doc id across all the solr docs with the same doc id. Now, taking the Stefan's example, I would like to sort desc the facets by date (yes, date boosting during the facet gathering process) that were calculated against someStr field: int name=T-AS_1386181 45 /int int name=T-AS_1386229 54 /int So SOLR facet component would ignore the counts and sort the facets by dates desc (in reverse chronological order). Is it possible to implement such a solution through some class inheritance in facet component? Regards, Dmitry On Thu, May 19, 2011 at 4:25 PM, Stefan Matheis matheis.ste...@googlemail.com wrote: Dmitry, how should that work? Take a this short sample-data: id | date T-AS_1386229 | 1995-12-31T23:59:59Z T-AS_1386181 | 1996-12-31T23:59:59Z T-AS_1386229 | 1997-12-31T23:59:59Z So, you'll have two facets for the ids .. but how should they be sorted? One (of the two) is the first and the other the last Document .. so, sort by lowest date? highest date? i guess, that would/could not really work. Perhaps we have to ask another Question .. what are you trying to achieve? Boost by Date? Regards Stefan On Thu, May 19, 2011 at 2:24 PM, Dmitry Kan dmitry@gmail.com wrote: Hi Erick, It is about ordering the facet information. The result set is empty via rows=0. Here is the logics and example: Each doc has string field someStr and a date field associated with it, and same doc id has same value of the date field. Question: is it possible to sort the facet values given below on that date field? curl http://localhost:8983/solr/select?q=someStr:networkfacet=truefacet.field=idfacet.limit=1000facet.mincount=1rows=0 result excerpt: lst name=facet_fields lst name=id int name=T-AS_1386229 54 /int int name=T-AS_1386181 45 /int int name=T-CP_1370095 36 /int int name=T-AS_1377809 25 /int int name=T-CP_1380207 18 /int int name=T-CP_1373820 11 /int int name=T-AS_1372073-1 8 /int int name=T-AS_1367577 6 /int int name=T-AS_1383141 5 /int int name=T-AS_1383648-1 5 /int int name=T-AS_1351183-1 4 /int /lst /lst Regards, Dmitry On Wed, May 18, 2011 at 3:33 PM, Erick Erickson erickerick...@gmail.com wrote: Can you provide an example of what you are trying to do? Are you referring to ordering the result set or the facet information?
Re: lucene parser, negative OR operands
On 5/18/2011 9:07 PM, Chris Hostetter wrote: You could implement a parser like that relatively easily -- just make sure you put a MatchAllDocsQuery in every BooleanQuery object thta you construct, and only ever use the PROHIBITED and MANDATORY clause types (never OPTIONAL) ... the thing is, a parser like that isn't as useful as you think it might be when dealing with search results. OPTIONAL clauses are where most of the useful factors of scoring documents ocme into play. Thanks for the background and ideas, very helpful. Hmm, but what if it DID use OPTIONAL clause types but just turned all pure-negative clauses into the alternative combination with MatchAllDocsQuery ( *:* AND $pure_negative)? Just like lucene query parser does now, but not only for top-level clauses. Seems like that might maintain the power of optional clauses for scoring, but still allow negative clauses to work the 'boolean logic' way people expect -- same rationale that has the query parser doing this at top-level, why not do it for sub-clauses as well? Does that have any promise do you think? Jonathan
Re: how to convert YYYY-MM-DD to YYY-MM-DD hh:mm:ss - DIH
Try this in your query: TIME_FORMAT(timeDb, '%H:%i') as timefield http://www.java2s.com/Tutorial/MySQL/0280__Date-Time-Functions/TIMEFORMATtimeformat.htm -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-convert--MM-DD-to-YYY-MM-DD-hh-mm-ss-DIH-tp2961481p2961591.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: sorting on date field in facet query
This is more a speculation than direction, I don't currently use Field Collapsing but my take on it is that it returns the number of docs collapsed. So instead of faceting could you do a search returning DocID, collapsing on DocID sorting on date, then the count of collapsed docs *should* match the facet count? Just wondering. -- View this message in context: http://lucene.472066.n3.nabble.com/sorting-on-date-field-in-facet-query-tp2956540p2961612.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: sorting on date field in facet query
Hi, 1. Is it possible to produce the collapsed docs count in the same query? 2. What is the performance of Field Collapsing versus Facet Search? Dmitry On Thu, May 19, 2011 at 5:36 PM, kenf_nc ken.fos...@realestate.com wrote: This is more a speculation than direction, I don't currently use Field Collapsing but my take on it is that it returns the number of docs collapsed. So instead of faceting could you do a search returning DocID, collapsing on DocID sorting on date, then the count of collapsed docs *should* match the facet count? Just wondering. -- View this message in context: http://lucene.472066.n3.nabble.com/sorting-on-date-field-in-facet-query-tp2956540p2961612.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Dmitry Kan
Re: how to convert YYYY-MM-DD to YYY-MM-DD hh:mm:ss - DIH
did you mean something like this ? DATE_FORMAT(cp.field, '%Y-%m-%di %H:%i:%s') AS field ??? i think i need to add the timestamp to my date fields? or not ? why cannot DIH handle with this ? - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-convert--MM-DD-to-YYY-MM-DD-hh-mm-ss-DIH-tp2961481p2961684.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: sorting on date field in facet query
Oooh, that's clever The glitch is that field collapsing is scheduled for 3.2, but that probably means the patch is close to being applicable to 3.1 but I don't know that for sure. Erick On Thu, May 19, 2011 at 10:36 AM, kenf_nc ken.fos...@realestate.com wrote: This is more a speculation than direction, I don't currently use Field Collapsing but my take on it is that it returns the number of docs collapsed. So instead of faceting could you do a search returning DocID, collapsing on DocID sorting on date, then the count of collapsed docs *should* match the facet count? Just wondering. -- View this message in context: http://lucene.472066.n3.nabble.com/sorting-on-date-field-in-facet-query-tp2956540p2961612.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to convert YYYY-MM-DD to YYY-MM-DD hh:mm:ss - DIH
Offhand, I don't think the problem is DIH since your stack trace specifies a SQL error. What is the SQL you're using? And the DIH configuration? Best Erick On Thu, May 19, 2011 at 10:53 AM, stockii stock.jo...@googlemail.com wrote: did you mean something like this ? DATE_FORMAT(cp.field, '%Y-%m-%di %H:%i:%s') AS field ??? i think i need to add the timestamp to my date fields? or not ? why cannot DIH handle with this ? - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-convert--MM-DD-to-YYY-MM-DD-hh-mm-ss-DIH-tp2961481p2961684.html Sent from the Solr - User mailing list archive at Nabble.com.
New release of Python/Solr library Sunburnt
Hi, I'd like to announce the release of a new version of my Python-Solr library, sunburnt: http://pypi.python.org/pypi/sunburnt/0.5 Documentation and tutorial examples are available at: http://opensource.timetric.com/sunburnt/ and there's a mailing list for discussion at http://groups.google.com/group/python-sunburnt Sunburnt was written initially for use with the Timetric platform (http://timetric.com) and is in use by several other internet-scale sites. Toby -- http://timetric.com 2nd Floor, White Bear Yard, 144a Clerkenwell Road, London EC1R 5DF phone: +44 20 3286 0677 (office), +44 7747 603618 (mobile) twitter: @timetric, @tow21 | skype: tobyohwhite
Re: how to convert YYYY-MM-DD to YYY-MM-DD hh:mm:ss - DIH
entity name=foo pk=cp_id transformer=DateFormatTransformer query=SELECT ..., ...some fields ... cp.start_date_1, cp.start_date_2, cp.end_date_1, cp.end_date_2, .. some other fields .. FROM ... /entity that not works with fields with this value: -00-00 OR/AND 2011-05-18 id tried with: field column=start_date_1 dateTimeFormat=-MM-dd'T'hh:mm:ss / but solr say always that these fields have a wrong format ! i try my sql-selects before i post it here ,-) - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-convert--MM-DD-to-YYY-MM-DD-hh-mm-ss-DIH-tp2961481p2961787.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to convert YYYY-MM-DD to YYY-MM-DD hh:mm:ss - DIH
okay, i found the problem. i put the fields two times in my data-config ;-) - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-convert--MM-DD-to-YYY-MM-DD-hh-mm-ss-DIH-tp2961481p2961834.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Replication - replicated failed at the same time?
Hm, anyone? On Sat, May 14, 2011 at 7:11 PM, Stefan Matheis matheis.ste...@googlemail.com wrote: Hi Guys, while working on the UI for Replication, i've got confused sometimes because of the following response (from /replication?command=details): ?xml version=1.0 encoding=UTF-8? response lst name=details lst name=slave !-- .. -- str name=indexReplicatedAtSat May 14 16:25:53 UTC 2011/str arr name=indexReplicatedAtList strSat May 14 16:25:53 UTC 2011/str /arr str name=replicationFailedAtSat May 14 16:25:53 UTC 2011/str arr name=replicationFailedAtList strSat May 14 16:25:53 UTC 2011/str /arr !-- .. -- /lst /lst /response To reproduce that: Start with Solr-Instance (with a clean index), trigger replication, abort fetch - look at the details. Does not really make sense to me? If it's okay .. please let me know: how why - especially interested in how to display that information in the UI (Current State: http://files.mathe.is/solr-admin/10_replication.png). Regards Stefan
Re: Facetting: Some questions concerning method:fc
Am 19.05.2011 16:07, schrieb Yonik Seeley: On Thu, May 19, 2011 at 9:56 AM, Erik Fäßlererik.faess...@uni-jena.de wrote: I have a few questions concerning the field cache method for faceting. The wiki says for enum method: This was the default (and only) method for faceting multi-valued fields prior to Solr 1.4. . And for fc method: This was the default method for single valued fields prior to Solr 1.4. . I just ran into the problem of using fc for a field which can have multiple terms for one field. The facet counts would be wrong, seemingly only counting the first term in the field of each document. I observed this in Solr 1.4.1 and in 3.1 with the same index. That doesn't sound right... the results should always be identical between facet.method=fc and facet.method=enum. Are you sure you didn't index a multi-valued field and then change the fieldType in the schema to be single valued? Are you sure the field is indexed the way you think it is? If so, is there an easy way for someone to reproduce what you are seeing? -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco Thanks a lot for your help: Changing the field type to multiValued did the trick. The point is, I built the index using Lucene directly (I need to for some special manipulation of offsets and position increments). So my question is which requirements a Lucene field has to fulfill so Solr's faceting works correctly. Particular question: In Lucene terms, what exactly is denoted by a multiValued field? I thought that would result in multiple Lucene Field instances with the same name for a single document. But I think my field has only one instance per document (but I could check that back). Thanks again for your quick and helpful answer! Erik
DIH Response
Hello, We have configured solr for delta processing through DIH and we kick off the index request from within a batch process. However, we somehow need to know whether our indexing request succeeded or not because we want to be able to rollback a db transaction if that step fails. By looking at the SolrServer API we weren't able to find a method that could help us with that, so the only solution we see is by constantly polling the server and parsing the response for the idle or Rolledback words. What we noticed though is that the response also contains a message saying This response format is experimental. It is likely to change in the future. Does this mean that we can't rely on this response to build our module? Is there a better way? Thank you, Savvas
Similarity class for an individual field
Hi all, Based on advice I received on a previous email thread, I applied patch https://issues.apache.org/jira/browse/SOLR-2338. My goal was to be able to apply a similarity class to certain fields but not all fields. I ran the following commands: $ cd your Solr trunk checkout dir $ svn up $ wget https://issues.apache.org/jira/secure/attachment/12475027/SOLR-2338.patch $ patch -p0 -i SOLR-2338.patch And I did not get any errors. I then created my own SimilarityClass listed below because it isn't very large: package org.apache.lucene.misc; import org.apache.lucene.search.DefaultSimilarity; public class SimpleSimilarity extends DefaultSimilarity { public SimpleSimilarity() { super(); } public float idf(int dont, int care) { return 1; } } As you can see, it isn't very complicated. I'm just trying to remove the idf from the scoring equation in certain cases. Next, I make a change to the schema.xml file: fieldType name=string_noidf class=solr.StrField sortMissingLast=true omitNorms=true similarity class=org.apache.lucene.misc.SimpleSimilarity/ /fieldType And apply that to the field in question: field name=string_noidf multiValued=true type=string_noidf indexed=true stored=true required=false omitNorms=true / But I think something did not get applied correctly to the patch. I restarted and did a full import but the scores are exactly the same. Also, I tried using the existing SweetSpotSimilarity: fieldType name=string_noidf class=solr.StrField sortMissingLast=true omitNorms=true similarity class=org.apache.lucene.misc.SweetSpotSimilarity/ /fieldType But the scores remained unchanged even in that case. At this point, I'm not quite sure how to debug this to see whether the problem is with the patch or the similarity class but given that the SweetSpot similarity class didn't work either, I'm inclined to think it was a problem with the patch. Any thoughts on this one? Thanks, Brian Lamb
Re: Similarity class for an individual field
Also, I've tried adding: similarity class=org.apache.lucene.misc.SweetSpotSimilarity/ To the end of the schema file so that it is applied globally but it does not appear to change the score either. What am I doing incorrectly? Thanks, Brian Lamb On Thu, May 19, 2011 at 2:45 PM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, Based on advice I received on a previous email thread, I applied patch https://issues.apache.org/jira/browse/SOLR-2338. My goal was to be able to apply a similarity class to certain fields but not all fields. I ran the following commands: $ cd your Solr trunk checkout dir $ svn up $ wget https://issues.apache.org/jira/secure/attachment/12475027/SOLR-2338.patch $ patch -p0 -i SOLR-2338.patch And I did not get any errors. I then created my own SimilarityClass listed below because it isn't very large: package org.apache.lucene.misc; import org.apache.lucene.search.DefaultSimilarity; public class SimpleSimilarity extends DefaultSimilarity { public SimpleSimilarity() { super(); } public float idf(int dont, int care) { return 1; } } As you can see, it isn't very complicated. I'm just trying to remove the idf from the scoring equation in certain cases. Next, I make a change to the schema.xml file: fieldType name=string_noidf class=solr.StrField sortMissingLast=true omitNorms=true similarity class=org.apache.lucene.misc.SimpleSimilarity/ /fieldType And apply that to the field in question: field name=string_noidf multiValued=true type=string_noidf indexed=true stored=true required=false omitNorms=true / But I think something did not get applied correctly to the patch. I restarted and did a full import but the scores are exactly the same. Also, I tried using the existing SweetSpotSimilarity: fieldType name=string_noidf class=solr.StrField sortMissingLast=true omitNorms=true similarity class=org.apache.lucene.misc.SweetSpotSimilarity/ /fieldType But the scores remained unchanged even in that case. At this point, I'm not quite sure how to debug this to see whether the problem is with the patch or the similarity class but given that the SweetSpot similarity class didn't work either, I'm inclined to think it was a problem with the patch. Any thoughts on this one? Thanks, Brian Lamb
Re: Similarity class for an individual field
I tried editing the SweetSpotSimilarity class located at lucene/contrib/misc/src/java/org/apache/lucene/misc/SweetSpotSimilarity.java to just return 1 for each function and the score does not change at all. This has led me to believe that it does not recognize similarity at all. At this point, all I have for similarity is the line at the end of the file to apply similarity to all searches but that does not even work. So where am I going wrong? Thanks, Brian Lamb On Thu, May 19, 2011 at 3:41 PM, Brian Lamb brian.l...@journalexperts.comwrote: Also, I've tried adding: similarity class=org.apache.lucene.misc.SweetSpotSimilarity/ To the end of the schema file so that it is applied globally but it does not appear to change the score either. What am I doing incorrectly? Thanks, Brian Lamb On Thu, May 19, 2011 at 2:45 PM, Brian Lamb brian.l...@journalexperts.com wrote: Hi all, Based on advice I received on a previous email thread, I applied patch https://issues.apache.org/jira/browse/SOLR-2338. My goal was to be able to apply a similarity class to certain fields but not all fields. I ran the following commands: $ cd your Solr trunk checkout dir $ svn up $ wget https://issues.apache.org/jira/secure/attachment/12475027/SOLR-2338.patch $ patch -p0 -i SOLR-2338.patch And I did not get any errors. I then created my own SimilarityClass listed below because it isn't very large: package org.apache.lucene.misc; import org.apache.lucene.search.DefaultSimilarity; public class SimpleSimilarity extends DefaultSimilarity { public SimpleSimilarity() { super(); } public float idf(int dont, int care) { return 1; } } As you can see, it isn't very complicated. I'm just trying to remove the idf from the scoring equation in certain cases. Next, I make a change to the schema.xml file: fieldType name=string_noidf class=solr.StrField sortMissingLast=true omitNorms=true similarity class=org.apache.lucene.misc.SimpleSimilarity/ /fieldType And apply that to the field in question: field name=string_noidf multiValued=true type=string_noidf indexed=true stored=true required=false omitNorms=true / But I think something did not get applied correctly to the patch. I restarted and did a full import but the scores are exactly the same. Also, I tried using the existing SweetSpotSimilarity: fieldType name=string_noidf class=solr.StrField sortMissingLast=true omitNorms=true similarity class=org.apache.lucene.misc.SweetSpotSimilarity/ /fieldType But the scores remained unchanged even in that case. At this point, I'm not quite sure how to debug this to see whether the problem is with the patch or the similarity class but given that the SweetSpot similarity class didn't work either, I'm inclined to think it was a problem with the patch. Any thoughts on this one? Thanks, Brian Lamb
Re: solr sorting on multiple conditions, please help
: sort=query({!v=area_id: 78153}) desc, score desc : : What I want to achieve is sort by if there is a match with area_id, then : sort by the actual score I think you can use the map function here to map all scores greater then zero (matching docs) to some fixed value. something like this should work... qq=area_id:78153 sort=map(query($qq,-1),0,,1) desc, score desc http://wiki.apache.org/solr/FunctionQuery#map -Hoss
How to get Error caught in SOLR layer to SOLRj layer
Hi, I have a code logic to push documents to SOLR using SOLRj APIs. Due to an error in schema, i get appropriate error in SOLR logs printed in catalina.log inside tomcat. Here is a snippet: SEVERE: org.apache.solr.common.SolrException: ERROR: multiple values encountered for non multiValued copy field suggestion: E:\Files\lpsimdev.inf at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:288) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.solr.servlet.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:104) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859) at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579) at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555) at java.lang.Thread.run(Unknown Source) But in my JAVA logs, i simply get this snippet: ### 13 05/19 17:27:52:333 ### Runner@9be1041:: (SOLR failed with SolrException for DocId = [2dac611a5bb7ce87831dc0245ffcb66a] and detailed Exception: [org.apache.solr.common.SolrException: Internal Server Error Internal Server Error request: http://dm2search2.dm2.commvault.com:27000/solr/update/extract?fmap.content=bodyliteral.contentid=2dac611a5bb7ce87831dc0245ffcb66aliteral.jid=5literal.afln=27009287literal.conv=lpsimdev.infliteral.cvowner=SX1X5X32X544literal.cvreadacls=SX1X5X32X544;SX1X5X18;SX1X5X32X544;SX1X5X32X545literal.mtmstr=1000502032literal.afofstr=43 02 25 29 17 literal.bktm=2011-5-19T14:56:3Zliteral.mtm=2001-9-14T21:13:52Zliteral.afof=4302252917literal.atyp=33literal.clid=2literal.cijid=22literal.afid=1literal.szkb=822literal.ccn=-1literal.apid=6literal.url=E:\Files\lpsimdev.infliteral.cistate=1wt=javabinversion=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:436) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:245) at org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer.request(StreamingUpdateSolrServer.java:202) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:33) at com.commvault.commclient.ciengine.CVRequestWrapper.processRequests(CVRequestWrapper.java:551) at com.commvault.commclient.ciengine.solr.SOLRHTTPConnector$Runner.run(SOLRHTTPConnector.java:692) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) How can i get the same error in my solrj side, so that i can debug easily? Thanks a lot for your time help, Geeta -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-Error-caught-in-SOLR-layer-to-SOLRj-layer-tp2963446p2963446.html Sent from the Solr - User mailing list archive at Nabble.com.
Help, Data Import not indexing in solr.
Newbie at SOLR, When I ran through my test data config, it was able to find my 91 rows sample test. However, it didn't add any into my index. Can someone help me and tell me why? Please find the data config below: dataConfig dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://localhost\TESTSERVER:4317;databaseName=Northwind;user=sa;password=datapassword / document entity name=Customers query=select * from Customers field column=CustomerID name=customerid / field column=CompanyName name=companyname / field column=ContactName name=contactname / field column=Address name=address / field column=City name=city / field column=ContactTitle name=contacttitle / /entity /document /dataConfig Here is the result when I run http://localhost:8983/solr/dataimport? (after I ran the full import) response lst name=responseHeader int name=status0/int int name=QTime15/int /lst lst name=initArgs lst name=defaults str name=configdataconfig.xml/str /lst /lst str name=statusidle/str str name=importResponse/ lst name=statusMessages str name=Total Requests made to DataSource1/str str name=Total Rows Fetched91/str str name=Total Documents Skipped0/str str name=Full Dump Started2011-05-19 15:09:56/str str name= Indexing completed. Added/Updated: 0 documents. Deleted 0 documents. /str str name=Committed2011-05-19 15:09:57/str str name=Optimized2011-05-19 15:09:57/str str name=Total Documents Processed0/str str name=Time taken 0:0:1.765/str /lst str name=WARNING This response format is experimental. It is likely to change in the future. /str /response Please help. Thx. -- View this message in context: http://lucene.472066.n3.nabble.com/Help-Data-Import-not-indexing-in-solr-tp2963450p2963450.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Faceting on distance in Solr: how do you generate links that search withing a given range of distance?
: It is fairly simple to generate facets for ranges or 'buckets' of : distance in Solr: : http://wiki.apache.org/solr/SpatialSearch#How_to_facet_by_distance. : What isnt described is how to generate the links for these facets any query you specify in a facet.query to generate a constraint count can be specified in an fq to actaully apply that constraint. So if you use... facet.query={!frange l=5.001 u=3000}geodist() ...to get a count of 34 and the user wants to constrain to those docs, you would add... fq={!frange l=5.001 u=3000}geodist() ...to the query to do that. -Hoss
Re: Embedded Solr Optimize under Windows
: Thanks for the reply. I'm at home right now, or I'd try this myself, but is : the suggestion that two optimize() calls in a row would resolve the issue? it might ... I think the situations in which it happens have evolved a bit over the years as IndexWRiter has gotten smarter about knowing when it really needs to touch the disk to reduce IO. there's a relatively new explicit method (IndexWriter.deleteUnusedFiles) that can force this... https://issues.apache.org/jira/browse/LUCENE-2259 ...but it's only on trunk, and there isn't any user level hook for it in Solr yet (i opened SOLR-2532 to consider adding it) -Hoss
Re: SOLR Custom datasource integration
What is JPA? You are better off pulling from JPA yourself than coding with the DataImportHandler. It will be much easier. EmbeddedSolr is just like web solr: when you commit data it is on the disk. If you crash during indexing, it may or may not be available to commit. EmbeddedSolr does not do anything special with index storage. Lance On Thu, May 19, 2011 at 2:08 AM, amit.b@gmail.com amit.b@gmail.com wrote: Hi, We are trying build enterprise search solution using SOLR , out data source is Database which is interfaced with JPA. Solution looks like SOLR INDEX JPA Oracle database. We need help to findout what is the best approch integrate Solr Index with JPA. We tried out two appoches Approch 1 - 1 Polulating SolrInputDocument with data from JPA 2 Updating EmbeddedSolrServer with captured data using SolrJ API. Approch 2 - 1 Customizing dataimporthandler of HTTPSolrServer 2 Retrieving data in dataimporthandler using JPA entity. Functional requirement - 1 Solution should be performant for huge magnitude of data 2 Should be scalable We have few question which will help us to decide solution Will like know which one is better approch to meet our requirement. Is it good idea to integrate with Lucene against using EmbeddedSolrServer + JPA If JVM is crashes , EmbeddedSolrServer content will be lost on reboot. Can we get support from Jasper Experts team ? can we buy ? how ? -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Custom-datasource-integration-tp2960475p2960475.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Re: Embedded Solr Optimize under Windows
Ahh, thanks. I might try a basic commit() then and see, although it's not a huge deal for me. It occurred to me that two optimize() calls would probably leave exactly the same problem behind. On 20 May 2011 09:52, Chris Hostetter hossman_luc...@fucit.org wrote: : Thanks for the reply. I'm at home right now, or I'd try this myself, but is : the suggestion that two optimize() calls in a row would resolve the issue? it might ... I think the situations in which it happens have evolved a bit over the years as IndexWRiter has gotten smarter about knowing when it really needs to touch the disk to reduce IO. there's a relatively new explicit method (IndexWriter.deleteUnusedFiles) that can force this... https://issues.apache.org/jira/browse/LUCENE-2259 ...but it's only on trunk, and there isn't any user level hook for it in Solr yet (i opened SOLR-2532 to consider adding it) -Hoss
Mysql vs Postgres DIH
Hi, i make the same query to import my data with mysql and postgres. But only postgres index all data (17090). While Mysql index 17086, after 197085, after 17087... never 17090. But the response tell me that it has skipped 0 documents. I don't understand! Help me please, i woul to use Mysql for my application... Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Mysql-vs-Postgres-DIH-tp2963822p2963822.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Mysql vs Postgres DIH
Excuse me, i wrong to write 197085, correct is 17085. But never the same count... -- View this message in context: http://lucene.472066.n3.nabble.com/Mysql-vs-Postgres-DIH-tp2963822p2963824.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: error while doing full import
thank you dan... i have checked the code that produces XML for solr and then fixed nbsp problem - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/error-while-doing-full-import-tp2951185p2963832.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Too slow indexing while using 2 different data sources
hi Gora, i guess you are right, i have checked and url seems serving data slowly... maybe its because of the crappy test env too... thank you so much - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Too-slow-indexing-while-using-2-different-data-sources-tp2959551p2963833.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: SOLR-2209
I'm using Solr 1.4... I thought I had a case without a NOT but it seems to work now :S It might be a glitch on my server. The problem is easily reproducible with the NOT operator http://10.0.5.221:8983/jobs/select?q=title:java%20AND%20(-title:programmer) http://10.0.5.221:8983/jobs/select?q=title:java%20AND%20(-(title:programmer) ) both queries returns 0 results while... http://10.0.5.221:8983/jobs/select?q=title:java%20AND%20-(title:programmer) (note the position of the negation operator) returns more than 50 000 results -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: May-19-11 9:53 AM To: solr-user@lucene.apache.org Subject: Re: SOLR-2209 What version of Solr are you using? Because this works fine for me. Could you attach the results of adding debugQuery=on in both instances? The parsed form of the query is identical in 1.4.1 as far as I can tell. The bug you're referencing is a peculiarity of the not (-) operator I think. Best Erick On Thu, May 19, 2011 at 7:25 AM, Jean-Sebastien Vachon jean-sebastien.vac...@wantedtech.com wrote: Hi All, I am having some problems with the presence of unnecessary parenthesis in my query. A query such as: title:software AND (title:engineer) will return no results. Remove the parenthesis fix the issue but then since my user can enter the parenthesis by himself I need to find a way to fix or work-around this bug. I found that this is related to SOLR-2209 but there is no activity on this bug. Anyone know if this will get fixed some time in the future or if it is already fixed in Solr 4? Otherwise, could someone point me to the code handling this so that I can attempt to make a fix? Thx
Re: Problem about Solrj
you mean you have change the code of the solr admin page to remove all indexes? and also, when you by indexes are gone you mean they are deleted or solr sees no indexes when you run it? a little bit confusing post :) - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-about-Solrj-tp2952009p2963901.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem about Solrj
sorry for the typos in the prev msg... a little bit drowsy still... so if you can make a little bit more clear about your problem, we can help you - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-about-Solrj-tp2952009p2963935.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using Boost fields for a sum total score.
Apologies if this is obvious, but I've been banging my head against a wall. I can define a query like the following: http://HOST_NAME/solr/select?q=$search_termbq=boost_high:$search_term^1.5bq=boost_medium:$search_term^1.3bq=boost_max:$search_term^1.7bq=boost_low:$search_term^1.1 This does precisely what I'm looking for (assuming $search_term is a string like dinosaur) The search term is found in the default/defined search fields, and then a boost is applied if this term is also found in one of the defined boost fields. What I'm looking to do is define this setup in solrconfig.xml such that I need only hit a URL like: http://HOST_NAME/solr/select?q=$search_term I can define a bq str in solrconfig, but seem unable to reference the q query parameter in order to boost only when the search term is found. Any help would be greatly appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/Using-Boost-fields-for-a-sum-total-score-tp2958968p2963986.html Sent from the Solr - User mailing list archive at Nabble.com.
Please Unsubscribe
Could you please unsubscribe me. From: ronveenstra ron-s...@agathongroup.com Reply-To: solr-user@lucene.apache.org Date: Thu, 19 May 2011 18:52:52 -0700 (PDT) To: solr-user@lucene.apache.org Subject: Re: Using Boost fields for a sum total score. Apologies if this is obvious, but I've been banging my head against a wall. I can define a query like the following: http://HOST_NAME/solr/select?q=$search_termbq=boost_high:$search_term^1.5b q=boost_medium:$search_term^1.3bq=boost_max:$search_term^1.7bq=boost_low:$ search_term^1.1 This does precisely what I'm looking for (assuming $search_term is a string like dinosaur) The search term is found in the default/defined search fields, and then a boost is applied if this term is also found in one of the defined boost fields. What I'm looking to do is define this setup in solrconfig.xml such that I need only hit a URL like: http://HOST_NAME/solr/select?q=$search_term I can define a bq str in solrconfig, but seem unable to reference the q query parameter in order to boost only when the search term is found. Any help would be greatly appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/Using-Boost-fields-for-a-sum-total-score- tp2958968p2963986.html Sent from the Solr - User mailing list archive at Nabble.com.
How can I query mutlitcore with solrJ
Dear team. I installed to cores on my tomcat , http://localhost:8983/solr/fund_dih/admin/ http://localhost:8983/solr/fund_tika/admin/ How can I send one query request via Solrj to these URL? Thanks and Regards Zane
How to get right facet counts?
We are having an issue with facet counts and grouping We have multiple doctors with addresses. How do I search these lat longs? 1. Using SOLR 3.1, I can duplicate all fields except lat_long, and use group.field for the key. 2. I can use David Smiley's solution for multiple points (but it seems to be abandoned?) 3. I can add a parameter that will calculate facet.field after the group by (can I get some help?) 4. Others? The whole thing does not sound good, since there is sooo much duplication. It would be perfect to support multiValued field with many lat_logs for a row in SOLR directly. Ideas? Thanks.
Re: Using Boost fields for a sum total score.
Put everything except q in solrconfig... Then just use qt=nameinsolrconfigq= On 5/19/11 7:52 PM, ronveenstra ron-s...@agathongroup.com wrote: Apologies if this is obvious, but I've been banging my head against a wall. I can define a query like the following: http://HOST_NAME/solr/select?q=$search_termbq=boost_high:$search_term^1.5 bq=boost_medium:$search_term^1.3bq=boost_max:$search_term^1.7bq=boost_l ow:$search_term^1.1 This does precisely what I'm looking for (assuming $search_term is a string like dinosaur) The search term is found in the default/defined search fields, and then a boost is applied if this term is also found in one of the defined boost fields. What I'm looking to do is define this setup in solrconfig.xml such that I need only hit a URL like: http://HOST_NAME/solr/select?q=$search_term I can define a bq str in solrconfig, but seem unable to reference the q query parameter in order to boost only when the search term is found. Any help would be greatly appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/Using-Boost-fields-for-a-sum-total-scor e-tp2958968p2963986.html Sent from the Solr - User mailing list archive at Nabble.com.