Re: Autocomplete with Solr 3.1
Nobody can help me -- View this message in context: http://lucene.472066.n3.nabble.com/Autocomplete-with-Solr-3-1-tp3202214p3206095.html Sent from the Solr - User mailing list archive at Nabble.com.
Collpasing MultiValue fields
Hello, I understand collapsing is not yet possible for multi value fields, but still wonder what is the best way to solve the issue I am having. I have the following document data fields: 1. Title (max 200 chars) 2. Abstract (max 2000 chars) 3. Body (can be quite long) 4. Author (multi valued) 5. link (multi valued) 6. more fields I would like to collapse results of search based on different links. But currently can't. My solution for now is to have a single solr document for each link+title combination. But that multiplies my data tremendously, and I am also noticing that there is not smart data storage of the stored fields in solr. Some of the fields I am storing (abstract, body) are huge and I would like to avoid duplicating them. Any ideas? Thanks in advance, Fattie
I can't pass the unit test when compile from apache-solr-3.3.0-src
I just goto apache-solr-3.3.0/solr and run 'ant test' I find that the junit test will always fail, and told me ’BUILD FAILED‘ but if I type 'ant dist', I can get a apache-solr-3.3-SNAPSHOT.war with no warning. Is it a problem just me? my server:Centos 5.6 64bit/apache-ant-1.8.2 /junit and jdk (both jrocket and sun jdk1.6 fails)
Re: Dealing with keyword stuffing
On Thu, Jul 28, 2011 at 08:31, Chris Hostetter hossman_luc...@fucit.orgwrote: : Presumably, they are doing this by increasing tf (term frequency), : i.e., by repeating keywords multiple times. If so, you can use a custom : similarity class that caps term frequency, and/or ensures that the scoring : increases less than linearly with tf. Please see In some cases, yes they are repeating keywords multiple times. Stuffing different combinations - Solr, Solr Lucene, Solr Search, Solr Apache, Solr Guide. in paticular, using something like SweetSpotSimilarity tuned to know what values make sense for good content in your domain can be useful because it can actaully penalize docsuments that are too short/long or have term freqs that are outside of a reasonble expected range. I am not a Solr expert, But I was thinking in this direction. The ratio of tokens/total_length would be nearer to 1 for a stuffed document, while it would be nearer to 0 for a bogus document. Somewhere between the two lies documents that are more likely to be meaningful. I am not sure how to use SweetSpotSimilarity. I am googling on this, but any useful insights are so much appreciated.
Index time boosting with DIH
Can someone point me to an example for using index time boosting with the DataImportHandler.
Re: Index time boosting with DIH
On Thu, Jul 28, 2011 at 3:56 PM, Bürkle, David david.buer...@irix.chwrote: Can someone point me to an example for using index time boosting with the DataImportHandler. You can use the special flag variable $docBoost to add a index time boost. http://wiki.apache.org/solr/DataImportHandler#Special_Commands -- Regards, Shalin Shekhar Mangar.
Re: Dealing with keyword stuffing
On Thu, Jul 28, 2011 at 3:48 PM, Pranav Prakash pra...@gmail.com wrote: [...] I am not sure how to use SweetSpotSimilarity. I am googling on this, but any useful insights are so much appreciated. Replace the existing DefaultSimilarity class in schema.xml (look towards the bottom of the file) with the SweetSpotSimilarity class, e.g., have a line like: similarity class=org.apache.lucene.search.SweetSpotSimilarity/ Regards, Gora
Reusing SolrServer instances when swapping cores
Hi all We work with two cores (active and passive) and swap them when the reindexing was finished. Is it allowed to reuse the same instance of the SolrServer (both Embedded and Common)? I.E. do they point to the other core after the swapping? Regards Michael -- Michael Szalay Senior Software Engineer basis06 AG, Birkenweg 61, CH-3013 Bern - Fon +41 31 311 32 22 http://www.basis06.ch - source of smart business
Re: please help explaining debug output
IDF is the frequency of the term in that field for the entire index, not the specific document. So it means that the term is in that field for some document somewhere, but not in that particular document I believe... Which leads me to wonder if the document is getting indexed as you expect, although there's nothing in the data that you've provided that I can point to as the culprit, it all looks like it *should* work If you can get a copy of Luke and look at the document in question and/or look at the schema browser for that particular field it might help, but frankly I'm at a loss to understand what the problem is... Sorry I can't be of more help Erick On Tue, Jul 26, 2011 at 1:04 PM, Robert Petersen rober...@buy.com wrote: That didn't help. Seems like another case where I should get matches but don't and this time it is only for some documents. Others with similar content do match just fine. The debug output 'explain other' section for a non-matching document seems to say the term frequency is 0 for my problematic term, although I know it is in the content. I ended up making a synonym to do what the analysis stack *should* be doing: splitting LaserJet on case changes. IE putting LaserJet, laser jet in synonyms at index time makes this work. I don't know why though. Question: Does this debug output mean it is matching the terms but the term frequency vector is returning 0 for the frequency of this term. IE Does this mean the term is in the doc but not in the tf array? 0.0 = no match on required clause (moreWords:laser jet) 0.0 = weight(moreWords:laser jet in 32497), product of: 0.60590804 = queryWeight(moreWords:laser jet), product of: 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.041507367 = queryNorm 0.0 = fieldWeight(moreWords:laser jet in 32497), product of: 0.0 = tf(phraseFreq=0.0) 14.597603 = idf(moreWords: laser=26731 jet=12685) 0.078125 = fieldNorm(field=moreWords, doc=32497) -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Monday, July 25, 2011 3:28 PM To: solr-user@lucene.apache.org Subject: Re: please help explaining debug output Hmmm, I can't find a convenient 1.4.0 to download, but re-indexing is a good idea since this seems like it *should* work. Erick On Mon, Jul 25, 2011 at 5:32 PM, Robert Petersen rober...@buy.com wrote: I'm still on solr 1.4.0 and the analysis page looks like they should match, and other products with the same content do in fact match. I'm reindexing the non-matching ones to rule that out. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Monday, July 25, 2011 1:58 PM To: solr-user@lucene.apache.org Subject: Re: please help explaining debug output Hmmm, I'm assuming that moreWords is your default text field, yes? But it works for me (tm), using 1.4.1. What version of Solr are you on? Also, take a glance at the admin/analysis page, that might help... Gotta run Erick On Mon, Jul 25, 2011 at 4:52 PM, Robert Petersen rober...@buy.com wrote: Sorry, to clarify a search for P1102W matches all three docs but a search for p1102w LaserJet only matches the second two. Someone asked me a question while I was typing and I got distracted, apologies for any confusion. -Original Message- From: Robert Petersen [mailto:rober...@buy.com] Sent: Monday, July 25, 2011 1:42 PM To: solr-user@lucene.apache.org Subject: please help explaining debug output I have three documents with the following product titles in a text field called moreWords with analysis stack matching the solr example text field definition. 1. HP LaserJet P1102W Monochrome Laser Printer http://www.buy.com/prod/hp-laserjet-p1102w-monochrome-laser-printer/q/l oc/101/213824965.html 2. HP CE285A (85A) Remanufactured Black Toner Cartridge for LaserJet M1212nf, P1102, P1102W Series http://www.buy.com/prod/hp-ce285a-85a-remanufactured-black-toner-cartri dge-for-laserjet/q/loc/101/217145536.html 3. Black HP CE285A Toner Cartridge For LaserJet P1102W, LaserJet M1130, LaserJet M1132, LaserJet M1210 http://www.buy.com/prod/black-hp-ce285a-toner-cartridge-for-laserjet-p1 102w-laserjet-m1130/q/loc/101/222045267.html A search for P1102W matches (2) and (3), but not (1) above. Can someone explain the debug output? It looks like I am getting a non-match on (1) because term frequency is zero? Am I reading that right? If so, how could that be? the searched terms are equivalently in all three docs. I don't get it. lst name=debug str name=rawquerystringp1102w LaserJet /str str name=querystringp1102w LaserJet /str str name=parsedquery+PhraseQuery(moreWords:p 1102 w) +PhraseQuery(moreWords:laser jet)/str str name=parsedquery_toString+moreWords:p 1102 w +moreWords:laser jet/str lst name=explain str name=222045267 3.64852 = (MATCH) sum
Re: Exact match not the first result returned
That's a clever idea. I'll put something together and see how it turns out. Thanks for the tip. On Wed, Jul 27, 2011 at 10:55 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : With your solution, RECORD 1 does appear at the top but I think thats just : blind luck more than anything else because RECORD 3 shows as having the same : score. So what more can I do to push RECORD 1 up to the top. Ideally, I'd : like all three records returned with RECORD 1 being the first listing. with omitNorms RECORD1 and RECORD3 have the same score because only the tf() matters, and both docs contain the term frank exactly twice. the reason RECORD1 isn't scoring higher even though it contains (as you put it matchings 'Fred' exactly is that from a term perspective, RECORD1 doesn't actually match myname:Fred exactly, because there are in fact other terms in that field because it's multivalued. one way to indicate that you (only* want documents where entire field values to match your input (ie: RECORD1 but no other records) would be to use a StrField instead of a TextField or an analyzer that doesn't split up tokens (lie: something using KeywordTokenizer). that way a query on myname:Frank would not match a document where you had indexed the value Frank Stalone by a query for myname:Frank Stalone would. in your case, you don't want *only* the exact field value matches, but you want them boosted, so you could do something like copyField myname into myname_str and then do... q=+myname:Frank myname_str:Frank^100 ...in which case a match on myname is required, but a match on myname_str will greatly increase the score. dismax (and edismax) are really designed for situations like this... defType=dismax qf=myname pf=myname_str^100 q=Frank -Hoss
Possible to use quotes in dismax qf?
I want to do a dismax search to search for original query and this query as a phrasequery: q=sail boat needs to be converted to dismax query q=sail boat sail boat qf=title^10 content^2 What is best way to do this? -- View this message in context: http://lucene.472066.n3.nabble.com/Possible-to-use-quotes-in-dismax-qf-tp3206762p3206762.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: slave data files way bigger than master
My utter and complete shot in the dark is that the slave isn't getting its data from the master you think it is. I know it's a silly comment, but I've chased my tail this way more than once G... None of the files match. None of the dates match, etc. I'm assuming that bouncing the slave doesn't make any of the files go away. This shouldn't be necessary, but it might do to test to see if somehow the slave Solr has the files open. They *should* be removed when the slave reopens its searchers, but we're looking for really odd things here... Are you totally sure that the slave has attempted a replication? Take a look in the logs to see if there are errors being reported for the replication process. You can also issue the replication command vai HTTP, see: http://wiki.apache.org/solr/SolrReplication#HTTP_API This is all speculation, but something is massively not right with the file lists you've posted Best Erick On Tue, Jul 26, 2011 at 3:45 PM, Jonathan Rochkind rochk...@jhu.edu wrote: So I've got Solr 1.4. I've got replication going on. Once a day, before replication, I optimize on master. Then I replicate. I'd expect optimization before replicate would basically replace all files on slave, this is expected. But that means I'd also expect that the index files on slave would be identical, and the same size, as on master, after replication, this is the point of replication, yes? But they are not. The master is only 12G, the slave is 39G. The index files in slave and master have completely different filenames too, I don't know if that's expected, but it's not what I expected. I'll post complete file lists below. Anyone have any idea what's going on? Also... I wonder if these extra index files on the slave are just extra not even looekd at by the slave solr, or if instead they actually ARE included in the indexes! If the latter, and we have 'ghost' documents in the index, that could explain some weird problems I'm having with the slave getting Java out of heap space errors due to huge uninverted indexes, even though the index is basically the same with the same solrconfig.xml settings as it has been for a while, without such problems. Greatly appreciate if anyone has any ideas. MASTER: ls -lh master_index total 12G -rw-rw-r-- 1 tomcat tomcat 3.0G Jul 26 06:37 _24p.fdt -rw-rw-r-- 1 tomcat tomcat 15M Jul 26 06:37 _24p.fdx -rw-rw-r-- 1 tomcat tomcat 836 Jul 26 06:33 _24p.fnm -rw-rw-r-- 1 tomcat tomcat 1.2G Jul 26 06:44 _24p.frq -rw-rw-r-- 1 tomcat tomcat 49M Jul 26 06:44 _24p.nrm -rw-rw-r-- 1 tomcat tomcat 1.1G Jul 26 06:44 _24p.prx -rw-rw-r-- 1 tomcat tomcat 7.8M Jul 26 06:44 _24p.tii -rw-rw-r-- 1 tomcat tomcat 660M Jul 26 06:44 _24p.tis -rw-rw-r-- 1 tomcat tomcat 2.1G Jul 26 08:54 _2k4.fdt -rw-rw-r-- 1 tomcat tomcat 7.6M Jul 26 08:54 _2k4.fdx -rw-rw-r-- 1 tomcat tomcat 836 Jul 26 08:51 _2k4.fnm -rw-rw-r-- 1 tomcat tomcat 719M Jul 26 08:59 _2k4.frq -rw-rw-r-- 1 tomcat tomcat 25M Jul 26 08:59 _2k4.nrm -rw-rw-r-- 1 tomcat tomcat 797M Jul 26 08:59 _2k4.prx -rw-rw-r-- 1 tomcat tomcat 5.0M Jul 26 08:59 _2k4.tii -rw-rw-r-- 1 tomcat tomcat 436M Jul 26 08:59 _2k4.tis -rw-rw-r-- 1 tomcat tomcat 211M Jul 26 09:25 _2n3.fdt -rw-rw-r-- 1 tomcat tomcat 774K Jul 26 09:25 _2n3.fdx -rw-rw-r-- 1 tomcat tomcat 836 Jul 26 09:25 _2n3.fnm -rw-rw-r-- 1 tomcat tomcat 72M Jul 26 09:26 _2n3.frq -rw-rw-r-- 1 tomcat tomcat 2.5M Jul 26 09:26 _2n3.nrm -rw-rw-r-- 1 tomcat tomcat 78M Jul 26 09:26 _2n3.prx -rw-rw-r-- 1 tomcat tomcat 668K Jul 26 09:26 _2n3.tii -rw-rw-r-- 1 tomcat tomcat 53M Jul 26 09:26 _2n3.tis -rw-rw-r-- 1 tomcat tomcat 186M Jul 26 09:49 _2q6.fdt -rw-rw-r-- 1 tomcat tomcat 774K Jul 26 09:49 _2q6.fdx -rw-rw-r-- 1 tomcat tomcat 836 Jul 26 09:49 _2q6.fnm -rw-rw-r-- 1 tomcat tomcat 60M Jul 26 09:50 _2q6.frq -rw-rw-r-- 1 tomcat tomcat 2.5M Jul 26 09:50 _2q6.nrm -rw-rw-r-- 1 tomcat tomcat 64M Jul 26 09:50 _2q6.prx -rw-rw-r-- 1 tomcat tomcat 562K Jul 26 09:50 _2q6.tii -rw-rw-r-- 1 tomcat tomcat 45M Jul 26 09:50 _2q6.tis -rw-rw-r-- 1 tomcat tomcat 246M Jul 26 10:16 _2t9.fdt -rw-rw-r-- 1 tomcat tomcat 774K Jul 26 10:16 _2t9.fdx -rw-rw-r-- 1 tomcat tomcat 836 Jul 26 10:16 _2t9.fnm -rw-rw-r-- 1 tomcat tomcat 68M Jul 26 10:17 _2t9.frq -rw-rw-r-- 1 tomcat tomcat 2.5M Jul 26 10:17 _2t9.nrm -rw-rw-r-- 1 tomcat tomcat 89M Jul 26 10:17 _2t9.prx -rw-rw-r-- 1 tomcat tomcat 602K Jul 26 10:17 _2t9.tii -rw-rw-r-- 1 tomcat tomcat 53M Jul 26 10:17 _2t9.tis -rw-rw-r-- 1 tomcat tomcat 221M Jul 26 10:45 _2wc.fdt -rw-rw-r-- 1 tomcat tomcat 774K Jul 26 10:45 _2wc.fdx -rw-rw-r-- 1 tomcat tomcat 836 Jul 26 10:45 _2wc.fnm -rw-rw-r-- 1 tomcat tomcat 69M Jul 26 10:46 _2wc.frq -rw-rw-r-- 1 tomcat tomcat 2.5M Jul 26 10:46 _2wc.nrm -rw-rw-r-- 1 tomcat tomcat 82M Jul 26 10:46 _2wc.prx -rw-rw-r-- 1 tomcat tomcat 613K Jul 26 10:46 _2wc.tii -rw-rw-r-- 1 tomcat tomcat 53M Jul 26 10:46 _2wc.tis -rw-rw-r-- 1
Re: how to get solr core information using solrj
hi Stefan, thanks for your advice,i wrote a jsp file to obtain those information. witch looks like : CoreContainer cores=(CoreContainer)request.getAttribute(org.apache.solr.CoreContainer); then cores.getCores() get core informations. later I translate infos to json format. at client side.I use httpClient to request this page,then parse json to Java Object. finally i got what i want. i am not familiar with solr,so i'm not sure their are other good interfaces. On http://wiki.apache.org/solr/**CoreAdmin#STATUShttp://wiki.apache.org/solr/CoreAdmin#STATUS , I have not found direct method to get information about core names ,paths etc. many thanks for your advice. On Wed, Jul 20, 2011 at 3:01 PM, Stefan Matheis matheis.ste...@googlemail.com wrote: Jiang, what about http://wiki.apache.org/solr/**CoreAdmin#STATUShttp://wiki.apache.org/solr/CoreAdmin#STATUS? Regards Stefan Am 20.07.2011 05:40, schrieb Jiang mingyuan: hi all, Our solr server contains two cores:core0,core1,and they both works well. Now I'am trying to find a way to get information about core0 and core1. Can solrj or other api do this? thanks very much.
Re: how to get solr core information using solrj
HI Erick: At the page you have show me, I found some useful methods. But it seems like not contains method about how to obtain core names,core paths. so,I followed the solr index page's method,wrote a jsp page ,like: CoreContainer cores=(CoreContainer)request.getAttribute(org.apache.solr.CoreContainer); then cores.getCores() get core informations. later I translate infos to json format. at client side.I use httpClient to request this page,then parse json to Java Object. finally i got what i want. thanks again. On Mon, Jul 25, 2011 at 9:40 PM, Erick Erickson erickerick...@gmail.comwrote: http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/request/CoreAdminRequest.html That should get you started. Best Erick On Tue, Jul 19, 2011 at 11:40 PM, Jiang mingyuan mailtojiangmingy...@gmail.com wrote: hi all, Our solr server contains two cores:core0,core1,and they both works well. Now I'am trying to find a way to get information about core0 and core1. Can solrj or other api do this? thanks very much.
Re: Solr DataImport with multiple DBs
Often, the easiest solution when DIH gets really complex is to do one of two things: 1 Use SolrJ instead. You can do complex things more easily much of the time with DIH. 2 You could consider using a custom Transformer in conjunction with your primary delta query to access the second table, see: http://wiki.apache.org/solr/DIHCustomTransformer Best Erick On Tue, Jul 26, 2011 at 7:27 PM, spravin spravin.li...@gmail.com wrote: Hi All I am stuck with an issue with delta-import while configuring solr in an environment where multiple databases exist. My schema looks like this: id, name, keyword names exist in one DB and keywords in a table in the other DB (with id as foreign key). For delta import, I would need to check against the updated column in both the tables. But they are in two different databases, so I can't do this in a single deltaquery. So I'm not able to detect if the field in the second database has changed. The relevant part of my dataconfig xml looks like this: dataConfig dataSource ds1... / dataSource ds2... / document entity name=name dataSource=ds1 query=SELECT ID, Name, Updated FROM records deltaImportQuery=SELECT ID, Name, Updated FROM records WHERE ID = '${dataimporter.delta.ID http://dataimporter.delta.id/}' deltaQuery=SELECT ID FROM records WHERE Updated '${dataimporter.last_index_time}' entity name=keywords dataSource=ds2 query=SELECT Keyword,Updated AS KeywordUpdated FROM keywords WHERE ID = '${name.ID}' /entity /entity /document /dataConfig I'm hoping someone in this list could point me to a solution: a way to specify deltaQuery across multiple databases. (In the above example, I would like to add OR ID IN (SELECT ID FROM keywords WHERE Updated '${dataimporter.last_index_time}') to the deltaQuery, but this table can be accessed only from a different dataSource. Thanks - PS -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-DataImport-with-multiple-DBs-tp3201843p3201843.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: what data type for geo fields?
Thanks for the feedback. I'll have look more at how geohash works. Looking at the sample schema more closely, I see: fieldType name=double class=solr.TrieDoubleField precisionStep=0 omitNorms=true positionIncrementGap=0/ So in fact double is also Trie, but just with precisionStep 0 in the example. -Peter On Wed, Jul 27, 2011 at 9:57 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Wed, Jul 27, 2011 at 9:01 AM, Peter Wolanin peter.wola...@acquia.com wrote: Looking at the example schema: http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_3/solr/example/solr/conf/schema.xml the solr.PointType field type uses double (is this just an example field, or used for geo search?) While you could possibly use PointType for geo search, it doesn't have good support for it (it's more of a general n-dimension point) The LatLonType has all the geo support currently. , while the solr.LatLonType field uses tdouble and it's unclear how the geohash is translated into lat/lon values or if the geohash itself might typically be used as a copyfield and use just for matching a query on a geohash? There's no geohash used in LatLonType It is indexed as a lat and lon under the covers (using the suffix _d) Is there an advantage in terms of speed to using Trie fields for solr.LatLonType? Currently only for explicit range queries... like point:[10,10 TO 20,20] I would assume so, e.g. for bbox operations. It's a bit of an implementation detail, but bbox doesn't currently use range queries. -Yonik http://www.lucidimagination.com -- Peter M. Wolanin, Ph.D. : Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com : 978-296-5247 Get a free, hosted Drupal 7 site: http://www.drupalgardens.com;
Re: what data type for geo fields?
On Thu, Jul 28, 2011 at 10:24 AM, Peter Wolanin peter.wola...@acquia.com wrote: Thanks for the feedback. I'll have look more at how geohash works. Looking at the sample schema more closely, I see: fieldType name=double class=solr.TrieDoubleField precisionStep=0 omitNorms=true positionIncrementGap=0/ So in fact double is also Trie, but just with precisionStep 0 in the example. Right, which means it's a normal numeric field with one token indexed per value (i.e. no tradeoff to to speed up range queries by increasing index size). -Yonik http://www.lucidimagination.com
Re: Possible to use quotes in dismax qf?
Hi, You can use the pf parameter of the DismaxQParserPlugin: http://wiki.apache.org/solr/DisMaxQParserPlugin#pf_.28Phrase_Fields.29 This parameter receives a list of fields using the same syntax as the qf parameter. After determining the list of matching documents, DismaxQParserPlugin will boost the docs where the terms of the query match as a phrase in the one of those fields. You can also use the ps field to set a phrase slop and boost docs where the terms appear in close proximity instead of as an exact phrase. Regards, *Juan* On Thu, Jul 28, 2011 at 11:00 AM, O. Klein kl...@octoweb.nl wrote: I want to do a dismax search to search for original query and this query as a phrasequery: q=sail boat needs to be converted to dismax query q=sail boat sail boat qf=title^10 content^2 What is best way to do this? -- View this message in context: http://lucene.472066.n3.nabble.com/Possible-to-use-quotes-in-dismax-qf-tp3206762p3206762.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr DataImport with multiple DBs
Would it be possible to just run two sepearate deltas, one that updates records that changed in ds1 and another that updates records that changed in ds2 ? Of course this would be inefficient if a lot of records typically change in both places at the same time. With this approach, you might have to run the deltas using command=full-import / clean=false as shown here: http://wiki.apache.org/solr/DataImportHandlerFaq#fullimportdelta James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Thursday, July 28, 2011 9:14 AM To: solr-user@lucene.apache.org Subject: Re: Solr DataImport with multiple DBs Often, the easiest solution when DIH gets really complex is to do one of two things: 1 Use SolrJ instead. You can do complex things more easily much of the time with DIH. 2 You could consider using a custom Transformer in conjunction with your primary delta query to access the second table, see: http://wiki.apache.org/solr/DIHCustomTransformer Best Erick On Tue, Jul 26, 2011 at 7:27 PM, spravin spravin.li...@gmail.com wrote: Hi All I am stuck with an issue with delta-import while configuring solr in an environment where multiple databases exist. My schema looks like this: id, name, keyword names exist in one DB and keywords in a table in the other DB (with id as foreign key). For delta import, I would need to check against the updated column in both the tables. But they are in two different databases, so I can't do this in a single deltaquery. So I'm not able to detect if the field in the second database has changed. The relevant part of my dataconfig xml looks like this: dataConfig dataSource ds1... / dataSource ds2... / document entity name=name dataSource=ds1 query=SELECT ID, Name, Updated FROM records deltaImportQuery=SELECT ID, Name, Updated FROM records WHERE ID = '${dataimporter.delta.ID http://dataimporter.delta.id/}' deltaQuery=SELECT ID FROM records WHERE Updated '${dataimporter.last_index_time}' entity name=keywords dataSource=ds2 query=SELECT Keyword,Updated AS KeywordUpdated FROM keywords WHERE ID = '${name.ID}' /entity /entity /document /dataConfig I'm hoping someone in this list could point me to a solution: a way to specify deltaQuery across multiple databases. (In the above example, I would like to add OR ID IN (SELECT ID FROM keywords WHERE Updated '${dataimporter.last_index_time}') to the deltaQuery, but this table can be accessed only from a different dataSource. Thanks - PS -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-DataImport-with-multiple-DBs-tp3201843p3201843.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: colocated term stats
Not sure if this will do what you want, but one way might be using facets. Take the term you are interested in, and apply it as an fq. Now the result set will include only documents that include that term. So also request facets for that result set, the top 10 facets are the top 10 terms that appear in that result set -- which is the top 10 terms that appear in documents together with your fq constraint. (Okay, you might need to look at 11, because one of the facet values will be the same term you fq constrained). You don't need to look at actual documents at all (rows=0), just facet response. Make sense? Does that do what you want? On 7/27/2011 9:12 PM, Twomey, David wrote: Given a query term, is it possible to get from the index the top 10 collocated terms in the index. ie: return the top 10 terms that appear with this term based on doc count. A plus would be to add some constraints on how near the terms are in the docs.
Re: Exact match not the first result returned
Keep in mind that if you use a field type that includes spaces (eg StrField, or KeywordTokenizer), then if you're using dismax or lucene query parsers, the only way to find matches in this field on queries that include spaces will be to do explicit phrase searches with double quotes. These fields will, however, work fine with pf in dismax/edismax as per Hoss's example. But yeah, I do what Hoss recommends -- I've got a KeywordTokenizer copy of my searchable field. I use a pf on that field with a very high boost to try and boost truly complete matches, that match the entirety of the value. It's not exactly 'exact', I still do some normalization, including flattening unicode to ascii, and normalizing 1 or more string-or-punctuation to exactly 1 one space using a char regex filter. It seems to pretty much work -- this is just one of various relevancy tweaks I've got going on, to the extent that my relevancy has become pretty complicated and hard to predict and doesn't always do what I'd expect/intend, but this particular aspect seems to mostly pretty much work. On 7/27/2011 10:55 PM, Chris Hostetter wrote: : With your solution, RECORD 1 does appear at the top but I think thats just : blind luck more than anything else because RECORD 3 shows as having the same : score. So what more can I do to push RECORD 1 up to the top. Ideally, I'd : like all three records returned with RECORD 1 being the first listing. with omitNorms RECORD1 and RECORD3 have the same score because only the tf() matters, and both docs contain the term frank exactly twice. the reason RECORD1 isn't scoring higher even though it contains (as you put it matchings 'Fred' exactly is that from a term perspective, RECORD1 doesn't actually match myname:Fred exactly, because there are in fact other terms in that field because it's multivalued. one way to indicate that you (only* want documents where entire field values to match your input (ie: RECORD1 but no other records) would be to use a StrField instead of a TextField or an analyzer that doesn't split up tokens (lie: something using KeywordTokenizer). that way a query on myname:Frank would not match a document where you had indexed the value Frank Stalone by a query for myname:Frank Stalone would. in your case, you don't want *only* the exact field value matches, but you want them boosted, so you could do something like copyField myname into myname_str and then do... q=+myname:Frank myname_str:Frank^100 ...in which case a match on myname is required, but a match on myname_str will greatly increase the score. dismax (and edismax) are really designed for situations like this... defType=dismax qf=myname pf=myname_str^100 q=Frank -Hoss
Re: Possible to use quotes in dismax qf?
It's not clear to me why you would try to do that, I'm not sure it makes a lot of sense. You want to find all documents that have sail boat as a phrase AND have sail somewhere in them AND have boat somewhere in them? That's exactly the same as just all documents that have sail boat as a phrase -- such documents will neccesarily include sail and boat, right? So why not just ask for q=sail boat? What are you actually trying to do? Maybe dismax 'pf', which relevancy boosts documents which have your input as a phrase, si what you really want? Then you'd just search for q=sail boat, but documents that included sail boat as a phrase would be boosted, at the boost you specify. On 7/28/2011 10:00 AM, O. Klein wrote: I want to do a dismax search to search for original query and this query as a phrasequery: q=sail boat needs to be converted to dismax query q=sail boat sail boat qf=title^10 content^2 What is best way to do this? -- View this message in context: http://lucene.472066.n3.nabble.com/Possible-to-use-quotes-in-dismax-qf-tp3206762p3206762.html Sent from the Solr - User mailing list archive at Nabble.com.
about the Solr request filter
Hello,Dear friends, I have got an problem in developing with solr. In My Application ,It must sends multiple query to solr server after the page is loaded. Then I found a problem: some request will return statusCode:0 and QTime:0, The solr has accepted the request, but It does not return a result document. If I send each request one by one manually ,It will return the result. But If I send the request frequently in a very short times, It will return nothing only statusCode:0 and QTime:0. I think this may be a stratege for solr. but i can't find any documents or discussions on the internet. so i want you can help me. -- Surely, 你永远是最棒的!
Re: Store complete XML record (DIH XPathEntityProcessor)
Hi g, have a look at the PlainTextEntityProcessor: http://wiki.apache.org/solr/DataImportHandler#PlainTextEntityProcessor you will have to call the URL twice that way, but I don't think you can get the complete document (the root element with all structure) via xpath - so the XPathEntityProcessor cannot help you. If calling the URL twice slows your indexer down in unacceptable ways you can always subclass XPathEntityProcessor (knowing Java is helpful, thoug...). There surely is a way to make it return what you need. Or maybe an entity processor that caches the content and uses XPath EP and PlainText EP to accomplish your needs (not sure whether the API allows for that). Cheers, Chantal On Thu, 2011-07-28 at 05:53 +0200, solruser@9913 wrote: I am trying to use DIH to import an XML based file with multiple XML records in it. Each record corresponds to one document in Lucene. I am using the DIH FileListEntityProcessor (to get file list) followed by the XPathEntityProcessor to create the entities. It works perfectly and I am able to map XML elements to fields . however I also need to store the entire XML record as separate 'full text' field. Is there any way the XPathEntityProcessor provides a variable like 'rawLine' or 'plainText' that I can map to a field. I tried to use the Plain Text processor after this - but that does not recognize the XML boundaries and just gives the whole XML file. entity name=x rootEntity=truedataSource=logfilereader processor=XPathEntityProcessor url=${logfile.fileAbsolutePath} stream=false forEach=/xml/myrecord transformer= field column=mycol1 xpath=/xml/myrecord/@something / and so on ... This works perfectly. However I also need something like ... field column=fullxmlrecord name=plainText / Any help is much appreciated. I am a newbie and may be missing something obvious here -g -- View this message in context: http://lucene.472066.n3.nabble.com/Store-complete-XML-record-DIH-XPathEntityProcessor-tp3205524p3205524.html Sent from the Solr - User mailing list archive at Nabble.com.
Exception in thread main org.apache.solr.common.SolrException: No such core: core1
Hi I am very new with Solr, infact just started today so forgive my lack of knowledge on the subject. Everything went fine until the point where I started to get the exception Exception in thread main org.apache.solr.common.SolrException: No such core: core1 and I am stuck at the same point for a couple of hours now. *Below is the test code :* public class UtilSolR { private static EmbeddedSolrServer embeddedSolrServer = null; private static SolrServer httpSolrServer = null; /** * @param args */ public static void main(String[] args) { //SolrServer server = getHttpSolRServer(); SolrServer server = getEmbeddedSolRServer(); SolrInputDocument doc1 = new SolrInputDocument(); doc1.addField(tenant_id, tenant_id, 1.0f); doc1.addField(displayas, displayas, 1.0f); doc1.addField(btel, btel, 1.0f); doc1.addField(htel, htel, 1.0f); //SolrInputDocument doc2 = new SolrInputDocument(); //doc2.addField( id, id2, 1.0f ); //doc2.addField( name, doc2, 1.0f ); //doc2.addField( price, 20 ); CollectionSolrInputDocument docs = new ArrayListSolrInputDocument(); docs.add( doc1 ); //docs.add( doc2 ); try { server.add( docs ); server.commit(); } catch (SolrServerException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } System.out.println(Done !!); } public static EmbeddedSolrServer getEmbeddedSolRServer(){ if(embeddedSolrServer == null){ CoreContainer coreContainer; System.setProperty(solr.solr.home, /home/automata/solr/apache-solr-3.3.0/example/solr/); CoreContainer.Initializer initializer = new CoreContainer.Initializer(); try { coreContainer = initializer.initialize(); embeddedSolrServer = new EmbeddedSolrServer(coreContainer, ); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (ParserConfigurationException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (SAXException e) { // TODO Auto-generated catch block e.printStackTrace(); } } return embeddedSolrServer; } } *The solr.xml file is as follows :* solr persistent=false cores adminPath=/admin/cores defaultCoreName=collection1 core name=collection1 instanceDir=. / core name=core1 instanceDir=core1 / /cores /solr The structure of the example folder is standard(just as the supplied one from apache) and no change has been made to it. The solar interface doesnt mention any core names there, however does not throw a 404 on opening the admin page. Any help to resolve the problem will be really great. Please let me know if I can provide any more information. thanks ! -- View this message in context: http://lucene.472066.n3.nabble.com/Exception-in-thread-main-org-apache-solr-common-SolrException-No-such-core-core1-tp3206610p3206610.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Possible to use quotes in dismax qf?
I removed the post as it might confuse people. But because of analysers combining 2 words in a phrase query using shingles and positionfilter and the usage of dismax, I need q to be the original query plus the original query as phrasequery. That way the combined words are also highlighted and do I get the results I need. qf is not the place to do this it seems though. Any way to do this in Solr? -- View this message in context: http://lucene.472066.n3.nabble.com/Re-Possible-to-use-quotes-in-dismax-qf-tp3206891p3206986.html Sent from the Solr - User mailing list archive at Nabble.com.
question about exception in faceting
If I got an exception during faceting (e.g. undefined field), Solr doesn't return HTTP 400 but 200 with the exception stack trace in arr name=exception .../arr tag. Why is it implemented so? I checked Solr 1.1 and saw the same behavior. Except FacetComponent, HighlightComponent for example, if I use a bad regex pattern for RegexFragmenter, HighlightComponent throws an exception then Solr return 400. Thank you! koji -- Check out Query Log Visualizer http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html http://www.rondhuit.com/en/
Re: Store complete XML record (DIH XPathEntityProcessor)
Thanks Chantal I am ok with the second call and I already tried using that. Unfortunatly It reads the whole file into a field. My file is as below example xml record ... /record record ... /record record ... /record /xml Now the XPATH does the 'for each /record' part. For each record I also need to store the raw log in there. If I use the PlainTextEntityProcessor then it gives me the whole file (from xml .. /xml ) and not each of the record /record Am I using the PlainTextEntityProcessor wrong? THanks g -- View this message in context: http://lucene.472066.n3.nabble.com/Store-complete-XML-record-DIH-XPathEntityProcessor-tp3205524p3207203.html Sent from the Solr - User mailing list archive at Nabble.com.
ShingleFilterFactory class error
Hi, I am trying to create shingles with minShingleSize = 10, but it also returns bi-grams too. Heres is my schema defn filter class=solr.ShingleFilterFactory minShingleSize=10 maxShingleSize=25 outputUnigrams=false outputUnigramsIfNoShingles=false tokenSeparator= / For the input String Apple - iPad 3G Wi-Fi - 32GB, it breaks into Apple - - iPad My understaing that it should be 10-gram token. Is it bug or any configuration is to be added. Thank you in advance. Pradeep
RE: ShingleFilterFactory class error
Pradeep, As indicated on the wiki http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory, the minShingleSize option is not available in Solr versions prior to 3.1. What version of Solr are you using? (By the way, I am only replying on solr-user@lucene.apache.org mailing list - the d...@lucene.apache.org mailing list is for the development of Lucene/Solr, not for questions about using the products; please ask first on solr-user@lucene.apache.org, if you think you have found a bug. If you don't get an answer in a day or two, then it makes sense to escalate to d...@lucene.apache.org.) Steve -Original Message- From: Pradeep Pujari [mailto:prade...@rocketmail.com] Sent: Thursday, July 28, 2011 1:43 PM To: solr-user@lucene.apache.org Subject: ShingleFilterFactory class error Hi, I am trying to create shingles with minShingleSize = 10, but it also returns bi-grams too. Heres is my schema defn filter class=solr.ShingleFilterFactory minShingleSize=10 maxShingleSize=25 outputUnigrams=false outputUnigramsIfNoShingles=false tokenSeparator= / For the input String Apple - iPad 3G Wi-Fi - 32GB, it breaks into Apple - - iPad My understaing that it should be 10-gram token. Is it bug or any configuration is to be added. Thank you in advance. Pradeep
field with repeated data in index
Hello all I created an index consisting of orders and the names of the salesmen who are responsible for the order. As you can imagine, the same name can be associated with many different orders. No problem. Until I try to do a faceted search on the salesman name field. Right now, I have the data indexed as follows: field name=PRIMARY_AC type=string indexed=false stored=true required=true default=PRIMARY_AC unavailable/ My faceted search gives me the following response: response={responseHeader={status=0,QTime=358,params={facet=on,indent=true,q=*:*,facet.field=PRIMARY_AC,wt=javabin,rows=0,version=2}},response={numFound=954178,start=0,docs=[]},facet_counts={facet_queries={},facet_fields={PRIMARY_AC={}},facet_dates={},facet_ranges={}}} Which just isn't right. I KNOW there's data in there, but am confused as to how to properly identify it to Solr. Any suggestions? Mark
RE: field with repeated data in index
You need to index the field you want to facet on. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Mark juszczec [mailto:mark.juszc...@gmail.com] Sent: Thursday, July 28, 2011 3:50 PM To: solr-user@lucene.apache.org Subject: field with repeated data in index Hello all I created an index consisting of orders and the names of the salesmen who are responsible for the order. As you can imagine, the same name can be associated with many different orders. No problem. Until I try to do a faceted search on the salesman name field. Right now, I have the data indexed as follows: field name=PRIMARY_AC type=string indexed=false stored=true required=true default=PRIMARY_AC unavailable/ My faceted search gives me the following response: response={responseHeader={status=0,QTime=358,params={facet=on,indent=true,q=*:*,facet.field=PRIMARY_AC,wt=javabin,rows=0,version=2}},response={numFound=954178,start=0,docs=[]},facet_counts={facet_queries={},facet_fields={PRIMARY_AC={}},facet_dates={},facet_ranges={}}} Which just isn't right. I KNOW there's data in there, but am confused as to how to properly identify it to Solr. Any suggestions? Mark
Re: field with repeated data in index
James Wow. That was fast. Thanks! But I thought you couldn't index a field that has duplicate values? Mark On Thu, Jul 28, 2011 at 4:53 PM, Dyer, James james.d...@ingrambook.comwrote: You need to index the field you want to facet on. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Mark juszczec [mailto:mark.juszc...@gmail.com] Sent: Thursday, July 28, 2011 3:50 PM To: solr-user@lucene.apache.org Subject: field with repeated data in index Hello all I created an index consisting of orders and the names of the salesmen who are responsible for the order. As you can imagine, the same name can be associated with many different orders. No problem. Until I try to do a faceted search on the salesman name field. Right now, I have the data indexed as follows: field name=PRIMARY_AC type=string indexed=false stored=true required=true default=PRIMARY_AC unavailable/ My faceted search gives me the following response: response={responseHeader={status=0,QTime=358,params={facet=on,indent=true,q=*:*,facet.field=PRIMARY_AC,wt=javabin,rows=0,version=2}},response={numFound=954178,start=0,docs=[]},facet_counts={facet_queries={},facet_fields={PRIMARY_AC={}},facet_dates={},facet_ranges={}}} Which just isn't right. I KNOW there's data in there, but am confused as to how to properly identify it to Solr. Any suggestions? Mark
RE: field with repeated data in index
I'm not sure what you're getting at when you mention duplicate values, but pretty much any way I interpret it, its allowed. The only case it wouldn't be is if the field is your primary key and you try to index a second document with the same key as an existing document. In that case the second document will replace the first. It might save you some time in the long run, if you haven't already, to go through the step-by-step tutorial at http://lucene.apache.org/solr/tutorial.html . There are links there also for the Solr Book and the Lucid reference guide. These are both excellent detailed tutorials and should help you get up-to-speed pretty fast. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Mark juszczec [mailto:mark.juszc...@gmail.com] Sent: Thursday, July 28, 2011 3:56 PM To: solr-user@lucene.apache.org Subject: Re: field with repeated data in index James Wow. That was fast. Thanks! But I thought you couldn't index a field that has duplicate values? Mark On Thu, Jul 28, 2011 at 4:53 PM, Dyer, James james.d...@ingrambook.comwrote: You need to index the field you want to facet on. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Mark juszczec [mailto:mark.juszc...@gmail.com] Sent: Thursday, July 28, 2011 3:50 PM To: solr-user@lucene.apache.org Subject: field with repeated data in index Hello all I created an index consisting of orders and the names of the salesmen who are responsible for the order. As you can imagine, the same name can be associated with many different orders. No problem. Until I try to do a faceted search on the salesman name field. Right now, I have the data indexed as follows: field name=PRIMARY_AC type=string indexed=false stored=true required=true default=PRIMARY_AC unavailable/ My faceted search gives me the following response: response={responseHeader={status=0,QTime=358,params={facet=on,indent=true,q=*:*,facet.field=PRIMARY_AC,wt=javabin,rows=0,version=2}},response={numFound=954178,start=0,docs=[]},facet_counts={facet_queries={},facet_fields={PRIMARY_AC={}},facet_dates={},facet_ranges={}}} Which just isn't right. I KNOW there's data in there, but am confused as to how to properly identify it to Solr. Any suggestions? Mark
[WARNING] Index corruption and crashes in Apache Lucene Core / Apache Solr with Java 7
Hello Apache Lucene Apache Solr users, Hello users of other Java-based Apache projects, Oracle released Java 7 today. Unfortunately it contains hotspot compiler optimizations, which miscompile some loops. This can affect code of several Apache projects. Sometimes JVMs only crash, but in several cases, results calculated can be incorrect, leading to bugs in applications (see Hotspot bugs 7070134 [1], 7044738 [2], 7068051 [3]). Apache Lucene Core and Apache Solr are two Apache projects, which are affected by these bugs, namely all versions released until today. Solr users with the default configuration will have Java crashing with SIGSEGV as soon as they start to index documents, as one affected part is the well-known Porter stemmer (see LUCENE-3335 [4]). Other loops in Lucene may be miscompiled, too, leading to index corruption (especially on Lucene trunk with pulsing codec; other loops may be affected, too - LUCENE-3346 [5]). These problems were detected only 5 days before the official Java 7 release, so Oracle had no time to fix those bugs, affecting also many more applications. In response to our questions, they proposed to include the fixes into service release u2 (eventually into service release u1, see [6]). This means you cannot use Apache Lucene/Solr with Java 7 releases before Update 2! If you do, please don't open bug reports, it is not the committers' fault! At least disable loop optimizations using the -XX:-UseLoopPredicate JVM option to not risk index corruptions. Please note: Also Java 6 users are affected, if they use one of those JVM options, which are not enabled by default: -XX:+OptimizeStringConcat or -XX:+AggressiveOpts It is strongly recommended not to use any hotspot optimization switches in any Java version without extensive testing! In case you upgrade to Java 7, remember that you may have to reindex, as the unicode version shipped with Java 7 changed and tokenization behaves differently (e.g. lowercasing). For more information, read JRE_VERSION_MIGRATION.txt in your distribution package! On behalf of the Lucene project, Uwe [1] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7070134 [2] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7044738 [3] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7068051 [4] https://issues.apache.org/jira/browse/LUCENE-3335 [5] https://issues.apache.org/jira/browse/LUCENE-3346 [6] http://s.apache.org/StQ - Uwe Schindler uschind...@apache.org Apache Lucene PMC Member / Committer Bremen, Germany http://lucene.apache.org/
solr.TrieFloatField with multiValued=false treated as `UnInverted multi-valued field`
Hi! I have problem with coding own SearchComponent. My schema.xml is: ... fieldType name=decimal class=solr.TrieFloatField precisionStep=2 omitNorms=true positionIncrementGap=0 / ... field name=price_min type=decimal indexed=true stored=true multiValued=false / ... When I use this value in my code leaves this in log: Jul 28, 2011 4:29:04 PM org.apache.solr.request.UnInvertedField uninvert INFO: UnInverted multi-valued field {field=price_min,memSize=13758712,tindexSize=28852,time=2407,phase1=2398,nTerms=184366,bigTerms=0,termInstances=3248049,uses=0} So it suggest that `price_min` is mutliValue but it isn't and I'm confused. In code those values are also false: SchemaField sf = searcher.getSchema().getField(field); FieldType ft = sf.getType(); sf.multiValued() || ft.multiValuedFieldCache() // is false I don't also understand why for this value is used UnInvertedField. Could anybody explain it to me? I'm confused when I try: FieldCache.StringIndex si = FieldCache.DEFAULT.getStringIndex(searcher.getReader(), fieldName); String termText = si.lookup[si.order[docID]]; And here `termText` is everytime equal to '~' and si.order[docID] is at the end of their array. It would be great to see from someone any helpful idea. -- Rafał RaVbaker Piekarski.
Index
Hi All, How we can check the particular;ar file is not INDEX in solr ? Regards, Gaurav
Re: Index
I have no idea what you mean. A file on your disk? What does INDEX in solr mean? Be more specific and clear, perhaps provide an example, and maybe someone can help you. On 7/28/2011 5:45 PM, GAURAV PAREEK wrote: Hi All, How we can check the particular;ar file is not INDEX in solr ? Regards, Gaurav
Re: Index
Do you mean, how can you check whether it has been indexed by solr, and is searchable? Nick On 7/28/2011 5:45 PM, GAURAV PAREEK wrote: Hi All, How we can check the particular;ar file is not INDEX in solr ? Regards, Gaurav
Re: Index
Yes NICK you are correct ? how can you check whether it has been indexed by solr, and is searchable? On Fri, Jul 29, 2011 at 3:27 AM, Nicholas Chase nch...@earthlink.netwrote: Do you mean, how can you check whether it has been indexed by solr, and is searchable? Nick On 7/28/2011 5:45 PM, GAURAV PAREEK wrote: Hi All, How we can check the particular;ar file is not INDEX in solr ? Regards, Gaurav
Re: question about exception in faceting
Correction: Except FacetComponent, HighlightComponent for example, if I use a bad regex pattern for RegexFragmenter, HighlightComponent throws an exception then Solr return 400. Solr returns 500 in this case actually. I think it should be 400 (bad request). koji -- Check out Query Log Visualizer http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html http://www.rondhuit.com/en/ (11/07/29 1:18), Koji Sekiguchi wrote: If I got an exception during faceting (e.g. undefined field), Solr doesn't return HTTP 400 but 200 with the exception stack trace inarr name=exception .../arr tag. Why is it implemented so? I checked Solr 1.1 and saw the same behavior. Except FacetComponent, HighlightComponent for example, if I use a bad regex pattern for RegexFragmenter, HighlightComponent throws an exception then Solr return 400. Thank you! koji
Re: question about exception in faceting
: If I got an exception during faceting (e.g. undefined field), Solr doesn't : return HTTP 400 but 200 with the exception stack trace in arr name=exception : .../arr tag. Why is it implemented so? I checked Solr 1.1 and saw the same behavior. super historic, pre-apache, code ... the idea at the time was that some parts of the response (like faceting, highlightin, watever...) would be optional and if there was an error computing that data it wouldn't fail the main request. that logic should really be ripped out. -Hoss
Re: question about exception in faceting
(11/07/29 8:52), Chris Hostetter wrote: : If I got an exception during faceting (e.g. undefined field), Solr doesn't : return HTTP 400 but 200 with the exception stack trace inarr name=exception : .../arr tag. Why is it implemented so? I checked Solr 1.1 and saw the same behavior. super historic, pre-apache, code ... the idea at the time was that some parts of the response (like faceting, highlightin, watever...) would be optional and if there was an error computing that data it wouldn't fail the main request. that logic should really be ripped out. Thank you for the response what I expected! I opened: https://issues.apache.org/jira/browse/SOLR-2682 koji -- Check out Query Log Visualizer http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html http://www.rondhuit.com/en/