Re: QueryElevationComponent not working in Distributed Search
Hi Erick, I cannot migrate to 4.0-ALPHA or 4.0-BETA because of the dependency in configuration as part of indexing in solrconfig.xml and schema.xml. When I try to use 4.0 version, I get a series of errors that pops up. Also I cannot change the entire configuration files that are available to me. So I tried patching up the diffs that were available as attachments in the issue that I have mentioned below. https://issues.apache.org/jira/browse/SOLR-2949 . But still I was facing some issues and tried replacing QueryElevationComponent.java from the newer versions. But I still do not find the functionality of elevating to be working for distributed search. Can you pleae let me know if there is any mean that I can include this fix without migrating to newer versions. Thank you, Vinoth -- View this message in context: http://lucene.472066.n3.nabble.com/QueryElevationComponent-not-working-in-Distributed-Search-tp4011785p4012382.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem with relating values in two multi value fields
Hi Mikhail, sorry, my fault. This was one of my first ideas. My problem is, that I've 1.000.000 documents, each with about 20 attributes. Additionally each document has between 200 and 500 option-value pairs. So if I denormalize the data, it means that I've 1.000.000 x 350 (200 + 500 / 2) = 350.000.000 documents, each with 20 attributes. Is denormalization the only way to handle this problem? I Thank you Torben Am 06.10.2012 um 12:30 schrieb Mikhail Khludnev: Torben, Denormalization implies copying attrs which are common for a group into the smaller docs: doc str name=setid3/str str name=attribute_Avalue/str str name=attribute_Bvalue/str str name=optionsA/str str name=value200/str /doc doc str name=setid3/str str name=attribute_Avalue/str str name=attribute_Bvalue/str str name=optionsB/str str name=value400/str /doc doc str name=setid3/str str name=attribute_Avalue/str str name=attribute_Bvalue/str str name=optionsB/str str name=value400/str /doc doc str name=setid3/str str name=attribute_Avalue/str str name=attribute_Bvalue/str str name=optionsC/str str name=value240/str /doc and use group.facet=true On Sat, Oct 6, 2012 at 2:24 AM, Torben Honigbaum torben.honigb...@neuland-bfi.de wrote: Hi Mikhail, thank you for your answer. Maybe my sample data was a not so god. The document always have additional data which I need to use as facet like this: doc str name=id3/str str name=attribute_Avalue/str str name=attribute_Bvalue/str str name=options strA/str strB/str ... str str name=value str200/str str400/str ... str /doc Torben Am 05.10.2012 um 17:20 schrieb Mikhail Khludnev: denormalize your docs to option x value tuples, identify them by duping id. doc str name=setid3/str str name=optionsA/str str name=value200/str /doc doc str name=setid3/str str name=optionsB/str str name=value400/str /doc doc str name=setid3/str str name=optionsB/str str name=value400/str /doc doc str name=setid3/str str name=optionsC/str str name=value240/str /doc then collapse them by set setid field. (it can not be uniqkey). On Fri, Oct 5, 2012 at 6:26 PM, Torben Honigbaum torben.honigb...@neuland-bfi.de wrote: Hi Mikhail, I read the article and can't see how to solve my problem with FieldCollapsing. Any other suggestions? Torben Am 04.10.2012 um 17:31 schrieb Mikhail Khludnev: it's a typical nested document problem. there are several approaches. Out of the box solution as far you need facets is http://wiki.apache.org/solr/FieldCollapsing . On Thu, Oct 4, 2012 at 7:19 PM, Torben Honigbaum torben.honigb...@neuland-bfi.de wrote: Hi Jack, thank you for your answer. The problem is, that I don't know the value for option A and that the values are numbers and I've to use the values as facet. So I need something like this: Docs: doc str name=id3/str str name=options strA/str strB/str ... str str name=value str200/str str400/str ... str /doc doc str name=id4/str str name=options strA/str strE/str ... str str name=value str300/str str400/str ... str /doc doc str name=id6/str str name=options strA/str strC/str ... str str name=value str200/str str400/str ... str /doc Query: …?q=options:A Facet: 200 (2), 300 (1) Thank you Torben Am 04.10.2012 um 17:10 schrieb Jack Krupansky: Use a field called option_value_pairs with values like A 200 and then query with a quoted phrase A 200. You could use a special character like equal sign instead of space: A=200 and then you don't have to quote it in the query. -- Jack Krupansky -Original Message- From: Torben Honigbaum Sent: Thursday, October 04, 2012 11:03 AM To: solr-user@lucene.apache.org Subject: Problem with relating values in two multi value fields Hello, I've a problem with relating values in two multi value fields. My documents look like this: doc str name=id3/str str name=options strA/str strB/str strC/str strD/str str str name=value str200/str str400/str str240/str str310/str str /doc My problem is that I've to search for a set of documents and display only the value for option A, for example, and use the value field as facet field. I need a result like this: doc str name=id3/str str name=optionsA/str str name=value200/str /doc facet … I think that this is a use case which isn't possible, right? So can someone show me an alternative way to solve this problem? The documents each have 500 options with 500 related values. Thank you Torben -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Sincerely yours Mikhail Khludnev Tech Lead
Re: Adding a new pseudo field
If I've understood you correctly, you could achieve this also with the XSLTResponseWriter, it would be pretty trivial to write an XLST that exposes the node position in the results, containing: positionxsl:value-of select=position()//position Stick that in solr/conf/xslt, and reference it with wt=xslttr=.xsl That way you wouldn't need to modify Solr at all. Also, look in Solr 4.0, which has calculated fields. Not sure if there's the scope to find the document position as a function query though. Upayavira On Mon, Oct 8, 2012, at 05:02 AM, deniz wrote: well basically i was about to explain and ask once more for your opinions but this morning i just wanted to try something in the source code and it succeeded... so here is what i want and i did for getting it: What I wanted: . The exact thing I want to is similar to score field. Normally it always exists but we can see it in a normal query response, unless we set fl:*,score. For my case, I would like to see each documents position in a pseudo field like score, so when i run a query with fl:*,position I want to see position5/position for the 5th document in the result set. so to make it more clear when you search for q=name:denizfl=*,position,score the result set will be something like : docposition/position1id986/idscore5/score/doc docposition/position2id1002/idscore4/score/doc docposition/position3id140/idscore3/score/doc and when user runs another query lets say q=name:stephanfl=*,position,score the result set will be like: docposition/position1id140/idscore8/score/doc docposition/position2id986/idscore5/score/doc docposition/position3id1002/idscore1/score/doc as you see, each time a different query will have different score, therefore a documents position - or ranking whichever you prefer to say - will be changed according to query What I did: well after digging the source code, I am now able to see dynamic positions for each different search.. I have simply added a position function to DocIterator and implemented in in subclasses. Then I have added a control block in ReturnFields for checking if fl has position in it. It is working in a similar way with score. and the last thing to do was adding a custom augmenter class like PositionAugmenter - similar to ScoreAugmenter. Then I am done :) I hope it helps if anyone faces a similar issue... - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-a-new-pseudo-field-tp4011995p4012375.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Storing queries in Solr
Solr has a small query cache, but this does not hold queries for any length of time, so won't suit your purpose. The LucidWorks Search product has (I believe) a click tracking feature, but that is about boosting documents that are clicked on, not specific search terms. Parsing the Solr log, or pushing query terms to a different core/index would really be the only way to achieve what you're suggesting, as far as I am aware. Processing logs would be preferable anyhow, as you don't really want to be triggering an index write during each query (assuming you have more queries than updates to your main index), and also if this is for building a suggester index, then it is unlikely to need updating that regularly - every hour or every day should be more than sufficient. You could write a SearchComponent that logs queries in another format, should the existing log format not be sufficient for you. Upayavira On Mon, Oct 8, 2012, at 01:24 AM, Jorge Luis Betancourt Gonzalez wrote: Hi! I was wondering if there are any built-in mechanism that allow me to store the queries made to a solr server inside the index itself. I know that the suggester module exist, but as far as I know it only works for terms existing in the index, and not with queries. I remember reading about using some external program to parse the solr log and pushing the queries or any other interesting data into the index, is this the only way of accomplish this? Greetings! 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: Adding a new pseudo field
Good question. I know xslt could output json, but you'd have to write a stylesheet that transforms the xml into json. I'm not sure whether you can influence the content-type for the output with the xslt response writer though. There's also the velocity response writer, which sits behind the /browse interface, that might help you also. Upayavira On Mon, Oct 8, 2012, at 08:54 AM, deniz wrote: Could xslt processor be useful for json response too? because i will be using the response not for browser but for some other jars.. - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-a-new-pseudo-field-tp4011995p4012393.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: add shard to index
Given that Solr does not support distributed IDF, adding a shard without balancing the number of documents could seriously skew your scoring. If you are okay with that, then the next question is what happens if you download the clusterstate.json from ZooKeeper, and add another entry, along the lines of shard3:{}, then upload it again, what would happen then? My theory is that the next host you start up would become the first node of shard3. Worth a try (unless someone more knowledgeable tells us otherwise!) Upayavira On Mon, Oct 8, 2012, at 01:35 AM, Radim Kolar wrote: i am reading this: http://wiki.apache.org/solr/SolrCloud section Re-sizing a Cluster Its possible to add shard to an existing index? I do not need to get data redistributed, they can stay where they are, its enough for me if new entries will be distributed into new number of shards. restarting solr is fine.
Re: Problem with relating values in two multi value fields
On Mon, 2012-10-08 at 08:42 +0200, Torben Honigbaum wrote: sorry, my fault. This was one of my first ideas. My problem is, that I've 1.000.000 documents, each with about 20 attributes. Additionally each document has between 200 and 500 option-value pairs. So if I denormalize the data, it means that I've 1.000.000 x 350 (200 + 500 / 2) = 350.000.000 documents, each with 20 attributes. If you have a few hundred or less distinct primary attributes (the A, B, C's in your example), you could create a new field for each of them: /doc str name=id3/str str name=optionsA B C D/str str name=option_A200/str str name=option_B400/str str name=option_C240/str str name=option_D310/str ... ... /doc Query for options:A and facet on field option_A to get facets for the specific field. This normalization does increase the index size due to duplicated secondary values between the option-fields, but since our assumption is a relatively small amount of primary values, it should not be too much. Alternatively, if you have many distinct primary attributes, index the pairs as Jack suggests: /doc str name=id3/str str name=optionsA B C D/str str name=optionA=200/str str name=optionB=400/str str name=optionC=240/str str name=optionD=310/str ... ... /doc Query for options:A and facet on field option with field.prefix=A=. Your result will be A=200 (2), A=450 (1)... so you'll have to strip whatever= before display. This normalization is potentially a lot heavier than the previous one, as we have distinct_primaries * distinct_secondaries distinct values. Worst case, where every document only contains distinct combinations of primary/secondary, we have 350M distinct option-values, which is quite heavy for a single box to facet on. Whether that is better or worse that 350M documents, I don't know. Is denormalization the only way to handle this problem? I What you are trying to do does look quite a lot like hierarchical faceting, which Solr does not support directly. But even if you apply one of the experimental patches, it does not mitigate the potential combinatorial explosion of your primary secondary values. So that leaves the question: How many distinct combinations of primary and secondary values do you have? Regards, Toke Eskildsen
Re: add shard to index
Hello! Radim there is a JIRA issue - https://issues.apache.org/jira/browse/SOLR-3755. It is work in progress, but once finished Solr will enable you to add additional shards on a live collection and split the ones that were already created. -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch Given that Solr does not support distributed IDF, adding a shard without balancing the number of documents could seriously skew your scoring. If you are okay with that, then the next question is what happens if you download the clusterstate.json from ZooKeeper, and add another entry, along the lines of shard3:{}, then upload it again, what would happen then? My theory is that the next host you start up would become the first node of shard3. Worth a try (unless someone more knowledgeable tells us otherwise!) Upayavira On Mon, Oct 8, 2012, at 01:35 AM, Radim Kolar wrote: i am reading this: http://wiki.apache.org/solr/SolrCloud section Re-sizing a Cluster Its possible to add shard to an existing index? I do not need to get data redistributed, they can stay where they are, its enough for me if new entries will be distributed into new number of shards. restarting solr is fine.
Reloading ExternalFileField blocks Solr
Hi List We're using Solr-4.0.0-Beta with a 7M document index running on a single host with 16 shards. We'd like to use an ExternalFileField to hold a value that changes often. However, we've discovered that the file is apparently re-read by every shard/core on *every commit*; the index is unresponsive in this period (around 20s on the host we're running on). This is unacceptable for our needs. In the future, we'd like to add other values as ExternalFileFields, and this will make the problem worse. It would be better if the external file were instead read in in the background, updating previously read relevant values for each shard as they are read in. I guess a change in the ExternalFileField code would be required to achieve this, but I have no experience here, so suggestions are very welcome. Thanks, /Martin Koch - Issuu - Senior Systems Architect.
Solr 4 spatial search - point intersects polygon
Hi everyone, I've been playing around with the new spatial search functionalities included in the newer versions of solr (solr 4.1 and solr trunk 5.0), and i've found something strange when I try to find a point inside a polygon (particularly inside a square). You can reproduce this problem using the spatial-solr-sandbox project that has the following config for the fields: /[...] fieldType name=geohash class=solr.SpatialRecursivePrefixTreeFieldType units=degrees / [...] field name=geohash type=geohash indexed=true stored=true multiValued=false / [...]/ I'm trying to find the following document: /doc str name=idG292223/str str name=nameDubai/str str name=geohash55.28 25.252220/str /doc / I want to test if this point is located inside a polygon so i'm using the following query: /q=geohash:Intersects(POLYGON((55.18 25.352220,55.38 25.352220,55.38 25.152220,55.18 25.152220,55.18 25.352220)))/ As you can see, it's a small square that contains the point described before. I get some results, but that document is not there, and the ones returned are wrong since they are not even inside the square. /result name=response numFound=8 start=0 doc str name=idG1809498/str str name=nameGuilin/str str name=geohash110.286390 25.281940/str /doc [...]/ However, if i change a little bit the shape of the square (just changed a little bit one corner), it returns the result as expected /q=geohash:Intersects(POLYGON((55.18 25.352220,*55.48* 25.352220,55.38 25.152220,55.18 25.152220,55.18 25.352220)))/ Now it returns a single result and it's OK /result name=response numFound=1 start=0 doc str name=idG292223/str str name=nameDubai/str str name=geohash55.28 25.252220/str /doc /result/ If i use a bbox with the same size and position than the first square, it returns correctly the document. /q=geohash:Intersects(55.18 25.152220 55.38 25.352220) result name=response numFound=1 start=0 doc str name=idG292223/str str name=nameDubai/str str name=geohash55.28 25.252220/str /doc /result/ If you draw another polygon such a triangle it works well too. I've tested this against different points and it's always the same, it seems that if you draw a straight square (or rectangle), it can't find the point inside it, and it returns wrong results. Am i doing anything wrong? Thanks in advance Jorge -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-spatial-search-point-intersects-polygon-tp4012402.html Sent from the Solr - User mailing list archive at Nabble.com.
I don't understand
Hi, There are two servers with the same configuration. I crawl the same URL. One of them is giving the following error: Caused by: org.apache.solr.common.SolrException: ERROR: [doc=http://bilgisayarciniz.org/] multiple values encountered for non multiValued copy field text: bilgisayarciniz web hizmetleri I really fail to understand. Why does this happen? Regards, PS: Neither server has multiValued=true for title field.
Re: I don't understand
Hi, Please describe your environemnt better * How do you crawl, using which crawler? * To which RequestHandler do you send the docs? * Which version of Solr * Can you share your schema and other relevant config with us? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 8. okt. 2012 kl. 12:11 skrev Tolga to...@ozses.net: Hi, There are two servers with the same configuration. I crawl the same URL. One of them is giving the following error: Caused by: org.apache.solr.common.SolrException: ERROR: [doc=http://bilgisayarciniz.org/] multiple values encountered for non multiValued copy field text: bilgisayarciniz web hizmetleri I really fail to understand. Why does this happen? Regards, PS: Neither server has multiValued=true for title field.
solr1.4 code Example
hi, I am unable to unzip the 5883_Code.zip file for solr 1.4 from paktpub site .I get the error message End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive. In the latter case the central directory and zipfile comment will be found on the last disk(s) of this archive. any pointers? Regards Sujatha
Re: I don't understand
Hi Jan, thanks for your fast reply. Below is the information you requested: * I use nutch, using the command nutch crawl urls -dir crawl-$(date +%FT%H-%M-%S) -solr http://localhost:8983/solr/ -depth 10 -topN 5 * What do you mean which RequestHandler? How can I find that out? * 3.6.1 * Both schemas are below: schema name=nutch version=1.4 types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=long class=solr.TrieLongField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=float class=solr.TrieFloatField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=date class=solr.TrieDateField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType fieldType name=url class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1/ /analyzer /fieldType /types fields field name=id type=string stored=true indexed=true/ !-- core fields -- field name=segment type=string stored=true indexed=false/ field name=digest type=string stored=true indexed=false/ field name=boost type=float stored=true indexed=false/ !-- fields for index-basic plugin -- field name=host type=string stored=false indexed=true/ field name=url type=url stored=true indexed=true required=true/ field name=content type=text stored=false indexed=true/ field name=title type=text stored=true indexed=true/ field name=cache type=string stored=true indexed=false/ field name=tstamp type=date stored=true indexed=false/ !-- fields for index-anchor plugin -- field name=anchor type=string stored=true indexed=true multiValued=true/ !-- fields for index-more plugin -- field name=type type=string stored=true indexed=true multiValued=true/ field name=contentLength type=long stored=true indexed=false/ field name=lastModified type=date stored=true indexed=false/ field name=date type=date stored=true indexed=true/ !-- fields for languageidentifier plugin -- field name=lang type=string stored=true indexed=true/ !-- fields for subcollection plugin -- field name=subcollection type=string stored=true indexed=true multiValued=true/ !-- fields for feed plugin (tag is also used by microformats-reltag)-- field name=author type=string stored=true indexed=true/ field name=tag type=string stored=true indexed=true multiValued=true/ field name=feed type=string stored=true indexed=true/ field name=publishedDate type=date stored=true indexed=true/ field name=updatedDate type=date stored=true indexed=true/ !-- fields for creativecommons plugin -- field name=cc type=string stored=true indexed=true multiValued=true/ /fields uniqueKeyid/uniqueKey defaultSearchFieldcontent/defaultSearchField solrQueryParser defaultOperator=OR/ /schema schema name=nutch version=1.4 types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=long class=solr.TrieLongField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=float class=solr.TrieFloatField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=date class=solr.TrieDateField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory
Re: I don't understand
Hi Jan, thanks for your fast reply. Below is the information you requested: * I use nutch, using the command nutch crawl urls -dir crawl-$(date +%FT%H-%M-%S) -solr http://localhost:8983/solr/ -depth 10 -topN 5 * What do you mean which RequestHandler? How can I find that out? * 3.6.1 * Both schemas are below: schema name=nutch version=1.4 types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=long class=solr.TrieLongField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=float class=solr.TrieFloatField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=date class=solr.TrieDateField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType fieldType name=url class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1/ /analyzer /fieldType /types fields field name=id type=string stored=true indexed=true/ !-- core fields -- field name=segment type=string stored=true indexed=false/ field name=digest type=string stored=true indexed=false/ field name=boost type=float stored=true indexed=false/ !-- fields for index-basic plugin -- field name=host type=string stored=false indexed=true/ field name=url type=url stored=true indexed=true required=true/ field name=content type=text stored=false indexed=true/ field name=title type=text stored=true indexed=true/ field name=cache type=string stored=true indexed=false/ field name=tstamp type=date stored=true indexed=false/ !-- fields for index-anchor plugin -- field name=anchor type=string stored=true indexed=true multiValued=true/ !-- fields for index-more plugin -- field name=type type=string stored=true indexed=true multiValued=true/ field name=contentLength type=long stored=true indexed=false/ field name=lastModified type=date stored=true indexed=false/ field name=date type=date stored=true indexed=true/ !-- fields for languageidentifier plugin -- field name=lang type=string stored=true indexed=true/ !-- fields for subcollection plugin -- field name=subcollection type=string stored=true indexed=true multiValued=true/ !-- fields for feed plugin (tag is also used by microformats-reltag)-- field name=author type=string stored=true indexed=true/ field name=tag type=string stored=true indexed=true multiValued=true/ field name=feed type=string stored=true indexed=true/ field name=publishedDate type=date stored=true indexed=true/ field name=updatedDate type=date stored=true indexed=true/ !-- fields for creativecommons plugin -- field name=cc type=string stored=true indexed=true multiValued=true/ /fields uniqueKeyid/uniqueKey defaultSearchFieldcontent/defaultSearchField solrQueryParser defaultOperator=OR/ /schema schema name=nutch version=1.4 types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=long class=solr.TrieLongField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=float class=solr.TrieFloatField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=date class=solr.TrieDateField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory
Re: QueryElevationComponent not working in Distributed Search
You shouldn't try copying files around, your comment that you tried replacing QueryElevationComponent.java leads me to think you tried that. Instead, I notice that there's a SOLR-2949.3x patch. If you want to try that, you can apply the patch to the 3.x code line. See working with patches at http://wiki.apache.org/solr/HowToContribute WARNING: I have no clue whether that patch will apply cleanly, nor whether it will actually fix distrib QEV. It doesn't look like it was applied to 3.x. Also, looking at the comments it's not clear that it _would_ work, see Marks last comment. What kinds of errors do you get with 4.0? It's true that a bunch has changed, but I really don't see any other reliable way to get distributed QEV working other than either using 4.0 or patching 3.6... and if you do this latter you're kind of on you own. Best Erick On Mon, Oct 8, 2012 at 2:21 AM, vasokan vaso...@andrew.cmu.edu wrote: Hi Erick, I cannot migrate to 4.0-ALPHA or 4.0-BETA because of the dependency in configuration as part of indexing in solrconfig.xml and schema.xml. When I try to use 4.0 version, I get a series of errors that pops up. Also I cannot change the entire configuration files that are available to me. So I tried patching up the diffs that were available as attachments in the issue that I have mentioned below. https://issues.apache.org/jira/browse/SOLR-2949 . But still I was facing some issues and tried replacing QueryElevationComponent.java from the newer versions. But I still do not find the functionality of elevating to be working for distributed search. Can you pleae let me know if there is any mean that I can include this fix without migrating to newer versions. Thank you, Vinoth -- View this message in context: http://lucene.472066.n3.nabble.com/QueryElevationComponent-not-working-in-Distributed-Search-tp4011785p4012382.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: add shard to index
Right, but even if that worked, you'd then get docs being assigned to the wrong shard. The shard assignment would be something like (hash(id)/3). So a document currently on shard 0 would be indexed next time, perhaps, on shard 2, leaving two live docs in your system with the same ID. Bad Things would happen then... I believe that currently your only real option is to re-index from scratch when you add more shards. I was thinking about this at one point. Unless the guys work some magic, it will be an expensive process. Not as expensive as re-indexing for sure, but consider 12 documents in 3 shards. shard1 - 1, 4, 7, 10 shard2 - 2, 5, 8, 11 shard3 - 3, 6, 9, 12 Now you add a shard and the docs are re-distributed shard1 - 1, 5, 9 shard2 - 2, 6, 10 shard3 - 3, 7, 11 shard4 - 4, 8, 12 In this simple case, only 3 out of your 12 documents stayed on the same shard! All the rest had to be moved. Then the indexes have to be distributed across all replicas, then Now, there won't have to be any analysis done. You won't have to reconstruct all of the documents from your system-of-record. You won't have to a _ton_ of work that you originally had to do. This should be enormously faster than re-indexing. But it still won't be something to casually do on a live system under load G. Disclaimer: I really may be talking through my hat here, but this _sounds_ right. FWIW Erick On Mon, Oct 8, 2012 at 4:33 AM, Upayavira u...@odoko.co.uk wrote: Given that Solr does not support distributed IDF, adding a shard without balancing the number of documents could seriously skew your scoring. If you are okay with that, then the next question is what happens if you download the clusterstate.json from ZooKeeper, and add another entry, along the lines of shard3:{}, then upload it again, what would happen then? My theory is that the next host you start up would become the first node of shard3. Worth a try (unless someone more knowledgeable tells us otherwise!) Upayavira On Mon, Oct 8, 2012, at 01:35 AM, Radim Kolar wrote: i am reading this: http://wiki.apache.org/solr/SolrCloud section Re-sizing a Cluster Its possible to add shard to an existing index? I do not need to get data redistributed, they can stay where they are, its enough for me if new entries will be distributed into new number of shards. restarting solr is fine.
Re: I don't understand
Well, the schemas are different. The first schema doesn't have a copyField directive anywhere in it and the second one does. And the copyField is in a non-standard place anyway, it's usually outside the /fields tag. Kind of surprising it works at all there, now I've got to go figure out why G. Anyway apparently you've edited the schemas inconsistently. and this copyField will never work unless the text field is multiValued... Best Erick On Mon, Oct 8, 2012 at 7:11 AM, Tolga to...@ozses.net wrote: Hi Jan, thanks for your fast reply. Below is the information you requested: * I use nutch, using the command nutch crawl urls -dir crawl-$(date +%FT%H-%M-%S) -solr http://localhost:8983/solr/ -depth 10 -topN 5 * What do you mean which RequestHandler? How can I find that out? * 3.6.1 * Both schemas are below: schema name=nutch version=1.4 types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=long class=solr.TrieLongField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=float class=solr.TrieFloatField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=date class=solr.TrieDateField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType fieldType name=url class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1/ /analyzer /fieldType /types fields field name=id type=string stored=true indexed=true/ !-- core fields -- field name=segment type=string stored=true indexed=false/ field name=digest type=string stored=true indexed=false/ field name=boost type=float stored=true indexed=false/ !-- fields for index-basic plugin -- field name=host type=string stored=false indexed=true/ field name=url type=url stored=true indexed=true required=true/ field name=content type=text stored=false indexed=true/ field name=title type=text stored=true indexed=true/ field name=cache type=string stored=true indexed=false/ field name=tstamp type=date stored=true indexed=false/ !-- fields for index-anchor plugin -- field name=anchor type=string stored=true indexed=true multiValued=true/ !-- fields for index-more plugin -- field name=type type=string stored=true indexed=true multiValued=true/ field name=contentLength type=long stored=true indexed=false/ field name=lastModified type=date stored=true indexed=false/ field name=date type=date stored=true indexed=true/ !-- fields for languageidentifier plugin -- field name=lang type=string stored=true indexed=true/ !-- fields for subcollection plugin -- field name=subcollection type=string stored=true indexed=true multiValued=true/ !-- fields for feed plugin (tag is also used by microformats-reltag)-- field name=author type=string stored=true indexed=true/ field name=tag type=string stored=true indexed=true multiValued=true/ field name=feed type=string stored=true indexed=true/ field name=publishedDate type=date stored=true indexed=true/ field name=updatedDate type=date stored=true indexed=true/ !-- fields for creativecommons plugin -- field name=cc type=string stored=true indexed=true multiValued=true/ /fields uniqueKeyid/uniqueKey defaultSearchFieldcontent/defaultSearchField solrQueryParser defaultOperator=OR/ /schema schema name=nutch version=1.4 types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType
Re: solr 1.4.1 - 3.6.1; SOLR-758
The Extended Dismax query parser (edismax) mostly obsoletes Dismax except in the sense that some apps prefer the restricted syntax of Dismax: http://wiki.apache.org/solr/ExtendedDisMax -- Jack Krupansky -Original Message- From: Patrick Kirsch Sent: Monday, October 08, 2012 2:32 AM To: solr-user@lucene.apache.org Subject: solr 1.4.1 - 3.6.1; SOLR-758 Regarding https://issues.apache.org/jira/browse/SOLR-758 (Enhance DisMaxQParserPlugin to support full-Solr syntax and to support alternate escaping strategies.) I'm updating from solr 1.4.1 to 3.6.1 (I'm aware that it is not beautiful). After applying the attached patches to 3.6.1 I'm experiencing this problem: - SEVERE: org.apache.solr.common.SolrException: Error Instantiating QParserPlugin, org.apache.solr.search.AdvancedQParserPlugin is not a org.apache.solr.search.QParserPlugin at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:421) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:441) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1612) [...] These patches seems no valid anymore. Which leads me to the more experienced users here: - Although not directly mentioned in https://issues.apache.org/jira/browse/SOLR-758, is there any other (new) QParser which obsoletes the DisMax? - Futhermore I tried to make the patches apply (forward porting), but always get the error Error Instantiating QParserPlugin, org.apache.solr.search.AdvancedQParserPlugin is not a org.apache.solr.search.QParserPlugin, although the class dependency is linear: ./core/src/java/org/apache/solr/search/AdvancedQParserPlugin.java: [...] public class AdvancedQParserPlugin extends DisMaxQParserPlugin { [...] ./core/src/java/org/apache/solr/search/DisMaxQParserPlugin.java: [...] public class DisMaxQParserPlugin extends QParserPlugin { [...] Thanks, Patrick
Re: solr1.4 code Example
On Mon, 2012-10-08 at 13:08 +0200, Sujatha Arun wrote: I am unable to unzip the 5883_Code.zip file for solr 1.4 from paktpub site .I get the error message End-of-central-directory signature not found. [...] It is a corrupt ZIP-file. I'm guessing you got it from http://www.packtpub.com/files/code/5883_Code.zip I tried downloading the archive and it was indeed corrupt. You can read some of the files by using jar for unpacking: 'jar xvf 5883_Code.zip'. You'll need to contact packtpub to get them to fix it peroperly. A quick search indicates that they've had problems before: https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201005.mbox/% 3c4bf66e8f.4070...@shoptimax.de%3E
Re: long query response time in shards search
What release of Solr are you on? Solr 4.0 has improved wildcard support (FST automatons.) But even then, such heavy use of wildcards may be problematic. If you intend to use wildcard in that manner, you might want to create a customer stemming filter that does that stemming at index time (and query time) so you don't need to do such heavy wildcarding. Do these complex queries always run slow (the first time each is tried) or just sometimes or some of the queries? (Solr will cache the results of a given query so that the next time the same results can be returned without re-querying the index.) -- Jack Krupansky -Original Message- From: Jason Sent: Monday, October 08, 2012 12:26 AM To: solr-user@lucene.apache.org Subject: Re: long query response time in shards search Hi, Otis Thanks your reply. yes, all cores are in same server. * what do you consider too long? just id(key) query response takes too long. almost id(key) query response takes under 10ms. example - 2012-10-05 16:38:32,078 [http-8080-exec-3979] INFO org.apache.solr.core.SolrCore - [usp00] webapp=/solr_us path=/select params={rows=1shards=usp00,usp01,usp02,usp03,usp04,usp05fl=cin,scorestart=0q=id:(US200840881A1)} status=0 QTime=164085 * how many queries are running concurrently? approximately 5 to 10 queries. but queries are very complex. complex means many terms include wildcard. * can you show some example queries? example - q=(angiogenesis*+OR+neovascula*+OR+(vessel*+OR+vascula*)+N+(proliferat*+OR+growth*))+5N+(inhibit*+OR+prevent*+OR+treat*+OR+thera*+OR+medic*)+AND+(ibd+OR+crohn*+OR+behcet*+OR+inflammat*+2N+(bowel*+OR+intestin*+OR+colitis*+OR+enteritis*+OR+gastroenteritis*)+OR+ulcerative*+W+colitis*+OR+intestin*+W+behcet*+OR+macula*+W+degenerat*+OR+amd+OR+armd) * how many CPU cores does your server have? 32 cores (server has 4 CPU and 8 cores in each CPU.) 128G RAM Also, total index for all cores include 15million docs and size is 400G. complex queries are problem?? -- View this message in context: http://lucene.472066.n3.nabble.com/long-query-response-time-in-shards-search-tp4012366p4012378.html Sent from the Solr - User mailing list archive at Nabble.com.
search by multiple 'LIKE' operator connected with 'AND' operator
Hi. I have a trouble with SOLR configuration. Just want to implement configuration that would be operate with index like MySQL query: field_name LIKE '%foo%' AND field_name LIKE '%bar%'. So, for example, I have 4 indexed titles: 'Kathy Lee', 'Kathy Norris', 'Kathy Davies', 'Kathy Bird' and with my query Kathy Norris I receive all these indexes. Quoted query give no results at all. latest field definition that I've try (very simple, just for tests): fieldType name=text_ngram class=solr.TextField indexed=true stored=true multiValued=true positionIncrementGap=100 autoGeneratePhraseQueries=false analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.NGramFilterFactory minGramSize=2 maxGramSize=100/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.PositionFilterFactory / /analyzer /fieldType Also I've try field with ShingleFilterFactory, also ShingleFilterFactory combined with NGrams. But no results. Btw. I have default solr configuration for drupal search_api_solr module, just modified with a new request handler. Trying different configurations not give expected results. Thanks for help. -- View this message in context: http://lucene.472066.n3.nabble.com/search-by-multiple-LIKE-operator-connected-with-AND-operator-tp4012536.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Storing queries in Solr
Thanks for the quick response, I'm trying to get a suggester query, I found odd the being a very common issue solr doesn't provide any built in mechanism for query suggestions, but implementing the other components isn't so hard either. Greetiings! On Oct 8, 2012, at 3:38 AM, Upayavira wrote: Solr has a small query cache, but this does not hold queries for any length of time, so won't suit your purpose. The LucidWorks Search product has (I believe) a click tracking feature, but that is about boosting documents that are clicked on, not specific search terms. Parsing the Solr log, or pushing query terms to a different core/index would really be the only way to achieve what you're suggesting, as far as I am aware. Processing logs would be preferable anyhow, as you don't really want to be triggering an index write during each query (assuming you have more queries than updates to your main index), and also if this is for building a suggester index, then it is unlikely to need updating that regularly - every hour or every day should be more than sufficient. You could write a SearchComponent that logs queries in another format, should the existing log format not be sufficient for you. Upayavira On Mon, Oct 8, 2012, at 01:24 AM, Jorge Luis Betancourt Gonzalez wrote: Hi! I was wondering if there are any built-in mechanism that allow me to store the queries made to a solr server inside the index itself. I know that the suggester module exist, but as far as I know it only works for terms existing in the index, and not with queries. I remember reading about using some external program to parse the solr log and pushing the queries or any other interesting data into the index, is this the only way of accomplish this? Greetings! 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Wildcards and fuzzy/phonetic query
Hi! I'm quite new to Solr, I was recently asked to help out on a project where the previous Solr-person quit quite suddenly. I've noticed that some of our searches don't return the expected result, and I'm hoping you guys can help me out. We've indexed a lot of names, and would like to search for a person in our system using these names. We previously used Oracle Text for this, and we experience that Solr is much faster. So far so good! :) But when we try to use wildcards things start to to wrong. We're using Solr 3.4, and I see that some of our problems are solved in 3.6. Ref SOLR-2438: https://issues.apache.org/jira/browse/SOLR-2438 But we would also like to be able to combine wildcards with fuzzy searches, and wildcards with a phonetic filter. I don't see anything about phonetic filters in SOLR-2438 or SOLR-2921. (https://issues.apache.org/jira/browse/SOLR-2921) Is it possible to make the phonetic filters MultiTermAware? Regarding fuzzy queries, in Oracle Text I can search for chr% (chr* in Solr..) and find both christian and kristian. As far as I understand, this is not possible in Solr, WildcardQuery and FuzzyQuery cannot be combined. Is this correct, or have I misunderstood anything? Are there any workarounds or filter-combinations I can use to achieve the same result? I've seen people suggest using a boolean query to combine the two, but I don't really see how that would solve my chr*-problem. As I mentioned earlier I'm quite new to this, so I apologize if what I'm asking about only shows my ignorance.. Regards, Hågen
Re: search by multiple 'LIKE' operator connected with 'AND' operator
The PositionFilterFactory is probably preventing phrase queries from working. What are you expecting it to do? It basically means query if all the quoted terms occur at the same position. SQL like is comparable to Lucene wildcard, but change the % to * and _ to ?. -- Jack Krupansky -Original Message- From: gremlin Sent: Monday, October 08, 2012 10:47 AM To: solr-user@lucene.apache.org Subject: search by multiple 'LIKE' operator connected with 'AND' operator Hi. I have a trouble with SOLR configuration. Just want to implement configuration that would be operate with index like MySQL query: field_name LIKE '%foo%' AND field_name LIKE '%bar%'. So, for example, I have 4 indexed titles: 'Kathy Lee', 'Kathy Norris', 'Kathy Davies', 'Kathy Bird' and with my query Kathy Norris I receive all these indexes. Quoted query give no results at all. latest field definition that I've try (very simple, just for tests): fieldType name=text_ngram class=solr.TextField indexed=true stored=true multiValued=true positionIncrementGap=100 autoGeneratePhraseQueries=false analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.NGramFilterFactory minGramSize=2 maxGramSize=100/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.PositionFilterFactory / /analyzer /fieldType Also I've try field with ShingleFilterFactory, also ShingleFilterFactory combined with NGrams. But no results. Btw. I have default solr configuration for drupal search_api_solr module, just modified with a new request handler. Trying different configurations not give expected results. Thanks for help. -- View this message in context: http://lucene.472066.n3.nabble.com/search-by-multiple-LIKE-operator-connected-with-AND-operator-tp4012536.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Storing queries in Solr
Hi Jorge, As far as I know, there isn't built-in component to achieve such function in Solr (maybe in latest 4.1 that I didn't explored in depth yet). However I've done myself in the past using different approaches. The first one is similar to Upayavira's suggestion ans uses an independent index where queries and clicks where stored in order to make popular queries suggestion and/or document suggestions. My second implementation was using a dedicated field on the original documents' index in order to add terms of queries that lead to a click on each particular document (ie re-indexing the document with a new field) and using this field as boosted terms and/or document suggestion. However this later solution is likely to not scale very well especially if your document index is very dynamic (my particular case relied on almost static documents repository). Finally, remember that exploiting queries and clicks may lead to private data management issues.Since you're storing their queries, warn your users appropriately. br, gdupont On 8 October 2012 02:24, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Hi! I was wondering if there are any built-in mechanism that allow me to store the queries made to a solr server inside the index itself. I know that the suggester module exist, but as far as I know it only works for terms existing in the index, and not with queries. I remember reading about using some external program to parse the solr log and pushing the queries or any other interesting data into the index, is this the only way of accomplish this? Greetings! 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION -- Gérard Dupont Information Processing Control and Cognition (IPCC) CASSIDIAN - an EADS company Document Learning team - LITIS Laboratory
Re: Wildcards and fuzzy/phonetic query
A regular expression term may provide what you want, but not exactly. Maybe something like: /(ch|k)r.*/ (No guarantee that will actually work.) See: http://lucene.apache.org/core/4_0_0-BETA/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Regexp_Searches And probably slower than desirable. -- Jack Krupansky -Original Message- From: Hågen Pihlstrøm Hasle Sent: Monday, October 08, 2012 11:21 AM To: solr-user@lucene.apache.org Subject: Wildcards and fuzzy/phonetic query Hi! I'm quite new to Solr, I was recently asked to help out on a project where the previous Solr-person quit quite suddenly. I've noticed that some of our searches don't return the expected result, and I'm hoping you guys can help me out. We've indexed a lot of names, and would like to search for a person in our system using these names. We previously used Oracle Text for this, and we experience that Solr is much faster. So far so good! :) But when we try to use wildcards things start to to wrong. We're using Solr 3.4, and I see that some of our problems are solved in 3.6. Ref SOLR-2438: https://issues.apache.org/jira/browse/SOLR-2438 But we would also like to be able to combine wildcards with fuzzy searches, and wildcards with a phonetic filter. I don't see anything about phonetic filters in SOLR-2438 or SOLR-2921. (https://issues.apache.org/jira/browse/SOLR-2921) Is it possible to make the phonetic filters MultiTermAware? Regarding fuzzy queries, in Oracle Text I can search for chr% (chr* in Solr..) and find both christian and kristian. As far as I understand, this is not possible in Solr, WildcardQuery and FuzzyQuery cannot be combined. Is this correct, or have I misunderstood anything? Are there any workarounds or filter-combinations I can use to achieve the same result? I've seen people suggest using a boolean query to combine the two, but I don't really see how that would solve my chr*-problem. As I mentioned earlier I'm quite new to this, so I apologize if what I'm asking about only shows my ignorance.. Regards, Hågen=
Re: SolrJ - IOException
I have also just ran into this a few times over the weekend in a newly deployed system. We are running Solr 4.0 Beta (not using SolrCloud) and it is hosted via AWS. I have a RabbitMQ consumer that reads updates from a queue and posts updates to Solr via SolrJ. There is quite a bit of error handling around the indexing request, and even if Solr is not live the consumer application successfully logs the exception and attempts to move along in the queue. There are two consumer applications running at once, and at times processes 400 requests per minute. The high volume times is not necessarily when this problem occurs, though. This exception is causing the entire application to hang - which is surprising considering all SolrJ logic is wrapped with try/catches. Has anyone found out more information regarding the possible keep alive bug? Any insight is much appreciated. Thanks, Briggs Thompson Oct 8, 2012 7:25:48 AM org.apache.http.impl.client.DefaultRequestDirector tryExecute INFO: I/O exception (java.net.SocketException) caught when processing request: Broken pipe Oct 8, 2012 7:25:48 AM org.apache.http.impl.client.DefaultRequestDirector tryExecute INFO: Retrying request Oct 8, 2012 7:25:48 AM com..rabbitmq.worker.SolrWriter work SEVERE: {id:4049703,datetime:2012-10-08 07:22:05} IOException occured when talking to server at: http://ec2-50-18-73-42.us-west-1.compute.amazonaws.com:8983/solr/coupon server org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://ec2-50-18-73-42.us-west-1.compute.amazonaws.com:8983/solr/coupon server at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:362) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:69) at org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:96) at org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:79) at com..solr.SolrIndexService.IndexCoupon(SolrIndexService.java:57) at com..solr.SolrIndexService.Index(SolrIndexService.java:36) at com..rabbitmq.worker.SolrWriter.work(SolrWriter.java:47) at com..rabbitmq.job.Runner.run(Runner.java:84) at com..rabbitmq.job.SolrConsumer.main(SolrConsumer.java:10) Caused by: org.apache.http.client.ClientProtocolException at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:306) ... 10 more Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry request with a non-repeatable request entity. The cause lists the reason the original request failed. at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:686) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:517) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) ... 13 more Caused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at org.apache.http.impl.io.AbstractSessionOutputBuffer.flushBuffer(AbstractSessionOutputBuffer.java:147) at org.apache.http.impl.io.AbstractSessionOutputBuffer.flush(AbstractSessionOutputBuffer.java:154) at org.apache.http.impl.conn.LoggingSessionOutputBuffer.flush(LoggingSessionOutputBuffer.java:95) at org.apache.http.impl.io.ChunkedOutputStream.flush(ChunkedOutputStream.java:178) at org.apache.http.entity.mime.content.InputStreamBody.writeTo(InputStreamBody.java:72) at org.apache.http.entity.mime.HttpMultipart.doWriteTo(HttpMultipart.java:206) at org.apache.http.entity.mime.HttpMultipart.writeTo(HttpMultipart.java:224) at org.apache.http.entity.mime.MultipartEntity.writeTo(MultipartEntity.java:183) at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98) at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108) at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122) at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271) at org.apache.http.impl.conn.AbstractClientConnAdapter.sendRequestEntity(AbstractClientConnAdapter.java:227) at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) at
Re: multivalued filed question (FieldCache error)
Thank you very much! I've singlelined, spaced removed every fl field in my solrconfig and now the app works fine Giovanni Il 05/10/12 20:49, Chris Hostetter ha scritto: : So extracting the attachment you will be able to track down what appens : : this is the query that shows the error, and below you can see the latest stack : trace and the qt definition Awesome -- exactly what we needed. I've reproduced your problem, and verified that it has something to do with the extra newlines which are confusing the parsing into not recognizing store_slug as a simple field name. The workarround is to modify the fl in your config to look like this... str name=flsku,store_slug/str ...or even like this... str name=fl sku, store_slug /str ...and then it should work fine. having a newline immediately following the store_slug field name is somehow confusing things, and making it not recognize store_slug as a simple field name -- so then it tries to parse it as a function, and since bare field names can also be used as functions that parsing works, but then you get the error that the field can't be used as a function since it's multivalued. I'll try to get a fix for this into 4.0-FINAL... https://issues.apache.org/jira/browse/SOLR-3916 -Hoss
Re: search by multiple 'LIKE' operator connected with 'AND' operator
Disabling PositionFilterFactory is totally break multiword search, and I could find titles only by single word. Default solr.TextField field with WhitespaceTokenizerFactory returns only complete words match, enabling NGramFilterFactory for that field doesn't do anything for me. If I use field described I coud find by both words, but no 'both at a time', just 'one of any'. TextField field copied by copyField into NGram field also doesn't helps. Maybe I miss something from schema configuration? -- View this message in context: http://lucene.472066.n3.nabble.com/search-by-multiple-LIKE-operator-connected-with-AND-operator-tp4012536p4012554.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcards and fuzzy/phonetic query
whether phonetic filters can be multiterm aware: I'd be leery of this, as I basically don't quite know how that would behave. You'd have to insure that the algorithms changed the first parts of the words uniformly, regardless of what followed. I'm pretty sure that _some_ phonetic algorithms do not follow this pattern, i.e. eric wouldn't necessarily have the same beginning as erickson. That said, some of the algorithms _may_ follow this rule and might be OK candidates for being MultiTermAware But, you don't need this in order to try it out. See the Expert Level Schema Possibilities at: http://searchhub.org/dev/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/ You can define your own analysis chain for wildcards as part of your fieldType definition and include whatever you want, whether or not it's MultiTermAware and it will be applied at query time. Use the analyzer type=query entry as a basis. _But_ you shouldn't include anything in this section that produces more than one output per input token. Note, token, not field. I.e. a really bad candidate for this section is WordDelimiterFilterFactory if you use the admin/analysis page (which you'll get to know intimately) and look at a type that has WordDelimiterFilterFactory in its chain and put something like erickErickson1234, you'll see what I mean.. Make sure and check the verbose box If you can determine that some of the phonetic algorithms _should_ be MultiTermAware, please feel free to raise a JIRA and we can discuss... I suspect it'll be on a case-by-case basis. Best Erick On Mon, Oct 8, 2012 at 11:21 AM, Hågen Pihlstrøm Hasle haagenha...@gmail.com wrote: Hi! I'm quite new to Solr, I was recently asked to help out on a project where the previous Solr-person quit quite suddenly. I've noticed that some of our searches don't return the expected result, and I'm hoping you guys can help me out. We've indexed a lot of names, and would like to search for a person in our system using these names. We previously used Oracle Text for this, and we experience that Solr is much faster. So far so good! :) But when we try to use wildcards things start to to wrong. We're using Solr 3.4, and I see that some of our problems are solved in 3.6. Ref SOLR-2438: https://issues.apache.org/jira/browse/SOLR-2438 But we would also like to be able to combine wildcards with fuzzy searches, and wildcards with a phonetic filter. I don't see anything about phonetic filters in SOLR-2438 or SOLR-2921. (https://issues.apache.org/jira/browse/SOLR-2921) Is it possible to make the phonetic filters MultiTermAware? Regarding fuzzy queries, in Oracle Text I can search for chr% (chr* in Solr..) and find both christian and kristian. As far as I understand, this is not possible in Solr, WildcardQuery and FuzzyQuery cannot be combined. Is this correct, or have I misunderstood anything? Are there any workarounds or filter-combinations I can use to achieve the same result? I've seen people suggest using a boolean query to combine the two, but I don't really see how that would solve my chr*-problem. As I mentioned earlier I'm quite new to this, so I apologize if what I'm asking about only shows my ignorance.. Regards, Hågen
Re: SolrJ - IOException
Also note there were no exceptions in the actual Solr log, only on the SolrJ side. Thanks, Briggs On Mon, Oct 8, 2012 at 10:45 AM, Briggs Thompson w.briggs.thomp...@gmail.com wrote: I have also just ran into this a few times over the weekend in a newly deployed system. We are running Solr 4.0 Beta (not using SolrCloud) and it is hosted via AWS. I have a RabbitMQ consumer that reads updates from a queue and posts updates to Solr via SolrJ. There is quite a bit of error handling around the indexing request, and even if Solr is not live the consumer application successfully logs the exception and attempts to move along in the queue. There are two consumer applications running at once, and at times processes 400 requests per minute. The high volume times is not necessarily when this problem occurs, though. This exception is causing the entire application to hang - which is surprising considering all SolrJ logic is wrapped with try/catches. Has anyone found out more information regarding the possible keep alive bug? Any insight is much appreciated. Thanks, Briggs Thompson Oct 8, 2012 7:25:48 AM org.apache.http.impl.client.DefaultRequestDirector tryExecute INFO: I/O exception (java.net.SocketException) caught when processing request: Broken pipe Oct 8, 2012 7:25:48 AM org.apache.http.impl.client.DefaultRequestDirector tryExecute INFO: Retrying request Oct 8, 2012 7:25:48 AM com..rabbitmq.worker.SolrWriter work SEVERE: {id:4049703,datetime:2012-10-08 07:22:05} IOException occured when talking to server at: http://ec2-50-18-73-42.us-west-1.compute.amazonaws.com:8983/solr/coupon server org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://ec2-50-18-73-42.us-west-1.compute.amazonaws.com:8983/solr/coupon server at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:362) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:69) at org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:96) at org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:79) at com..solr.SolrIndexService.IndexCoupon(SolrIndexService.java:57) at com..solr.SolrIndexService.Index(SolrIndexService.java:36) at com..rabbitmq.worker.SolrWriter.work(SolrWriter.java:47) at com..rabbitmq.job.Runner.run(Runner.java:84) at com..rabbitmq.job.SolrConsumer.main(SolrConsumer.java:10) Caused by: org.apache.http.client.ClientProtocolException at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:306) ... 10 more Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry request with a non-repeatable request entity. The cause lists the reason the original request failed. at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:686) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:517) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) ... 13 more Caused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at org.apache.http.impl.io.AbstractSessionOutputBuffer.flushBuffer(AbstractSessionOutputBuffer.java:147) at org.apache.http.impl.io.AbstractSessionOutputBuffer.flush(AbstractSessionOutputBuffer.java:154) at org.apache.http.impl.conn.LoggingSessionOutputBuffer.flush(LoggingSessionOutputBuffer.java:95) at org.apache.http.impl.io.ChunkedOutputStream.flush(ChunkedOutputStream.java:178) at org.apache.http.entity.mime.content.InputStreamBody.writeTo(InputStreamBody.java:72) at org.apache.http.entity.mime.HttpMultipart.doWriteTo(HttpMultipart.java:206) at org.apache.http.entity.mime.HttpMultipart.writeTo(HttpMultipart.java:224) at org.apache.http.entity.mime.MultipartEntity.writeTo(MultipartEntity.java:183) at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98) at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108) at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122) at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271) at
Re: Problem with relating values in two multi value fields
Toke, You are absolutely right, concatenating term is a possible solution. I found faceting is quite complicated in this case, but it was a hot fix which we delivered to production. Torben, This problem arise quite often, beside of these two approaches discussed there, also possible to approach SpanQueries and TermPositions - you can check our experience here: http://blog.griddynamics.com/2011/06/solr-experience-search-parent-child.html http://vimeo.com/album/2012142/video/33817062 Our current way is BlockJoin which is really performant in case of batched updates: http://blog.griddynamics.com/2012/08/block-join-query-performs.html. Bad thing that there is no open facet component for block join. We have a code, but are not ready to share it, yet. On Mon, Oct 8, 2012 at 12:44 PM, Toke Eskildsen t...@statsbiblioteket.dkwrote: On Mon, 2012-10-08 at 08:42 +0200, Torben Honigbaum wrote: sorry, my fault. This was one of my first ideas. My problem is, that I've 1.000.000 documents, each with about 20 attributes. Additionally each document has between 200 and 500 option-value pairs. So if I denormalize the data, it means that I've 1.000.000 x 350 (200 + 500 / 2) = 350.000.000 documents, each with 20 attributes. If you have a few hundred or less distinct primary attributes (the A, B, C's in your example), you could create a new field for each of them: /doc str name=id3/str str name=optionsA B C D/str str name=option_A200/str str name=option_B400/str str name=option_C240/str str name=option_D310/str ... ... /doc Query for options:A and facet on field option_A to get facets for the specific field. This normalization does increase the index size due to duplicated secondary values between the option-fields, but since our assumption is a relatively small amount of primary values, it should not be too much. Alternatively, if you have many distinct primary attributes, index the pairs as Jack suggests: /doc str name=id3/str str name=optionsA B C D/str str name=optionA=200/str str name=optionB=400/str str name=optionC=240/str str name=optionD=310/str ... ... /doc Query for options:A and facet on field option with field.prefix=A=. Your result will be A=200 (2), A=450 (1)... so you'll have to strip whatever= before display. This normalization is potentially a lot heavier than the previous one, as we have distinct_primaries * distinct_secondaries distinct values. Worst case, where every document only contains distinct combinations of primary/secondary, we have 350M distinct option-values, which is quite heavy for a single box to facet on. Whether that is better or worse that 350M documents, I don't know. Is denormalization the only way to handle this problem? I What you are trying to do does look quite a lot like hierarchical faceting, which Solr does not support directly. But even if you apply one of the experimental patches, it does not mitigate the potential combinatorial explosion of your primary secondary values. So that leaves the question: How many distinct combinations of primary and secondary values do you have? Regards, Toke Eskildsen -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Reloading ExternalFileField blocks Solr
Martin, Can you tell me what's the content of that field, and how it should affect search result? On Mon, Oct 8, 2012 at 12:55 PM, Martin Koch m...@issuu.com wrote: Hi List We're using Solr-4.0.0-Beta with a 7M document index running on a single host with 16 shards. We'd like to use an ExternalFileField to hold a value that changes often. However, we've discovered that the file is apparently re-read by every shard/core on *every commit*; the index is unresponsive in this period (around 20s on the host we're running on). This is unacceptable for our needs. In the future, we'd like to add other values as ExternalFileFields, and this will make the problem worse. It would be better if the external file were instead read in in the background, updating previously read relevant values for each shard as they are read in. I guess a change in the ExternalFileField code would be required to achieve this, but I have no experience here, so suggestions are very welcome. Thanks, /Martin Koch - Issuu - Senior Systems Architect. -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Wildcards and fuzzy/phonetic query
Hi, Consider looking into synonyms and ngrams. Otis -- Performance Monitoring - http://sematext.com/spm On Oct 8, 2012 11:21 AM, Hågen Pihlstrøm Hasle haagenha...@gmail.com wrote: Hi! I'm quite new to Solr, I was recently asked to help out on a project where the previous Solr-person quit quite suddenly. I've noticed that some of our searches don't return the expected result, and I'm hoping you guys can help me out. We've indexed a lot of names, and would like to search for a person in our system using these names. We previously used Oracle Text for this, and we experience that Solr is much faster. So far so good! :) But when we try to use wildcards things start to to wrong. We're using Solr 3.4, and I see that some of our problems are solved in 3.6. Ref SOLR-2438: https://issues.apache.org/jira/browse/SOLR-2438 But we would also like to be able to combine wildcards with fuzzy searches, and wildcards with a phonetic filter. I don't see anything about phonetic filters in SOLR-2438 or SOLR-2921. ( https://issues.apache.org/jira/browse/SOLR-2921) Is it possible to make the phonetic filters MultiTermAware? Regarding fuzzy queries, in Oracle Text I can search for chr% (chr* in Solr..) and find both christian and kristian. As far as I understand, this is not possible in Solr, WildcardQuery and FuzzyQuery cannot be combined. Is this correct, or have I misunderstood anything? Are there any workarounds or filter-combinations I can use to achieve the same result? I've seen people suggest using a boolean query to combine the two, but I don't really see how that would solve my chr*-problem. As I mentioned earlier I'm quite new to this, so I apologize if what I'm asking about only shows my ignorance.. Regards, Hågen
Re: solr 1.4.1 - 3.6.1; SOLR-758
: Regarding https://issues.apache.org/jira/browse/SOLR-758 (Enhance : DisMaxQParserPlugin to support full-Solr syntax and to support alternate : escaping strategies.) FWIW: i'm not really sure what/how that issue relates to the problem you are seeing (or how you *think* it relates to hte problem you are seeing) ... so i'm just going to focus on the specifics of your error... : After applying the attached patches to 3.6.1 I'm experiencing this problem: The mailing list typically rejects patches - none came with your message. : - SEVERE: org.apache.solr.common.SolrException: Error Instantiating : QParserPlugin, org.apache.solr.search.AdvancedQParserPlugin is not a : org.apache.solr.search.QParserPlugin Besides the obvious problem of not extending the expect class, the other posibility is that when compiling you AdvancedQParserPlugin you may be compailing against the wrong version of solr -- ie: you could get this error if the AdvancedQParserPlugin.class file you have was generated when your AdvancedQParserPlugin.java file was compiled against a different QParserPlugin.class then the one in use at runtime. -Hoss
Re: solr1.4 code Example
did get some files by jar unpacking ,but could not get the ones I wanted ...thanks anyway !! On Mon, Oct 8, 2012 at 5:56 PM, Toke Eskildsen t...@statsbiblioteket.dkwrote: On Mon, 2012-10-08 at 13:08 +0200, Sujatha Arun wrote: I am unable to unzip the 5883_Code.zip file for solr 1.4 from paktpub site .I get the error message End-of-central-directory signature not found. [...] It is a corrupt ZIP-file. I'm guessing you got it from http://www.packtpub.com/files/code/5883_Code.zip I tried downloading the archive and it was indeed corrupt. You can read some of the files by using jar for unpacking: 'jar xvf 5883_Code.zip'. You'll need to contact packtpub to get them to fix it peroperly. A quick search indicates that they've had problems before: https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201005.mbox/% 3c4bf66e8f.4070...@shoptimax.de%3E
Re: Wildcards and fuzzy/phonetic query
I guess synonyms would give me a similar result as using regexes, like Jack wrote about. I've thought about that, but I don't think it would be good enough. Substituting k for ch is easy enough, but the problem is that I have to think of every possible substitution in advance. I'd like Fil* to find Phillip, I'd like Hen* to find Hansen, and so on. The possibilities are quite endless, and I can't think of them all. I can't limit myself to Norwegian names either, a lot of people living in Norway have names from other countries. I'd like Moha* to find Mouhammed, etc.. Or am I too pessimistic? I haven't read enough about Ngrams yet, so I'm not sure if I've understood it properly. It divides the word into several pieces and tries to find one or more matches? Would that really help in my Chr* example? I guess you mean the combination of synonyms and ngrams? Is it possible to combine ngrams with a fuzzy query? So that every piece of a word is matched in a fuzzy way? Could that help me? I'll certainly look into ngrams more, thanks for the suggestion. Regards, Hågen On Oct 8, 2012, at 7:23 PM, Otis Gospodnetic wrote: Hi, Consider looking into synonyms and ngrams. Otis -- Performance Monitoring - http://sematext.com/spm On Oct 8, 2012 11:21 AM, Hågen Pihlstrøm Hasle haagenha...@gmail.com wrote: Hi! I'm quite new to Solr, I was recently asked to help out on a project where the previous Solr-person quit quite suddenly. I've noticed that some of our searches don't return the expected result, and I'm hoping you guys can help me out. We've indexed a lot of names, and would like to search for a person in our system using these names. We previously used Oracle Text for this, and we experience that Solr is much faster. So far so good! :) But when we try to use wildcards things start to to wrong. We're using Solr 3.4, and I see that some of our problems are solved in 3.6. Ref SOLR-2438: https://issues.apache.org/jira/browse/SOLR-2438 But we would also like to be able to combine wildcards with fuzzy searches, and wildcards with a phonetic filter. I don't see anything about phonetic filters in SOLR-2438 or SOLR-2921. ( https://issues.apache.org/jira/browse/SOLR-2921) Is it possible to make the phonetic filters MultiTermAware? Regarding fuzzy queries, in Oracle Text I can search for chr% (chr* in Solr..) and find both christian and kristian. As far as I understand, this is not possible in Solr, WildcardQuery and FuzzyQuery cannot be combined. Is this correct, or have I misunderstood anything? Are there any workarounds or filter-combinations I can use to achieve the same result? I've seen people suggest using a boolean query to combine the two, but I don't really see how that would solve my chr*-problem. As I mentioned earlier I'm quite new to this, so I apologize if what I'm asking about only shows my ignorance.. Regards, Hågen
Re: add shard to index
Do it as it is done in cassandra database. Adding new node and redistributing data can be done in live system without problem it looks like this: every cassandra node has key range assigned. instead of assigning keys to nodes like hash(key) mod nodes, then every node has its portion of hash keyspace. They do not need to be same, some node can have larger portion of keyspace then another. hash function max possible value is 12. shard1 - 1-4 shard2 - 5-8 shard3 - 9-12 now lets add new shard. In cassandra adding new shard by default cuts existing one by half, so you will have shard1 - 1-2 shard23-4 shard35-8 shard4 9-12 see? You needed to move only documents from old shard1. Usually you are adding more then 1 shard during reorganization, you do not need to rebalance cluster by moving every node into different position in hash keyspace that much.
Re: add shard to index
AKA Consistent Hashing: http://en.wikipedia.org/wiki/Consistent_hashing Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Mon, Oct 8, 2012 at 11:33 AM, Radim Kolar h...@filez.com wrote: Do it as it is done in cassandra database. Adding new node and redistributing data can be done in live system without problem it looks like this: every cassandra node has key range assigned. instead of assigning keys to nodes like hash(key) mod nodes, then every node has its portion of hash keyspace. They do not need to be same, some node can have larger portion of keyspace then another. hash function max possible value is 12. shard1 - 1-4 shard2 - 5-8 shard3 - 9-12 now lets add new shard. In cassandra adding new shard by default cuts existing one by half, so you will have shard1 - 1-2 shard23-4 shard35-8 shard4 9-12 see? You needed to move only documents from old shard1. Usually you are adding more then 1 shard during reorganization, you do not need to rebalance cluster by moving every node into different position in hash keyspace that much.
Re: Wildcards and fuzzy/phonetic query
I understand that I'm quickly reaching the boundaries of my Solr-competence when I'm supposed to read about Expert Level concepts.. :) I had already read it once, but now I read it again. Twice. And I'm not sure if I understand it correctly.. So let me ask a follow-up question: If I define an analyzer of type multiterm, will every filter I include for that analyzer be applied, even if it's not MultiTermAware? To complicate this further, I'm not really sure if phonetic filters is a good match for our needs. We search for names, and these names can come from all over the world. We use DoubleMetaphone, and Wikipedia says it tries to account for myriad irregularities in English of Slavic, Germanic, Celtic, Greek, French, Italian, Spanish, Chinese, and other origin. So I guess it's quite good. But how about names from the middle east, Pakistan or India? Is DoubleMetaphone a good match also for names from these countries? Are there any better algorithms? How about fuzzy-searches and wildcards, are they impossible to combine? We actually do three queries for every search, one fuzzy, one phonetic and one using ngram. Because I don't have too much confidence in the phonetic algorithm, I would really like to be able to combine fuzzy queries with wildcards.. :) Regards, Hågen On Oct 8, 2012, at 6:09 PM, Erick Erickson wrote: whether phonetic filters can be multiterm aware: I'd be leery of this, as I basically don't quite know how that would behave. You'd have to insure that the algorithms changed the first parts of the words uniformly, regardless of what followed. I'm pretty sure that _some_ phonetic algorithms do not follow this pattern, i.e. eric wouldn't necessarily have the same beginning as erickson. That said, some of the algorithms _may_ follow this rule and might be OK candidates for being MultiTermAware But, you don't need this in order to try it out. See the Expert Level Schema Possibilities at: http://searchhub.org/dev/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/ You can define your own analysis chain for wildcards as part of your fieldType definition and include whatever you want, whether or not it's MultiTermAware and it will be applied at query time. Use the analyzer type=query entry as a basis. _But_ you shouldn't include anything in this section that produces more than one output per input token. Note, token, not field. I.e. a really bad candidate for this section is WordDelimiterFilterFactory if you use the admin/analysis page (which you'll get to know intimately) and look at a type that has WordDelimiterFilterFactory in its chain and put something like erickErickson1234, you'll see what I mean.. Make sure and check the verbose box If you can determine that some of the phonetic algorithms _should_ be MultiTermAware, please feel free to raise a JIRA and we can discuss... I suspect it'll be on a case-by-case basis. Best Erick On Mon, Oct 8, 2012 at 11:21 AM, Hågen Pihlstrøm Hasle haagenha...@gmail.com wrote: Hi! I'm quite new to Solr, I was recently asked to help out on a project where the previous Solr-person quit quite suddenly. I've noticed that some of our searches don't return the expected result, and I'm hoping you guys can help me out. We've indexed a lot of names, and would like to search for a person in our system using these names. We previously used Oracle Text for this, and we experience that Solr is much faster. So far so good! :) But when we try to use wildcards things start to to wrong. We're using Solr 3.4, and I see that some of our problems are solved in 3.6. Ref SOLR-2438: https://issues.apache.org/jira/browse/SOLR-2438 But we would also like to be able to combine wildcards with fuzzy searches, and wildcards with a phonetic filter. I don't see anything about phonetic filters in SOLR-2438 or SOLR-2921. (https://issues.apache.org/jira/browse/SOLR-2921) Is it possible to make the phonetic filters MultiTermAware? Regarding fuzzy queries, in Oracle Text I can search for chr% (chr* in Solr..) and find both christian and kristian. As far as I understand, this is not possible in Solr, WildcardQuery and FuzzyQuery cannot be combined. Is this correct, or have I misunderstood anything? Are there any workarounds or filter-combinations I can use to achieve the same result? I've seen people suggest using a boolean query to combine the two, but I don't really see how that would solve my chr*-problem. As I mentioned earlier I'm quite new to this, so I apologize if what I'm asking about only shows my ignorance.. Regards, Hågen
Re: Fallout from the deprecation of setQueryType
On 9/28/2012 9:09 AM, Shawn Heisey wrote: I am planning and building up a test system with Solr 4.0, for my eventual upgrade. I have not made a lot of progress so far, but I have come across a potential problem. It's been over a week with no response to this. Please see the original email for full details. I have all but decided that I will allow the default /select handler to receive queries currently assigned to my lbcheck handler, and use a new handler called /search for everything on which I want to track statistics. There is still a possible problem. I have a broker core that has the shards parameter included in the standard request handler, so this would migrate to the new /search request handler. In the past, you could change the handler used on those shards with a shards.qt parameter, but if the qt parameter is no longer allowed to have a slash, this isn't going to work in the future. I will instead need an alternate config option that makes it use a new handler instead of /select. Does that option already exist? Thanks, Shawn
Re: Wildcards and fuzzy/phonetic query
To answer your first question, yes, you've got it right. If you define a multiterm section in your fieldType, whatever you put in that section gets applied whether the underlying class is MultiTermAware or not. Which means you can shoot yourself in the foot really bad G... Well, you have 6 or so possibilities out of the box...and all of them will fail at times. Fuzzy searches will also fail at times. And so will most anything else you try. The problem is these are algorithmic in nature and there are just too many cases that don't fit, human language is so endlessly variable Whether Middle Eastern names will work well with phonetic filters, well, what's the input language? Are you indexing English (or Norwegian or...) translations? In that case things should work OK since the phonetic variations should be accounted for in the translations. If you're indexing in different languages, you can apply different phonetic filters on different fields, so you might be able to work it that way. But if you're indexing multiple languages in to a _single_ field, you'll have a lot of other problems to solve before you start worrying about phonetics... All I can really say is give it a try and see how well it works since good search results are so domain dependent Fuzzy searches + wildcards. I don't think you can do that reasonably, but I'm not entirely sure. Best Erick On Mon, Oct 8, 2012 at 2:28 PM, Hågen Pihlstrøm Hasle haagenha...@gmail.com wrote: I understand that I'm quickly reaching the boundaries of my Solr-competence when I'm supposed to read about Expert Level concepts.. :) I had already read it once, but now I read it again. Twice. And I'm not sure if I understand it correctly.. So let me ask a follow-up question: If I define an analyzer of type multiterm, will every filter I include for that analyzer be applied, even if it's not MultiTermAware? To complicate this further, I'm not really sure if phonetic filters is a good match for our needs. We search for names, and these names can come from all over the world. We use DoubleMetaphone, and Wikipedia says it tries to account for myriad irregularities in English of Slavic, Germanic, Celtic, Greek, French, Italian, Spanish, Chinese, and other origin. So I guess it's quite good. But how about names from the middle east, Pakistan or India? Is DoubleMetaphone a good match also for names from these countries? Are there any better algorithms? How about fuzzy-searches and wildcards, are they impossible to combine? We actually do three queries for every search, one fuzzy, one phonetic and one using ngram. Because I don't have too much confidence in the phonetic algorithm, I would really like to be able to combine fuzzy queries with wildcards.. :) Regards, Hågen On Oct 8, 2012, at 6:09 PM, Erick Erickson wrote: whether phonetic filters can be multiterm aware: I'd be leery of this, as I basically don't quite know how that would behave. You'd have to insure that the algorithms changed the first parts of the words uniformly, regardless of what followed. I'm pretty sure that _some_ phonetic algorithms do not follow this pattern, i.e. eric wouldn't necessarily have the same beginning as erickson. That said, some of the algorithms _may_ follow this rule and might be OK candidates for being MultiTermAware But, you don't need this in order to try it out. See the Expert Level Schema Possibilities at: http://searchhub.org/dev/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/ You can define your own analysis chain for wildcards as part of your fieldType definition and include whatever you want, whether or not it's MultiTermAware and it will be applied at query time. Use the analyzer type=query entry as a basis. _But_ you shouldn't include anything in this section that produces more than one output per input token. Note, token, not field. I.e. a really bad candidate for this section is WordDelimiterFilterFactory if you use the admin/analysis page (which you'll get to know intimately) and look at a type that has WordDelimiterFilterFactory in its chain and put something like erickErickson1234, you'll see what I mean.. Make sure and check the verbose box If you can determine that some of the phonetic algorithms _should_ be MultiTermAware, please feel free to raise a JIRA and we can discuss... I suspect it'll be on a case-by-case basis. Best Erick On Mon, Oct 8, 2012 at 11:21 AM, Hågen Pihlstrøm Hasle haagenha...@gmail.com wrote: Hi! I'm quite new to Solr, I was recently asked to help out on a project where the previous Solr-person quit quite suddenly. I've noticed that some of our searches don't return the expected result, and I'm hoping you guys can help me out. We've indexed a lot of names, and would like to search for a person in our system using these names. We previously used Oracle Text for this, and we experience that Solr is
Re: Reloading ExternalFileField blocks Solr
Sure: We're boosting search results based on user actions which could be e.g. the number of times a particular document has been read. In future, we'd also like to boost by e.g. impressions (the number of times a document has been displayed) and other values. /Martin On Mon, Oct 8, 2012 at 7:02 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Martin, Can you tell me what's the content of that field, and how it should affect search result? On Mon, Oct 8, 2012 at 12:55 PM, Martin Koch m...@issuu.com wrote: Hi List We're using Solr-4.0.0-Beta with a 7M document index running on a single host with 16 shards. We'd like to use an ExternalFileField to hold a value that changes often. However, we've discovered that the file is apparently re-read by every shard/core on *every commit*; the index is unresponsive in this period (around 20s on the host we're running on). This is unacceptable for our needs. In the future, we'd like to add other values as ExternalFileFields, and this will make the problem worse. It would be better if the external file were instead read in in the background, updating previously read relevant values for each shard as they are read in. I guess a change in the ExternalFileField code would be required to achieve this, but I have no experience here, so suggestions are very welcome. Thanks, /Martin Koch - Issuu - Senior Systems Architect. -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
How to efficiently find documents that have a specific value for a field OR the field does not exist at all
I'm trying to find documents using this query: field:value OR (*:* AND NOT field:[* TO *]) Which means, either field is set to value or the field does not exist in the document. I'm running this for ~20 fields in a single query strung together with ANDs. The query time is high, averaging around 3.5s. Does anyone have suggestions on how to optimize this query? As a last resort, using technologies outside of Solr is a possibility. All suggestions are greatly appreciated! Thanks for your time and efforts, Artem PS. For the record, a colleague and I have brainstormed some idea of our own: * Adding a meta field to each document that consists of 1s and 0s, where each character represents a field's existence (1 yes, 0 no). In this case the query would look like: field:value OR signature:???0??? So we are looking for a certain field (the 0) that definitely does not exist and all the others we do not care about (wildcard). Note that this would have to be a leading wildcard query or we could prepend a dummy character to beginning. A bit of a hack. * Using bitwise operations to find all documents whose set of fields is a subset of they query's set of fields. This would be more work and would require writing a custom query parser or search handler.
Funny behavior in facet query on large dataset
I am doing a facet query in Solr (3.4) and getting very bad performance. This is in a solr shard with 22 million records, but I am specifically doing a small time slice. However even if I take the time slice query out it takes the same amount of time, so it seems to be searching the entire data set. I am trying to find all documents that contain the word dude or thedude or anotherdude and count how many of these were written by eldudearino (of course names are changed here to protect the innocent...). My query is like this: http://myserver:8080/solr/select/?fq=created_at:NOW-5MINUTESq=(+(text:(%22dude%22+%22thedude%22+%22%23anotherdude%22))+)facet=trueindent=onfacet.mincount=1wt=xmlversion=2.2rows=0fl=author_username,author_idfacet.field=author_usernamefq=author_username:(%22@eldudearino%22) Any ideas what I could be doing wrong? Thanks in advance! -- View this message in context: http://lucene.472066.n3.nabble.com/Funny-behavior-in-facet-query-on-large-dataset-tp4012584.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Funny behavior in facet query on large dataset
Faceting at that scale takes time to warm up. If you've got your caches and such configured appropriately, then successive searches will be very fast, however you'll still need to do the cache warming (depends on the faceting implementation you're using, in this case you're probably using the FieldCache). Faceting performance doesn't depend on the filters or query the caches that need to be built are indeed across the entire index. Erik On Oct 8, 2012, at 16:26 , kevinlieb wrote: I am doing a facet query in Solr (3.4) and getting very bad performance. This is in a solr shard with 22 million records, but I am specifically doing a small time slice. However even if I take the time slice query out it takes the same amount of time, so it seems to be searching the entire data set. I am trying to find all documents that contain the word dude or thedude or anotherdude and count how many of these were written by eldudearino (of course names are changed here to protect the innocent...). My query is like this: http://myserver:8080/solr/select/?fq=created_at:NOW-5MINUTESq=(+(text:(%22dude%22+%22thedude%22+%22%23anotherdude%22))+)facet=trueindent=onfacet.mincount=1wt=xmlversion=2.2rows=0fl=author_username,author_idfacet.field=author_usernamefq=author_username:(%22@eldudearino%22) Any ideas what I could be doing wrong? Thanks in advance! -- View this message in context: http://lucene.472066.n3.nabble.com/Funny-behavior-in-facet-query-on-large-dataset-tp4012584.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Reloading ExternalFileField blocks Solr
Martin, I have kind of hack approach in mind regarding hiding document from search. So, it's a little bit easier than your task. I'm going to deliver talk about it http://www.apachecon.eu/schedule/presentation/89/ . Frankly speaking, there is no reliable out-of-the-box solution for it. I saw that DocValues has been integrated with FunctionQueries already, but DocValues updates, which sounds like doable thing, has not been delivered yet. Regards On Mon, Oct 8, 2012 at 11:54 PM, Martin Koch m...@issuu.com wrote: Sure: We're boosting search results based on user actions which could be e.g. the number of times a particular document has been read. In future, we'd also like to boost by e.g. impressions (the number of times a document has been displayed) and other values. /Martin On Mon, Oct 8, 2012 at 7:02 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Martin, Can you tell me what's the content of that field, and how it should affect search result? On Mon, Oct 8, 2012 at 12:55 PM, Martin Koch m...@issuu.com wrote: Hi List We're using Solr-4.0.0-Beta with a 7M document index running on a single host with 16 shards. We'd like to use an ExternalFileField to hold a value that changes often. However, we've discovered that the file is apparently re-read by every shard/core on *every commit*; the index is unresponsive in this period (around 20s on the host we're running on). This is unacceptable for our needs. In the future, we'd like to add other values as ExternalFileFields, and this will make the problem worse. It would be better if the external file were instead read in in the background, updating previously read relevant values for each shard as they are read in. I guess a change in the ExternalFileField code would be required to achieve this, but I have no experience here, so suggestions are very welcome. Thanks, /Martin Koch - Issuu - Senior Systems Architect. -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Funny behavior in facet query on large dataset
: a small time slice. However even if I take the time slice query out it : takes the same amount of time, so it seems to be searching the entire data : set. a) you might try using facet.method=enum - in some special cases it may be faster then the default (facet.method=fc). : I am trying to find all documents that contain the word dude or thedude : or anotherdude and count how many of these were written by eldudearino : (of course names are changed here to protect the innocent...). b) field faceting isn't really designed for this type of problem. field faceting is very suitable for questions like find all docs matching QUERY, and for all of those docs, give me a list of hte top N authors and how many docs were written by those authors. c) If you just wnat to query for just the docs written by a single author, you cna use an fq like you do in your example, and then look at the numFound to know the total-- but in that case the faceting is just making extra work to generate counts of 0 for all of the other authors. d) if you want to query for an arbitrary set of documents, and then know how many of those documents were written by a particular author (or each of a particular set of authors) try facet.query instead. ...facet=truefacet.query=author_username:(%22@eldudearino%22) -Hoss
Re: How to efficiently find documents that have a specific value for a field OR the field does not exist at all
field:value OR (*:* AND NOT field:[* TO *]) Which means, either field is set to value or the field does not exist in the document. Instead of field:[* TO *], you can define a default value in schema.xml. Or DefaultValueUpdateProcessorFactory in solrconfig. With this, the field does not exist in the document part becomes field:MySpecialDefaultValue
Re: Funny behavior in facet query on large dataset
Thanks for all the replies. I oversimplified the problem for the purposes of making my post small and concise. I am really trying to find the counts of documents by a list of 10 different authors that match those keywords. Of course on looking up a single author there is no reason to do a facet query. To be clearer: Find all documents that contain the word dude or thedude or anotherdude and count how many of these were written by eldudearino and zeedudearino and adudearino and beedudearino I tried facet.query as well as facet.method=fc and neither really helped. We are constantly adding documents to the solr index and committing, every few seconds, which is probably why this is not working well. Seems we need to re-architect the way we are doing this... -- View this message in context: http://lucene.472066.n3.nabble.com/Funny-behavior-in-facet-query-on-large-dataset-tp4012584p4012610.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: long query response time in shards search
Hi, We're using Solr 4.0 and servicing patent search. Patent search intends to very complex queries including wildcard. I think Ngram or EdgeNgram filter is alternative. But every terms included a query don't have wildcard. So we can't use that filter. If I make empty core and use in main core that just merge search results, is it helpful? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/long-query-response-time-in-shards-search-tp4012366p4012628.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Funny behavior in facet query on large dataset
On 10/8/2012 4:09 PM, kevinlieb wrote: Thanks for all the replies. I oversimplified the problem for the purposes of making my post small and concise. I am really trying to find the counts of documents by a list of 10 different authors that match those keywords. Of course on looking up a single author there is no reason to do a facet query. To be clearer: Find all documents that contain the word dude or thedude or anotherdude and count how many of these were written by eldudearino and zeedudearino and adudearino and beedudearino I tried facet.query as well as facet.method=fc and neither really helped. We are constantly adding documents to the solr index and committing, every few seconds, which is probably why this is not working well. Seems we need to re-architect the way we are doing this... I would definitely consider increasing the amount of time between commits. You can add documents at whatever interval you want, but if you only do commits every minute or two, your caches will be much more useful. Your time slice filter query (NOW-5MINUTES) will never be cached, because NOW is measured in milliseconds and will therefore be different for every query. You might consider doing NOW/MINUTE-5MINUTES instead .. or even [NOW/MINUTE-5MINUTES TO *] so that you actually are dealing with a range. For the space of that minute (at least until the cache gets invalidated by a commit), the filter cache entry will be valid. Some general questions that may matter: How big are all your index directories on this server, how much RAM is in the server, and how much RAM are you giving to Java? I'm also curious how big your Solr caches are, what the autowarm counts are, and how long it is taking for your caches to warm up after each commit. You can get the warm times from the cache statistics in the admin interface. Thanks, Shawn
Re: Funny behavior in facet query on large dataset
Hi Kevin, Right, it's the very frequent commits, most likely. Change commits to, say, every 60 or 120 seconds and compare the performance. I think you guys use SPM, so check the Cache graphs (hit % specifically) before and after the above change. Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Mon, Oct 8, 2012 at 6:09 PM, kevinlieb ke...@politear.com wrote: Thanks for all the replies. I oversimplified the problem for the purposes of making my post small and concise. I am really trying to find the counts of documents by a list of 10 different authors that match those keywords. Of course on looking up a single author there is no reason to do a facet query. To be clearer: Find all documents that contain the word dude or thedude or anotherdude and count how many of these were written by eldudearino and zeedudearino and adudearino and beedudearino I tried facet.query as well as facet.method=fc and neither really helped. We are constantly adding documents to the solr index and committing, every few seconds, which is probably why this is not working well. Seems we need to re-architect the way we are doing this... -- View this message in context: http://lucene.472066.n3.nabble.com/Funny-behavior-in-facet-query-on-large-dataset-tp4012584p4012610.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrJ 4.0 Beta maxConnectionsPerHost
Hi, Qs: * Have you tried StreamingUpdateSolrServer? * Newever version of Solr(J)? When things hang, jstack your app that uses SolrJ and Solr a few times and you should be able to see where they are stuck. Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Mon, Oct 8, 2012 at 9:52 PM, Briggs Thompson w.briggs.thomp...@gmail.com wrote: I am running into an issue of a multithreaded SolrJ client application used for indexing is getting into a hung state. I responded to a separate thread earlier today with someone that had the same error, see http://lucene.472066.n3.nabble.com/SolrJ-IOException-td4010026.html I did some digging and experimentation and found something interesting. When starting up the application, I see the following in Solr logs: Creating new http client, config:maxConnections=200maxConnectionsPerHost=8 The way I instantiate the HttpSolrServer through SolrJ is like the following HttpSolrServer solrServer = new HttpSolrServer(serverUrl); solrServer.setConnectionTimeout(1000); solrServer.setDefaultMaxConnectionsPerHost(100); solrServer.setMaxTotalConnections(100); solrServer.setParser(new BinaryResponseParser()); solrServer.setRequestWriter(new BinaryRequestWriter()); It seems as though the maxConnections and maxConnectionsPerHost are not actually getting set. Anyone seen this problem or have an idea how to resolve? Thanks, Briggs Thompson
Re: long query response time in shards search
Hi, We've explored this with a few clients a while back. If I remember correctly, this doesn't make much difference and I don't expect it will make any noticable difference for you since all your cores are on that same 1 server. If you had 1 server with more CPU cores you would see better numbers. Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Mon, Oct 8, 2012 at 9:43 PM, Jason hialo...@gmail.com wrote: Hi, We're using Solr 4.0 and servicing patent search. Patent search intends to very complex queries including wildcard. I think Ngram or EdgeNgram filter is alternative. But every terms included a query don't have wildcard. So we can't use that filter. If I make empty core and use in main core that just merge search results, is it helpful? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/long-query-response-time-in-shards-search-tp4012366p4012628.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Reloading ExternalFileField blocks Solr
Hi Martin, Perhaps you could make a small change in Solr to add don't reload EFF if it hasn't been modified since it was last opened. I assume you commit pretty often, but don't modify EFF files that often, so this could save you some needless loading. That said, I'd be surprised EFF doesn't already do this... I didn't check. Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Mon, Oct 8, 2012 at 4:55 AM, Martin Koch m...@issuu.com wrote: Hi List We're using Solr-4.0.0-Beta with a 7M document index running on a single host with 16 shards. We'd like to use an ExternalFileField to hold a value that changes often. However, we've discovered that the file is apparently re-read by every shard/core on *every commit*; the index is unresponsive in this period (around 20s on the host we're running on). This is unacceptable for our needs. In the future, we'd like to add other values as ExternalFileFields, and this will make the problem worse. It would be better if the external file were instead read in in the background, updating previously read relevant values for each shard as they are read in. I guess a change in the ExternalFileField code would be required to achieve this, but I have no experience here, so suggestions are very welcome. Thanks, /Martin Koch - Issuu - Senior Systems Architect.
Problem with dataimporter.request
I'm quite new in SOLR, I have a question regarding the request for data importer. In my data-config.xml, i have something like this entity name=content pk=id query=SELECT * FROM tableX deltaQuery=SELECT max(id) AS id from ${dataimporter.request.dataView} deltaImportQuery=SELECT * FROM tableX WHERE ${dataimporter.delta.id} lt; id /entity However, everytime I execute delta-import (/dataimport?command=delta-import), it always gives me exception like this: Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT max(id) AS id FROM Processing Document # 1 I believe this error exists because the system didn't recognize ${dataimporter.request.dataView}, but I don't know how to make that recognized? *I also asked the very same question in http://stackoverflow.com/questions/12793025/cannot-get-anything-from-dataimporter-request-on-updating-index, if you want to get some reputations there too, you can answer there. Thank you! -- Zakka Fauzan