solr indexing on HDFS for high query throughput
Hi, I am using solr for indexing. Index size is small and it is around 50GB. I need to use solr for high query throughput system. I am using twitter api and I need to search incoming tweet in solr. So I want to know how should I design such system ? Does solr supports HDFS natively ? How can I index and search on HDFS system ? Thanks Vineet Yadav
Re: SOLR 4 Alpha Out Of Mem Err
I think what makes the most sense is to limit the number of connections to another host. A host only has so many CPU resources, and beyond a certain point throughput would start to suffer anyway (and then only make the problem worse). It also makes sense in that a client could generate documents faster than we can index them (either for a short period of time, or on average) and having flow control to prevent unlimited buffering (which is essentially what this is) makes sense. Nick - when you switched to HttpSolrServer, things worked because this added an explicit flow control mechanism. A single request (i.e. an add with one or more documents) is fully indexed to all endpoints before the response is returned. Hence if you have 10 indexing threads and are adding documents in batches of 100, there can be only 1000 documents buffered in the system at any one time. -Yonik http://lucidimagination.com
Start solr master and solr slave with enable replication = false
Hi, It's possible to start the solr master and slave with the following configuration - replication on master disabled when we start solr -- the replication feature must be available - polling on slave disabled -- the replication feature must be available -- Best Regards -- Jamel -- View this message in context: http://lucene.472066.n3.nabble.com/Start-solr-master-and-solr-slave-with-enable-replication-false-tp3995685.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problems with elevation component configuration
Hi, Well, if I understand correctly, only the search term is important for elevation, not the query. Anyway, we ended up modifying QueryElevationComponent class, extracting the search term from the query using regex. After that, it turned out that elevation doesn't work with grouped results, so we had to separate sorting for groups and non-groups in prepare() method of the same class. That was not the end of problems, because we need to show elevated results with a different styling, so we upgraded to Solr4 and now it seems to be working as expected. -- View this message in context: http://lucene.472066.n3.nabble.com/Problems-with-elevation-component-configuration-tp3993204p3995692.html Sent from the Solr - User mailing list archive at Nabble.com.
change of API Javadoc interface funtionality in 4.0.x
Dear developers, while upgrading from 3.6.x to 4.x I have to rewrite some of my code and search for the new methods and/or classes. In 3.6.x and older versions the API Javadoc interface had an Index which made it easy to find the appropriate methods. The button to call the Index was located in the top of the web page between Deprecated and Help. What is the sense of removing the Index from the API Javadoc for Lucene and Solr? Regards Bernd
Re: SOLR 4 Alpha Out Of Mem Err
Nick, to solve out of memory issue, i think you can make below change: 1) in solrsconfig.xml, reduce ramBufferSizeMB (there are two, change both) 2) in solrsconfig.xml, reduce documentCache value to solve call commit slow down index issue, i think you can change new search default queyr: in solrsconfig.xml, search for listener event=newSearcher class=solr.QuerySenderListener change str name=qcontent:*/str str name=start0/str str name=rows10/str to str name=qcontent:notexist/str str name=start0/str str name=rows10/str -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-4-Alpha-Out-Of-Mem-Err-tp3995033p3995695.html Sent from the Solr - User mailing list archive at Nabble.com.
Does SolrEntityProcessor fulfill my requirements?
Hi folks, i have this case: i want to update my solr 4.0 from trunk to solr 4.0 alpha. the index structure has changed, i can't replicate. 10 cores are in use, each with 30Mio docs. We assume that all fields are stored and indexed. What is the best way to export the docs from all cores on one machine with solr 4.0trunk to same named cores on other machine with solr 4.0 alpha. SolrEntityProcessor can be one solution, but does it work with this size of data? I want reindex all docs at once and not in small parts. I find no examples of bigger reindex-attempts with SolrEntityProcessor. Xslt as option two? What were the best solution to do this, what do you think? Best Regards Vadim
Re: Wildcard query vs facet.prefix for autocomplete?
Well, option 2 won't do you any good, so speed doesn't really matter. Your response would have a facet count for dam, all by itself, something like int name=damned2/int int name=dame1/int etc. which does not contain anything that lets you reconstruct the title for autosuggest. Best Erick On Tue, Jul 17, 2012 at 3:18 AM, santamaria2 aravinda@contify.com wrote: I'll consider using the other methods, but I'd like to know which would be faster among the two approaches mentioned in my opening post. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995458.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard query vs facet.prefix for autocomplete?
Well silly me... you're right. On Wed, Jul 18, 2012 at 6:44 PM, Erick Erickson [via Lucene] ml-node+s472066n399570...@n3.nabble.com wrote: Well, option 2 won't do you any good, so speed doesn't really matter. Your response would have a facet count for dam, all by itself, something like int name=damned2/int int name=dame1/int etc. which does not contain anything that lets you reconstruct the title for autosuggest. Best Erick On Tue, Jul 17, 2012 at 3:18 AM, santamaria2 [hidden email]http://user/SendEmail.jtp?type=nodenode=3995706i=0 wrote: I'll consider using the other methods, but I'd like to know which would be faster among the two approaches mentioned in my opening post. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995458.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995706.html To unsubscribe from Wildcard query vs facet.prefix for autocomplete?, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3995199code=YXJhdmluZGEucmFvQGNvbnRpZnkuY29tfDM5OTUxOTl8MTgyMTM4MDg2OQ== . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995707.html Sent from the Solr - User mailing list archive at Nabble.com.
NGram for misspelt words
I have configured NGram Indexing for some fields. Say I search for the city Ludlow, I get the results (normal search) If I search for Ludlo (with w ommitted) I get the results If I search for Ludl (with ow ommitted) I still get the results I know that they are all partial strings of the main string hence NGram works perfect. But when I type in Ludlwo (misspelt, characters o and w interchanged) I dont get any results, It should ideally match Ludl and provide the results. I am not looking for Edit distance based Spell Correctors. How can I make above NGram based search work? Here is my schema.xml (NGramFieldType): fieldType name=nGram class=solr.TextField positionIncrementGap=100 stored=false multiValued=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ !-- potentially word delimiter, synonym filter, stop words, NOT stemming -- filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15 side=front / /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ !-- potentially word delimiter, synonym filter, stop words, NOT stemming -- filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType /PRE BR **BRThis message may contain confidential or proprietary information intended only for the use of theBRaddressee(s) named above or may contain information that is legally privileged. If you areBRnot the intended addressee, or the person responsible for delivering it to the intended addressee,BRyou are hereby notified that reading, disseminating, distributing or copying this message is strictlyBRprohibited. If you have received this message by mistake, please immediately notify us byBRreplying to the message and delete the original message and any copies immediately thereafter.BR BR Thank you.~BR **BR FAFLDBR PRE
Count is inconsistent between facet and stats
Hi Guys, Steps to reproduce: 1) Download apache-solr-4.0.0-ALPHA 2) cd example; java -jar start.jar 3) cd exampledocs; ./post.sh *.xml 4) Use statsComponent to get the stats info for field 'popularity' based on facet 'cat'. And the 'count' for 'electronics' is 3 http://localhost:8983/solr/collection1/select?q=cat:electronicswt=jsonrows=0stats=truestats.field=popularitystats.facet=cat { - stats_fields: { - popularity: { - min: 0, - max: 10, - count: 14, - missing: 0, - sum: 75, - sumOfSquares: 503, - mean: 5.357142857142857, - stddev: 2.7902892835178013, - facets: { - cat: { - music: { - min: 10, - max: 10, - count: 1, - missing: 0, - sum: 10, - sumOfSquares: 100, - mean: 10, - stddev: 0 }, - monitor: { - min: 6, - max: 6, - count: 2, - missing: 0, - sum: 12, - sumOfSquares: 72, - mean: 6, - stddev: 0 }, - hard drive: { - min: 6, - max: 6, - count: 2, - missing: 0, - sum: 12, - sumOfSquares: 72, - mean: 6, - stddev: 0 }, - scanner: { - min: 6, - max: 6, - count: 1, - missing: 0, - sum: 6, - sumOfSquares: 36, - mean: 6, - stddev: 0 }, - memory: { - min: 0, - max: 7, - count: 3, - missing: 0, - sum: 12, - sumOfSquares: 74, - mean: 4, - stddev: 3.605551275463989 }, - graphics card: { - min: 7, - max: 7, - count: 2, - missing: 0, - sum: 14, - sumOfSquares: 98, - mean: 7, - stddev: 0 }, - electronics: { - min: 1, - max: 7, - count: 3, - missing: 0, - sum: 9, - sumOfSquares: 51, - mean: 3, - stddev: 3.4641016151377544 } } } } } } 5) Facet on 'cat' and the count is 14. http://localhost:8983/solr/collection1/select?q=cat:electronicswt=jsonrows=0facet=truefacet.field=cat { - cat: [ - electronics, - 14, - memory, - 3, - connector, - 2, - graphics card, - 2, - hard drive, - 2, - monitor, - 2, - camera, - 1, - copier, - 1, - multifunction printer, - 1, - music, - 1, - printer, - 1, - scanner, - 1, - currency, - 0, - search, - 0, - software, - 0 ] }, So from StatsComponent the count for 'electronics' cat is 3, while FacetComponent report 14 'electronics'. Is this a bug? Following is the field definition for 'cat'. field name=cat type=string indexed=true stored=true multiValued=true/ Thanks, Yandong
Re: NGram for misspelt words
You are creating grams only while indexing and not querying hence 'ludlwo' would not match. Your analyzer will create the following grams while indexing for 'ludlow': lu lud ludl ludlo ludlow and hence would not match to 'ludlwo'. Either you need to create gram while querying also or use Edit Distance. On Wed, Jul 18, 2012 at 7:43 PM, Husain, Yavar yhus...@firstam.com wrote: I have configured NGram Indexing for some fields. Say I search for the city Ludlow, I get the results (normal search) If I search for Ludlo (with w ommitted) I get the results If I search for Ludl (with ow ommitted) I still get the results I know that they are all partial strings of the main string hence NGram works perfect. But when I type in Ludlwo (misspelt, characters o and w interchanged) I dont get any results, It should ideally match Ludl and provide the results. I am not looking for Edit distance based Spell Correctors. How can I make above NGram based search work? Here is my schema.xml (NGramFieldType): fieldType name=nGram class=solr.TextField positionIncrementGap=100 stored=false multiValued=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ !-- potentially word delimiter, synonym filter, stop words, NOT stemming -- filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15 side=front / /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ !-- potentially word delimiter, synonym filter, stop words, NOT stemming -- filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType /PRE BR **BRThis message may contain confidential or proprietary information intended only for the use of theBRaddressee(s) named above or may contain information that is legally privileged. If you areBRnot the intended addressee, or the person responsible for delivering it to the intended addressee,BRyou are hereby notified that reading, disseminating, distributing or copying this message is strictlyBRprohibited. If you have received this message by mistake, please immediately notify us byBRreplying to the message and delete the original message and any copies immediately thereafter.BR BR Thank you.~BR **BR FAFLDBR PRE
RE: NGram for misspelt words
Thanks Sahi. I have replaced my EdgeNGramFilterFactory to NGramFilterFactory as I need substrings not just in front or back but anywhere. You are right I put the same NGramFilterFactory in both Query and Index however now it does not return any results not even the basic one. -Original Message- From: Dikchant Sahi [mailto:contacts...@gmail.com] Sent: Wednesday, July 18, 2012 7:54 PM To: solr-user@lucene.apache.org Subject: Re: NGram for misspelt words You are creating grams only while indexing and not querying hence 'ludlwo' would not match. Your analyzer will create the following grams while indexing for 'ludlow': lu lud ludl ludlo ludlow and hence would not match to 'ludlwo'. Either you need to create gram while querying also or use Edit Distance. On Wed, Jul 18, 2012 at 7:43 PM, Husain, Yavar yhus...@firstam.com wrote: I have configured NGram Indexing for some fields. Say I search for the city Ludlow, I get the results (normal search) If I search for Ludlo (with w ommitted) I get the results If I search for Ludl (with ow ommitted) I still get the results I know that they are all partial strings of the main string hence NGram works perfect. But when I type in Ludlwo (misspelt, characters o and w interchanged) I dont get any results, It should ideally match Ludl and provide the results. I am not looking for Edit distance based Spell Correctors. How can I make above NGram based search work? Here is my schema.xml (NGramFieldType): fieldType name=nGram class=solr.TextField positionIncrementGap=100 stored=false multiValued=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ !-- potentially word delimiter, synonym filter, stop words, NOT stemming -- filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15 side=front / /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ !-- potentially word delimiter, synonym filter, stop words, NOT stemming -- filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType /PRE BR ** BRThis message may contain confidential or proprietary information intended only for the use of theBRaddressee(s) named above or may contain information that is legally privileged. If you areBRnot the intended addressee, or the person responsible for delivering it to the intended addressee,BRyou are hereby notified that reading, disseminating, distributing or copying this message is strictlyBRprohibited. If you have received this message by mistake, please immediately notify us byBRreplying to the message and delete the original message and any copies immediately thereafter.BR BR Thank you.~BR ** BR FAFLDBR PRE
Re: NGram for misspelt words
Have you tried the analysis window to debug. I believe you are doing something wrong in the fieldType. On Wed, Jul 18, 2012 at 8:07 PM, Husain, Yavar yhus...@firstam.com wrote: Thanks Sahi. I have replaced my EdgeNGramFilterFactory to NGramFilterFactory as I need substrings not just in front or back but anywhere. You are right I put the same NGramFilterFactory in both Query and Index however now it does not return any results not even the basic one. -Original Message- From: Dikchant Sahi [mailto:contacts...@gmail.com] Sent: Wednesday, July 18, 2012 7:54 PM To: solr-user@lucene.apache.org Subject: Re: NGram for misspelt words You are creating grams only while indexing and not querying hence 'ludlwo' would not match. Your analyzer will create the following grams while indexing for 'ludlow': lu lud ludl ludlo ludlow and hence would not match to 'ludlwo'. Either you need to create gram while querying also or use Edit Distance. On Wed, Jul 18, 2012 at 7:43 PM, Husain, Yavar yhus...@firstam.com wrote: I have configured NGram Indexing for some fields. Say I search for the city Ludlow, I get the results (normal search) If I search for Ludlo (with w ommitted) I get the results If I search for Ludl (with ow ommitted) I still get the results I know that they are all partial strings of the main string hence NGram works perfect. But when I type in Ludlwo (misspelt, characters o and w interchanged) I dont get any results, It should ideally match Ludl and provide the results. I am not looking for Edit distance based Spell Correctors. How can I make above NGram based search work? Here is my schema.xml (NGramFieldType): fieldType name=nGram class=solr.TextField positionIncrementGap=100 stored=false multiValued=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ !-- potentially word delimiter, synonym filter, stop words, NOT stemming -- filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15 side=front / /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ !-- potentially word delimiter, synonym filter, stop words, NOT stemming -- filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType /PRE BR ** BRThis message may contain confidential or proprietary information intended only for the use of theBRaddressee(s) named above or may contain information that is legally privileged. If you areBRnot the intended addressee, or the person responsible for delivering it to the intended addressee,BRyou are hereby notified that reading, disseminating, distributing or copying this message is strictlyBRprohibited. If you have received this message by mistake, please immediately notify us byBRreplying to the message and delete the original message and any copies immediately thereafter.BR BR Thank you.~BR ** BR FAFLDBR PRE
Re: edismax not working in a core
the ~2 is the mm parameter I'm pretty sure. So I'd guess your configuration has a mm parameter set on the core that isn't doing what you want.. Best Erick On Tue, Jul 17, 2012 at 3:05 PM, Richard Frovarp rfrov...@apache.org wrote: On 07/14/2012 05:32 PM, Erick Erickson wrote: Really hard to say. Try executing your query on the cores with debugQuery=on and compare the parsed results (for this you can probably just ignore the explain bits of the output, concentrate on the parsed query). Okay, for the example core from the project, the query was: test OR samsung parsedquery: +(DisjunctionMaxQuery((id:test^10.0 | text:test^0.5 | cat:test^1.4 | manu:test^1.1 | name:test^1.2 | features:test | sku:test^1.5)) DisjunctionMaxQuery((id:samsung^10.0 | text:samsung^0.5 | cat:samsung^1.4 | manu:samsung^1.1 | name:samsung^1.2 | features:samsung | sku:samsung^1.5))) For my core the query was: frovarp OR fee parsedquery: +((DisjunctionMaxQuery((content:fee | title:fee^5.0 | mainContent:fee^2.0)) DisjunctionMaxQuery((content:frovarp | title:frovarp^5.0 | mainContent:frovarp^2.0)))~2) What is that ~2? That's the difference. The third core that works properly also doesn't have the ~2.
Re: Using Solr 3.4 running on tomcat7 - very slow search
bq: This index is only used for searching and being replicated every 7 sec from the master. This is a red-flag. 7 second replication times are likely forcing your app to spend all its time opening new searchers. Your cached filter queries are likely rarely being re-used because they're being thrown away every 7 seconds. This assumes you're changing your master index frequently. If you need near real time, consider Solr trunk and SolrCloud, but trying to simulate NRT with very short replication intervals is usually a bad idea. A quick test would be to disable replication for a bit (or lengthen it to, say, 10 minutes) Best Erick On Tue, Jul 17, 2012 at 10:47 PM, Fuad Efendi f...@efendi.ca wrote: FWIW, when asked at what point one would want to split JVMs and shard, on the same machine, Grant Ingersoll mentioned 16GB, and precisely for GC cost reasons. You're way above that. - his index is 75G, and Grant mentioned RAM heap size; we can use terabytes of index with 16Gb memory.
Re: Wildcard query vs facet.prefix for autocomplete?
But I did run across an idea a while ago... Either with a custom update processor or on the client side, you permute the title so you index something like: Shadows of the Damned of the DamnedShadows the DamnedShadows of DamnedShadows of the Index these with KeywordTokenizer and LowercaseFilter. Now, your responses from TermComponent (prefix) contain the entire string and you can display them correctly by rearranging the string at the client side based on the (or whatever delimiter). Still an issue with proper capitalization though since TermsComponent only looks at the actual indexed data and it'll be lower-cased. You could use String, but then you're counting on the user to capitalize properly, always a dicey call. And TermsComponent is very fast FWIW Erick On Wed, Jul 18, 2012 at 9:21 AM, santamaria2 aravinda@contify.com wrote: Well silly me... you're right. On Wed, Jul 18, 2012 at 6:44 PM, Erick Erickson [via Lucene] ml-node+s472066n399570...@n3.nabble.com wrote: Well, option 2 won't do you any good, so speed doesn't really matter. Your response would have a facet count for dam, all by itself, something like int name=damned2/int int name=dame1/int etc. which does not contain anything that lets you reconstruct the title for autosuggest. Best Erick On Tue, Jul 17, 2012 at 3:18 AM, santamaria2 [hidden email]http://user/SendEmail.jtp?type=nodenode=3995706i=0 wrote: I'll consider using the other methods, but I'd like to know which would be faster among the two approaches mentioned in my opening post. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995458.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995706.html To unsubscribe from Wildcard query vs facet.prefix for autocomplete?, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3995199code=YXJhdmluZGEucmFvQGNvbnRpZnkuY29tfDM5OTUxOTl8MTgyMTM4MDg2OQ== . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995707.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Start solr master and solr slave with enable replication = false
See: http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node I'll admit that I haven't tried this personally, but I think it'll work. Although I'm pretty sure that if you just disable the master, disabling the polling on the slave isn't necessary. Best Erick On Wed, Jul 18, 2012 at 6:24 AM, Jamel ESSOUSSI jamel.essou...@gmail.com wrote: Hi, It's possible to start the solr master and slave with the following configuration - replication on master disabled when we start solr -- the replication feature must be available - polling on slave disabled -- the replication feature must be available -- Best Regards -- Jamel -- View this message in context: http://lucene.472066.n3.nabble.com/Start-solr-master-and-solr-slave-with-enable-replication-false-tp3995685.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using Solr 3.4 running on tomcat7 - very slow search
Hi Eric, I totally agree. That's what I also figured ultimately. One thing I am not clear. The replication is supposed to be incremental ? But looks like it is trying to replicate the whole index. May be I am changing the index so frequently, it is triggering auto merge and a full replication ? I am thinking in right direction? I see that when I start the solr search instance before I start feeding the solr Index, my searches are fine BUT it is using the old searcher so I am not seeing the updates in the result. So now I am trying to change my architecture. I am going to have a core dedicated to receive daily updates, which is going to be 5 million docs and size is going to be little less than 5 G, which is small and replication will be faster? I will search both the cores i.e. old data and the daily updates and do a field collapsing on my unique id so that I do not return duplicate results .I haven't tried grouping results ; so not sure about the performance. Any suggestion ? Eventually I have to use Solr trunk like you suggested. Thank you for your help, On Wed, Jul 18, 2012 at 10:28 AM, Erick Erickson [via Lucene] ml-node+s472066n3995754...@n3.nabble.com wrote: bq: This index is only used for searching and being replicated every 7 sec from the master. This is a red-flag. 7 second replication times are likely forcing your app to spend all its time opening new searchers. Your cached filter queries are likely rarely being re-used because they're being thrown away every 7 seconds. This assumes you're changing your master index frequently. If you need near real time, consider Solr trunk and SolrCloud, but trying to simulate NRT with very short replication intervals is usually a bad idea. A quick test would be to disable replication for a bit (or lengthen it to, say, 10 minutes) Best Erick On Tue, Jul 17, 2012 at 10:47 PM, Fuad Efendi [hidden email]http://user/SendEmail.jtp?type=nodenode=3995754i=0 wrote: FWIW, when asked at what point one would want to split JVMs and shard, on the same machine, Grant Ingersoll mentioned 16GB, and precisely for GC cost reasons. You're way above that. - his index is 75G, and Grant mentioned RAM heap size; we can use terabytes of index with 16Gb memory. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995754.html To unsubscribe from Using Solr 3.4 running on tomcat7 - very slow search, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3995436code=bW91bmFuZGlAZ21haWwuY29tfDM5OTU0MzZ8Mjg1MTA5MTUw . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995774.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr faceting -- sort order
I have a keyword field type that I made: fieldType name=keyword class=solr.TextField analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ charFilter class=solr.PatternReplaceCharFilterFactory pattern=_ replacement= maxBlockChars=5000/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ charFilter class=solr.PatternReplaceCharFilterFactory pattern=_ replacement= maxBlockChars=5000/ /analyzer /fieldType When I do a query, the results that come through retain their original case for this field, like: doc 1 keyword: Blah Blah Blah doc 2 keyword: Yadda Yadda Yadda But when I pull back facets, i get: blah blah blah (1) yadda yadda yadda (1) I was attempting to fix a sorting problem -- keyword would show up after keyword Zulu due to the index sorting, so I thought that I could lowercase it all to have it be in the same order. But now it is all in lower case, and I'd like it to retain the original style. Is there a different sort that I should use, or is there a change that I can make to my keyword type that would let the facet count list show up alphabetically, but ignoring case. Thanks! -- Chris
Solr grouping / facet query
Could anyone suggest the options available to handle the following situation: 1. Say we have 1,000 authors 2. 65% of these authors have 10-100 titles they authored; the others have not authored any titles but provide only their biography and writing capability. 3. We want to search for authors, group the results by author, and show the 4 most relevant titles authored for each (if any) next to the author name. Since not all authors have titles authored, I can't group titles by author. Also, adding their bio to each title places a lot of duplicate data in the index. So the search results would look like this; Author A title0, title6, title8, title3 Author G no titles found Author E title4, title9, title2 Any suggestions would be appreciated! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-grouping-facet-query-tp3995787.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: java.lang.AssertionError: System properties invariant violated.
: I am porting 3x unittests to the solr/lucene trunk. My unittests are : OK and pass, but in the end fail because the new rule checks for : modifier properties. I know what the problem is, I am creating new : system properties in the @beforeClass, but I think I need to do it : there, because the project loads C library before initializing tests. The purpose ot the assertion is to verify that no code being tested is modifying system properties -- if you are setting hte properties yourself in some @BeforeClass methods, just use System.clearProperty to unset them in corrisponding @AfterClass methods -Hoss
How To apply transformation in DIH for multivalued numeric field?
I have a multivalued integer field and a multivalued string field defined in my schema as field name=community_tag_ids type=integer indexed=true stored=true multiValued=true omitNorms=true / field name=community_tags type=text indexed=true termVectors=true stored=true multiValued=true omitNorms=true / The DIH entity and field defn for the same goes as entity name=document dataSource=app onError=skip transformer=RegexTransformer query=... entity name=community_tags transformer=RegexTransformer query=SELECT group_concat(a.id SEPARATOR ',') AS community_tag_ids, group_concat(a.title SEPARATOR ',') AS community_tags FROM tags a JOIN tag_dets b ON a.id = b.tag_id WHERE b.doc_id = ${document.id} field column=community_tag_ids name=community_tag_ids/ field column=community_tags splitBy=, / /entity /entity The value for field community_tags comes correctly as an array of strings. However the value of field community_tag_ids is not proper arr name=community_tag_ids int[B@390c0a18/int /arr I tried chaining NumberFormatTransformer with formatStyle=number but that throws DataImportHandlerException: Failed to apply NumberFormat on column. Could it be due to NULL values from database or because the value is not proper? How do we handle NULL in this case? *Pranav Prakash* temet nosce
Re: edismax not working in a core
On 07/18/2012 11:20 AM, Erick Erickson wrote: the ~2 is the mm parameter I'm pretty sure. So I'd guess your configuration has a mm parameter set on the core that isn't doing what you want.. I'm not setting the mm parameter or the q.op parameter. All three cores have a defaultOperator of OR. So I don't know where that would be coming from. However, if I specify a mm of 0, it appears to work just fine. I've added it as a default parameter to the select handler. Thanks for pointing me in the right direction. Richard
Re: DIH XML configs for multi environment
That approach would work for core dependent parameters. In my case, the params are environment dependent. I think a simpler approach would be to pass the url param as JVM options, and these XMLs get it from there. I haven't tried it yet. *Pranav Prakash* temet nosce On Tue, Jul 17, 2012 at 5:09 PM, Markus Klose m...@shi-gmbh.com wrote: Hi There is one more approach using the property mechanism. You could specify the datasource like this: dataSource name=database driver=${sqlDriver} url=${sqlURL}/ And you can specifiy the properties in the solr.xml in your core configuration like this: core instanceDir=core1 name=core1 property name=sqlURL value=jdbc:hsqldb:/temp/example/ex/ /core Viele Grüße aus Augsburg Markus Klose SHI Elektronische Medien GmbH Adresse: Curt-Frenzel-Str. 12, 86167 Augsburg Tel.: 0821 7482633 26 Tel.: 0821 7482633 0 (Zentrale) Mobil:0176 56516869 Fax: 0821 7482633 29 E-Mail: markus.kl...@shi-gmbh.com Internet: http://www.shi-gmbh.com Registergericht Augsburg HRB 17382 Geschäftsführer: Peter Spiske USt.-ID: DE 182167335 -Ursprüngliche Nachricht- Von: Rahul Warawdekar [mailto:rahul.warawde...@gmail.com] Gesendet: Mittwoch, 11. Juli 2012 11:21 An: solr-user@lucene.apache.org Betreff: Re: DIH XML configs for multi environment http://wiki.eclipse.org/Jetty/Howto/Configure_JNDI_Datasource http://docs.codehaus.org/display/JETTY/DataSource+Examples On Wed, Jul 11, 2012 at 2:30 PM, Pranav Prakash pra...@gmail.com wrote: That's cool. Is there something similar for Jetty as well? We use Jetty! *Pranav Prakash* temet nosce On Wed, Jul 11, 2012 at 1:49 PM, Rahul Warawdekar rahul.warawde...@gmail.com wrote: Hi Pranav, If you are using Tomcat to host Solr, you can define your data source in context.xml file under tomcat configuration. You have to refer to this datasource with the same name in all the 3 environments from DIH data-config.xml. This context.xml file will vary across 3 environments having different credentials for dev, stag and prod. eg DIH data-config.xml will refer to the datasource as listed below dataSource jndiName=java:comp/env/*YOUR_DATASOURCE_NAME* type=JdbcDataSource readOnly=true / context.xml file which is located under /TOMCAT_HOME/conf folder will have the resource entry as follows Resource name=*YOUR_DATASOURCE_NAME* auth=Container type= username=X password=X driverClassName= url= maxActive=8 / On Wed, Jul 11, 2012 at 1:31 PM, Pranav Prakash pra...@gmail.com wrote: The DIH XML config file has to be specified dataSource. In my case, and possibly with many others, the logon credentials as well as mysql server paths would differ based on environments (dev, stag, prod). I don't want to end up coming with three different DIH config files, three different handlers and so on. What is a good way to deal with this? *Pranav Prakash* temet nosce -- Thanks and Regards Rahul A. Warawdekar -- Thanks and Regards Rahul A. Warawdekar
Re: Searcher Refrence Counts
I'd guess the getSearcher call you are making is incrementing the ref count and you are not decrementing it? On Jul 18, 2012, at 12:17 PM, Karthick Duraisamy Soundararaj wrote: Hi All, The SolrCore seems to have a reference counted searcher with it. I had to write a customSearchHandler that extends SearchHandler, and I was playing around with it. I did the following change to search handler SearchHanlder.java -- handleRequestBody(SolrQueryRequest req,SolrQueryResponse req) { System.out.println(Reference count Before Search: +req.getCore().getSearcher.getRefcount) //In eclipse .. .. ... System.out.println(Reference count After Search : +req.getCore().getSearcher.getRefcount) // In eclipse } Now, I am surprised to see Reference count not getting decremented at all. Following is the sample output I get Reference count before search:1 Reference count after search:2 .. Reference count before search:2 Reference count after search:3 . Reference count before search:4 Reference count after search:5 ... Reference count before search:3000 Reference count after search:30001 The reference count seems to be increasing. Wouldnt this cause a memory leak? - Mark Miller lucidimagination.com
RE: How To apply transformation in DIH for multivalued numeric field?
Don't you want to specify splitBy for the integer field too? Actually though, you shouldn't need to use GROUP_CONCAT and RegexTransformer at all. DIH is designed to handle 1many relations between parent and child entities by populating all the child fields as multi-valued automatically. I guess your approach leads to a lot fewer rows getting sent from your db to Solr though. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Pranav Prakash [mailto:pra...@gmail.com] Sent: Wednesday, July 18, 2012 2:38 PM To: solr-user@lucene.apache.org Subject: How To apply transformation in DIH for multivalued numeric field? I have a multivalued integer field and a multivalued string field defined in my schema as field name=community_tag_ids type=integer indexed=true stored=true multiValued=true omitNorms=true / field name=community_tags type=text indexed=true termVectors=true stored=true multiValued=true omitNorms=true / The DIH entity and field defn for the same goes as entity name=document dataSource=app onError=skip transformer=RegexTransformer query=... entity name=community_tags transformer=RegexTransformer query=SELECT group_concat(a.id SEPARATOR ',') AS community_tag_ids, group_concat(a.title SEPARATOR ',') AS community_tags FROM tags a JOIN tag_dets b ON a.id = b.tag_id WHERE b.doc_id = ${document.id} field column=community_tag_ids name=community_tag_ids/ field column=community_tags splitBy=, / /entity /entity The value for field community_tags comes correctly as an array of strings. However the value of field community_tag_ids is not proper arr name=community_tag_ids int[B@390c0a18/int /arr I tried chaining NumberFormatTransformer with formatStyle=number but that throws DataImportHandlerException: Failed to apply NumberFormat on column. Could it be due to NULL values from database or because the value is not proper? How do we handle NULL in this case? *Pranav Prakash* temet nosce
Re: java.lang.AssertionError: System properties invariant violated.
Thank you! I haven't really understood the LuceneTestCase.classRules before this. roman On Wed, Jul 18, 2012 at 3:11 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : I am porting 3x unittests to the solr/lucene trunk. My unittests are : OK and pass, but in the end fail because the new rule checks for : modifier properties. I know what the problem is, I am creating new : system properties in the @beforeClass, but I think I need to do it : there, because the project loads C library before initializing tests. The purpose ot the assertion is to verify that no code being tested is modifying system properties -- if you are setting hte properties yourself in some @BeforeClass methods, just use System.clearProperty to unset them in corrisponding @AfterClass methods -Hoss
Re: edismax not working in a core
On 07/18/2012 02:39 PM, Richard Frovarp wrote: On 07/18/2012 11:20 AM, Erick Erickson wrote: the ~2 is the mm parameter I'm pretty sure. So I'd guess your configuration has a mm parameter set on the core that isn't doing what you want.. I'm not setting the mm parameter or the q.op parameter. All three cores have a defaultOperator of OR. So I don't know where that would be coming from. However, if I specify a mm of 0, it appears to work just fine. I've added it as a default parameter to the select handler. Thanks for pointing me in the right direction. Richard Okay, that's wrong. Term boosting isn't working either, and what I did above just turns everything into an OR query. I did figure out the problem, however. In the core that wasn't working, one of the query field names wasn't correct. No errors were ever thrown, it just made the query behave in a very odd way. I finally figured it out after debugging each field independent of each other.
Re: Solr 4 Alpha SolrJ Indexing Issue
I have realized this is not specific to SolrJ but to my instance of Solr. Using curl to delete by query is not working either. Running curl http://localhost:8983/solr/coupon/update -H Content-Type: text/xml --data-binary 'deletequery*:*/query/delete' Yields this in the logs: INFO: [coupon] webapp=/solr path=/update params={stream.body=deletequery*:*/query/delete} {deleteByQuery=*:*} 0 0 But the corpus of documents in the core do not change. My solrconfig is pretty barebones at this point, but I attached it in case anyone sees something strange. Anyone have any idea why documents aren't getting deleted? Thanks in advance, Briggs Thompson On Wed, Jul 18, 2012 at 12:54 PM, Briggs Thompson w.briggs.thomp...@gmail.com wrote: Hello All, I am using 4.0 Alpha and running into an issue with indexing using HttpSolrServer (SolrJ). Relevant java code: HttpSolrServer solrServer = new HttpSolrServer(MY_SERVER); solrServer.setRequestWriter(new BinaryRequestWriter()); Relevant Solrconfig.xml content: requestHandler name=/update class=solr.UpdateRequestHandler / requestHandler name=/update/javabin class=solr.BinaryUpdateRequestHandler / Indexing documents works perfectly fine (using addBeans()), however, when trying to do deletes I am seeing issues. I tried to do a solrServer.deleteByQuery(*:*) followed by a commit and optimize, and nothing is deleted. The response from delete request is a success, and even in the solr logs I see the following: INFO: [coupon] webapp=/solr path=/update/javabin params={wt=javabinversion=2} {deleteByQuery=*:*} 0 1 Jul 18, 2012 11:15:34 AM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit{flags=0,version=0,optimize=true,openSearcher=true,waitSearcher=false,expungeDeletes=false,softCommit=false} I tried removing the binaryRequestWriter and have the request send out in default format, and I get the following error. SEVERE: org.apache.solr.common.SolrException: Unsupported ContentType: application/octet-stream Not in: [application/xml, text/csv, text/json, application/csv, application/javabin, text/xml, application/json] at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:86) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1001) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) I thought that an optimize does the same thing as expungeDeletes, but in the log I see expungeDeletes=false. Is there a way to force that using SolrJ? Thanks in advance, Briggs ?xml version=1.0 encoding=UTF-8 ? !-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -- !-- This is a stripped down
Custom JUnit tests based on SolrTestCaseJ4 fails intermittently.
Hi, I am trying out the Solr Alpha release against some custom and Junit codes I have written. I am seeing my custom JUnit tests failing once in a while. The tests are based on Solr Junit test code where they are extending SolrTestCaseJ4. My guess is the Randomized Testing coming across some issue here. However not sure what the source of the problem is. I noticed the value of 'codec' is null for failed cases, but I am setting the luceneMatchVersion value in solrconfig.xml as bellow: luceneMatchVersion${tests.luceneMatchVersion:LUCENE_CURRENT}/luceneMatchVersion I am including the test outputs for both scenarios here. Any help or pointer appreciated. Thanks, Koorosh Here is the output of Junit test which failes when running it from Eclipse: NOTE: test params are: codec=null, sim=null, locale=null, timezone=(null) NOTE: Windows 7 6.1 amd64/Sun Microsystems Inc. 1.6.0_21 (64-bit)/cpus=4,threads=1,free=59414480,total=63242240 NOTE: All tests run in this JVM: [TestDocsHandler] Jul 18, 2012 3:55:25 PM com.carrotsearch.randomizedtesting.RandomizedRunner runSuite SEVERE: Panic: RunListener hook shouldn't throw exceptions. java.lang.NullPointerException at org.apache.lucene.util.RunListenerPrintReproduceInfo.reportAdditionalFailureInfo(RunListenerPrintReproduceInfo.java:159) at org.apache.lucene.util.RunListenerPrintReproduceInfo.testRunFinished(RunListenerPrintReproduceInfo.java:104) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:634) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551) Here is the output for the same test where it is successful: 24 T11 oas.SolrTestCaseJ4.initCore initCore Creating dataDir: C:\Users\xuser\AppData\Local\Temp\solrtest-TestDocsHandler-1342651924084 43 T11 oasc.SolrResourceLoader.locateSolrHome JNDI not configured for solr (NoInitialContextEx) 43 T11 oasc.SolrResourceLoader.locateSolrHome using system property solr.solr.home: solr-gold/solr-extraction 45 T11 oasc.SolrResourceLoader.init new SolrResourceLoader for deduced Solr Home: 'solr-gold/solr-extraction\' 284 T11 oasc.SolrConfig.init Using Lucene MatchVersion: LUCENE_40 429 T11 oasc.SolrConfig.init Loaded SolrConfig: solrconfig-dow.xml 434 T11 oass.IndexSchema.readSchema Reading Solr Schema 443 T11 oass.IndexSchema.readSchema Schema name=SolvNet Common core 522 T11 oass.IndexSchema.readSchema default search field in schema is indexed_content 524 T11 oass.IndexSchema.readSchema query parser default operator is AND 525 T11 oass.IndexSchema.readSchema unique key field: id 616 T11 oasc.SolrResourceLoader.locateSolrHome JNDI not configured for solr (NoInitialContextEx) 617 T11 oasc.SolrResourceLoader.locateSolrHome using system property solr.solr.home: solr-gold/solr-extraction 617 T11 oasc.SolrResourceLoader.init new SolrResourceLoader for directory: 'solr-gold/solr-extraction\' 618 T11 oasc.CoreContainer.init New CoreContainer 994682772 642 T11 oasc.SolrCore.init [collection1] Opening new SolrCore at solr-gold/solr-extraction\, dataDir=C:\Users\koo\AppData\Local\Temp\solrtest-TestDocsHandler-1342651924084\ 642 T11 oasc.SolrCore.init JMX monitoring not detected for core: collection1 648 T11 oasc.SolrCore.getNewIndexDir WARNING New index directory detected: old=null new=C:\Users\koo\AppData\Local\Temp\solrtest-TestDocsHandler-1342651924084\index/ 648 T11 oasc.SolrCore.initIndex WARNING [collection1] Solr index directory 'C:\Users\koo\AppData\Local\Temp\solrtest-TestDocsHandler-1342651924084\index' doesn't exist. Creating new index... 742 T11 oasc.SolrDeletionPolicy.onCommit SolrDeletionPolicy.onCommit: commits:num=1 commit{dir=MockDirWrapper(org.apache.lucene.store.RAMDirectory@44023756 lockFactory=org.apache.lucene.store.NativeFSLockFactory@21ed5459),segFN=segments_1,generation=1,filenames=[segments_1] 743 T11 oasc.SolrDeletionPolicy.updateCommits newest commit = 1 871 T11 oasc.RequestHandlers.initHandlersFromConfig created /update/javabin: solr.BinaryUpdateRequestHandler 875 T11 oasc.RequestHandlers.initHandlersFromConfig created standard: solr.StandardRequestHandler 878 T11 oasc.RequestHandlers.initHandlersFromConfig created /update: solr.XmlUpdateRequestHandler 878 T11 oasc.RequestHandlers.initHandlersFromConfig created /admin/: org.apache.solr.handler.admin.AdminHandlers 886 T11 oasc.RequestHandlers.initHandlersFromConfig created /update/extract: com.synopsys.ies.solr.backend.handler.extraction.SolvNetExtractingRequestHandler 891 T11 oasc.RequestHandlers.initHandlersFromConfig WARNING Multiple requestHandler registered to the same name: standard ignoring: org.apache.solr.handler.StandardRequestHandler 892 T11 oasc.RequestHandlers.initHandlersFromConfig created standard: solr.SearchHandler 892 T11 oasc.RequestHandlers.initHandlersFromConfig created employee:
Re: Using Solr 3.4 running on tomcat7 - very slow search
Replication will indeed be incremental. But if you commit too often (and committing too often a common mistake) then the merging will eventually merge everything into new segments and the whole thing will be replicated. Additionally, optimizing (or forceMerge in 4.x) will make a single segment and force the entire index to replicate. You should emphatically _not_ have to have two cores. Solr is built to handle replication etc. I suspect your committing too often or some other mis-configuration and you're creating a problem for yourself. Here's what I'd do: 1 increase the polling interval to, say, 10 minutes (or however long you can live with stale data) on the slave. 2 decrease the commits you're doing. This could involve the autocommit options you might have set in solrconfig.xml. It could be your client (don't know how you're indexing, solrJ?) and the commitWithin parameter. Could be you're optimizing (if you are, stop it!). Note that ramBufferSizeMB has no influence on how often things are _committed_. When this limit is exceeded, the accumulated indexing data is written to the currently-open segment. Multiple flushes can go to the _same_ segment. The write-once nature of segments means that after a segment is closed (through a commit), it is not changed. But a segment that is not closed may be written to multiple times until it's closed. HTH Erick On Wed, Jul 18, 2012 at 1:25 PM, Mou mouna...@gmail.com wrote: Hi Eric, I totally agree. That's what I also figured ultimately. One thing I am not clear. The replication is supposed to be incremental ? But looks like it is trying to replicate the whole index. May be I am changing the index so frequently, it is triggering auto merge and a full replication ? I am thinking in right direction? I see that when I start the solr search instance before I start feeding the solr Index, my searches are fine BUT it is using the old searcher so I am not seeing the updates in the result. So now I am trying to change my architecture. I am going to have a core dedicated to receive daily updates, which is going to be 5 million docs and size is going to be little less than 5 G, which is small and replication will be faster? I will search both the cores i.e. old data and the daily updates and do a field collapsing on my unique id so that I do not return duplicate results .I haven't tried grouping results ; so not sure about the performance. Any suggestion ? Eventually I have to use Solr trunk like you suggested. Thank you for your help, On Wed, Jul 18, 2012 at 10:28 AM, Erick Erickson [via Lucene] ml-node+s472066n3995754...@n3.nabble.com wrote: bq: This index is only used for searching and being replicated every 7 sec from the master. This is a red-flag. 7 second replication times are likely forcing your app to spend all its time opening new searchers. Your cached filter queries are likely rarely being re-used because they're being thrown away every 7 seconds. This assumes you're changing your master index frequently. If you need near real time, consider Solr trunk and SolrCloud, but trying to simulate NRT with very short replication intervals is usually a bad idea. A quick test would be to disable replication for a bit (or lengthen it to, say, 10 minutes) Best Erick On Tue, Jul 17, 2012 at 10:47 PM, Fuad Efendi [hidden email]http://user/SendEmail.jtp?type=nodenode=3995754i=0 wrote: FWIW, when asked at what point one would want to split JVMs and shard, on the same machine, Grant Ingersoll mentioned 16GB, and precisely for GC cost reasons. You're way above that. - his index is 75G, and Grant mentioned RAM heap size; we can use terabytes of index with 16Gb memory. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995754.html To unsubscribe from Using Solr 3.4 running on tomcat7 - very slow search, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3995436code=bW91bmFuZGlAZ21haWwuY29tfDM5OTU0MzZ8Mjg1MTA5MTUw . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995774.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4 Alpha SolrJ Indexing Issue
Hi Briggs, I'm not sure about Solr 4.0, but do you need to commit? curl http://localhost:8983/solr/coupon/update?commit=true -H Content-Type: text/xml --data-binary 'deletequery*:*/query/delete' Brendan www.kuripai.com On Jul 18, 2012, at 7:11 PM, Briggs Thompson wrote: I have realized this is not specific to SolrJ but to my instance of Solr. Using curl to delete by query is not working either. Running curl http://localhost:8983/solr/coupon/update -H Content-Type: text/xml --data-binary 'deletequery*:*/query/delete' Yields this in the logs: INFO: [coupon] webapp=/solr path=/update params={stream.body=deletequery*:*/query/delete} {deleteByQuery=*:*} 0 0 But the corpus of documents in the core do not change. My solrconfig is pretty barebones at this point, but I attached it in case anyone sees something strange. Anyone have any idea why documents aren't getting deleted? Thanks in advance, Briggs Thompson On Wed, Jul 18, 2012 at 12:54 PM, Briggs Thompson w.briggs.thomp...@gmail.com wrote: Hello All, I am using 4.0 Alpha and running into an issue with indexing using HttpSolrServer (SolrJ). Relevant java code: HttpSolrServer solrServer = new HttpSolrServer(MY_SERVER); solrServer.setRequestWriter(new BinaryRequestWriter()); Relevant Solrconfig.xml content: requestHandler name=/update class=solr.UpdateRequestHandler / requestHandler name=/update/javabin class=solr.BinaryUpdateRequestHandler / Indexing documents works perfectly fine (using addBeans()), however, when trying to do deletes I am seeing issues. I tried to do a solrServer.deleteByQuery(*:*) followed by a commit and optimize, and nothing is deleted. The response from delete request is a success, and even in the solr logs I see the following: INFO: [coupon] webapp=/solr path=/update/javabin params={wt=javabinversion=2} {deleteByQuery=*:*} 0 1 Jul 18, 2012 11:15:34 AM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit{flags=0,version=0,optimize=true,openSearcher=true,waitSearcher=false,expungeDeletes=false,softCommit=false} I tried removing the binaryRequestWriter and have the request send out in default format, and I get the following error. SEVERE: org.apache.solr.common.SolrException: Unsupported ContentType: application/octet-stream Not in: [application/xml, text/csv, text/json, application/csv, application/javabin, text/xml, application/json] at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:86) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1001) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) I thought that an optimize does the same thing as expungeDeletes, but in the log I see expungeDeletes=false. Is there a way to force that using SolrJ? Thanks in advance, Briggs solrconfig.xml
SOLR 4 ALPHA /terms /browse
When I setup a 2 shard cluster using the example and run it through its paces, I find two features that do not work as I expect. Any suggestions on adjusting my configuration or expectations would be appreciated. /terms does not return any terms when issued as follows: http://hostname:8983/solr/terms?terms.fl=nameterms=trueterms.limit=-1isSh ard=trueterms.sort=indexterms.prefix=s but does return reasonable results when distrib is turned off like so http://hostname:8983/solr/terms?terms.fl=nameterms=truedistrib=falseterms .limit=-1isShard=trueterms.sort=indexterms.prefix=s /browse returns this stack trace to the browser HTTP ERROR 500 Problem accessing /solr/browse. Reason: {msg=ZkSolrResourceLoader does not support getConfigDir() - likely, what you are trying to do is not supported in ZooKeeper mode,trace=org.apache.solr.common.cloud.ZooKeeperException: ZkSolrResourceLoader does not support getConfigDir() - likely, what you are trying to do is not supported in ZooKeeper mode at org.apache.solr.cloud.ZkSolrResourceLoader.getConfigDir(ZkSolrResourceLoader .java:99) at org.apache.solr.response.VelocityResponseWriter.getEngine(VelocityResponseWr iter.java:117) at org.apache.solr.response.VelocityResponseWriter.write(VelocityResponseWriter .java:40) at org.apache.solr.core.SolrCore$LazyQueryResponseWriterWrapper.write(SolrCore. java:1990) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter. java:398) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 276) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler .java:1337) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119 ) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java :233) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java :1065) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java: 192) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java: 999) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117 ) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHand lerCollection.java:250) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection. java:149) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:1 11) at org.eclipse.jetty.server.Server.handle(Server.java:351) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpCo nnection.java:454) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpCo nnection.java:47) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpC onnection.java:890) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplet e(AbstractHttpConnection.java:944) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:634) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnectio n.java:66) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketCon nector.java:254) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java: 599) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:5 34) at java.lang.Thread.run(Thread.java:662) ,code=500} Best regards, Nick Koton
Solr multiple cores activation
I am implementing a search engine with Nutch as web crawler and Solr for searching. Now,since Nutch has no search-user-interface any more, I came to know about Ajax-Solr as search-user-interface. I implemented Ajax-Solr with no hindrance, but during its search operation its only search under reuters data. If I want to crawl the complete web ,other than reuter's data, using nutch and integrate it with solr,then i have to replace solr's schema.xml file with nutch's schema.xml file which will not be according to ajax-solr configuration. By replacing the schema.xml files, ajax-solr *wont* work!!! So, I found a solution to this (correct me if i am wrong),ie, to activate multiple cores which means integrating Solr with nutch in one core(ie indexing) and using Ajax-Solr in other. I tried activating multiple cores,ie integrating solr with nutch in one core and ajax-solr in other, but to *NO luck*. I tried every single thing, every permutation and combination , but failed to set them up. I followed these links 1) http://wiki.apache.org/solr/CoreAdmin 2) http://www.plaidpony.com/blog/post/2011/04/Multicore-SOLR-And-Tomcat-On-Windows-Server-2008-R2.aspx But they also didnt helped either. Can you please tell how to set them up??? Been stuck up with this for over 2 days nows. Kindly help!!! Are there any other search-user-interface?? Thanks Regards Praful Bagai
RE: Could I use Solr to index multiple applications?
Yury and Shashi, Thanks very much for helps! I am studying the options pointed out by you (Solr multiple cores and Elasticsearch). Best regards, Lisheng -Original Message- From: Yury Kats [mailto:yuryk...@yahoo.com] Sent: Tuesday, July 17, 2012 7:19 PM To: solr-user@lucene.apache.org Subject: Re: Could I use Solr to index multiple applications? On 7/17/2012 9:26 PM, Zhang, Lisheng wrote: Thanks very much for quick help! Multicore sounds interesting, I roughly read the doc, so we need to put each core name into Solr config XML, if we add another core and change XML, do we need to restart Solr? You can add/create cores on the fly, without restarting. See http://wiki.apache.org/solr/CoreAdmin#CREATE
Re: Using Solr 3.4 running on tomcat7 - very slow search
Increasing the polling interval does help. But the requirement is to get a document indexed and searchable instantly ( sounds like RTS), 30 sec is acceptable.I need to look at Solr NRT and cloud. I created a new core to accept daily updates and replicate every 10 sec. Two other cores with 234 Million documents are configured to replicate only once a day. I am feeding all three cores but two big cores are not replicating. While searching I am running a group.field on my unique id and taking the most updated one. Right now it looks fine.Every day I am going to delete the last day's records from the daily update. I am planning to use rsync for replication, it will be fusion IO to fusion IO , so hopefully will be very fast. What do you think ? We use windows service ( written in dot net C#) to feed the data using REST call. That is really fast , we can feed more than 15 Million data in a day to two cores easily. I am using solr config autocommit = 5 sec I could not figure out how I was able to achieve those numbers in my test environment, all configuration were same except I had lot less memory in test ! I am trying to find out what I am missing in other configuration. My SLES kernel version is different in production, its a 3.0.* , test was 2.6.* but I do not think that can cause a problem. Thank you again, Mou On Wed, Jul 18, 2012 at 6:26 PM, Erick Erickson [via Lucene] ml-node+s472066n3995861...@n3.nabble.com wrote: Replication will indeed be incremental. But if you commit too often (and committing too often a common mistake) then the merging will eventually merge everything into new segments and the whole thing will be replicated. Additionally, optimizing (or forceMerge in 4.x) will make a single segment and force the entire index to replicate. You should emphatically _not_ have to have two cores. Solr is built to handle replication etc. I suspect your committing too often or some other mis-configuration and you're creating a problem for yourself. Here's what I'd do: 1 increase the polling interval to, say, 10 minutes (or however long you can live with stale data) on the slave. 2 decrease the commits you're doing. This could involve the autocommit options you might have set in solrconfig.xml. It could be your client (don't know how you're indexing, solrJ?) and the commitWithin parameter. Could be you're optimizing (if you are, stop it!). Note that ramBufferSizeMB has no influence on how often things are _committed_. When this limit is exceeded, the accumulated indexing data is written to the currently-open segment. Multiple flushes can go to the _same_ segment. The write-once nature of segments means that after a segment is closed (through a commit), it is not changed. But a segment that is not closed may be written to multiple times until it's closed. HTH Erick On Wed, Jul 18, 2012 at 1:25 PM, Mou [hidden email]http://user/SendEmail.jtp?type=nodenode=3995861i=0 wrote: Hi Eric, I totally agree. That's what I also figured ultimately. One thing I am not clear. The replication is supposed to be incremental ? But looks like it is trying to replicate the whole index. May be I am changing the index so frequently, it is triggering auto merge and a full replication ? I am thinking in right direction? I see that when I start the solr search instance before I start feeding the solr Index, my searches are fine BUT it is using the old searcher so I am not seeing the updates in the result. So now I am trying to change my architecture. I am going to have a core dedicated to receive daily updates, which is going to be 5 million docs and size is going to be little less than 5 G, which is small and replication will be faster? I will search both the cores i.e. old data and the daily updates and do a field collapsing on my unique id so that I do not return duplicate results .I haven't tried grouping results ; so not sure about the performance. Any suggestion ? Eventually I have to use Solr trunk like you suggested. Thank you for your help, On Wed, Jul 18, 2012 at 10:28 AM, Erick Erickson [via Lucene] [hidden email] http://user/SendEmail.jtp?type=nodenode=3995861i=1 wrote: bq: This index is only used for searching and being replicated every 7 sec from the master. This is a red-flag. 7 second replication times are likely forcing your app to spend all its time opening new searchers. Your cached filter queries are likely rarely being re-used because they're being thrown away every 7 seconds. This assumes you're changing your master index frequently. If you need near real time, consider Solr trunk and SolrCloud, but trying to simulate NRT with very short replication intervals is usually a bad idea. A quick test would be to disable replication for a bit (or lengthen it to, say, 10 minutes) Best Erick On Tue, Jul 17, 2012 at 10:47 PM, Fuad Efendi [hidden email]
Re: Solr 4 Alpha SolrJ Indexing Issue
On 7/18/2012 7:11 PM, Briggs Thompson wrote: I have realized this is not specific to SolrJ but to my instance of Solr. Using curl to delete by query is not working either. Can be this: https://issues.apache.org/jira/browse/SOLR-3432
Re: Quick Confirmation on LocalSolrQueryRequest close
Put my question wrong.. Excuse me for spamming.. its been a tiring couple of days and I am almost sleep typing.. Please read the snippet again. This might be a dumb question. But I would like to confirm. Will the following snippet cause a index searcher leak and end up in an out of memory exception when newsearchers are created? class myCustomHandler extends SearchHandler { . void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) { LocalSolrQueryRequest newReq = new LocalSolrQueryRequest(); newReq = req.getCore(); . // newReq.close()Will removing this lead to OOME? } My conviction is yes. But just want to confirm.. On Wed, Jul 18, 2012 at 11:04 PM, Karthick Duraisamy Soundararaj karthick.soundara...@gmail.com wrote: This might be a dumb question. But I would like to confirm. Will the following snippet cause a index searcher leak and end up in an out of memory exception when newsearchers are created? class myCustomHandler extends SearchHandler { . void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) { LocalSolrQueryRequest newReq = new LocalSolrQueryRequest(); newReq = req.getCore(); . newReq.close() } My conviction is yes. But just want to confirm..
Re: Solr 4 Alpha SolrJ Indexing Issue
Yury, Thank you so much! That was it. Man, I spent a good long while trouble shooting this. Probably would have spent quite a bit more time. I appreciate your help!! -Briggs On Wed, Jul 18, 2012 at 9:35 PM, Yury Kats yuryk...@yahoo.com wrote: On 7/18/2012 7:11 PM, Briggs Thompson wrote: I have realized this is not specific to SolrJ but to my instance of Solr. Using curl to delete by query is not working either. Can be this: https://issues.apache.org/jira/browse/SOLR-3432
Frustrating differences in fieldNorm between two different versions of solr indexing the same document
Greetings, I've been digging in to this for two days now and have come up short - hopefully there is some simple answer I am just not seeing: I have a solr 1.4.1 instance and a solr 3.6.0 instance, both configured as identically as possible (given deprecations) and indexing the same document. For most queries the results are very close (scoring within three significant differences, almost identical positions in results). However, for certain documents, the scores are very different (causing these docs to be ranked +/- 25 positions different or more in the results) In looking at debugQuery output, it seems like this is due to fieldNorm values being lower for the 3.6.0 instance than the 1.4.1. (note that for most docs, the fieldNorms are identical) I have taken the field values for the example below and run them through /admin/analysis.jsp on each solr instance. Even for the problematic docs/fields, the results are almost identical. For the example below, the t_tag values for the problematic doc: 1.4.1: 162 values 3.6.0: 164 values note that 1/sqrt(162) = 0.07857 ~= fieldNorm for 1.4.1, however, (1/0.0625)^2 = 256, which is no where near 164 Here is a particular example from 1.4.1: 1.6263733 = (MATCH) fieldWeight(t_tag:soul in 2066419), product of: 3.8729835 = tf(termFreq(t_tag:soul)=15) 5.3750753 = idf(docFreq=27619, maxDocs=2194294) 0.078125 = fieldNorm(field=t_tag, doc=2066419) And the same from 3.6.0: 1.3042576 = (MATCH) fieldWeight(t_tag:soul in 1977957), product of: 3.8729835 = tf(termFreq(t_tag:soul)=15) 5.388126 = idf(docFreq=27740, maxDocs=2232857) 0.0625 = fieldNorm(field=t_tag, doc=1977957) Here is the 1.4.1 config for the t_tag field and text type: fieldtype name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory words=stopwords.txt ignoreCase=true/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ /analyzer /fieldtype dynamicField name=t_* type=text indexed=true stored=true required=false multiValued=true termVectors=true/ And 3.6.0 schema config for the t_tag field and text type: fieldtype name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory words=stopwords.txt ignoreCase=true/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldtype field name=t_tag type=text indexed=true stored=true required=false multiValued=true/ I at first got distracted by this change between versions: LUCENE-2286: Enabled DefaultSimilarity.setDiscountOverlaps by default. This means that terms with a position increment gap of zero do not affect the norms calculation by default. However, this doesn't appear to be causing the issue as, according to analysis.jsp there is no overlap for t_tag... Can you point me to where these fieldNorm differences are coming from and why they'd only be happing for a select few documents for which the content doesn't stand out? Thank you, Aaron
Re: How To apply transformation in DIH for multivalued numeric field?
I had tried with splitBy for numeric field, but that also did not worked for me. However I got rid of group_concat and it was all good to go. Thanks a lot!! I really had a difficult time understanding this behavior. *Pranav Prakash* temet nosce On Thu, Jul 19, 2012 at 1:34 AM, Dyer, James james.d...@ingrambook.comwrote: Don't you want to specify splitBy for the integer field too? Actually though, you shouldn't need to use GROUP_CONCAT and RegexTransformer at all. DIH is designed to handle 1many relations between parent and child entities by populating all the child fields as multi-valued automatically. I guess your approach leads to a lot fewer rows getting sent from your db to Solr though. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Pranav Prakash [mailto:pra...@gmail.com] Sent: Wednesday, July 18, 2012 2:38 PM To: solr-user@lucene.apache.org Subject: How To apply transformation in DIH for multivalued numeric field? I have a multivalued integer field and a multivalued string field defined in my schema as field name=community_tag_ids type=integer indexed=true stored=true multiValued=true omitNorms=true / field name=community_tags type=text indexed=true termVectors=true stored=true multiValued=true omitNorms=true / The DIH entity and field defn for the same goes as entity name=document dataSource=app onError=skip transformer=RegexTransformer query=... entity name=community_tags transformer=RegexTransformer query=SELECT group_concat(a.id SEPARATOR ',') AS community_tag_ids, group_concat(a.title SEPARATOR ',') AS community_tags FROM tags a JOIN tag_dets b ON a.id = b.tag_id WHERE b.doc_id = ${document.id} field column=community_tag_ids name=community_tag_ids/ field column=community_tags splitBy=, / /entity /entity The value for field community_tags comes correctly as an array of strings. However the value of field community_tag_ids is not proper arr name=community_tag_ids int[B@390c0a18/int /arr I tried chaining NumberFormatTransformer with formatStyle=number but that throws DataImportHandlerException: Failed to apply NumberFormat on column. Could it be due to NULL values from database or because the value is not proper? How do we handle NULL in this case? *Pranav Prakash* temet nosce
Can I get DIH skip fields that match empty text nodes
Hello, I have DIH reading an XML file and getting fields with empty values. My definition is: field column=title xpath=/database/document/item[@name='Title']/text/ /text here is actual node name, not text() (e.g. item name='Title'text//item) Right now, I get the field (of type string) with empty value indexed/stored/returned. Plus, all the copy fields get the empties as well. Can I get DIH to skip that field if I don't have any actual text in it? I can see how to do it with custom transformer, but it seems that this would be a common problem and I might just be missing a setting or some XPath secret. I actually tried [node()], [text()] and .../text/text() at the end, but that seems to make the XPathEntityProcessor skip the field all together. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Re: Wildcard query vs facet.prefix for autocomplete?
Very interesting! Thanks for sharing, I'll ponder on it. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995899.html Sent from the Solr - User mailing list archive at Nabble.com.