Re: input XSLT
There is a fundamental problem with using 'pull' approach using DIH. Normally people want a delta imports which are done using a timestamp field. Now it may not always be possible for application servers to sync their timestamps (given protocol restrictions due to security reasons). Due to this Solr application is likely to miss a few records occasionally. Such a problem does not arise if applications themseleves identify their records and post. Should we not have such a feature in Solr, which will allow users to push data onto the index in whichever format they wish to? This will also facilitate plugging in solr seamlessly with all kinds of applications. Regards, CI On Wed, Mar 11, 2009 at 11:52 PM, Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com wrote: On Tue, Mar 10, 2009 at 12:17 PM, CIF Search cifsea...@gmail.com wrote: Just as you have an xslt response writer to convert Solr xml response to make it compatible with any application, on the input side do you have an xslt module that will parse xml documents to solr format before posting them to solr indexer. I have gone through dataimporthandler, but it works in data 'pull' mode i.e. solr pulls data from the given location. I would still want to work with applications 'posting' documents to solr indexer as and when they want. it is a limitation of DIH, but if you can put your xml in a file behind an http server then you can fire a command to DIH to pull data from the url quite easily. Regards, CI -- --Noble Paul
Re: How to correctly boost results in Solr Dismax query
Hi Pete, bq parameter works with q,alt query parameter. If you are passing the search criteria using q.alt query parameter then this bq parameter comes into picture. Also, q.alt doesnt support field boosting. If you want to boost the records with their field value then you must use q query parameter instead of q.alt. 'q' parameter actually uses qf parameters from solrConfig for field boosting. Let me know if you have any questions. Thanks, Amit Garg Pete Smith-3 wrote: Hi, I have managed to build an index in Solr which I can search on keyword, produce facets, query facets etc. This is all working great. I have implemented my search using a dismax query so it searches predetermined fields. However, my results are coming back sorted by score which appears to be calculated by keyword relevancy only. I would like to adjust the score where fields have pre-determined values. I think I can do this with boost query and boost functions but the documentation here: http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3 Is not particularly helpful. I tried adding adding a bq argument to my search: bq=media:DVD^2 (yes, this is an index of films!) but I find when I start adding more and more: bq=media:DVD^2bq=media:BLU-RAY^1.5 I find the negative results - e.g. films that are DVD but are not BLU-RAY get negatively affected in their score. In the end it all seems to even out and my score is as it was before i started boosting. I must be doing this wrong and I wonder whether boost function comes in somewhere. Any ideas on how to correctly use boost? Cheers, Pete -- Pete Smith Developer No.9 | 6 Portal Way | London | W3 6RU | T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111 LOVEFiLM.com -- View this message in context: http://www.nabble.com/How-to-correctly-boost-results-in-Solr-Dismax-query-tp22476204p22490850.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: input XSLT
On Fri, Mar 13, 2009 at 11:36 AM, CIF Search cifsea...@gmail.com wrote: There is a fundamental problem with using 'pull' approach using DIH. Normally people want a delta imports which are done using a timestamp field. Now it may not always be possible for application servers to sync their timestamps (given protocol restrictions due to security reasons). Due to this Solr application is likely to miss a few records occasionally. Such a problem does not arise if applications themseleves identify their records and post. Should we not have such a feature in Solr, which will allow users to push data onto the index in whichever format they wish to? This will also facilitate plugging in solr seamlessly with all kinds of applications. You can of course push your documents to Solr using the XML/CSV update (or using the solrj client). It's just that you can't push documents with DIH. http://wiki.apache.org/solr/#head-98c3ee61c5fc837b09e3dfe3fb420491c9071be3 -- Regards, Shalin Shekhar Mangar.
Re: input XSLT
But these documents have to be converted to a particular format before being posted. Any XML document cannot be posted to Solr (with XSLT handled by Solr internally). DIH handles any xml format, but it operates in pull mode. On Fri, Mar 13, 2009 at 11:45 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Fri, Mar 13, 2009 at 11:36 AM, CIF Search cifsea...@gmail.com wrote: There is a fundamental problem with using 'pull' approach using DIH. Normally people want a delta imports which are done using a timestamp field. Now it may not always be possible for application servers to sync their timestamps (given protocol restrictions due to security reasons). Due to this Solr application is likely to miss a few records occasionally. Such a problem does not arise if applications themseleves identify their records and post. Should we not have such a feature in Solr, which will allow users to push data onto the index in whichever format they wish to? This will also facilitate plugging in solr seamlessly with all kinds of applications. You can of course push your documents to Solr using the XML/CSV update (or using the solrj client). It's just that you can't push documents with DIH. http://wiki.apache.org/solr/#head-98c3ee61c5fc837b09e3dfe3fb420491c9071be3 -- Regards, Shalin Shekhar Mangar.
Re: Compound word search (maybe DisMaxQueryPaser problem)
First of all: sorry Chris, Walter .. I did not mean to put pressure on anyone. It's just that if you're stuck with something and you have that little needle stinging saying: maybe you're just too damn stupid for this ... :) So, thanks a lot for your answers. As for index time expansion using synonyms: I think this is not an option for me since it would mean that I have to a) find all such words that might cause problems and b) find every variant that might possibly be used by customers. And then in the end I have to keep all my synonym files up-to-date. But the main design goal for my search implementation is little to no maintainance. My original assumption for the DisMax Handler was, that it will just take the original query string and pass it to every field in its fieldlist using the fields configured analyzer stack. Maybe in the end add some stuff for the special options and so ... and then send the query to lucene. Can you explain why this approach was not choosen? Thanks Tobi Chris Hostetter schrieb: : Hmmm was my mail so weird or my question so stupid ... or is there simply : noone with an answer? Not even a hint? :( patience my freind, i've got a backlog of ~~500 Lucene related messages in my INBOX, and i was just reading your original email when this reply came in. In generally this is a fairly hard problem ... the easiest solution i know of that works in most cases is to do index time expansion using the SYnonymFilter, so regardless of wether a document contains usbcable usb-cable or usb cable all three varients get indexed, and then the user can search for any of them. the downside is that it can throw off your tf/idf stats for some terms (if they apear by themselves, and as part of a compound) and it can result in false positives for esoteric phrase searches (but that tends to be more of a theoretical problem then an actual one. : But this never happens since with the DisMax Searcher the parser produces a : query like this: : : ((category:blue | name:blue)~0.1 (category:tooth | name:tooth)~0.1) ... : to deal with this compound word problem? Is there another query parser that : already does the trick? take a look at the FieldQParserPlugin ... it passes the raw query string to the analyser of a specified field -- this would let your TokenFilters see the stream of tokens (which isn't possible with the conventional QueryParser tokenization rules) but it doesn't have any of the field/query matric cross product goodness of dismax -- you'd only be able to query the one field. (Hmmm i wonder if DisMaxQParser 2.0 could have an option to let you specify a FieldType whose analyzer was used to tokenize the query string instead of using the Lucene QueryParser JavaCC tokenization, and *then* the tokens resulting from that initial analyzer could be passed to the analyzers of the various qf fields ... hmmm, that might be just crazy enough to be too crazy to work) -Hoss
Re: SolrJ : EmbeddedSolrServer and database data indexing
nope .. But you can still use SolrJ to invoke DIH. cretae a ModifiableSolrParams with the required request parameters create a QueryRequest with the params and then set the path as /dataimport and invoke the command with the CommonsHttpSolrServer#request() On Fri, Mar 13, 2009 at 8:40 AM, Ashish P ashish.ping...@gmail.com wrote: Is there any api in SolrJ that calls the dataImportHandler to execute commands like full-import and delta-import. Please help.. Ashish P wrote: Is it possible to index DB data directly to solr using EmbeddedSolrServer. I tried using data-Config File and Full-import commad, it works. So assuming using CommonsHttpServer will also work. But can I do it with EmbeddedSolrServer?? Thanks in advance... Ashish -- View this message in context: http://www.nabble.com/SolrJ-%3A-EmbeddedSolrServer-and-database-data-indexing-tp22488697p22489420.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Two way Synonyms in Solr
Hi, I am implementing 2 way synonyms in solr using q query parameter. One way synonym is working fine with q query parameter but 2 way is not working. for e.g. If I defined 2 way synonyms in the file like: value1, value2 It doesnt show any result for either of the value. Please suggest. Thanks, Amit Garg -- View this message in context: http://www.nabble.com/Two-way-Synonyms-in-Solr-tp22492439p22492439.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Two way Synonyms in Solr
dabboo wrote: Hi, I am implementing 2 way synonyms in solr using q query parameter. One way synonym is working fine with q query parameter but 2 way is not working. for e.g. If I defined 2 way synonyms in the file like: value1, value2 It doesnt show any result for either of the value. Please suggest. Thanks, Amit Garg Are you sure you have expand=true on your synonym definition? Also you can use /admin/analysis.jsp for debugging the field. Koji
Phrase Synonyms in solr
Hi, Can someone please tell me how to implement phrase synonyms in solr. Thanks, Amit -- View this message in context: http://www.nabble.com/Phrase-Synonyms-in-solr-tp22492440p22492440.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr: ERRORs at Startup
Hello everybody, I am currently using: - Solr v1.3.0 - Jboss jboss-5.0.1.GA http://jboss-5.0.1.ga/ - Java jdk 1.5_06 When I start Solr within Jboss I see a lot of errors in the log but Solr seems working (meaning I can see the admin interface but I cannot index my DB...but that is another story :-) ). Attached is the log file. Here just some of the error messages I see: ... 10:51:19,976 INFO [ConnectionFactoryBindingService] Bound ConnectionManager 'jboss.jca:service=ConnectionFactoryBinding,name=JmsXA' to JNDI name 'java:JmsXA' 10:51:20,006 INFO [TomcatDeployment] deploy, ctxPath=/ 10:51:20,126 INFO [TomcatDeployment] deploy, ctxPath=/jmx-console 10:51:20,525 INFO [TomcatDeployment] deploy, ctxPath=/solr 10:51:20,617 ERROR [STDERR] Mar 13, 2009 10:51:20 AM org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() 10:51:20,631 ERROR [STDERR] Mar 13, 2009 10:51:20 AM org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: No /solr/home in JNDI 10:51:20,631 ERROR [STDERR] Mar 13, 2009 10:51:20 AM org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: using system property solr.solr.home: /home/giovanni/development/search/solr 10:51:20,637 ERROR [STDERR] Mar 13, 2009 10:51:20 AM org.apache.solr.core.CoreContainer$Initializer initialize INFO: looking for solr.xml: /home/giovanni/development/search/solr/solr.xml 10:51:20,637 ERROR [STDERR] Mar 13, 2009 10:51:20 AM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to '/home/giovanni/development/search/solr/' 10:51:20,710 ERROR [STDERR] Mar 13, 2009 10:51:20 AM org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Adding 'file:/home/giovanni/development/search/solr/lib/ojdbc14.jar' to Solr classloader 10:51:20,734 ERROR [STDERR] Mar 13, 2009 10:51:20 AM org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: No /solr/home in JNDI 10:51:20,734 ERROR [STDERR] Mar 13, 2009 10:51:20 AM org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: using system property solr.solr.home: /home/giovanni/development/search/solr 10:51:20,735 ERROR [STDERR] Mar 13, 2009 10:51:20 AM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to '/home/giovanni/development/search/solr/' 10:51:20,736 ERROR [STDERR] Mar 13, 2009 10:51:20 AM org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Adding 'file:/home/giovanni/development/search/solr/lib/ojdbc14.jar' to Solr classloader 10:51:21,964 ERROR [STDERR] Mar 13, 2009 10:51:21 AM org.apache.solr.core.SolrConfig init INFO: Loaded SolrConfig: solrconfig.xml 10:51:21,977 ERROR [STDERR] Mar 13, 2009 10:51:21 AM org.apache.solr.core.SolrCore init INFO: Opening new SolrCore at /home/giovanni/development/search/solr/, dataDir=./solr/data/ 10:51:21,991 ERROR [STDERR] Mar 13, 2009 10:51:21 AM org.apache.solr.schema.IndexSchema readSchema INFO: Reading Solr Schema 10:51:22,027 ERROR [STDERR] Mar 13, 2009 10:51:22 AM org.apache.solr.schema.IndexSchema readSchema INFO: Schema name=search 10:51:22,051 ERROR [STDERR] Mar 13, 2009 10:51:22 AM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created string: org.apache.solr.schema.StrField 10:51:22,061 ERROR [STDERR] Mar 13, 2009 10:51:22 AM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created boolean: org.apache.solr.schema.BoolField 10:51:22,067 ERROR [STDERR] Mar 13, 2009 10:51:22 AM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created integer: org.apache.solr.schema.IntField 10:51:22,472 ERROR [STDERR] Mar 13, 2009 10:51:22 AM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created ignored: org.apache.solr.schema.StrField 10:51:22,483 ERROR [STDERR] Mar 13, 2009 10:51:22 AM org.apache.solr.schema.IndexSchema readSchema INFO: default search field is text 10:51:22,485 ERROR [STDERR] Mar 13, 2009 10:51:22 AM org.apache.solr.schema.IndexSchema readSchema INFO: query parser default operator is OR 10:51:22,486 ERROR [STDERR] Mar 13, 2009 10:51:22 AM org.apache.solr.schema.IndexSchema readSchema INFO: unique key field: uri 10:51:22,541 ERROR [STDERR] Mar 13, 2009 10:51:22 AM org.apache.solr.core.JmxMonitoredMap init INFO: JMX monitoring is enabled. Adding Solr mbeans to JMX Server: org.jboss.mx.server.mbeanserveri...@3deff3[ defaultDomain='jboss' ] 10:51:22,543 ERROR [STDERR] Mar 13, 2009 10:51:22 AM org.apache.solr.core.SolrCore parseListener INFO: Searching for listeners: //listen...@event=firstSearcher] 10:51:22,564 ERROR [STDERR] Mar 13, 2009 10:51:22 AM org.apache.solr.core.SolrCore parseListener INFO: Added SolrEventListener: org.apache.solr.core.QuerySenderListener{queries=[{q=fast_warm,start=0,rows=10}, {q=static firstSearcher warming query from solrconfig.xml}]} What am I missing? :-( Any idea? thanks in advance. Giovanni 10:50:41,204 INFO [ServerImpl] Starting JBoss (Microcontainer)... 10:50:41,207 INFO [ServerImpl] Release ID: JBoss [Morpheus] 5.0.1.GA (build: SVNTag=JBoss_5_0_1_GA date=200902231221) 10:50:41,208
Re: How to correctly boost results in Solr Dismax query
Hi Amit, Thanks very much for your reply. What you said makes things a bit clearer but I am still a bit confused. On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote: If you want to boost the records with their field value then you must use q query parameter instead of q.alt. 'q' parameter actually uses qf parameters from solrConfig for field boosting. From the documentation for Dismax queries, I thought that q is simply a keyword parameter: From http://wiki.apache.org/solr/DisMaxRequestHandler: q The guts of the search defining the main query. This is designed to be support raw input strings provided by users with no special escaping. '+' and '-' characters are treated as mandatory and prohibited modifiers for the subsequent terms. Text wrapped in balanced quote characters '' are treated as phrases, any query containing an odd number of quote characters is evaluated as if there were no quote characters at all. Wildcards in this q parameter are not supported. And I thought 'qf' is a list of fields and boost scores: From http://wiki.apache.org/solr/DisMaxRequestHandler: qf (Query Fields) List of fields and the boosts to associate with each of them when building DisjunctionMaxQueries from the user's query. The format supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which indicates that fieldOne has a boost of 2.3, fieldTwo has the default boost, and fieldThree has a boost of 0.4 ... this indicates that matches in fieldOne are much more significant than matches in fieldTwo, which are more significant than matches in fieldThree. But if I want to, say, search for films with 'indiana' in the title, with media=DVD scoring higher than media=BLU-RAY then do I need to do something like: solr/select?q=indiana And in my config: str name=qfmedia^2/str But I don't see where the actual *contents* of the media field would determine the boost. Sorry if I have misunderstood what you mean. Cheers, Pete Pete Smith-3 wrote: Hi, I have managed to build an index in Solr which I can search on keyword, produce facets, query facets etc. This is all working great. I have implemented my search using a dismax query so it searches predetermined fields. However, my results are coming back sorted by score which appears to be calculated by keyword relevancy only. I would like to adjust the score where fields have pre-determined values. I think I can do this with boost query and boost functions but the documentation here: http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3 Is not particularly helpful. I tried adding adding a bq argument to my search: bq=media:DVD^2 (yes, this is an index of films!) but I find when I start adding more and more: bq=media:DVD^2bq=media:BLU-RAY^1.5 I find the negative results - e.g. films that are DVD but are not BLU-RAY get negatively affected in their score. In the end it all seems to even out and my score is as it was before i started boosting. I must be doing this wrong and I wonder whether boost function comes in somewhere. Any ideas on how to correctly use boost? Cheers, Pete -- Pete Smith Developer No.9 | 6 Portal Way | London | W3 6RU | T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111 LOVEFiLM.com -- Pete Smith Developer No.9 | 6 Portal Way | London | W3 6RU | T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111 LOVEFiLM.com
Re: Solr: ERRORs at Startup
Hi Giovanni, It looks like logging is configured strangely. Those messages in my solr setup (on tomcat 6 or jetty) appear as INFO level messages. It could have something to do with your SLF4J setup, but I'm no expert on that side of things. I wouldn't worry too much, the content of the messages doesn't imply anything bad going on. Toby. On 13 Mar 2009, at 09:57, Giovanni De Stefano wrote: Hello everybody, I am currently using: Solr v1.3.0 Jboss jboss-5.0.1.GA Java jdk 1.5_06 When I start Solr within Jboss I see a lot of errors in the log but Solr seems working (meaning I can see the admin interface but I cannot index my DB...but that is another story :-) ). Attached is the log file. Here just some of the error messages I see: ... 10:51:19,976 INFO [ConnectionFactoryBindingService] Bound ConnectionManager 'jboss.jca:service=ConnectionFactoryBinding,name=JmsXA' to JNDI name 'java:JmsXA' 10:51:20,006 INFO [TomcatDeployment] deploy, ctxPath=/ 10:51:20,126 INFO [TomcatDeployment] deploy, ctxPath=/jmx-console 10:51:20,525 INFO [TomcatDeployment] deploy, ctxPath=/solr 10:51:20,617 ERROR [STDERR] Mar 13, 2009 10:51:20 AM org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() 10:51:20,631 ERROR [STDERR] Mar 13, 2009 10:51:20 AM org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: No /solr/home in JNDI 10:51:20,631 ERROR [STDERR] Mar 13, 2009 10:51:20 AM org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: using system property solr.solr.home: /home/giovanni/ development/search/solr 10:51:20,637 ERROR [STDERR] Mar 13, 2009 10:51:20 AM org.apache.solr.core.CoreContainer$Initializer initialize INFO: looking for solr.xml: /home/giovanni/development/search/solr/ solr.xml 10:51:20,637 ERROR [STDERR] Mar 13, 2009 10:51:20 AM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to '/home/giovanni/development/search/solr/' 10:51:20,710 ERROR [STDERR] Mar 13, 2009 10:51:20 AM org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Adding 'file:/home/giovanni/development/search/solr/lib/ ojdbc14.jar' to Solr classloader 10:51:20,734 ERROR [STDERR] Mar 13, 2009 10:51:20 AM org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: No /solr/home in JNDI 10:51:20,734 ERROR [STDERR] Mar 13, 2009 10:51:20 AM org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: using system property solr.solr.home: /home/giovanni/ development/search/solr 10:51:20,735 ERROR [STDERR] Mar 13, 2009 10:51:20 AM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to '/home/giovanni/development/search/solr/' 10:51:20,736 ERROR [STDERR] Mar 13, 2009 10:51:20 AM org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Adding 'file:/home/giovanni/development/search/solr/lib/ ojdbc14.jar' to Solr classloader 10:51:21,964 ERROR [STDERR] Mar 13, 2009 10:51:21 AM org.apache.solr.core.SolrConfig init INFO: Loaded SolrConfig: solrconfig.xml 10:51:21,977 ERROR [STDERR] Mar 13, 2009 10:51:21 AM org.apache.solr.core.SolrCore init INFO: Opening new SolrCore at /home/giovanni/development/search/ solr/, dataDir=./solr/data/ 10:51:21,991 ERROR [STDERR] Mar 13, 2009 10:51:21 AM org.apache.solr.schema.IndexSchema readSchema INFO: Reading Solr Schema 10:51:22,027 ERROR [STDERR] Mar 13, 2009 10:51:22 AM org.apache.solr.schema.IndexSchema readSchema INFO: Schema name=search 10:51:22,051 ERROR [STDERR] Mar 13, 2009 10:51:22 AM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created string: org.apache.solr.schema.StrField 10:51:22,061 ERROR [STDERR] Mar 13, 2009 10:51:22 AM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created boolean: org.apache.solr.schema.BoolField 10:51:22,067 ERROR [STDERR] Mar 13, 2009 10:51:22 AM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created integer: org.apache.solr.schema.IntField 10:51:22,472 ERROR [STDERR] Mar 13, 2009 10:51:22 AM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created ignored: org.apache.solr.schema.StrField 10:51:22,483 ERROR [STDERR] Mar 13, 2009 10:51:22 AM org.apache.solr.schema.IndexSchema readSchema INFO: default search field is text 10:51:22,485 ERROR [STDERR] Mar 13, 2009 10:51:22 AM org.apache.solr.schema.IndexSchema readSchema INFO: query parser default operator is OR 10:51:22,486 ERROR [STDERR] Mar 13, 2009 10:51:22 AM org.apache.solr.schema.IndexSchema readSchema INFO: unique key field: uri 10:51:22,541 ERROR [STDERR] Mar 13, 2009 10:51:22 AM org.apache.solr.core.JmxMonitoredMap init INFO: JMX monitoring is enabled. Adding Solr mbeans to JMX Server: org.jboss.mx.server.mbeanserveri...@3deff3[ defaultDomain='jboss' ] 10:51:22,543 ERROR [STDERR] Mar 13, 2009 10:51:22 AM org.apache.solr.core.SolrCore parseListener INFO: Searching for listeners: //listen...@event=firstSearcher] 10:51:22,564 ERROR [STDERR] Mar 13, 2009 10:51:22 AM
Re: How to correctly boost results in Solr Dismax query
Pete, Sorry, if wasnt clear. Here is the explanation. Suppose you have 2 records and they have films and media as 2 columns. Now first record has values like films=Indiana and media=blue ray and 2nd record has values like films=Bond and media=Indiana Values for qf parameters str name=qfmedia^2.0 films^1.0/str Now, search for q=Indiana .. it should display both of the records but record #2 will display above than the 1st. Let me know if you still have questions. Cheers, amit Pete Smith-3 wrote: Hi Amit, Thanks very much for your reply. What you said makes things a bit clearer but I am still a bit confused. On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote: If you want to boost the records with their field value then you must use q query parameter instead of q.alt. 'q' parameter actually uses qf parameters from solrConfig for field boosting. From the documentation for Dismax queries, I thought that q is simply a keyword parameter: From http://wiki.apache.org/solr/DisMaxRequestHandler: q The guts of the search defining the main query. This is designed to be support raw input strings provided by users with no special escaping. '+' and '-' characters are treated as mandatory and prohibited modifiers for the subsequent terms. Text wrapped in balanced quote characters '' are treated as phrases, any query containing an odd number of quote characters is evaluated as if there were no quote characters at all. Wildcards in this q parameter are not supported. And I thought 'qf' is a list of fields and boost scores: From http://wiki.apache.org/solr/DisMaxRequestHandler: qf (Query Fields) List of fields and the boosts to associate with each of them when building DisjunctionMaxQueries from the user's query. The format supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which indicates that fieldOne has a boost of 2.3, fieldTwo has the default boost, and fieldThree has a boost of 0.4 ... this indicates that matches in fieldOne are much more significant than matches in fieldTwo, which are more significant than matches in fieldThree. But if I want to, say, search for films with 'indiana' in the title, with media=DVD scoring higher than media=BLU-RAY then do I need to do something like: solr/select?q=indiana And in my config: str name=qfmedia^2/str But I don't see where the actual *contents* of the media field would determine the boost. Sorry if I have misunderstood what you mean. Cheers, Pete Pete Smith-3 wrote: Hi, I have managed to build an index in Solr which I can search on keyword, produce facets, query facets etc. This is all working great. I have implemented my search using a dismax query so it searches predetermined fields. However, my results are coming back sorted by score which appears to be calculated by keyword relevancy only. I would like to adjust the score where fields have pre-determined values. I think I can do this with boost query and boost functions but the documentation here: http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3 Is not particularly helpful. I tried adding adding a bq argument to my search: bq=media:DVD^2 (yes, this is an index of films!) but I find when I start adding more and more: bq=media:DVD^2bq=media:BLU-RAY^1.5 I find the negative results - e.g. films that are DVD but are not BLU-RAY get negatively affected in their score. In the end it all seems to even out and my score is as it was before i started boosting. I must be doing this wrong and I wonder whether boost function comes in somewhere. Any ideas on how to correctly use boost? Cheers, Pete -- Pete Smith Developer No.9 | 6 Portal Way | London | W3 6RU | T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111 LOVEFiLM.com -- Pete Smith Developer No.9 | 6 Portal Way | London | W3 6RU | T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111 LOVEFiLM.com -- View this message in context: http://www.nabble.com/How-to-correctly-boost-results-in-Solr-Dismax-query-tp22476204p22493646.html Sent from the Solr - User mailing list archive at Nabble.com.
DIH with outer joins
I have queries with outer joins defined in some entities and for the same root object I can have two or more lines with different objects, for example: Taking the following 3 tables, andquery defined in the entity with outer joins between tables: Table1 - Table2 - Table3 I can have the following lines returned by the query: Table1Instance1 - Table2Instance1 - Table3Instance1 Table1Instance1 - Table2Instance1 - Table3Instance2 Table1Instance1 - Table2Instance2 - Table3Instance3 Table1Instance2 - Table2Instance3 - Table3Instance4 I wanted to have a single document per root object instance (in this case per Table1 instance) but with the values from the different lines returned. Is it possible to have this behavior in DataImportHandler? How? Thanks in advance, Rui Pereira
Re: How to correctly boost results in Solr Dismax query
Hi Amit, Thanks again for your reply. I am understanding it a bit better but I think it would help if I posted an example. Say I have three records: doc long name=id1/long str name=mediaBLU-RAY/str str name=titleIndiana Jones and the Kingdom of the Crystal Skull/str /doc doc long name=id2/long str name=mediaDVD/str str name=titleIndiana Jones and the Kingdom of the Crystal Skull/str /doc doc long name=id3/long str name=mediaDVD/str str name=titleCasino Royale/str /doc Now, if I search for indiana: select?q=indiana I want the first two rows to come back (not the third as it does not contain 'indiana'). I would like record 2 to be scored higher than record 1 as it's media type is DVD. At the moment I have in my config: str name=qftitle/str And i was trying to boost by media having a specific value by using 'bq' but from what you told me that is incorrect. Cheers, Pete On Fri, 2009-03-13 at 03:21 -0700, dabboo wrote: Pete, Sorry, if wasnt clear. Here is the explanation. Suppose you have 2 records and they have films and media as 2 columns. Now first record has values like films=Indiana and media=blue ray and 2nd record has values like films=Bond and media=Indiana Values for qf parameters str name=qfmedia^2.0 films^1.0/str Now, search for q=Indiana .. it should display both of the records but record #2 will display above than the 1st. Let me know if you still have questions. Cheers, amit Pete Smith-3 wrote: Hi Amit, Thanks very much for your reply. What you said makes things a bit clearer but I am still a bit confused. On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote: If you want to boost the records with their field value then you must use q query parameter instead of q.alt. 'q' parameter actually uses qf parameters from solrConfig for field boosting. From the documentation for Dismax queries, I thought that q is simply a keyword parameter: From http://wiki.apache.org/solr/DisMaxRequestHandler: q The guts of the search defining the main query. This is designed to be support raw input strings provided by users with no special escaping. '+' and '-' characters are treated as mandatory and prohibited modifiers for the subsequent terms. Text wrapped in balanced quote characters '' are treated as phrases, any query containing an odd number of quote characters is evaluated as if there were no quote characters at all. Wildcards in this q parameter are not supported. And I thought 'qf' is a list of fields and boost scores: From http://wiki.apache.org/solr/DisMaxRequestHandler: qf (Query Fields) List of fields and the boosts to associate with each of them when building DisjunctionMaxQueries from the user's query. The format supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which indicates that fieldOne has a boost of 2.3, fieldTwo has the default boost, and fieldThree has a boost of 0.4 ... this indicates that matches in fieldOne are much more significant than matches in fieldTwo, which are more significant than matches in fieldThree. But if I want to, say, search for films with 'indiana' in the title, with media=DVD scoring higher than media=BLU-RAY then do I need to do something like: solr/select?q=indiana And in my config: str name=qfmedia^2/str But I don't see where the actual *contents* of the media field would determine the boost. Sorry if I have misunderstood what you mean. Cheers, Pete Pete Smith-3 wrote: Hi, I have managed to build an index in Solr which I can search on keyword, produce facets, query facets etc. This is all working great. I have implemented my search using a dismax query so it searches predetermined fields. However, my results are coming back sorted by score which appears to be calculated by keyword relevancy only. I would like to adjust the score where fields have pre-determined values. I think I can do this with boost query and boost functions but the documentation here: http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3 Is not particularly helpful. I tried adding adding a bq argument to my search: bq=media:DVD^2 (yes, this is an index of films!) but I find when I start adding more and more: bq=media:DVD^2bq=media:BLU-RAY^1.5 I find the negative results - e.g. films that are DVD but are not BLU-RAY get negatively affected in their score. In the end it all seems to even out and my score is as it was before i started boosting. I must be doing this wrong and I wonder whether boost function comes in somewhere. Any ideas on how to correctly use boost? Cheers, Pete -- Pete Smith Developer No.9 | 6 Portal Way | London | W3 6RU | T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111 LOVEFiLM.com -- Pete Smith Developer No.9 |
Re: How to correctly boost results in Solr Dismax query
Pete, bq works only with q.alt query and not with q queries. So, in your case you would be using qf parameter for field boosting, you will have to give both the fields in qf parameter i.e. both title and media. try this str name=qfmedia^1.0 title^100.0/str Pete Smith-3 wrote: Hi Amit, Thanks again for your reply. I am understanding it a bit better but I think it would help if I posted an example. Say I have three records: doc long name=id1/long str name=mediaBLU-RAY/str str name=titleIndiana Jones and the Kingdom of the Crystal Skull/str /doc doc long name=id2/long str name=mediaDVD/str str name=titleIndiana Jones and the Kingdom of the Crystal Skull/str /doc doc long name=id3/long str name=mediaDVD/str str name=titleCasino Royale/str /doc Now, if I search for indiana: select?q=indiana I want the first two rows to come back (not the third as it does not contain 'indiana'). I would like record 2 to be scored higher than record 1 as it's media type is DVD. At the moment I have in my config: str name=qftitle/str And i was trying to boost by media having a specific value by using 'bq' but from what you told me that is incorrect. Cheers, Pete On Fri, 2009-03-13 at 03:21 -0700, dabboo wrote: Pete, Sorry, if wasnt clear. Here is the explanation. Suppose you have 2 records and they have films and media as 2 columns. Now first record has values like films=Indiana and media=blue ray and 2nd record has values like films=Bond and media=Indiana Values for qf parameters str name=qfmedia^2.0 films^1.0/str Now, search for q=Indiana .. it should display both of the records but record #2 will display above than the 1st. Let me know if you still have questions. Cheers, amit Pete Smith-3 wrote: Hi Amit, Thanks very much for your reply. What you said makes things a bit clearer but I am still a bit confused. On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote: If you want to boost the records with their field value then you must use q query parameter instead of q.alt. 'q' parameter actually uses qf parameters from solrConfig for field boosting. From the documentation for Dismax queries, I thought that q is simply a keyword parameter: From http://wiki.apache.org/solr/DisMaxRequestHandler: q The guts of the search defining the main query. This is designed to be support raw input strings provided by users with no special escaping. '+' and '-' characters are treated as mandatory and prohibited modifiers for the subsequent terms. Text wrapped in balanced quote characters '' are treated as phrases, any query containing an odd number of quote characters is evaluated as if there were no quote characters at all. Wildcards in this q parameter are not supported. And I thought 'qf' is a list of fields and boost scores: From http://wiki.apache.org/solr/DisMaxRequestHandler: qf (Query Fields) List of fields and the boosts to associate with each of them when building DisjunctionMaxQueries from the user's query. The format supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which indicates that fieldOne has a boost of 2.3, fieldTwo has the default boost, and fieldThree has a boost of 0.4 ... this indicates that matches in fieldOne are much more significant than matches in fieldTwo, which are more significant than matches in fieldThree. But if I want to, say, search for films with 'indiana' in the title, with media=DVD scoring higher than media=BLU-RAY then do I need to do something like: solr/select?q=indiana And in my config: str name=qfmedia^2/str But I don't see where the actual *contents* of the media field would determine the boost. Sorry if I have misunderstood what you mean. Cheers, Pete Pete Smith-3 wrote: Hi, I have managed to build an index in Solr which I can search on keyword, produce facets, query facets etc. This is all working great. I have implemented my search using a dismax query so it searches predetermined fields. However, my results are coming back sorted by score which appears to be calculated by keyword relevancy only. I would like to adjust the score where fields have pre-determined values. I think I can do this with boost query and boost functions but the documentation here: http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3 Is not particularly helpful. I tried adding adding a bq argument to my search: bq=media:DVD^2 (yes, this is an index of films!) but I find when I start adding more and more: bq=media:DVD^2bq=media:BLU-RAY^1.5 I find the negative results - e.g. films that are DVD but are not BLU-RAY get negatively affected in their score. In the end it all seems to even out and my score is as it was before i started boosting. I must be doing this
Solr: is there a default ClobTransformer?
Hello all, I am trying to index an Oracle DB with some Clob columns. Following the doc I see that I need to transform my entity with a ClobTransformer. Now, my log says the following: 12:05:52,901 ERROR [STDERR] Mar 13, 2009 12:05:52 PM org.apache.solr.handler.dataimport.EntityProcessorBase loadTransformers SEVERE: Unable to load Transformer: ClobTransformer java.lang.ClassNotFoundException: Unable to load ClobTransformer or org.apache.solr.handler.dataimport.ClobTransformer at org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:587) at org.apache.solr.handler.dataimport.EntityProcessorBase.loadTransformers(EntityProcessorBase.java:96) at org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:159) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:80) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) this is pretty easy to understand: no ClobTransformer implementation is found in the classpath. The question is: is there any default ClobTransformer shipped with Solrl or do I have to implement a custom one? Thanks, Giovanni
Re: How to correctly boost results in Solr Dismax query
Hi, On Fri, 2009-03-13 at 03:57 -0700, dabboo wrote: bq works only with q.alt query and not with q queries. So, in your case you would be using qf parameter for field boosting, you will have to give both the fields in qf parameter i.e. both title and media. try this str name=qfmedia^1.0 title^100.0/str But with that, how will it know to rank media:DVD higher than media:BLU-RAY? Cheers, Pete Pete Smith-3 wrote: Hi Amit, Thanks again for your reply. I am understanding it a bit better but I think it would help if I posted an example. Say I have three records: doc long name=id1/long str name=mediaBLU-RAY/str str name=titleIndiana Jones and the Kingdom of the Crystal Skull/str /doc doc long name=id2/long str name=mediaDVD/str str name=titleIndiana Jones and the Kingdom of the Crystal Skull/str /doc doc long name=id3/long str name=mediaDVD/str str name=titleCasino Royale/str /doc Now, if I search for indiana: select?q=indiana I want the first two rows to come back (not the third as it does not contain 'indiana'). I would like record 2 to be scored higher than record 1 as it's media type is DVD. At the moment I have in my config: str name=qftitle/str And i was trying to boost by media having a specific value by using 'bq' but from what you told me that is incorrect. Cheers, Pete On Fri, 2009-03-13 at 03:21 -0700, dabboo wrote: Pete, Sorry, if wasnt clear. Here is the explanation. Suppose you have 2 records and they have films and media as 2 columns. Now first record has values like films=Indiana and media=blue ray and 2nd record has values like films=Bond and media=Indiana Values for qf parameters str name=qfmedia^2.0 films^1.0/str Now, search for q=Indiana .. it should display both of the records but record #2 will display above than the 1st. Let me know if you still have questions. Cheers, amit Pete Smith-3 wrote: Hi Amit, Thanks very much for your reply. What you said makes things a bit clearer but I am still a bit confused. On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote: If you want to boost the records with their field value then you must use q query parameter instead of q.alt. 'q' parameter actually uses qf parameters from solrConfig for field boosting. From the documentation for Dismax queries, I thought that q is simply a keyword parameter: From http://wiki.apache.org/solr/DisMaxRequestHandler: q The guts of the search defining the main query. This is designed to be support raw input strings provided by users with no special escaping. '+' and '-' characters are treated as mandatory and prohibited modifiers for the subsequent terms. Text wrapped in balanced quote characters '' are treated as phrases, any query containing an odd number of quote characters is evaluated as if there were no quote characters at all. Wildcards in this q parameter are not supported. And I thought 'qf' is a list of fields and boost scores: From http://wiki.apache.org/solr/DisMaxRequestHandler: qf (Query Fields) List of fields and the boosts to associate with each of them when building DisjunctionMaxQueries from the user's query. The format supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which indicates that fieldOne has a boost of 2.3, fieldTwo has the default boost, and fieldThree has a boost of 0.4 ... this indicates that matches in fieldOne are much more significant than matches in fieldTwo, which are more significant than matches in fieldThree. But if I want to, say, search for films with 'indiana' in the title, with media=DVD scoring higher than media=BLU-RAY then do I need to do something like: solr/select?q=indiana And in my config: str name=qfmedia^2/str But I don't see where the actual *contents* of the media field would determine the boost. Sorry if I have misunderstood what you mean. Cheers, Pete Pete Smith-3 wrote: Hi, I have managed to build an index in Solr which I can search on keyword, produce facets, query facets etc. This is all working great. I have implemented my search using a dismax query so it searches predetermined fields. However, my results are coming back sorted by score which appears to be calculated by keyword relevancy only. I would like to adjust the score where fields have pre-determined values. I think I can do this with boost query and boost functions but the documentation here: http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3 Is not particularly helpful. I tried adding adding a bq argument to my search: bq=media:DVD^2 (yes, this is an index of films!) but I find when I start adding more
Re: DIH with outer joins
it is not very clear to me on how it works probably you can put in the queries here. you can do all the joins in the db in one complex query and use that straightaway in an entity. You do not have to do any joins inside DIH itself On Fri, Mar 13, 2009 at 4:47 PM, Rui António da Cruz Pereira ruipereira...@gmail.com wrote: I have queries with outer joins defined in some entities and for the same root object I can have two or more lines with different objects, for example: Taking the following 3 tables, andquery defined in the entity with outer joins between tables: Table1 - Table2 - Table3 I can have the following lines returned by the query: Table1Instance1 - Table2Instance1 - Table3Instance1 Table1Instance1 - Table2Instance1 - Table3Instance2 Table1Instance1 - Table2Instance2 - Table3Instance3 Table1Instance2 - Table2Instance3 - Table3Instance4 I wanted to have a single document per root object instance (in this case per Table1 instance) but with the values from the different lines returned. Is it possible to have this behavior in DataImportHandler? How? Thanks in advance, Rui Pereira -- --Noble Paul
Re: Solr: is there a default ClobTransformer?
ClobTranformer is a Solr1.4 feature. which one are you using? On Fri, Mar 13, 2009 at 4:39 PM, Giovanni De Stefano giovanni.destef...@gmail.com wrote: Hello all, I am trying to index an Oracle DB with some Clob columns. Following the doc I see that I need to transform my entity with a ClobTransformer. Now, my log says the following: 12:05:52,901 ERROR [STDERR] Mar 13, 2009 12:05:52 PM org.apache.solr.handler.dataimport.EntityProcessorBase loadTransformers SEVERE: Unable to load Transformer: ClobTransformer java.lang.ClassNotFoundException: Unable to load ClobTransformer or org.apache.solr.handler.dataimport.ClobTransformer at org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:587) at org.apache.solr.handler.dataimport.EntityProcessorBase.loadTransformers(EntityProcessorBase.java:96) at org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:159) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:80) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) this is pretty easy to understand: no ClobTransformer implementation is found in the classpath. The question is: is there any default ClobTransformer shipped with Solrl or do I have to implement a custom one? Thanks, Giovanni -- --Noble Paul
Re: DIH with outer joins
I thought that I could remove the uniqueKey in Solr and then have more that one document with the same id, but then I don't know if in delta-imports the documents outdated or deleted are updated (updated document is added and then we would have the outdated and the updated document in the index) or removed. Noble Paul നോബിള് नोब्ळ् wrote: it is not very clear to me on how it works probably you can put in the queries here. you can do all the joins in the db in one complex query and use that straightaway in an entity. You do not have to do any joins inside DIH itself On Fri, Mar 13, 2009 at 4:47 PM, Rui António da Cruz Pereira ruipereira...@gmail.com wrote: I have queries with outer joins defined in some entities and for the same root object I can have two or more lines with different objects, for example: Taking the following 3 tables, andquery defined in the entity with outer joins between tables: Table1 - Table2 - Table3 I can have the following lines returned by the query: Table1Instance1 - Table2Instance1 - Table3Instance1 Table1Instance1 - Table2Instance1 - Table3Instance2 Table1Instance1 - Table2Instance2 - Table3Instance3 Table1Instance2 - Table2Instance3 - Table3Instance4 I wanted to have a single document per root object instance (in this case per Table1 instance) but with the values from the different lines returned. Is it possible to have this behavior in DataImportHandler? How? Thanks in advance, Rui Pereira
Re: Two way Synonyms in Solr
Yes, I have defined expand=true for synonym definition. But still, 2 way synonym are not working. Also, is there any way, phrase synonym starts working. Koji Sekiguchi-2 wrote: dabboo wrote: Hi, I am implementing 2 way synonyms in solr using q query parameter. One way synonym is working fine with q query parameter but 2 way is not working. for e.g. If I defined 2 way synonyms in the file like: value1, value2 It doesnt show any result for either of the value. Please suggest. Thanks, Amit Garg Are you sure you have expand=true on your synonym definition? Also you can use /admin/analysis.jsp for debugging the field. Koji -- View this message in context: http://www.nabble.com/Two-way-Synonyms-in-Solr-tp22492439p22494772.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH with outer joins
have one root entity which just does a select id from Table1 .Then have a child entiy which does all the joins and return all other columns for that 'id'. On Fri, Mar 13, 2009 at 5:10 PM, Rui António da Cruz Pereira ruipereira...@gmail.com wrote: I thought that I could remove the uniqueKey in Solr and then have more that one document with the same id, but then I don't know if in delta-imports the documents outdated or deleted are updated (updated document is added and then we would have the outdated and the updated document in the index) or removed. Noble Paul നോബിള് नोब्ळ् wrote: it is not very clear to me on how it works probably you can put in the queries here. you can do all the joins in the db in one complex query and use that straightaway in an entity. You do not have to do any joins inside DIH itself On Fri, Mar 13, 2009 at 4:47 PM, Rui António da Cruz Pereira ruipereira...@gmail.com wrote: I have queries with outer joins defined in some entities and for the same root object I can have two or more lines with different objects, for example: Taking the following 3 tables, andquery defined in the entity with outer joins between tables: Table1 - Table2 - Table3 I can have the following lines returned by the query: Table1Instance1 - Table2Instance1 - Table3Instance1 Table1Instance1 - Table2Instance1 - Table3Instance2 Table1Instance1 - Table2Instance2 - Table3Instance3 Table1Instance2 - Table2Instance3 - Table3Instance4 I wanted to have a single document per root object instance (in this case per Table1 instance) but with the values from the different lines returned. Is it possible to have this behavior in DataImportHandler? How? Thanks in advance, Rui Pereira -- --Noble Paul
Re: Solr: ERRORs at Startup
Hello Toby, thank you for your quick reply. Even setting everything to INFO through http://localhost:8080/solr/admin/logging didn't help. But considering you do not see any bad issue here, at this time I will ignore those ERROR messages :-) Cheers, Giovanni On Fri, Mar 13, 2009 at 11:16 AM, Toby Cole toby.c...@semantico.com wrote: Hi Giovanni, It looks like logging is configured strangely. Those messages in my solr setup (on tomcat 6 or jetty) appear as INFO level messages. It could have something to do with your SLF4J setup, but I'm no expert on that side of things. I wouldn't worry too much, the content of the messages doesn't imply anything bad going on. Toby. On 13 Mar 2009, at 09:57, Giovanni De Stefano wrote: Hello everybody, I am currently using: Solr v1.3.0 Jboss jboss-5.0.1.GA Java jdk 1.5_06 When I start Solr within Jboss I see a lot of errors in the log but Solr seems working (meaning I can see the admin interface but I cannot index my DB...but that is another story :-) ). Attached is the log file. Here just some of the error messages I see: ... 10:51:19,976 INFO [ConnectionFactoryBindingService] Bound ConnectionManager 'jboss.jca:service=ConnectionFactoryBinding,name=JmsXA' to JNDI name 'java:JmsXA' 10:51:20,006 INFO [TomcatDeployment] deploy, ctxPath=/ 10:51:20,126 INFO [TomcatDeployment] deploy, ctxPath=/jmx-console 10:51:20,525 INFO [TomcatDeployment] deploy, ctxPath=/solr 10:51:20,617 ERROR [STDERR] Mar 13, 2009 10:51:20 AM org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() 10:51:20,631 ERROR [STDERR] Mar 13, 2009 10:51:20 AM org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: No /solr/home in JNDI 10:51:20,631 ERROR [STDERR] Mar 13, 2009 10:51:20 AM org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: using system property solr.solr.home: /home/giovanni/development/search/solr 10:51:20,637 ERROR [STDERR] Mar 13, 2009 10:51:20 AM org.apache.solr.core.CoreContainer$Initializer initialize INFO: looking for solr.xml: /home/giovanni/development/search/solr/solr.xml 10:51:20,637 ERROR [STDERR] Mar 13, 2009 10:51:20 AM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to '/home/giovanni/development/search/solr/' 10:51:20,710 ERROR [STDERR] Mar 13, 2009 10:51:20 AM org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Adding 'file:/home/giovanni/development/search/solr/lib/ojdbc14.jar' to Solr classloader 10:51:20,734 ERROR [STDERR] Mar 13, 2009 10:51:20 AM org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: No /solr/home in JNDI 10:51:20,734 ERROR [STDERR] Mar 13, 2009 10:51:20 AM org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: using system property solr.solr.home: /home/giovanni/development/search/solr 10:51:20,735 ERROR [STDERR] Mar 13, 2009 10:51:20 AM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to '/home/giovanni/development/search/solr/' 10:51:20,736 ERROR [STDERR] Mar 13, 2009 10:51:20 AM org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Adding 'file:/home/giovanni/development/search/solr/lib/ojdbc14.jar' to Solr classloader 10:51:21,964 ERROR [STDERR] Mar 13, 2009 10:51:21 AM org.apache.solr.core.SolrConfig init INFO: Loaded SolrConfig: solrconfig.xml 10:51:21,977 ERROR [STDERR] Mar 13, 2009 10:51:21 AM org.apache.solr.core.SolrCore init INFO: Opening new SolrCore at /home/giovanni/development/search/solr/, dataDir=./solr/data/ 10:51:21,991 ERROR [STDERR] Mar 13, 2009 10:51:21 AM org.apache.solr.schema.IndexSchema readSchema INFO: Reading Solr Schema 10:51:22,027 ERROR [STDERR] Mar 13, 2009 10:51:22 AM org.apache.solr.schema.IndexSchema readSchema INFO: Schema name=search 10:51:22,051 ERROR [STDERR] Mar 13, 2009 10:51:22 AM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created string: org.apache.solr.schema.StrField 10:51:22,061 ERROR [STDERR] Mar 13, 2009 10:51:22 AM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created boolean: org.apache.solr.schema.BoolField 10:51:22,067 ERROR [STDERR] Mar 13, 2009 10:51:22 AM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created integer: org.apache.solr.schema.IntField 10:51:22,472 ERROR [STDERR] Mar 13, 2009 10:51:22 AM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created ignored: org.apache.solr.schema.StrField 10:51:22,483 ERROR [STDERR] Mar 13, 2009 10:51:22 AM org.apache.solr.schema.IndexSchema readSchema INFO: default search field is text 10:51:22,485 ERROR [STDERR] Mar 13, 2009 10:51:22 AM org.apache.solr.schema.IndexSchema readSchema INFO: query parser default operator is OR 10:51:22,486 ERROR [STDERR] Mar 13, 2009 10:51:22 AM org.apache.solr.schema.IndexSchema readSchema INFO: unique key field: uri 10:51:22,541 ERROR [STDERR] Mar 13, 2009 10:51:22 AM org.apache.solr.core.JmxMonitoredMap init INFO: JMX monitoring is
DIH with outer joins
I have queries with outer joins defined in some entities and for the same root object I can have two or more lines with different objects, for example: Taking the following 3 tables, andquery defined in the entity with outer joins between tables: Table1 - Table2 - Table3 I can have the following lines returned by the query: Table1Instance1 - Table2Instance1 - Table3Instance1 Table1Instance1 - Table2Instance1 - Table3Instance2 Table1Instance1 - Table2Instance2 - Table3Instance3 Table1Instance2 - Table2Instance3 - Table3Instance4 I wanted to have a single document per root object instance (in this case per Table1 instance) but with the values from the different lines returned. Is it possible to have this behavior in DataImportHandler? How? Thanks in advance, Rui Pereira
Re: input XSLT
Have you tried Solr Cell? http://wiki.apache.org/solr/ExtractingRequestHandler On Mar 13, 2009, at 2:49 AM, CIF Search wrote: But these documents have to be converted to a particular format before being posted. Any XML document cannot be posted to Solr (with XSLT handled by Solr internally). DIH handles any xml format, but it operates in pull mode. On Fri, Mar 13, 2009 at 11:45 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Fri, Mar 13, 2009 at 11:36 AM, CIF Search cifsea...@gmail.com wrote: There is a fundamental problem with using 'pull' approach using DIH. Normally people want a delta imports which are done using a timestamp field. Now it may not always be possible for application servers to sync their timestamps (given protocol restrictions due to security reasons). Due to this Solr application is likely to miss a few records occasionally. Such a problem does not arise if applications themseleves identify their records and post. Should we not have such a feature in Solr, which will allow users to push data onto the index in whichever format they wish to? This will also facilitate plugging in solr seamlessly with all kinds of applications. You can of course push your documents to Solr using the XML/CSV update (or using the solrj client). It's just that you can't push documents with DIH. http://wiki.apache.org/solr/#head-98c3ee61c5fc837b09e3dfe3fb420491c9071be3 -- Regards, Shalin Shekhar Mangar. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Solr: is there a default ClobTransformer?
Hello Paul, I must have missed that detail :-) I am currently using Solr 1.3.0. Thank you very much for your remark: I just downloaded the latest nightly build, compile the whole thing and included the apache-solr-dataimporthandler-1.4-dev.jar in my $SOLR_HOME/lib folder. I have just been able to index an Oracle DB with CLOB columns :-) I hope Solr 1.4.0 will be released soon so that I can have a clean installation rather than a hacked one (now I am using Solr 1.3.0 core with the addition of the mentioned dataimporthandlerjar from Solr 1.4.0). Cheers, Giovanni On Fri, Mar 13, 2009 at 12:29 PM, Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com wrote: ClobTranformer is a Solr1.4 feature. which one are you using? On Fri, Mar 13, 2009 at 4:39 PM, Giovanni De Stefano giovanni.destef...@gmail.com wrote: Hello all, I am trying to index an Oracle DB with some Clob columns. Following the doc I see that I need to transform my entity with a ClobTransformer. Now, my log says the following: 12:05:52,901 ERROR [STDERR] Mar 13, 2009 12:05:52 PM org.apache.solr.handler.dataimport.EntityProcessorBase loadTransformers SEVERE: Unable to load Transformer: ClobTransformer java.lang.ClassNotFoundException: Unable to load ClobTransformer or org.apache.solr.handler.dataimport.ClobTransformer at org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:587) at org.apache.solr.handler.dataimport.EntityProcessorBase.loadTransformers(EntityProcessorBase.java:96) at org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:159) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:80) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) this is pretty easy to understand: no ClobTransformer implementation is found in the classpath. The question is: is there any default ClobTransformer shipped with Solrl or do I have to implement a custom one? Thanks, Giovanni -- --Noble Paul
Re: fl wildcards
On Mar 12, 2009, at 1:43 PM, Schley Andrew Kutz wrote: If I wanted to hack Solr so that it has the ability to process wildcards for the field list parameter (fl), where would I look? (Perhaps I should look on the solr-dev mailing list, but since I am already on this one I thought I would start here). Thanks! One strategy that can be used (and Solr Flare, a RoR plugin, employs this) is to make a request to Solr's Luke request handler at client startup (or whenever you want to reset) to get a list of the fields actually in the index and use that to build the field list and other dynamically controlled things, like facet.field parameters. For example, Flare takes all fields returned from the luke request handler, and all that match *_facet become facet.field parameters in the search requests. Wasn't exactly an answer to your question. Wildcard support for field names in Solr is a feature that really deserves broader implementation consideration than just hacking one spot for fl. Other field list parameters, like hl.fl could use that capability too. Erik
Re: fl wildcards
Thanks. If I knew where to begin to implement this, I would. It seems to me that the constraining of field lists must occur at the very core of Solr because of the reduction in search time when specifying a restrictive set of fields to return. For example, when I return 10 entire documents the search takes a QTime of 170, which I presume is milliseconds. However, the time it takes a browser to render the data puts the actual time into seconds. When I restrict the field list with fl=id,name, the QTime is reduced to 24 -- not a small difference. So, this leads me to believe that the application of field list restrictions is not simply occurring in the response writer. Does anyone know where it *is* occurring? -- -a Ideally, a code library must be immediately usable by naive developers, easily customized by more sophisticated developers, and readily extensible by experts. -- L. Stein On Mar 13, 2009, at 7:21 AM, Erik Hatcher wrote: On Mar 12, 2009, at 1:43 PM, Schley Andrew Kutz wrote: If I wanted to hack Solr so that it has the ability to process wildcards for the field list parameter (fl), where would I look? (Perhaps I should look on the solr-dev mailing list, but since I am already on this one I thought I would start here). Thanks! One strategy that can be used (and Solr Flare, a RoR plugin, employs this) is to make a request to Solr's Luke request handler at client startup (or whenever you want to reset) to get a list of the fields actually in the index and use that to build the field list and other dynamically controlled things, like facet.field parameters. For example, Flare takes all fields returned from the luke request handler, and all that match *_facet become facet.field parameters in the search requests. Wasn't exactly an answer to your question. Wildcard support for field names in Solr is a feature that really deserves broader implementation consideration than just hacking one spot for fl. Other field list parameters, like hl.fl could use that capability too. Erik
Stemming in Solr
Hi, Can someone please let me know how to implement stemming in solr. I am particularly looking of the changes, I might need to do in the config files and also if I need to use some already supplied libraries/factories etc etc. It would be a great help. Thanks, Amit Garg -- View this message in context: http://www.nabble.com/Stemming-in-Solr-tp22495961p22495961.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: fl wildcards
Erik Hatcher wrote: Wasn't exactly an answer to your question. Wildcard support for field names in Solr is a feature that really deserves broader implementation consideration than just hacking one spot for fl. Other field list parameters, like hl.fl could use that capability too. I think 540 added wildcard support for hl.fl -- - Mark http://www.lucidimagination.com
Storing map in Field
All, I'm working with the sample schema, and have a scenario where I would like to store multiple prices in a map of some sort. This would be used for a scenario where a single product has different prices based on a price list. For instance: add doc field name=idSKU001/field field name=nameA Sample Product/field field name=price[pricelist1]119.99/field field name=price[pricelist2]109.99/field /doc /add Is something like this possible? Regards, -Jeff
Re: Storing map in Field
I don't think anything _quite_ like that exists, however you could use wildcard fields to achieve pretty much the same thing. You could use a post like this: add doc field name=idSKU001/field field name=nameA Sample Product/field field name=price_pricelist1119.99/field field name=price_pricelist2109.99/field /doc /add if you have a field definition in your schema.xml like: dynamicField name=price_* type=float indexed=true stored=true/ Regards, Toby. On 13 Mar 2009, at 14:01, Jeff Crowder wrote: All, I'm working with the sample schema, and have a scenario where I would like to store multiple prices in a map of some sort. This would be used for a scenario where a single product has different prices based on a price list. For instance: add doc field name=idSKU001/field field name=nameA Sample Product/field field name=price[pricelist1]119.99/field field name=price[pricelist2]109.99/field /doc /add Is something like this possible? Regards, -Jeff Toby Cole Software Engineer Semantico E: toby.c...@semantico.com W: www.semantico.com
Re: Storing map in Field
H, what do you want to *do* with those multiple prices? Search? Display? Change all the time? Each of these operations will generate different suggestions I daresay Best Erick On Fri, Mar 13, 2009 at 10:01 AM, Jeff Crowder jcrow...@tellusweb.comwrote: All, I'm working with the sample schema, and have a scenario where I would like to store multiple prices in a map of some sort. This would be used for a scenario where a single product has different prices based on a price list. For instance: add doc field name=idSKU001/field field name=nameA Sample Product/field field name=price[pricelist1]119.99/field field name=price[pricelist2]109.99/field /doc /add Is something like this possible? Regards, -Jeff
Re: DIH with outer joins
It may be easier to make a view in the database and index the view. Databases have good tools for that. wunder On 3/13/09 2:46 AM, Rui António da Cruz Pereira ruipereira...@gmail.com wrote: I have queries with outer joins defined in some entities and for the same root object I can have two or more lines with different objects, for example: Taking the following 3 tables, andquery defined in the entity with outer joins between tables: Table1 - Table2 - Table3 I can have the following lines returned by the query: Table1Instance1 - Table2Instance1 - Table3Instance1 Table1Instance1 - Table2Instance1 - Table3Instance2 Table1Instance1 - Table2Instance2 - Table3Instance3 Table1Instance2 - Table2Instance3 - Table3Instance4 I wanted to have a single document per root object instance (in this case per Table1 instance) but with the values from the different lines returned. Is it possible to have this behavior in DataImportHandler? How? Thanks in advance, Rui Pereira
rsync snappuller slowdown Qtime
Hi, Noticing a relevant latency during search, I tried to turn off cronjob and test it manually. And it was obvious how during snappuller on a slave server, the query time was a lot longer than the rest of the time. Even snapinstaller didn't affect the query time. without any action around 200msec with snappuller 3-6sec .. Do you have any idea? Thanks a lot, -- View this message in context: http://www.nabble.com/rsync-snappuller-slowdown-Qtime-tp22497625p22497625.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: fl wildcards
That makes sense, since hl.fl probably can get away with calculating in the writer, and not as part of the core. However, I really need wildcard (or globbing) support for field lists as part of the common query parameter fl. Again, if someone can just point me to where the Solr core is using the contents of the fl param, I am happy to implement this, if only locally for my purposes. Thanks! -- -a Ideally, a code library must be immediately usable by naive developers, easily customized by more sophisticated developers, and readily extensible by experts. -- L. Stein On Mar 13, 2009, at 8:10 AM, Mark Miller wrote: Erik Hatcher wrote: Wasn't exactly an answer to your question. Wildcard support for field names in Solr is a feature that really deserves broader implementation consideration than just hacking one spot for fl. Other field list parameters, like hl.fl could use that capability too. I think 540 added wildcard support for hl.fl -- - Mark http://www.lucidimagination.com
Re: DIH use of the ?command=full-import entity= command option
Bare in mind (and correct me if Im wrong) but a full-import is still a full-import no matter what entity you tack onto the param. Thus I think clean=false should be appended (a friend starting off in Solr was really confused by this + could not understand why it did a delete on all documents). Im not sure if that is clearly stated in the Wiki ... - Jon On Mar 13, 2009, at 1:34 AM, Shalin Shekhar Mangar wrote: On Fri, Mar 13, 2009 at 10:44 AM, Fergus McMenemie fer...@twig.me.ukwrote: If my data-config.xml contains multiple root level entities what is the expected action if I call full-import without an entity=XXX sub-command? Does it process all entities one after the other or only the first? (It would be useful IMHO if it only did the first.) It processes all entities one after the other. If you want to import only one, use the entity parameter. -- Regards, Shalin Shekhar Mangar.
Re: rsync snappuller slowdown Qtime
On Fri, Mar 13, 2009 at 10:33 AM, sunnyfr johanna...@gmail.com wrote: And it was obvious how during snappuller on a slave server, the query time was a lot longer than the rest of the time. Did the CPU utilization drop? It could be writing of the new files being pulled forcing parts of the current index files out of OS cache. iostat could also help look how much data is actually read from disk for a certain number of queries with and without a snappull going on. -Yonik http://www.lucidimagination.com
Re: Tomcat holding deleted snapshots until it's restarted - SOLVED!!!
Hey Yonik, I tested the last nightly build and still happens... but I have solved it! I tell you my solution, it seems to be working well but just want to be sure that it doesn't have any bad effects as for me this is one of the most complicated parts of the Solr source (the fact of dealing with multiple indexsearchers in a syncronized way). I noticed that in the SolrCore.java, there's a part in the function getSearcher where there is a comment saying: // we are all done with the old searcher we used // for warming... And after that the code is: if (currSearcherHolderF!=null) currSearcherHolderF.decref(); The problem here is that this old SolrIndexSearcher is never closed and never removed from _searchers What I have done: if (currSearcherHolderF!=null){ currSearcherHolderF.get().close(); //close SolrIndexSearcher proper currSearcherHolderF.decref(); _searchers.remove(); //remove the } Doing that... if I do a lsof | grep tomcat will see that tomcat is not holding deleted files anymore (as indexsearcher was proper close) and the _searchers var will not accumulate infinite references... It sorts the problem in the stats screen aswell... after 5 full-imports it just shows one IndexSearcher What do you think? -- View this message in context: http://www.nabble.com/Tomcat-holding-deleted-snapshots-until-it%27s-restarted---SOLVED%21%21%21-tp22451252p22500372.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tomcat holding deleted snapshots until it's restarted - SOLVED!!!
decref() decrements the reference count and closes the searcher when it reaches 0 (no more users). Forcing it to close at the point you did is unsafe since other threads may still be using that searcher. The real issue lies somewhere else - either a stuck thread, or some code that is not decrementing the reference when it's done. It's most likely the latter. We need to get to the root cause. Can you open a JIRA bug for this? -Yonik http://www.lucidimagination.com On Fri, Mar 13, 2009 at 12:39 PM, Marc Sturlese marc.sturl...@gmail.com wrote: Hey Yonik, I tested the last nightly build and still happens... but I have solved it! I tell you my solution, it seems to be working well but just want to be sure that it doesn't have any bad effects as for me this is one of the most complicated parts of the Solr source (the fact of dealing with multiple indexsearchers in a syncronized way). I noticed that in the SolrCore.java, there's a part in the function getSearcher where there is a comment saying: // we are all done with the old searcher we used // for warming... And after that the code is: if (currSearcherHolderF!=null) currSearcherHolderF.decref(); The problem here is that this old SolrIndexSearcher is never closed and never removed from _searchers What I have done: if (currSearcherHolderF!=null){ currSearcherHolderF.get().close(); //close SolrIndexSearcher proper currSearcherHolderF.decref(); _searchers.remove(); //remove the } Doing that... if I do a lsof | grep tomcat will see that tomcat is not holding deleted files anymore (as indexsearcher was proper close) and the _searchers var will not accumulate infinite references... It sorts the problem in the stats screen aswell... after 5 full-imports it just shows one IndexSearcher What do you think?
Re: Tomcat holding deleted snapshots until it's restarted - SOLVED!!!
On Fri, Mar 13, 2009 at 1:00 PM, Marc Sturlese marc.sturl...@gmail.com wrote: Ok, I will open a bug issue now. Forcing it to close at the point you did is unsafe since other threads may still be using that searcher. Can you give me an example where other threads would be using that searcher? Any searches that started before the new searcher was registered will still be using the old searcher. Thread A starts executing a search request with Searcher1 Thread B issues a commit - close the writer - open Searcher2 - register Searcher2 (and decrement Searcher1 ref count) Thread B finishes Thread A finishes (decrement Searcher1 ref count) -Yonik http://www.lucidimagination.com
Re: DIH with outer joins
The two entities resolves the problem, but adds some overhead (the queries can be really big). The views doesn't work for me, as the queries are dynamically generated, taken in consideration a determinate topology. Noble Paul നോബിള് नोब्ळ् wrote: have one root entity which just does a select id from Table1 .Then have a child entiy which does all the joins and return all other columns for that 'id'. On Fri, Mar 13, 2009 at 5:10 PM, Rui António da Cruz Pereira ruipereira...@gmail.com wrote: I thought that I could remove the uniqueKey in Solr and then have more that one document with the same id, but then I don't know if in delta-imports the documents outdated or deleted are updated (updated document is added and then we would have the outdated and the updated document in the index) or removed. Noble Paul നോബിള് नोब्ळ् wrote: it is not very clear to me on how it works probably you can put in the queries here. you can do all the joins in the db in one complex query and use that straightaway in an entity. You do not have to do any joins inside DIH itself On Fri, Mar 13, 2009 at 4:47 PM, Rui António da Cruz Pereira ruipereira...@gmail.com wrote: I have queries with outer joins defined in some entities and for the same root object I can have two or more lines with different objects, for example: Taking the following 3 tables, andquery defined in the entity with outer joins between tables: Table1 - Table2 - Table3 I can have the following lines returned by the query: Table1Instance1 - Table2Instance1 - Table3Instance1 Table1Instance1 - Table2Instance1 - Table3Instance2 Table1Instance1 - Table2Instance2 - Table3Instance3 Table1Instance2 - Table2Instance3 - Table3Instance4 I wanted to have a single document per root object instance (in this case per Table1 instance) but with the values from the different lines returned. Is it possible to have this behavior in DataImportHandler? How? Thanks in advance, Rui Pereira
Wildcard query search
Hi, I am trying to perform wildcard search using q query. The query results are returned. After getting the results, I trying to get the highlighting using ressponse.getHighlighting(). It returns empty list. But It works fine for non-wildcard searches. Any ideas please?. Thanks. Karthik
Re: Wildcard query search
Fragments from the user list (search it for the full context, I don't have the URL for the searchable user list handy, but it's on the Wiki) **original post Hi, i'm using solr 1.3.0 and SolrJ for my java application I need to highlight my query words even if I use wildcards for example q=tele* i need to highlight words as television, telephone, etc I found this thread http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200704.mbox/%3cof8c6e2423.f20baa06-onc12572c6.003fc377-c12572c6.00427...@ibs.se%3e but i have not understood ho to solve my problem could anyone tell me how to solve the problem with SolrJ and with solr web (by url)? thanks in advance, Revenge reply* To do it now, you'd have to switch the query parser to using the old style wildcard (and/or prefix) query, which is slower on large indexes and has max clause issues. I think I can make it work out of the box for the next release again though. see https://issues.apache.org/jira/browse/SOLR-825 On Fri, Mar 13, 2009 at 3:06 PM, Narayanan, Karthikeyan karthikeyan.naraya...@gs.com wrote: Hi, I am trying to perform wildcard search using q query. The query results are returned. After getting the results, I trying to get the highlighting using ressponse.getHighlighting(). It returns empty list. But It works fine for non-wildcard searches. Any ideas please?. Thanks. Karthik
Custom handler that forwards a request to another core
Hi, I'm writing a custom handler that forwards a request to a handler of another core. The custom handler is defined in core0 and the core I try to send the request to is core2 which has a mlt handler. Here is the code of my custom handler (extends RequestHandlerBase and implements SolrCoreAware): public void inform(SolrCore core) { this.core = core; this.cores = core.getCoreDescriptor().getCoreContainer(); this.multiCoreHandler = cores.getMultiCoreHandler(); } public void handleRequestBody(SolrQueryRequest request, SolrQueryResponse response) throws Exception { SolrCore coreToRequest = cores.getCore(core2); ModifiableSolrParams params = new ModifiableSolrParams(); params.set(q, Lucene); params.set(mlt.fl, body); params.set(debugQuery, true); request = new LocalSolrQueryRequest(coreToRequest, params); SolrRequestHandler mlt = coreToRequest.getRequestHandler(/mlt); coreToRequest.execute(mlt, request, response); coreToRequest.close(); } I'm calling this handler from firefox with this url (the path of my custom handler is /nlt): http://localhost:8080/solr/core0/nlt With my debugger, I can see, after the execute() method is executed, this line in the log: 13-Mar-2009 4:25:59 PM org.apache.solr.core.SolrCore execute INFO: [core2] webapp=/solr path=/nlt params={} webapp=null path=null params={q=Lucenemlt.fl=bodydebugQuery=true} status=0 QTime=125 Which seems logical: the core2 is executing the request (though I'm wondering how core2 knows about the /nlt path) After, I let the debugger resume the program and I see those lines: 13-Mar-2009 4:25:59 PM org.apache.solr.core.SolrCore execute INFO: [core0] webapp=/solr path=/nlt params={} webapp=null path=null params={q=Lucenemlt.fl=bodydebugQuery=true} status=0 QTime=125 status=0 QTime=141 13-Mar-2009 4:25:59 PM org.apache.solr.common.SolrException log SEVERE: java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.lucene.index.MultiSegmentReader.document(MultiSegmentReader.java:259) at org.apache.lucene.index.IndexReader.document(IndexReader.java:632) at org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:371) at org.apache.solr.request.XMLWriter$3.writeDocs(XMLWriter.java:479) It looks like core0 is also trying to handle the request. With the debugger, I discover that the code is trying to access a document with an id of the index of core2 within the index of core0, which fails (SolrIndexSearcher.java:371). Any idea with there seems to be two cores that try to handle the request? -- View this message in context: http://www.nabble.com/Custom-handler-that-forwards-a-request-to-another-core-tp22501470p22501470.html Sent from the Solr - User mailing list archive at Nabble.com.
Commit is taking very long time
Hello, I am experiencing strange problems while doing commit. I am doing indexing for every 10 min to update index with data base values. commit is taking 7 to 10 min approximately and my indexing is failing due to null pointer exception. If first thread is not completed in 10 min the second thread will be starting to index data. I changed wait=false for the listener from solrconfig.xml file. It stopped getting Null pointer exception but the commit is taking 7 to 10 min. I have approximately 70 to 90 kb of data every time. listener event=postCommit class=solr.RunExecutableListener str name=exesolr/bin/snapshooter/str str name=dir./str bool name=waitfalse/bool arr name=args strarg1/str strarg2/str /arr arr name=env strMYVAR=val1/str /arr /listener I kept all default parameter values in solrconfig.xml except the ramBuffersize to 512. Could you please tell me how can I overcome these problems, also some times I see INFO: Failed to unregister mbean: partitioned because it was not registered Mar 13, 2009 11:49:16 AM org.apache.solr.core.JmxMonitoredMap unregister in my log files. Log file ar 13, 2009 1:28:40 PM org.apache.solr.core.SolrCore execute INFO: [EnglishAuction1-0] webapp=/solr path=/update params={wt=javabinwaitFlush=truecommit=truewaitSearcher=trueversion=2.2} status=0 QTime=247232 Mar 13, 2009 1:30:32 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[79827482, 79845504, 79850902, 79850913, 79850697, 79850833, 79850901, 79798207, ...(93 more)]} 0 62578 Mar 13, 2009 1:30:32 PM org.apache.solr.core.SolrCore execute INFO: [EnglishAuction1-0] webapp=/solr path=/update params={wt=javabinversion=2.2} status=0 QTime=62578 Mar 13, 2009 1:30:32 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true) Mar 13, 2009 1:34:38 PM org.apache.solr.search.SolrIndexSearcher init INFO: Opening searc...@1ba5edf main Mar 13, 2009 1:34:38 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush Mar 13, 2009 1:34:38 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming searc...@1ba5edf main from searc...@81f25 main filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Mar 13, 2009 1:34:38 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for searc...@1ba5edf main filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Mar 13, 2009 1:34:38 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming searc...@1ba5edf main from searc...@81f25 main queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=3,evictions=0,size=3,warmupTime=63,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Mar 13, 2009 1:34:38 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for searc...@1ba5edf main queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=3,evictions=0,size=3,warmupTime=94,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Mar 13, 2009 1:34:38 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming searc...@1ba5edf main from searc...@81f25 main documentCache{lookups=0,hits=0,hitratio=0.00,inserts=20,evictions=0,size=20,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Mar 13, 2009 1:34:38 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for searc...@1ba5edf main documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Mar 13, 2009 1:34:38 PM org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener sending requests to searc...@1ba5edf main Mar 13, 2009 1:34:38 PM org.apache.solr.core.SolrCore execute INFO: [EnglishAuction1-0] webapp=null path=null params={rows=10start=0q=solr} hits=0 status=0 QTime=0 Mar 13, 2009 1:34:38 PM org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener done. Mar 13, 2009 1:34:38 PM org.apache.solr.core.SolrCore execute INFO: [EnglishAuction1-0] webapp=null path=null params={rows=10start=0q=rocks} hits=223 status=0 QTime=0 Mar 13, 2009 1:34:38 PM org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener done. Mar 13, 2009 1:34:38 PM org.apache.solr.core.SolrCore execute INFO: [EnglishAuction1-0] webapp=null path=null params={q=static+newSearcher+warming+query+from+solrconfig.xml} hits=4297 status=0 QTime=0 Mar 13, 2009 1:34:38 PM
Re: Commit is taking very long time
From your logs, it looks like the time is spent in closing of the index. There may be some pending deletes buffered, but they shouldn't take too long. There could also be a merge triggered... but this would only happen sometimes, not every time you commit. One more relatively recent change in Lucene is to sync the index files for safety. Are you perhaps running on Linux with the ext3 filesystem? Not sure what's causing the null pointer exception... do you have a stack trace? -Yonik http://www.lucidimagination.com On Fri, Mar 13, 2009 at 9:05 PM, mahendra mahendra mahendra_featu...@yahoo.com wrote: Hello, I am experiencing strange problems while doing commit. I am doing indexing for every 10 min to update index with data base values. commit is taking 7 to 10 min approximately and my indexing is failing due to null pointer exception. If first thread is not completed in 10 min the second thread will be starting to index data. I changed wait=false for the listener from solrconfig.xml file. It stopped getting Null pointer exception but the commit is taking 7 to 10 min. I have approximately 70 to 90 kb of data every time. listener event=postCommit class=solr.RunExecutableListener str name=exesolr/bin/snapshooter/str str name=dir./str bool name=waitfalse/bool arr name=args strarg1/str strarg2/str /arr arr name=env strMYVAR=val1/str /arr /listener I kept all default parameter values in solrconfig.xml except the ramBuffersize to 512. Could you please tell me how can I overcome these problems, also some times I see INFO: Failed to unregister mbean: partitioned because it was not registered Mar 13, 2009 11:49:16 AM org.apache.solr.core.JmxMonitoredMap unregister in my log files. Log file ar 13, 2009 1:28:40 PM org.apache.solr.core.SolrCore execute INFO: [EnglishAuction1-0] webapp=/solr path=/update params={wt=javabinwaitFlush=truecommit=truewaitSearcher=trueversion=2.2} status=0 QTime=247232 Mar 13, 2009 1:30:32 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[79827482, 79845504, 79850902, 79850913, 79850697, 79850833, 79850901, 79798207, ...(93 more)]} 0 62578 Mar 13, 2009 1:30:32 PM org.apache.solr.core.SolrCore execute INFO: [EnglishAuction1-0] webapp=/solr path=/update params={wt=javabinversion=2.2} status=0 QTime=62578 Mar 13, 2009 1:30:32 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true) Mar 13, 2009 1:34:38 PM org.apache.solr.search.SolrIndexSearcher init INFO: Opening searc...@1ba5edf main Mar 13, 2009 1:34:38 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush Mar 13, 2009 1:34:38 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming searc...@1ba5edf main from searc...@81f25 main filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Mar 13, 2009 1:34:38 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for searc...@1ba5edf main filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Mar 13, 2009 1:34:38 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming searc...@1ba5edf main from searc...@81f25 main queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=3,evictions=0,size=3,warmupTime=63,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Mar 13, 2009 1:34:38 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for searc...@1ba5edf main queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=3,evictions=0,size=3,warmupTime=94,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Mar 13, 2009 1:34:38 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming searc...@1ba5edf main from searc...@81f25 main documentCache{lookups=0,hits=0,hitratio=0.00,inserts=20,evictions=0,size=20,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Mar 13, 2009 1:34:38 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for searc...@1ba5edf main documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Mar 13, 2009 1:34:38 PM org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener sending requests to searc...@1ba5edf main Mar 13, 2009 1:34:38 PM org.apache.solr.core.SolrCore execute INFO: [EnglishAuction1-0] webapp=null path=null params={rows=10start=0q=solr} hits=0
Re: DIH with outer joins
joining entities may have some overhead. Is it prohibitive in absolute terms? On Sat, Mar 14, 2009 at 12:29 AM, Rui António da Cruz Pereira ruipereira...@gmail.com wrote: The two entities resolves the problem, but adds some overhead (the queries can be really big). The views doesn't work for me, as the queries are dynamically generated, taken in consideration a determinate topology. Noble Paul നോബിള് नोब्ळ् wrote: have one root entity which just does a select id from Table1 .Then have a child entiy which does all the joins and return all other columns for that 'id'. On Fri, Mar 13, 2009 at 5:10 PM, Rui António da Cruz Pereira ruipereira...@gmail.com wrote: I thought that I could remove the uniqueKey in Solr and then have more that one document with the same id, but then I don't know if in delta-imports the documents outdated or deleted are updated (updated document is added and then we would have the outdated and the updated document in the index) or removed. Noble Paul നോബിള് नोब्ळ् wrote: it is not very clear to me on how it works probably you can put in the queries here. you can do all the joins in the db in one complex query and use that straightaway in an entity. You do not have to do any joins inside DIH itself On Fri, Mar 13, 2009 at 4:47 PM, Rui António da Cruz Pereira ruipereira...@gmail.com wrote: I have queries with outer joins defined in some entities and for the same root object I can have two or more lines with different objects, for example: Taking the following 3 tables, andquery defined in the entity with outer joins between tables: Table1 - Table2 - Table3 I can have the following lines returned by the query: Table1Instance1 - Table2Instance1 - Table3Instance1 Table1Instance1 - Table2Instance1 - Table3Instance2 Table1Instance1 - Table2Instance2 - Table3Instance3 Table1Instance2 - Table2Instance3 - Table3Instance4 I wanted to have a single document per root object instance (in this case per Table1 instance) but with the values from the different lines returned. Is it possible to have this behavior in DataImportHandler? How? Thanks in advance, Rui Pereira -- --Noble Paul
Re: DataImportHandler Robustness For Imports That Take A Long Time
alternately you can do the commit yourself after marking in the db . Context#getSolrCore().getUpdateHandler().commit() or as you mentioned you can do an autocommit On Sat, Mar 14, 2009 at 12:31 AM, Chris Harris rygu...@gmail.com wrote: Wouldn't this approach get confused if there was an error that caused DIH to do a rollback? For example, suppose this happened: * 1000 successful document adds * The custom transformer saves some marker in the DB to signal that the above docs have been successfully indexed * The next document add throws an exception * DIH, rather than doing a commit, rolls back the 1000 document adds At this point my database marker says that the 1000 docs have been successfully indexed, but the documents themselves are not actually in the Solr index. Because by hypothesis my import query is defined in terms of my DB marker, I'll never end up getting these docs into the Solr index, even if I resolve the issue that causes the exception and re-run the data import. It seems like, to do a safe equivalent of your suggestion, I'd have to somehow A) prevent DIH from doing any rollbacks, B) get DIH to do auto-commits, and C) make my custom transformer update the DB marker only immediately after an auto-commit. On Mon, Mar 9, 2009 at 9:27 PM, Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com wrote: I recommend writing a simple transformer which can write an entry into db after n documents (say 1000). and modify your query to take to consider that entry so that subsequent imports will start from there. DIH does not write the last_index_time unless the import completes successfully. On Tue, Mar 10, 2009 at 1:54 AM, Chris Harris rygu...@gmail.com wrote: I have a dataset (7M-ish docs each of which is maybe 1-100K) that, with my current indexing process, takes a few days or maybe a week to put into Solr. I'm considering maybe switching to indexing with the DataImportHandler, but I'm concerned about the impact of this on indexing robustness: If I understand DIH properly, then if Solr goes down for whatever reason during an import, then DIH loses track of what it has and hasn't yet indexed that round, and will thus probably do a lot of redundant reimporting the next time you run an import command. (For example, if DIH successfully imports row id 100, and then Solr dies before the DIH import finishes, and then I restart Solr and start a new delta-import, then I think DIH will import row id 100 again.) One implication for my dataset seems to be that, unless Solr can actually stay up for several days on end, then DIH will never finish importing my data, even if I manage to keep Solr at, say, 99% uptime. This would be fine if a full import took only a few hours. If full import could take a week, though, this is slightly unnerving. (Sometimes you just need to restart Solr. Or the machine itself, for that matter.) Are there any good ways around this with DIH? One potential option is to give each row in the database table not only a ModificationTimestamp column but also a DataImportHandlerTimestamp column, and try to get DIH to update that column whenever it finishes indexing a row. Then you'd modify the WHERE clause in the DIH config so that instead of determining which rows to index with something like WHERE ModificationTimestamp dataimporter.last_index_time you'd use something like WHERE ModificationTimestamp SolrImportTimestamp In this way, hopefully, DIH can always pick up where it left off last time, rather than trying to redo any work it might have actually managed to do last round. (I'm using something along these lines with my current, non-DIH-based indexing scheme.) Am I making sense here? Chris -- --Noble Paul -- --Noble Paul
Re: what crawler do you use for Solr indexing?
Hello, I built my own crawler with Python, as I couldn't find (not complaining, probably didn't look hard enough) nutch documentation. I use BeautifulSoup, because the site is mostly based on Python/Django, and we like Python. Writing one was good for us because we spent most of out time figuring out what to write ... how to fetch pages, which to choose, what data to store etc. It was an awesome exercise that really narrowed the definition of our project. It helped us define our solr schema and other parts of the project during development. If we knew exactly what sort of data to crawl, and exactly what we intended to save, I'm sure we would have pushed harder at figuring out nutch. If I was to refactor, I would give Heririx and Nutch good looks now. cheers gene Gene Campbell http:www.picante.co.nz gene at picante point co point nz http://www.travelbeen.com - the social search engine for travel On Tue, Mar 10, 2009 at 11:14 PM, Andrzej Bialecki a...@getopt.org wrote: Sean Timm wrote: We too use Heritrix. We tried Nutch first but Nutch was not finding all of the documents that it was supposed to. When Nutch and Heritrix were both set to crawl our own site to a depth of three, Nutch missed some pages that were linked directly from the seed. We ended up with 10%-20% fewer pages in the Nutch crawl. FWIW, from a private conversation with Sean it seems that this was likely related to the default configuration in Nutch, which collects only the first 1000 outlinks from a page. This is an arbitrary and configurable limit, introduced as a way to limit the impact of spam pages and to limit the size of LinkDb. If a page hits this limit then indeed the symptoms that you observe are missing (dropped) links. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: unique result
FWIW... We run a hash or the content and other bits of our docs, and then remove duplicates according to specific algorithms. (exactly the same page content can clearly be hosted on many different urls but, and domains) Then, the choosen ones are indexed. Though we toss the synonyms in the index too, so we know all it's other names. cheers gene Gene Campbell http:www.picante.co.nz gene at picante point co point nz http://www.travelbeen.com - the social search engine for travel On Fri, Feb 27, 2009 at 5:53 AM, Cheng Zhang zhangyongji...@yahoo.com wrote: It's exactly what I'm looking for. Thank you Grant. - Original Message From: Grant Ingersoll gsing...@apache.org To: solr-user@lucene.apache.org Sent: Thursday, February 26, 2009 6:56:22 AM Subject: Re: unique result I presume these all have different unique ids? If you can address it at indexing time, then have a look at https://issues.apache.org/jira/browse/SOLR-799 Otherwise, you might look at https://issues.apache.org/jira/browse/SOLR-236 On Feb 25, 2009, at 6:54 PM, Cheng Zhang wrote: Is it possible to have Solr to remove duplicated query results? For example, instead of return result name=response numFound=572 start=0 doc str name=productGroup_t_i_s_nmWireless/str /doc doc str name=productGroup_t_i_s_nmWireless/str /doc doc str name=productGroup_t_i_s_nmWireless/str /doc doc str name=productGroup_t_i_s_nmVideo Games/str /doc doc str name=productGroup_t_i_s_nmVideo Games/str /doc /result return: result name=response numFound=572 start=0 doc str name=productGroup_t_i_s_nmWireless/str /doc doc str name=productGroup_t_i_s_nmVideo Games/str /doc /result Thanks a lot, Kevin -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
com.ctc.wstx.exc.WstxLazyException exception while passing the text content of a word doc to SOLR
Hi, I am using Apache POI parser to parse a Word Doc and extract the text content. Then i am passing the text content to SOLR. The Word document has many pictures, graphs and tables. But when i am passing the content to SOLR, it fails. Here is the exception trace. 09:31:04,516 ERROR [STDERR] Mar 14, 2009 9:31:04 AM org.apache.solr.common.SolrException log SEVERE: [com.ctc.wstx.exc.WstxLazyException] com.ctc.wstx.exc.WstxParsingException: Illegal charact er entity: expansion character (code 0x7) not a valid XML character at [row,col {unknown-source}]: [40,18] at com.ctc.wstx.exc.WstxLazyException.throwLazily(WstxLazyException.java:45) at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:729) at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3659) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) at org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:327 ) at org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.ja va:195) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandle r.java:123) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain. java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206 ) at org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain. java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206 ) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:235) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.j ava:190) at org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:92) at org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.process(SecurityContextE stablishmentValve.java:126) at org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.invoke(SecurityContextEs tablishmentValve.java:70) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.jboss.web.tomcat.service.jca.CachedConnectionValve.invoke(CachedConnectionValve.java :158) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:330) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:828) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.j ava:601) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:595). Another error trace relating to POI is also throwing up: 09:31:04,828 ERROR [STDERR] java.io.IOException: Unable to read entire header; 130 bytes read; expe cted 512 bytes 09:31:04,828 ERROR [STDERR] at org.apache.poi.poifs.storage.HeaderBlockReader.alertShortRead(He aderBlockReader.java:130) 09:31:04,843 ERROR [STDERR] at org.apache.poi.poifs.storage.HeaderBlockReader.init(HeaderBloc kReader.java:94) 09:31:04,843 ERROR [STDERR] at org.apache.poi.poifs.filesystem.POIFSFileSystem.init(POIFSFile System.java:151) 09:31:04,843 ERROR [STDERR] at org.apache.poi.hwpf.HWPFDocument.verifyAndBuildPOIFS(HWPFDocumen t.java:133) 09:31:04,843 ERROR [STDERR] at org.apache.poi.hwpf.extractor.WordExtractor.init(WordExtractor .java:51) 09:31:04,859 ERROR [STDERR] at com.apple.servlet.SearchApplicationServlet.parseWordFile(SearchA pplicationServlet.java:963) 09:31:04,859 ERROR [STDERR] at com.apple.servlet.SearchApplicationServlet.indexDirectory(Search ApplicationServlet.java:813) 09:31:04,859 ERROR [STDERR] at com.apple.servlet.SearchApplicationServlet.index(SearchApplicati onServlet.java:747) 09:31:04,859 ERROR [STDERR] at com.apple.servlet.SearchApplicationServlet.processAdd(SearchAppl icationServlet.java:331) 09:31:04,874 ERROR [STDERR] at com.apple.servlet.SearchApplicationServlet.doGet(SearchApplicati onServlet.java:160) 09:31:04,874 ERROR [STDERR] at
Re: Solr: ERRORs at Startup
: Even setting everything to INFO through : http://localhost:8080/solr/admin/logging didn't help. : : But considering you do not see any bad issue here, at this time I will : ignore those ERROR messages :-) i would read up more on how to configure logging in JBoss. as far as i can tell, Solr is logging messages, which are getting handled by a logger that writes them to STDERR using a fairly standard format (date, class, method, level, msg) ... except some other piece of code seems to be reading from STDERR, and assuming anything that got written there is an ERROR, so it's loging those writes to stderr using a format with a date, a level (of ERROR), and a group or some other identifier of STDERR the problem is if you ignore them completely, you're going to miss noticing when you really have a problem. Like i said: figure out how to configure logging in JBoss, you might need to change the slf4j adapater jar or something if it can't deal with JUL (which is the default). : 10:51:20,525 INFO [TomcatDeployment] deploy, ctxPath=/solr : 10:51:20,617 ERROR [STDERR] Mar 13, 2009 10:51:20 AM : org.apache.solr.servlet.SolrDispatchFilter init : INFO: SolrDispatchFilter.init() -Hoss