DIH ConcurrentModificationException
I've got a ConcurrentModificationException during a cron-ed delta import of DIH, I'm using multicore solr nightly from hudson 2009-04-02_08-06-47. I don't know if this stacktrace maybe useful to you, but here it is: java.util.ConcurrentModificationException at java.util.LinkedHashMap$LinkedHashIterator.nextEntry(Unknown Source) at java.util.LinkedHashMap$EntryIterator.next(Unknown Source) at java.util.LinkedHashMap$EntryIterator.next(Unknown Source) at org.apache.solr.handler.dataimport.DataImporter.getStatusMessages(DataImporter.java:384) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:210 ) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) of-course due to the nature of this exception I doubt it can be reproduced easily (this is the only one I've got, and the croned job runned a lot of times), but maybe should a synchronized be put somewhere? ciao, Walter
Re: Multiple Core schemas with single solr.solr.home
the only issue you may have will be related to software that writes files in solr-home, but the only one I can think of is dataimport.properties of DIH, so if you use DIH, you may want to make dataimport.properties location to be configurable dinamically, like an entry in data-config.xml, otherwise each import on a core will change the file for all cores; Another (easier? safer?) option would be to use symbolic links, i.e make a dir per core and add in each one a simbolic link for xml files, so that they all read the same. On Sat, Apr 4, 2009 at 6:28 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Sat, Apr 4, 2009 at 9:51 PM, Rakesh Sinha rakesh.use...@gmail.com wrote: I am planning to configure a solr server with multiple cores with different schema for themselves with a single solr.solr.home . Are there any examples in the wiki to the wiki ( the ones that I see have a single schema.xml for a given solr.solr.home under schema directory. ). Thanks for helping pointing to the same. It should be possible though I don't there are any examples. You can specify the same instanceDir for different cores but different dataDir (specifying dataDir in solr.xml is a trunk feature) -- Regards, Shalin Shekhar Mangar.
Re: Too many open files and background merge exceptions
you may try to put true in that useCompoundFile entry; this way indexing should use far less file descriptors, but it will slow down indexing, see http://issues.apache.org/jira/browse/LUCENE-888. Try to see if the reason of lack of descriptors is related only on solr. How are you using indexing, by using solrj, by posting xmls? Are the files being opened/parsed on the same machine of solr? On Mon, Apr 6, 2009 at 2:58 PM, Jarek Zgoda jarek.zg...@redefine.pl wrote: I'm indexing a set of 50 small documents. I'm adding documents in batches of 1000. At the beginning I had a setup that optimized the index each 1 documents, but quickly I had to optimize after adding each batch of documents. Unfortunately, I'm still getting the Too many open files IO error on optimize. I went from mergeFactor of 25 down to 10, but I'm still unable to optimize the index. I have configuration: useCompoundFilefalse/useCompoundFile ramBufferSizeMB256/ramBufferSizeMB mergeFactor2/mergeFactor maxMergeDocs2147483647/maxMergeDocs maxFieldLength1/maxFieldLength The machine (2 core AMD64, 4GB RAM) is running Debian Linux, Java is 1.6.0_11 64-Bit, Solr is nightly build (2009-04-02). And no, I can not change the limit of file descriptors (currently: 1024). What more can I do? -- We read Knuth so you don't have to. - Tim Peters Jarek Zgoda, RD, Redefine jarek.zg...@redefine.pl
Re: datadir issue for multicore on latest trunk
it also ignore dataDir directive in solr.xml, in fact adding: core name=core0 instanceDir=core0 property name=dataDir value=/multicore/core0 / /core doesn't change the behavior. this seems a bug introduced somewhere after 2nd february any clue? On Tue, Mar 3, 2009 at 5:56 PM, Walter Ferrara walters...@gmail.com wrote: there is a strange behavior which seems to affect hudson today (March 3rd) build but not (for example) hudson February 2th build. Basically when I start the multicore enviroment, it just create datadir in the current path. To replicate: 1. download latest trunk 2. go to example directory $ ls README.txt example-DIH exampledocs logs solr start.jar work etc exampleAnalysis lib multicore start.bat webapps $ java -Dsolr.solr.home=multicore -jar start.jar (then kill/sleep the process) $ ls README.txt etc exampledocs multicore start.jar core0 example-DIH lib solr webapps core1 exampleAnalysis logs start.bat work you see core0 and core1 directory where they should not be :-); solr-1041 doesn't fix in this case. ciao, Walter
Re: datadir issue for multicore on latest trunk
-mar-2009 12.50.55 org.apache.solr.schema.IndexSchema readSchema INFO: Reading Solr Schema 4-mar-2009 12.50.55 org.apache.solr.schema.IndexSchema readSchema INFO: Schema name=example core one 4-mar-2009 12.50.55 org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created string: org.apache.solr.schema.StrField 4-mar-2009 12.50.55 org.apache.solr.schema.IndexSchema readSchema INFO: default search field is name 4-mar-2009 12.50.55 org.apache.solr.schema.IndexSchema readSchema INFO: query parser default operator is OR 4-mar-2009 12.50.55 org.apache.solr.schema.IndexSchema readSchema INFO: unique key field: id 4-mar-2009 12.50.55 org.apache.solr.core.SolrCore init INFO: [core1] Opening new SolrCore at multicore\core1/, dataDir=core1/data\ 4-mar-2009 12.50.55 org.apache.solr.core.SolrCore init INFO: JMX monitoring not detected for core: core1 4-mar-2009 12.50.55 org.apache.solr.core.SolrCore parseListener INFO: [core1] Searching for listeners: //listen...@event=firstSearcher] 4-mar-2009 12.50.55 org.apache.solr.core.SolrCore parseListener INFO: [core1] Searching for listeners: //listen...@event=newSearcher] 4-mar-2009 12.50.55 org.apache.solr.core.SolrCore initIndex AVVERTENZA: [core1] Solr index directory 'core1\data\index' doesn't exist. Creating new index... 4-mar-2009 12.50.55 org.apache.solr.update.SolrIndexWriter getDirectory AVVERTENZA: No lockType configured for core1/data\index/ assuming 'simple' 4-mar-2009 12.50.55 org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created standard: org.apache.solr.handler.StandardRequestHandler 4-mar-2009 12.50.55 org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created /update: org.apache.solr.handler.XmlUpdateRequestHandler 4-mar-2009 12.50.55 org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created /admin/: org.apache.solr.handler.admin.AdminHandlers 4-mar-2009 12.50.55 org.apache.solr.search.SolrIndexSearcher init INFO: Opening searc...@13785d3 main 4-mar-2009 12.50.55 org.apache.solr.update.DirectUpdateHandler2$CommitTracker init INFO: AutoCommit: disabled 4-mar-2009 12.50.55 org.apache.solr.handler.component.SearchHandler inform INFO: Adding component:org.apache.solr.handler.component.querycompon...@1e228bc 4-mar-2009 12.50.55 org.apache.solr.handler.component.SearchHandler inform INFO: Adding component:org.apache.solr.handler.component.facetcompon...@e06940 4-mar-2009 12.50.55 org.apache.solr.handler.component.SearchHandler inform INFO: Adding component:org.apache.solr.handler.component.morelikethiscompon...@11e0c13 4-mar-2009 12.50.55 org.apache.solr.handler.component.SearchHandler inform INFO: Adding component:org.apache.solr.handler.component.highlightcompon...@1aae94f 4-mar-2009 12.50.55 org.apache.solr.handler.component.SearchHandler inform INFO: Adding component:org.apache.solr.handler.component.statscompon...@1bb5c09 4-mar-2009 12.50.55 org.apache.solr.handler.component.SearchHandler inform INFO: Adding debug component:org.apache.solr.handler.component.debugcompon...@1976011 4-mar-2009 12.50.55 org.apache.solr.core.CoreContainer register INFO: registering core: core1 4-mar-2009 12.50.55 org.apache.solr.core.SolrCore registerSearcher INFO: [core1] Registered new searcher searc...@13785d3 main 4-mar-2009 12.50.55 org.apache.solr.servlet.SolrDispatchFilter init INFO: user.dir=d:\DEV\apache-solr-2009-03-03_08-06-53\example 4-mar-2009 12.50.55 org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() done 4-mar-2009 12.50.55 org.apache.solr.servlet.SolrServlet init INFO: SolrServlet.init() 4-mar-2009 12.50.55 org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: JNDI not configured for solr (NoInitialContextEx) 4-mar-2009 12.50.55 org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: using system property solr.solr.home: multicore 4-mar-2009 12.50.55 org.apache.solr.servlet.SolrServlet init INFO: SolrServlet.init() done 4-mar-2009 12.50.55 org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: JNDI not configured for solr (NoInitialContextEx) 4-mar-2009 12.50.55 org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: using system property solr.solr.home: multicore 4-mar-2009 12.50.55 org.apache.solr.servlet.SolrUpdateServlet init INFO: SolrUpdateServlet.init() done 2009-03-04 12:50:55.687::INFO: Started SocketConnector @ 0.0.0.0:8983 2009-03-04 12:51:05.953::INFO: Shutdown hook executing 2009-03-04 12:51:05.984::INFO: Shutdown hook complete On Wed, Mar 4, 2009 at 12:36 PM, Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com wrote: property name=dataDir value=/multicore/core0 / is not honored automatically. the dataDir tag has to be present in solrconfig.xml and it should use this value. but you can specify it as follows core name=core0 instanceDir=core0 dataDir=/multicore/core0/ then it should be fine. can you just paste the log messages as solr starts --Noble On Wed, Mar 4, 2009 at 4:15 PM, Walter Ferrara walters...@gmail.com wrote
Re: datadir issue for multicore on latest trunk
tried with core name=core0 instanceDir=core0 dataDir=multicore/core0/ core name=core1 instanceDir=core1 dataDir=multicore/core1/ but no luck, the dataDir parameter seems ignored, no matter what is written there On Wed, Mar 4, 2009 at 12:58 PM, Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com wrote: looks like a bug. must reopen the issue On Wed, Mar 4, 2009 at 5:26 PM, Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com wrote: On Wed, Mar 4, 2009 at 5:24 PM, Walter Ferrara walters...@gmail.com wrote: using: cores adminPath=/admin/cores core name=core0 instanceDir=core0 dataDir=/multicore/core0/ core name=core1 instanceDir=core1 dataDir=/multicore/core1/ /cores doesn't work either dataDir=/multicore/core0 means the path is absolute. where did it create? here the output: 2009-03-04 12:50:54.890::INFO: Logging to STDERR via org.mortbay.log.StdErrLog 2009-03-04 12:50:54.968::INFO: jetty-6.1.3 4-mar-2009 12.50.55 org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() 4-mar-2009 12.50.55 org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: JNDI not configured for solr (NoInitialContextEx) 4-mar-2009 12.50.55 org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: using system property solr.solr.home: multicore 4-mar-2009 12.50.55 org.apache.solr.core.CoreContainer$Initializer initialize INFO: looking for solr.xml: d:\DEV\apache-solr-2009-03-03_08-06-53\example\multicore\solr.xml 4-mar-2009 12.50.55 org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to 'multicore/' 4-mar-2009 12.50.55 org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Reusing parent classloader 4-mar-2009 12.50.55 org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to 'multicore\core0/' 4-mar-2009 12.50.55 org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Reusing parent classloader 4-mar-2009 12.50.55 org.apache.solr.core.SolrConfig init INFO: Loaded SolrConfig: solrconfig.xml 4-mar-2009 12.50.55 org.apache.solr.schema.IndexSchema readSchema INFO: Reading Solr Schema 4-mar-2009 12.50.55 org.apache.solr.schema.IndexSchema readSchema INFO: Schema name=example core zero 4-mar-2009 12.50.55 org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created string: org.apache.solr.schema.StrField 4-mar-2009 12.50.55 org.apache.solr.schema.IndexSchema readSchema INFO: default search field is name 4-mar-2009 12.50.55 org.apache.solr.schema.IndexSchema readSchema INFO: query parser default operator is OR 4-mar-2009 12.50.55 org.apache.solr.schema.IndexSchema readSchema INFO: unique key field: id 4-mar-2009 12.50.55 org.apache.solr.core.SolrCore init INFO: [core0] Opening new SolrCore at multicore\core0/, dataDir=core0//multicore/core0/ 4-mar-2009 12.50.55 org.apache.solr.core.SolrCore init INFO: JMX monitoring not detected for core: core0 4-mar-2009 12.50.55 org.apache.solr.core.SolrCore parseListener INFO: [core0] Searching for listeners: //listen...@event=firstSearcher] 4-mar-2009 12.50.55 org.apache.solr.core.SolrCore parseListener INFO: [core0] Searching for listeners: //listen...@event=newSearcher] 4-mar-2009 12.50.55 org.apache.solr.core.SolrCore initIndex AVVERTENZA: [core0] Solr index directory 'core0\multicore\core0\index' doesn't exist. Creating new index... 4-mar-2009 12.50.55 org.apache.solr.update.SolrIndexWriter getDirectory AVVERTENZA: No lockType configured for core0//multicore/core0/index/ assuming 'simple' 4-mar-2009 12.50.55 org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created standard: org.apache.solr.handler.StandardRequestHandler 4-mar-2009 12.50.55 org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created /update: org.apache.solr.handler.XmlUpdateRequestHandler 4-mar-2009 12.50.55 org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created /admin/: org.apache.solr.handler.admin.AdminHandlers 4-mar-2009 12.50.55 org.apache.solr.search.SolrIndexSearcher init INFO: Opening searc...@1e57e8f main 4-mar-2009 12.50.55 org.apache.solr.update.DirectUpdateHandler2$CommitTracker init INFO: AutoCommit: disabled 4-mar-2009 12.50.55 org.apache.solr.handler.component.SearchHandler inform INFO: Adding component:org.apache.solr.handler.component.querycompon...@19a32e0 4-mar-2009 12.50.55 org.apache.solr.handler.component.SearchHandler inform INFO: Adding component:org.apache.solr.handler.component.facetcompon...@8238f4 4-mar-2009 12.50.55 org.apache.solr.handler.component.SearchHandler inform INFO: Adding component:org.apache.solr.handler.component.morelikethiscompon...@16925b0 4-mar-2009 12.50.55 org.apache.solr.handler.component.SearchHandler inform INFO: Adding component:org.apache.solr.handler.component.highlightcompon...@297ffb 4-mar-2009 12.50.55 org.apache.solr.handler.component.SearchHandler inform
datadir issue for multicore on latest trunk
there is a strange behavior which seems to affect hudson today (March 3rd) build but not (for example) hudson February 2th build. Basically when I start the multicore enviroment, it just create datadir in the current path. To replicate: 1. download latest trunk 2. go to example directory $ ls README.txt example-DIH exampledocs logs solr start.jar work etc exampleAnalysis lib multicore start.bat webapps $ java -Dsolr.solr.home=multicore -jar start.jar (then kill/sleep the process) $ ls README.txt etc exampledocs multicore start.jar core0 example-DIH lib solr webapps core1 exampleAnalysis logs start.bat work you see core0 and core1 directory where they should not be :-); solr-1041 doesn't fix in this case. ciao, Walter
Re: dataimporthandler and mysql connector jar
Shalin Shekhar Mangar wrote: Can you please open a JIRA issue for this? However, we may only be able to fix this after 1.3 because a code freeze has been decided upon, to release 1.3 asap. I've open https://issues.apache.org/jira/browse/SOLR-726 Walter
dataimporthandler and multiple delta-import
I'm using DIH and its wonderful delta-import. I have a question: the delta-import is synchronized? multiple call to delta imports, shouldn't result in one refused because the status is not idle? I've noticed however that calling multiple times in a sec the dataimport/?command=delta-import result in a strange exception: GRAVE: Delta Import Failed org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select entity from testtable where last_modified '2008-08-26 13:05:09' Processing Document # 1 at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:171) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:128) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:41) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextModifiedRowKey(SqlEntityProcessor.java:92) at org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:479) at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:192) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:131) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:357) at org.apache.solr.handler.dataimport.DataImporter.rumCmd(DataImporter.java:386) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:375) Caused by: com.mysql.jdbc.exceptions.MySQLNonTransientConnectionException: No operations allowed after connection closed. at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:888) at com.mysql.jdbc.Connection.checkClosed(Connection.java:1930) at com.mysql.jdbc.Connection.createStatement(Connection.java:3094) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:159) ... 10 more while calling the delta-import, waiting a bit, and recalling it again works fine... thanks, Walter
Re: dataimporthandler and multiple delta-import
Shalin Shekhar Mangar wrote: Hi Walter, Indeed, there's a race condition there because we didn't expect people to hit it concurrently. We expected that imports would be run sequentially. Thanks for noticing this. We shall add synchronization to the next release. Do you mind (again) opening an issue for this? We'll attach a patch soon. no problem! I've opened https://issues.apache.org/jira/browse/SOLR-728 I do understand the fact that import should be run sequentially, the main issue I can foresee is a delta-import via curl in crontab, that curl have no way to know if previous delta import was effectively over -- in my opinion, if there is a (delta|full)-import already running it should state that it cannot go ahead because another import process is running already. thank you for your fast reply and all your work in solr, Walter
dataimporthandler and mysql connector jar
Launching a multicore solr with dataimporthandler using a mysql driver, (driver=com.mysql.jdbc.Driver) works fine if the mysql connector jar (mysql-connector-java-5.0.7-bin.jar) is in the classpath, either jdk classpath or inside the solr.war lib dir. While putting the mysql-connector-java-5.0.7-bin.jar in core0/lib directory, or in the multicore shared lib dir (specified in sharedLib attribute in solr.xml) result in exception, even if the jar is correctly loaded by the classloader: 25-ago-2008 16.36.05 org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Adding 'file:/E:/Temp/apache-solr-2008-08-25_08-06-39/example/solr/lib/mysql-connector-java-5.0.7-bin.jar' to Solr classloader [..] GRAVE: Exception while loading DataImporter org.apache.solr.handler.dataimport.DataImportHandlerException: Failed to initialize DataSource: null Processing Document # at org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:306) at org.apache.solr.handler.dataimport.DataImporter.addDataSource(DataImporter.java:273) at org.apache.solr.handler.dataimport.DataImporter.initEntity(DataImporter.java:228) at org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.java:98) at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:106) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:311) at org.apache.solr.core.SolrCore.init(SolrCore.java:475) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:323) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:216) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:104) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594) at org.mortbay.jetty.servlet.Context.startContext(Context.java:139) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117) at org.mortbay.jetty.Server.doStart(Server.java:210) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.mortbay.start.Main.invokeMain(Main.java:183) at org.mortbay.start.Main.start(Main.java:497) at org.mortbay.start.Main.main(Main.java:115) Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: driver could not be loaded Processing Document # at org.apache.solr.handler.dataimport.JdbcDataSource.createConnectionFactory(JdbcDataSource.java:110) at org.apache.solr.handler.dataimport.JdbcDataSource.init(JdbcDataSource.java:63) at org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:303) ... 34 more Shouldn't it works when the jar is (only) in the core lib dir? tested on windows machine, with java 1.6 and today hudson nightly build of solr. Walter
Re: Issuing queries during analysis?
Dallan Quass wrote: I have a situation where it would be beneficial to issue queries in a filter that is called during analysis. In a nutshell, I have an index of places that includes possible abbreviations. And I want to query this index during analysis to convert user-entered places to standardized places. So if someone enters Chicago, IL into a place field, I want to write a filter that first issues a query on IL to find that the standardized name for IL is Illinois, and then issues a query on places named Chicago located in Illinois to find that the standardized name is Chicago, Cook, Illinois, and then returns this string in a token. this may sound a bit too KISS - but another approach could be based on synonyms, i.e. if the number of abbreviation is limited and defined (All US States), you can simply define complete state name for each abbreviation, this way a Chicago, IL will be translated (...) in Chicago, Illinois during indexing and/or querying... but this may depend by the Tokenizer you use and how your index is defined (do a search for Chicago, Illinois on a field gives you a doc with Chicago, Cook, Illinois in some (other/same) field?) have a look here: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2c461ac74b4ddd82e453dc68fcfc92da77358d46
Solr Multicore, are there any way to retrieve all the cores registered?
In solr, last trunk version in svn, is it possible to access the core registry, or what used to be the static MultiCore object? My goal is to retrieve all the cores registered in a given (multicore) enviroment. It used to be MultiCore.getRegistry() initially, at first stages of solr-350; but now MultiCore is static no more, and I don't find any reference to where pick up the multicore object initialized from multicore.xml. any tips on how to retrieve such information now? Walter
Re: Solr Multicore, are there any way to retrieve all the cores registered?
Ryan McKinley wrote: check the status action also, check the index.jsp page index.jsp do: org.apache.solr.core.MultiCore multicore = (org.apache.solr.core.MultiCore)request.getAttribute(org.apache.solr.MultiCore); which is ok in a servlet, but how should I do the same inside an handler, i.e. having just SolrQueryRequest and SolrQueryResponse? Is it something that can be extracted from SolrQueryRequest.getContext? and, in the perspective of solr 1.3, will this functionality be maintained? thanks, Walter
HTMLStripReader and script tags
I've noticed that passing html to a field using HTMLStripWhitespaceTokenizerFactory, ends up in having some javascripts too. For example, using a analyzer like: fieldType name=HTMLStripper2 class=solr.TextField analyzer tokenizer class=solr.HTMLStripWhitespaceTokenizerFactory/ /analyzer /fieldType with a text such as: html headtitletitle/title/head body pre SCRIPT LANGUAGE=JavaScript var time = new Date(); ordval= (time.getTime()); /SCRIPT post !-- comment -- /body /html Analysis.jsp turns out those tokens: title pre var time = new Date(); ordval= (time.getTime()); post While if the script in the page is commented, everything works fine. Is this due to design choice? Shouldn't scripts be removed in both cases? (Solr Implementation Version: 2008-03-24_09-57-01 ${svnversion} - hudson - 2008-03-24 09:59:40) Walter
Re: CorruptIndexException: unknown format version: -3
did you create/modify the index with a newer version of lucene than the one you use in solr? In this case I doubt you can downgrade your index, but maybe you can upgrade lucene in your solr (search in this forum, there should be a thread about this), (or try with the latest nightly builds) Paul Danese wrote: 2Hi all, Is there any way to recover from such an error as listed in the subject heading? Luke can view the index just fine (at least at a cursory level Luke is able to open the index, give me back the # of docs, etc.), but solr throws this exception whenever I try and start it up. any ideas on how to proceed? can I use luke or something else to uncorrupt, modify or save my index into a non-corrupt format? TIA!! org.apache.solr.core.SolrException log SEVERE: java.lang.RuntimeException: org.apache.lucene.index.CorruptIndexException: Unknown format version:-3 at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:433) at org.apache.solr.core.SolrCore.init(SolrCore.java:216) at org.apache.solr.core.SolrCore.getSolrCore(SolrCore.java:177) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594) at org.mortbay.jetty.servlet.Context.startContext(Context.java:139) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117) at org.mortbay.jetty.Server.doStart(Server.java:210) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.mortbay.start.Main.invokeMain(Main.java:183) at org.mortbay.start.Main.start(Main.java:497) at org.mortbay.start.Main.main(Main.java:115) Caused by: org.apache.lucene.index.CorruptIndexException: Unknown format version:-3 at org.apache.lucene.index.SegmentTermEnum.init(SegmentTermEnum.java:64) at org.apache.lucene.index.TermInfosReader.init(TermInfosReader.java:49) at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:184) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:157) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:139) at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:194) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:610) at org.apache.lucene.index.IndexReader.open(IndexReader.java:184) at org.apache.lucene.index.IndexReader.open(IndexReader.java:148) at org.apache.solr.search.SolrIndexSearcher.init(SolrIndexSearcher.java:87) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:424) ... 27 more - Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now.
Re: criteria for using the property stored=true and indexed=true
See: http://wiki.apache.org/solr/SchemaXml#head-af67aefdc51d18cd8556de164606030446f56554 indexed means searchable (facet and sort also need this), stored instead is needed only when you need the original text (i.e. not tokenized/analyzed) to be returned. When stored and indexed are not present, I think solr put them to a default true (both of them) Dilip.TS wrote: Hi, I would be some clarifications on which fields should we assign the property stored=true and indexed=true What is the criteria for these property assignments? What would be the impact if no field is assigned with this property? Thanks in Advance, Regards, Dilip TS Starmark Services Pvt. Ltd.
Re: 2GB limit on 32 bits
Isn't Xeon5110 64bit? Maybe you could just put a 64 bit OS in you box. Also, take a look at http://www.spack.org/wiki/LinuxRamLimits -- Walter Isart Montane wrote: I've got a dual Xeon. Here you are my cpuinfo. I've read the limit on a 2.6linux kernel is 4GB on user space and 4GB for kernel... that's why I asked if there's any way to reach 4GB per process. Thanks anyway :( cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU5110 @ 1.60GHz stepping: 6 cpu MHz : 1596.192 cache size : 4096 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl vmx tm2 cx16 xtpr lahf_lm bogomips: 3194.21 processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU5110 @ 1.60GHz stepping: 6 cpu MHz : 1596.192 cache size : 4096 KB physical id : 0 siblings: 2 core id : 1 cpu cores : 2 fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl vmx tm2 cx16 xtpr lahf_lm bogomips: 3192.09 processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU5110 @ 1.60GHz stepping: 6 cpu MHz : 1596.192 cache size : 4096 KB physical id : 3 siblings: 2 core id : 0 cpu cores : 2 fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl vmx tm2 cx16 xtpr lahf_lm bogomips: 3192.13 processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU5110 @ 1.60GHz stepping: 6 cpu MHz : 1596.192 cache size : 4096 KB physical id : 3 siblings: 2 core id : 1 cpu cores : 2 fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl vmx tm2 cx16 xtpr lahf_lm bogomips: 3192.12 On Nov 9, 2007 9:26 AM, Norberto Meijome [EMAIL PROTECTED] wrote: On Fri, 9 Nov 2007 09:03:01 -0300 Isart Montane [EMAIL PROTECTED] wrote: I've read there's a kernel limitation for a 32 bits architecture of 2Gb per process, and i just wanna know if anybody knows an alternative to get a new 64bits server. You don't say what CPU you have. But the 32 bit limit is real (it's an architecture issue, not a kernel limitation...). You could try running several servers on different ports, each managing part of your index, each up to 2 GB RAM - but you may be pushing your CPU / disks too much and hit other issues - try and see how it goes. If I were you, i'd seriously look into getting a new (64 bit) server . B _ {Beto|Norberto|Numard} Meijome Too bad ignorance isn't painful. Don Lindsay I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Solr and FieldCache
I have an index with several fields, but just one stored: ID (string, unique). I need to access that ID field for each of the tops nodes docs in my results (this is done inside a handler I wrote), code looks like: Hits hits = searcher.search(query); for(int i=0; inodes; i++) { id[i]=hits.doc(i).get(ID); score[i]=hits.score(i); } I noticed that retrieving the code is slow. if I use the FieldCache, like: id[i]=FieldCache.DEFAULT.getStrings(searcher.getReader(), ID)[hits.id(i)]; after the first execution (the initialization of the cache take some times), it seems to run much faster. But what happens when SOLR reload the index (after a commit, or an optimize for example)? Will it refresh the cache with new reader (in the warmup process?), or it will be the first query execution of that code (with the new reader) that will force the refresh? (this could mean that every first query after a reload will be slower) Is there any way to tell SOLR to cache and warmup when needed this ID field? Thanks, Walter
Re: Solr and FieldCache
About stored/index difference: ID is a string, (= solr.StrField) so FieldCache give me what I need. I'm just wondering, as this cached object could be (theoretically) pretty big, do I need to be aware of some OOM? I know that FieldCache use weakmaps, so I presume the cached array for the older reader(s) will be gc-ed when the reader is no longer referenced (i.e. when solr load the new one, after its warmup and so on), is that right? Thanks -- J.J. Larrea wrote: At 5:30 PM +0200 9/20/07, Walter Ferrara wrote: I have an index with several fields, but just one stored: ID (string, unique). I need to access that ID field for each of the tops nodes docs in my results (this is done inside a handler I wrote), code looks like: Hits hits = searcher.search(query); for(int i=0; inodes; i++) { id[i]=hits.doc(i).get(ID); score[i]=hits.score(i); } I noticed that retrieving the code is slow. if I use the FieldCache, like: id[i]=FieldCache.DEFAULT.getStrings(searcher.getReader(), ID)[hits.id(i)]; I assume you're putting FieldCache.DEFAULT.getStrings(searcher.getReader(), ID) in an array outside the loop, saving 2 redundant method calls per iteration. after the first execution (the initialization of the cache take some times), it seems to run much faster. Do note that FieldCache.DEFAULT is caching the indexed values, not the stored values. Since your field is an ID you are probably indexing it in such a way that both are identical, e.g. with KeywordTokenizer, so you're not seeing a difference. But what happens when SOLR reload the index (after a commit, or an optimize for example)? Will it refresh the cache with new reader (in the warmup process?), or it will be the first query execution of that code (with the new reader) that will force the refresh? (this could mean that every first query after a reload will be slower) It is refreshed by Lucene the first time the FieldCache array is requested from the new IndexReader. Is there any way to tell SOLR to cache and warmup when needed this ID field? Absolutely, just put a warmup query in solrconfig.xml which makes request that invokes FieldCache.DEFAULT.getStrings on that field. Simplest would probably be to invoke your custom handler, perhaps passing arguments that limit it to only processing one document to limit the data which gets cached; since getStrings returns the entire array, one pass through your loop is fine. If that's not easy with your handler, you could achieve the same effect by setting up a handler which facets on the ID field, sorting by ID (facet.sort=false), and only asks for a single value (facet.limit=1) (the entire id[docid] array will get scanned to count references to that ID, but that ensures it gets paged in). - J.J.