DIH ConcurrentModificationException

2009-05-04 Thread Walter Ferrara
I've got a ConcurrentModificationException during a cron-ed delta import of
DIH, I'm using multicore solr nightly from hudson 2009-04-02_08-06-47.
I don't know if this stacktrace maybe useful to you, but here it is:

java.util.ConcurrentModificationException
at java.util.LinkedHashMap$LinkedHashIterator.nextEntry(Unknown
Source)
at java.util.LinkedHashMap$EntryIterator.next(Unknown Source)
at java.util.LinkedHashMap$EntryIterator.next(Unknown Source)
at
org.apache.solr.handler.dataimport.DataImporter.getStatusMessages(DataImporter.java:384)
at
org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:210
)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

of-course due to the nature of this exception I doubt it can be reproduced
easily (this is the only one I've got, and the croned job runned a lot of
times), but maybe should a synchronized be put somewhere?
ciao,
Walter


Re: Multiple Core schemas with single solr.solr.home

2009-04-06 Thread Walter Ferrara
the only issue you may have will be related to software that writes files in
solr-home, but the only one I can think of is dataimport.properties of DIH,
so if you use DIH, you may want to make dataimport.properties location to be
configurable dinamically, like an entry in data-config.xml, otherwise each
import on a core will change the file for all cores; Another (easier?
safer?) option would be to use symbolic links, i.e make a dir per core and
add in each one a simbolic link for xml files, so that they all read the
same.


On Sat, Apr 4, 2009 at 6:28 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Sat, Apr 4, 2009 at 9:51 PM, Rakesh Sinha rakesh.use...@gmail.com
 wrote:

  I am planning to configure a solr server with multiple cores with
  different schema for themselves with a single solr.solr.home . Are
  there any examples in the wiki to the wiki ( the ones that I see have
  a single schema.xml for a given solr.solr.home under schema directory.
  ).
 
  Thanks for helping pointing to the same.
 

 It should be possible though I don't there are any examples. You can
 specify
 the same instanceDir for different cores but different dataDir (specifying
 dataDir in solr.xml is a trunk feature)

 --
 Regards,
 Shalin Shekhar Mangar.



Re: Too many open files and background merge exceptions

2009-04-06 Thread Walter Ferrara
you may try to put true in that useCompoundFile entry; this way indexing
should use far less file descriptors, but it will slow down indexing, see
http://issues.apache.org/jira/browse/LUCENE-888.
Try to see if the reason of lack of descriptors is related only on solr. How
are you using indexing, by using solrj, by posting xmls? Are the files being
opened/parsed on the same machine of solr?

On Mon, Apr 6, 2009 at 2:58 PM, Jarek Zgoda jarek.zg...@redefine.pl wrote:

 I'm indexing a set of 50 small documents. I'm adding documents in
 batches of 1000. At the beginning I had a setup that optimized the index
 each 1 documents, but quickly I had to optimize after adding each batch
 of documents. Unfortunately, I'm still getting the Too many open files IO
 error on optimize. I went from mergeFactor of 25 down to 10, but I'm still
 unable to optimize the index.

 I have configuration:
useCompoundFilefalse/useCompoundFile
ramBufferSizeMB256/ramBufferSizeMB
mergeFactor2/mergeFactor
maxMergeDocs2147483647/maxMergeDocs
maxFieldLength1/maxFieldLength

 The machine (2 core AMD64, 4GB RAM) is running Debian Linux, Java is
 1.6.0_11 64-Bit, Solr is nightly build (2009-04-02). And no, I can not
 change the limit of file descriptors (currently: 1024). What more can I do?

 --
 We read Knuth so you don't have to. - Tim Peters

 Jarek Zgoda, RD, Redefine
 jarek.zg...@redefine.pl




Re: datadir issue for multicore on latest trunk

2009-03-04 Thread Walter Ferrara
it also ignore dataDir directive in solr.xml, in fact adding:
core name=core0 instanceDir=core0 
property name=dataDir value=/multicore/core0 /
/core
doesn't change the behavior.

this seems a bug introduced somewhere after 2nd february
any clue?


On Tue, Mar 3, 2009 at 5:56 PM, Walter Ferrara walters...@gmail.com wrote:

 there is a strange behavior which seems to affect hudson today (March 3rd)
 build but not (for example) hudson February 2th build.
 Basically when I start the multicore enviroment, it just create datadir in
 the current path.
 To replicate:
 1. download latest trunk
 2. go to example directory

 $ ls
 README.txt  example-DIH  exampledocs  logs   solr  start.jar
 work
 etc exampleAnalysis  lib  multicore  start.bat  webapps

 $ java -Dsolr.solr.home=multicore -jar start.jar
 (then kill/sleep the process)

 $ ls
 README.txt  etc  exampledocs  multicore  start.jar
 core0   example-DIH  lib  solr  webapps
 core1   exampleAnalysis  logs start.bat  work

 you see core0 and core1 directory where they should not be :-); solr-1041
 doesn't fix in this case.

 ciao,
 Walter




Re: datadir issue for multicore on latest trunk

2009-03-04 Thread Walter Ferrara
-mar-2009 12.50.55 org.apache.solr.schema.IndexSchema readSchema
INFO: Reading Solr Schema
4-mar-2009 12.50.55 org.apache.solr.schema.IndexSchema readSchema
INFO: Schema name=example core one
4-mar-2009 12.50.55 org.apache.solr.util.plugin.AbstractPluginLoader load
INFO: created string: org.apache.solr.schema.StrField
4-mar-2009 12.50.55 org.apache.solr.schema.IndexSchema readSchema
INFO: default search field is name
4-mar-2009 12.50.55 org.apache.solr.schema.IndexSchema readSchema
INFO: query parser default operator is OR
4-mar-2009 12.50.55 org.apache.solr.schema.IndexSchema readSchema
INFO: unique key field: id
4-mar-2009 12.50.55 org.apache.solr.core.SolrCore init
INFO: [core1] Opening new SolrCore at multicore\core1/, dataDir=core1/data\
4-mar-2009 12.50.55 org.apache.solr.core.SolrCore init
INFO: JMX monitoring not detected for core: core1
4-mar-2009 12.50.55 org.apache.solr.core.SolrCore parseListener
INFO: [core1] Searching for listeners: //listen...@event=firstSearcher]
4-mar-2009 12.50.55 org.apache.solr.core.SolrCore parseListener
INFO: [core1] Searching for listeners: //listen...@event=newSearcher]
4-mar-2009 12.50.55 org.apache.solr.core.SolrCore initIndex
AVVERTENZA: [core1] Solr index directory 'core1\data\index' doesn't exist.
Creating new index...
4-mar-2009 12.50.55 org.apache.solr.update.SolrIndexWriter getDirectory
AVVERTENZA: No lockType configured for core1/data\index/ assuming 'simple'
4-mar-2009 12.50.55 org.apache.solr.util.plugin.AbstractPluginLoader load
INFO: created standard: org.apache.solr.handler.StandardRequestHandler
4-mar-2009 12.50.55 org.apache.solr.util.plugin.AbstractPluginLoader load
INFO: created /update: org.apache.solr.handler.XmlUpdateRequestHandler
4-mar-2009 12.50.55 org.apache.solr.util.plugin.AbstractPluginLoader load
INFO: created /admin/: org.apache.solr.handler.admin.AdminHandlers
4-mar-2009 12.50.55 org.apache.solr.search.SolrIndexSearcher init
INFO: Opening searc...@13785d3 main
4-mar-2009 12.50.55
org.apache.solr.update.DirectUpdateHandler2$CommitTracker init
INFO: AutoCommit: disabled
4-mar-2009 12.50.55 org.apache.solr.handler.component.SearchHandler inform
INFO: Adding
component:org.apache.solr.handler.component.querycompon...@1e228bc
4-mar-2009 12.50.55 org.apache.solr.handler.component.SearchHandler inform
INFO: Adding
component:org.apache.solr.handler.component.facetcompon...@e06940
4-mar-2009 12.50.55 org.apache.solr.handler.component.SearchHandler inform
INFO: Adding
component:org.apache.solr.handler.component.morelikethiscompon...@11e0c13
4-mar-2009 12.50.55 org.apache.solr.handler.component.SearchHandler inform
INFO: Adding
component:org.apache.solr.handler.component.highlightcompon...@1aae94f
4-mar-2009 12.50.55 org.apache.solr.handler.component.SearchHandler inform
INFO: Adding
component:org.apache.solr.handler.component.statscompon...@1bb5c09
4-mar-2009 12.50.55 org.apache.solr.handler.component.SearchHandler inform
INFO: Adding  debug
component:org.apache.solr.handler.component.debugcompon...@1976011
4-mar-2009 12.50.55 org.apache.solr.core.CoreContainer register
INFO: registering core: core1
4-mar-2009 12.50.55 org.apache.solr.core.SolrCore registerSearcher
INFO: [core1] Registered new searcher searc...@13785d3 main
4-mar-2009 12.50.55 org.apache.solr.servlet.SolrDispatchFilter init
INFO: user.dir=d:\DEV\apache-solr-2009-03-03_08-06-53\example
4-mar-2009 12.50.55 org.apache.solr.servlet.SolrDispatchFilter init
INFO: SolrDispatchFilter.init() done
4-mar-2009 12.50.55 org.apache.solr.servlet.SolrServlet init
INFO: SolrServlet.init()
4-mar-2009 12.50.55 org.apache.solr.core.SolrResourceLoader
locateInstanceDir
INFO: JNDI not configured for solr (NoInitialContextEx)
4-mar-2009 12.50.55 org.apache.solr.core.SolrResourceLoader
locateInstanceDir
INFO: using system property solr.solr.home: multicore
4-mar-2009 12.50.55 org.apache.solr.servlet.SolrServlet init
INFO: SolrServlet.init() done
4-mar-2009 12.50.55 org.apache.solr.core.SolrResourceLoader
locateInstanceDir
INFO: JNDI not configured for solr (NoInitialContextEx)
4-mar-2009 12.50.55 org.apache.solr.core.SolrResourceLoader
locateInstanceDir
INFO: using system property solr.solr.home: multicore
4-mar-2009 12.50.55 org.apache.solr.servlet.SolrUpdateServlet init
INFO: SolrUpdateServlet.init() done
2009-03-04 12:50:55.687::INFO:  Started SocketConnector @ 0.0.0.0:8983
2009-03-04 12:51:05.953::INFO:  Shutdown hook executing
2009-03-04 12:51:05.984::INFO:  Shutdown hook complete

On Wed, Mar 4, 2009 at 12:36 PM, Noble Paul നോബിള്‍ नोब्ळ् 
noble.p...@gmail.com wrote:

 property name=dataDir value=/multicore/core0 /
 is not honored automatically. the dataDir tag has to be present in
 solrconfig.xml and it should use this value.

 but you can specify it as follows
core name=core0 instanceDir=core0 dataDir=/multicore/core0/

 then it should be fine.

 can you just paste the log messages as  solr starts
 --Noble


 On Wed, Mar 4, 2009 at 4:15 PM, Walter Ferrara walters...@gmail.com
 wrote

Re: datadir issue for multicore on latest trunk

2009-03-04 Thread Walter Ferrara
tried with
core name=core0 instanceDir=core0 dataDir=multicore/core0/
core name=core1 instanceDir=core1 dataDir=multicore/core1/
but no luck, the dataDir parameter seems ignored, no matter what is written
there

On Wed, Mar 4, 2009 at 12:58 PM, Noble Paul നോബിള്‍ नोब्ळ् 
noble.p...@gmail.com wrote:

 looks like a bug. must reopen the issue

 On Wed, Mar 4, 2009 at 5:26 PM, Noble Paul നോബിള്‍  नोब्ळ्
 noble.p...@gmail.com wrote:
  On Wed, Mar 4, 2009 at 5:24 PM, Walter Ferrara walters...@gmail.com
 wrote:
  using:
   cores adminPath=/admin/cores
 core name=core0 instanceDir=core0 dataDir=/multicore/core0/
 core name=core1 instanceDir=core1 dataDir=/multicore/core1/
   /cores
  doesn't work either
 
  dataDir=/multicore/core0 means the path is absolute.
  where did it create?
 
 
  here the output:
 
  2009-03-04 12:50:54.890::INFO:  Logging to STDERR via
  org.mortbay.log.StdErrLog
  2009-03-04 12:50:54.968::INFO:  jetty-6.1.3
  4-mar-2009 12.50.55 org.apache.solr.servlet.SolrDispatchFilter init
  INFO: SolrDispatchFilter.init()
  4-mar-2009 12.50.55 org.apache.solr.core.SolrResourceLoader
  locateInstanceDir
  INFO: JNDI not configured for solr (NoInitialContextEx)
  4-mar-2009 12.50.55 org.apache.solr.core.SolrResourceLoader
  locateInstanceDir
  INFO: using system property solr.solr.home: multicore
  4-mar-2009 12.50.55 org.apache.solr.core.CoreContainer$Initializer
  initialize
  INFO: looking for solr.xml:
  d:\DEV\apache-solr-2009-03-03_08-06-53\example\multicore\solr.xml
  4-mar-2009 12.50.55 org.apache.solr.core.SolrResourceLoader init
  INFO: Solr home set to 'multicore/'
  4-mar-2009 12.50.55 org.apache.solr.core.SolrResourceLoader
  createClassLoader
  INFO: Reusing parent classloader
  4-mar-2009 12.50.55 org.apache.solr.core.SolrResourceLoader init
  INFO: Solr home set to 'multicore\core0/'
  4-mar-2009 12.50.55 org.apache.solr.core.SolrResourceLoader
  createClassLoader
  INFO: Reusing parent classloader
  4-mar-2009 12.50.55 org.apache.solr.core.SolrConfig init
  INFO: Loaded SolrConfig: solrconfig.xml
  4-mar-2009 12.50.55 org.apache.solr.schema.IndexSchema readSchema
  INFO: Reading Solr Schema
  4-mar-2009 12.50.55 org.apache.solr.schema.IndexSchema readSchema
  INFO: Schema name=example core zero
  4-mar-2009 12.50.55 org.apache.solr.util.plugin.AbstractPluginLoader
 load
  INFO: created string: org.apache.solr.schema.StrField
  4-mar-2009 12.50.55 org.apache.solr.schema.IndexSchema readSchema
  INFO: default search field is name
  4-mar-2009 12.50.55 org.apache.solr.schema.IndexSchema readSchema
  INFO: query parser default operator is OR
  4-mar-2009 12.50.55 org.apache.solr.schema.IndexSchema readSchema
  INFO: unique key field: id
  4-mar-2009 12.50.55 org.apache.solr.core.SolrCore init
  INFO: [core0] Opening new SolrCore at multicore\core0/,
  dataDir=core0//multicore/core0/
  4-mar-2009 12.50.55 org.apache.solr.core.SolrCore init
  INFO: JMX monitoring not detected for core: core0
  4-mar-2009 12.50.55 org.apache.solr.core.SolrCore parseListener
  INFO: [core0] Searching for listeners:
 //listen...@event=firstSearcher]
  4-mar-2009 12.50.55 org.apache.solr.core.SolrCore parseListener
  INFO: [core0] Searching for listeners: //listen...@event=newSearcher]
  4-mar-2009 12.50.55 org.apache.solr.core.SolrCore initIndex
  AVVERTENZA: [core0] Solr index directory 'core0\multicore\core0\index'
  doesn't exist. Creating new index...
  4-mar-2009 12.50.55 org.apache.solr.update.SolrIndexWriter getDirectory
  AVVERTENZA: No lockType configured for core0//multicore/core0/index/
  assuming 'simple'
  4-mar-2009 12.50.55 org.apache.solr.util.plugin.AbstractPluginLoader
 load
  INFO: created standard: org.apache.solr.handler.StandardRequestHandler
  4-mar-2009 12.50.55 org.apache.solr.util.plugin.AbstractPluginLoader
 load
  INFO: created /update: org.apache.solr.handler.XmlUpdateRequestHandler
  4-mar-2009 12.50.55 org.apache.solr.util.plugin.AbstractPluginLoader
 load
  INFO: created /admin/: org.apache.solr.handler.admin.AdminHandlers
  4-mar-2009 12.50.55 org.apache.solr.search.SolrIndexSearcher init
  INFO: Opening searc...@1e57e8f main
  4-mar-2009 12.50.55
  org.apache.solr.update.DirectUpdateHandler2$CommitTracker init
  INFO: AutoCommit: disabled
  4-mar-2009 12.50.55 org.apache.solr.handler.component.SearchHandler
 inform
  INFO: Adding
  component:org.apache.solr.handler.component.querycompon...@19a32e0
  4-mar-2009 12.50.55 org.apache.solr.handler.component.SearchHandler
 inform
  INFO: Adding
  component:org.apache.solr.handler.component.facetcompon...@8238f4
  4-mar-2009 12.50.55 org.apache.solr.handler.component.SearchHandler
 inform
  INFO: Adding
 
 component:org.apache.solr.handler.component.morelikethiscompon...@16925b0
  4-mar-2009 12.50.55 org.apache.solr.handler.component.SearchHandler
 inform
  INFO: Adding
  component:org.apache.solr.handler.component.highlightcompon...@297ffb
  4-mar-2009 12.50.55 org.apache.solr.handler.component.SearchHandler
 inform

datadir issue for multicore on latest trunk

2009-03-03 Thread Walter Ferrara
there is a strange behavior which seems to affect hudson today (March 3rd)
build but not (for example) hudson February 2th build.
Basically when I start the multicore enviroment, it just create datadir in
the current path.
To replicate:
1. download latest trunk
2. go to example directory

$ ls
README.txt  example-DIH  exampledocs  logs   solr  start.jar
work
etc exampleAnalysis  lib  multicore  start.bat  webapps

$ java -Dsolr.solr.home=multicore -jar start.jar
(then kill/sleep the process)

$ ls
README.txt  etc  exampledocs  multicore  start.jar
core0   example-DIH  lib  solr  webapps
core1   exampleAnalysis  logs start.bat  work

you see core0 and core1 directory where they should not be :-); solr-1041
doesn't fix in this case.

ciao,
Walter


Re: dataimporthandler and mysql connector jar

2008-08-26 Thread Walter Ferrara
Shalin Shekhar Mangar wrote:
 Can you please open a JIRA issue for this? However, we may only be able to
 fix this after 1.3 because a code freeze has been decided upon, to release
 1.3 asap.
   
I've open https://issues.apache.org/jira/browse/SOLR-726

Walter



dataimporthandler and multiple delta-import

2008-08-26 Thread Walter Ferrara
I'm using DIH and its wonderful delta-import.
I have a question: the delta-import is synchronized? multiple call to
delta imports, shouldn't result in one refused because the status is not
idle?
I've noticed however that calling multiple times in a sec the
dataimport/?command=delta-import result in a strange exception:

GRAVE: Delta Import Failed
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query: select entity from testtable where last_modified 
'2008-08-26 13:05:09' Processing Document # 1
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:171)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:128)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:41)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextModifiedRowKey(SqlEntityProcessor.java:92)
at
org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:479)
at
org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:192)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:131)
at
org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:357)
at
org.apache.solr.handler.dataimport.DataImporter.rumCmd(DataImporter.java:386)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:375)
Caused by:
com.mysql.jdbc.exceptions.MySQLNonTransientConnectionException: No
operations allowed after connection closed.
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:888)
at com.mysql.jdbc.Connection.checkClosed(Connection.java:1930)
at com.mysql.jdbc.Connection.createStatement(Connection.java:3094)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:159)
... 10 more

while calling the delta-import, waiting a bit, and recalling it again
works fine...

thanks,
Walter



Re: dataimporthandler and multiple delta-import

2008-08-26 Thread Walter Ferrara
Shalin Shekhar Mangar wrote:
 Hi Walter,

 Indeed, there's a race condition there because we didn't expect people to
 hit it concurrently. We expected that imports would be run sequentially.

 Thanks for noticing this. We shall add synchronization to the next release.
 Do you mind (again) opening an issue for this? We'll attach a patch soon.
   
no problem! I've opened https://issues.apache.org/jira/browse/SOLR-728
I do understand the fact that import should be run sequentially, the
main issue I can foresee is a delta-import via curl in crontab, that
curl have no way to know if previous delta import was effectively over
-- in my opinion, if there is a (delta|full)-import already running it
should state that it cannot go ahead because another import process is
running already.

thank you for your fast reply and all your work in solr,
Walter



dataimporthandler and mysql connector jar

2008-08-25 Thread Walter Ferrara
Launching a multicore solr with dataimporthandler using a mysql driver,
(driver=com.mysql.jdbc.Driver) works fine if the mysql connector jar
(mysql-connector-java-5.0.7-bin.jar) is in the classpath, either jdk
classpath or inside the solr.war lib dir.
While putting the mysql-connector-java-5.0.7-bin.jar in core0/lib
directory, or in the multicore shared lib dir (specified in sharedLib
attribute in solr.xml) result in exception, even if the jar is correctly
loaded by the classloader:

25-ago-2008 16.36.05 org.apache.solr.core.SolrResourceLoader
createClassLoader
INFO: Adding
'file:/E:/Temp/apache-solr-2008-08-25_08-06-39/example/solr/lib/mysql-connector-java-5.0.7-bin.jar'
to Solr classloader
[..]
GRAVE: Exception while loading DataImporter
org.apache.solr.handler.dataimport.DataImportHandlerException: Failed to
initialize DataSource: null Processing Document #
at
org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:306)
at
org.apache.solr.handler.dataimport.DataImporter.addDataSource(DataImporter.java:273)
at
org.apache.solr.handler.dataimport.DataImporter.initEntity(DataImporter.java:228)
at
org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.java:98)
at
org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:106)
at
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:311)
at org.apache.solr.core.SolrCore.init(SolrCore.java:475)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:323)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:216)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:104)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
at
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
at
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
at
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
at org.mortbay.jetty.Server.doStart(Server.java:210)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.mortbay.start.Main.invokeMain(Main.java:183)
at org.mortbay.start.Main.start(Main.java:497)
at org.mortbay.start.Main.main(Main.java:115)
Caused by:
org.apache.solr.handler.dataimport.DataImportHandlerException: driver
could not be loaded Processing Document #
at
org.apache.solr.handler.dataimport.JdbcDataSource.createConnectionFactory(JdbcDataSource.java:110)
at
org.apache.solr.handler.dataimport.JdbcDataSource.init(JdbcDataSource.java:63)
at
org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:303)
... 34 more

Shouldn't it works when the jar is (only) in the core lib dir?
tested on windows machine, with java 1.6 and today hudson nightly build
of solr.

Walter




Re: Issuing queries during analysis?

2008-05-30 Thread Walter Ferrara

Dallan Quass wrote:

I have a situation where it would be beneficial to issue queries in a filter
that is called during analysis.  In a nutshell, I have an index of places
that includes possible abbreviations.  And I want to query this index during
analysis to convert user-entered places to standardized places.  So if
someone enters Chicago, IL into a place field, I want to write a filter
that first issues a query on IL to find that the standardized name for IL
is Illinois, and then issues a query on places named Chicago located in
Illinois to find that the standardized name is Chicago, Cook, Illinois,
and then returns this string in a token.  
  
this may sound a bit too KISS - but another approach could be based on 
synonyms, i.e. if the number of abbreviation is limited and defined 
(All US States), you can simply define complete state name for each 
abbreviation, this way a Chicago, IL will be translated (...) in 
Chicago, Illinois during indexing and/or querying... but this may 
depend by the Tokenizer you use and how your index is defined (do a 
search for Chicago, Illinois on a field gives you a doc with Chicago, 
Cook, Illinois in some (other/same) field?)


have a look here: 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2c461ac74b4ddd82e453dc68fcfc92da77358d46






Solr Multicore, are there any way to retrieve all the cores registered?

2008-05-09 Thread Walter Ferrara
In solr, last trunk version in svn, is it possible to access the core 
registry, or what used to be the static MultiCore object? My goal is to 
retrieve all the cores registered in a given (multicore) enviroment.
It used to be MultiCore.getRegistry() initially, at first stages of 
solr-350; but now MultiCore is static no more, and I don't find any 
reference to where pick up the multicore object initialized from 
multicore.xml.


any tips on how to retrieve such information now?
Walter



Re: Solr Multicore, are there any way to retrieve all the cores registered?

2008-05-09 Thread Walter Ferrara

Ryan McKinley wrote:

check the status action

also, check the index.jsp page

index.jsp do:
org.apache.solr.core.MultiCore multicore = 
(org.apache.solr.core.MultiCore)request.getAttribute(org.apache.solr.MultiCore);


which is ok in a servlet, but how should I do the same inside an 
handler, i.e. having just SolrQueryRequest and SolrQueryResponse? Is it 
something that can be extracted from SolrQueryRequest.getContext? and, 
in the perspective of solr 1.3, will this functionality be maintained?


thanks,
Walter



HTMLStripReader and script tags

2008-04-10 Thread Walter Ferrara
I've noticed that passing html to a field using 
HTMLStripWhitespaceTokenizerFactory, ends up in having some javascripts too.

For example, using a analyzer like:
fieldType name=HTMLStripper2 class=solr.TextField 
 analyzer
   tokenizer class=solr.HTMLStripWhitespaceTokenizerFactory/
 /analyzer
/fieldType

with a text such as:
html
headtitletitle/title/head
body
pre
SCRIPT LANGUAGE=JavaScript
 var time = new Date();
 ordval= (time.getTime());
/SCRIPT
post !-- comment --
/body
/html

Analysis.jsp turns out those tokens:
title
pre
var
time
=
new
Date();
ordval=
(time.getTime());
post

While if the script in the page is commented, everything works fine.
Is this due to design choice? Shouldn't scripts be removed in both cases?
(Solr Implementation Version: 2008-03-24_09-57-01 ${svnversion} - hudson 
- 2008-03-24 09:59:40)


Walter



Re: CorruptIndexException: unknown format version: -3

2008-02-25 Thread Walter Ferrara
did you create/modify the index with a newer version of lucene than the 
one you use in solr?
In this case I doubt you can downgrade your index, but maybe you can 
upgrade lucene in your solr (search in this forum, there should be a 
thread about this), (or try with the latest nightly builds)


Paul Danese wrote:

2Hi all,

Is there any way to recover from such an error as listed in the subject heading?

Luke can view the index just fine (at least at a cursory level Luke is able to 
open the index, give me back the # of docs, etc.), but solr throws this 
exception whenever I try and start it up.

any ideas on how to proceed?
can I use luke or something else to uncorrupt, modify or save my index into a 
non-corrupt format?

TIA!!


 org.apache.solr.core.SolrException log
SEVERE: java.lang.RuntimeException: 
org.apache.lucene.index.CorruptIndexException: Unknown format version:-3
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:433)
at org.apache.solr.core.SolrCore.init(SolrCore.java:216)
at org.apache.solr.core.SolrCore.getSolrCore(SolrCore.java:177)
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
at 
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
at 
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
at 
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
at org.mortbay.jetty.Server.doStart(Server.java:210)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.mortbay.start.Main.invokeMain(Main.java:183)
at org.mortbay.start.Main.start(Main.java:497)
at org.mortbay.start.Main.main(Main.java:115)
Caused by: org.apache.lucene.index.CorruptIndexException: Unknown format 
version:-3
at 
org.apache.lucene.index.SegmentTermEnum.init(SegmentTermEnum.java:64)
at 
org.apache.lucene.index.TermInfosReader.init(TermInfosReader.java:49)
at 
org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:184)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:157)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:139)
at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:194)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:610)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:184)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:148)
at 
org.apache.solr.search.SolrIndexSearcher.init(SolrIndexSearcher.java:87)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:424)
... 27 more





   
-

Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.
  


Re: criteria for using the property stored=true and indexed=true

2007-12-12 Thread Walter Ferrara
See:
http://wiki.apache.org/solr/SchemaXml#head-af67aefdc51d18cd8556de164606030446f56554

indexed means searchable (facet and sort also need this), stored instead
is needed only when you need the original text (i.e. not
tokenized/analyzed) to be returned.
When stored and indexed are not present, I think solr put them to a
default true (both of them)

Dilip.TS wrote:
 Hi,

 I would be some clarifications on which fields should we assign the property
 stored=true and indexed=true
 What is the criteria for these property assignments?
 What would be the impact if no field is assigned with this property?

 Thanks in Advance,

 Regards,
 Dilip TS
 Starmark Services Pvt. Ltd.


   


Re: 2GB limit on 32 bits

2007-11-09 Thread Walter Ferrara
Isn't Xeon5110 64bit? Maybe you could just put a 64 bit OS in you box.
Also, take a look at http://www.spack.org/wiki/LinuxRamLimits
--
Walter

Isart Montane wrote:
 I've got a dual Xeon. Here you are my cpuinfo. I've read the limit on
 a 2.6linux kernel is 4GB on user space and 4GB for kernel... that's
 why I asked
 if there's any way to reach 4GB per process.

 Thanks anyway :(

 cat /proc/cpuinfo
 processor   : 0
 vendor_id   : GenuineIntel
 cpu family  : 6
 model   : 15
 model name  : Intel(R) Xeon(R) CPU5110  @ 1.60GHz
 stepping: 6
 cpu MHz : 1596.192
 cache size  : 4096 KB
 physical id : 0
 siblings: 2
 core id : 0
 cpu cores   : 2
 fdiv_bug: no
 hlt_bug : no
 f00f_bug: no
 coma_bug: no
 fpu : yes
 fpu_exception   : yes
 cpuid level : 10
 wp  : yes
 flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
 cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
 constant_tsc pni monitor ds_cpl vmx tm2 cx16 xtpr lahf_lm
 bogomips: 3194.21

 processor   : 1
 vendor_id   : GenuineIntel
 cpu family  : 6
 model   : 15
 model name  : Intel(R) Xeon(R) CPU5110  @ 1.60GHz
 stepping: 6
 cpu MHz : 1596.192
 cache size  : 4096 KB
 physical id : 0
 siblings: 2
 core id : 1
 cpu cores   : 2
 fdiv_bug: no
 hlt_bug : no
 f00f_bug: no
 coma_bug: no
 fpu : yes
 fpu_exception   : yes
 cpuid level : 10
 wp  : yes
 flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
 cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
 constant_tsc pni monitor ds_cpl vmx tm2 cx16 xtpr lahf_lm
 bogomips: 3192.09

 processor   : 2
 vendor_id   : GenuineIntel
 cpu family  : 6
 model   : 15
 model name  : Intel(R) Xeon(R) CPU5110  @ 1.60GHz
 stepping: 6
 cpu MHz : 1596.192
 cache size  : 4096 KB
 physical id : 3
 siblings: 2
 core id : 0
 cpu cores   : 2
 fdiv_bug: no
 hlt_bug : no
 f00f_bug: no
 coma_bug: no
 fpu : yes
 fpu_exception   : yes
 cpuid level : 10
 wp  : yes
 flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
 cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
 constant_tsc pni monitor ds_cpl vmx tm2 cx16 xtpr lahf_lm
 bogomips: 3192.13

 processor   : 3
 vendor_id   : GenuineIntel
 cpu family  : 6
 model   : 15
 model name  : Intel(R) Xeon(R) CPU5110  @ 1.60GHz
 stepping: 6
 cpu MHz : 1596.192
 cache size  : 4096 KB
 physical id : 3
 siblings: 2
 core id : 1
 cpu cores   : 2
 fdiv_bug: no
 hlt_bug : no
 f00f_bug: no
 coma_bug: no
 fpu : yes
 fpu_exception   : yes
 cpuid level : 10
 wp  : yes
 flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
 cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
 constant_tsc pni monitor ds_cpl vmx tm2 cx16 xtpr lahf_lm
 bogomips: 3192.12

 On Nov 9, 2007 9:26 AM, Norberto Meijome [EMAIL PROTECTED] wrote:

   
 On Fri, 9 Nov 2007 09:03:01 -0300
 Isart Montane [EMAIL PROTECTED] wrote:

 
 I've read there's a kernel limitation for a 32 bits architecture of 2Gb
 per process, and i just wanna know if anybody knows an alternative to
 get a new 64bits server.
   
 You don't say what CPU you have. But the 32 bit limit is real (it's an
 architecture issue, not a kernel limitation...). You could try running
 several servers on different ports, each managing part of your index, each
 up to 2 GB RAM - but you may be pushing your CPU / disks too much and hit
 other issues - try and see how it goes.

 If I were you, i'd seriously look into getting a new (64 bit) server .
 B

 _
 {Beto|Norberto|Numard} Meijome

 Too bad ignorance isn't painful.
  Don Lindsay

 I speak for myself, not my employer. Contents may be hot. Slippery when
 wet. Reading disclaimers makes you go blind. Writing them is worse. You have
 been Warned.

 

   


Solr and FieldCache

2007-09-20 Thread Walter Ferrara
I have an index with several fields, but just one stored: ID (string,
unique).
I need to access that ID field for each of the tops nodes docs in my
results (this is done inside a handler I wrote), code looks like:

 Hits hits = searcher.search(query);
 for(int i=0; inodes; i++) {
id[i]=hits.doc(i).get(ID);
score[i]=hits.score(i);
 }

I noticed that retrieving the code is slow.

if I use the FieldCache, like:
id[i]=FieldCache.DEFAULT.getStrings(searcher.getReader(),
ID)[hits.id(i)];
after the first execution (the initialization of the cache take some
times), it seems to run much faster.

But what happens when SOLR reload  the index (after a commit, or an
optimize for example)?
Will it refresh the cache with new reader (in the warmup process?), or
it will be the first query execution of that code (with the new reader)
that will force the refresh? (this could mean that every first query
after a reload will be slower)
Is there any way to tell SOLR to cache and warmup when needed this ID
field?
 
Thanks,
Walter



Re: Solr and FieldCache

2007-09-20 Thread Walter Ferrara
About stored/index difference: ID is a string, (= solr.StrField) so
FieldCache give me what I need.

I'm just wondering, as this cached object could be (theoretically)
pretty big, do I need to be aware of some OOM? I know that FieldCache
use weakmaps, so I presume the cached array for the older reader(s) will
be gc-ed when the reader is no longer referenced (i.e. when solr load
the new one, after its warmup and so on), is that right?

Thanks
--

J.J. Larrea wrote:
 At 5:30 PM +0200 9/20/07, Walter Ferrara wrote:
   
 I have an index with several fields, but just one stored: ID (string,
 unique).
 I need to access that ID field for each of the tops nodes docs in my
 results (this is done inside a handler I wrote), code looks like:

 Hits hits = searcher.search(query);
 for(int i=0; inodes; i++) {
id[i]=hits.doc(i).get(ID);
score[i]=hits.score(i);
 }

 I noticed that retrieving the code is slow.

 if I use the FieldCache, like:
 id[i]=FieldCache.DEFAULT.getStrings(searcher.getReader(),
 ID)[hits.id(i)];
 

 I assume you're putting FieldCache.DEFAULT.getStrings(searcher.getReader(),
 ID) in an array outside the loop, saving 2 redundant method calls per 
 iteration.

   
 after the first execution (the initialization of the cache take some
 times), it seems to run much faster.
 

 Do note that FieldCache.DEFAULT is caching the indexed values, not the stored 
 values.  Since your field is an ID you are probably indexing it in such a way 
 that both are identical, e.g. with KeywordTokenizer, so you're not seeing a 
 difference.

   
 But what happens when SOLR reload  the index (after a commit, or an
 optimize for example)?
 Will it refresh the cache with new reader (in the warmup process?), or
 it will be the first query execution of that code (with the new reader)
 that will force the refresh? (this could mean that every first query
 after a reload will be slower)
 

 It is refreshed by Lucene the first time the FieldCache array is requested 
 from the new IndexReader.

   
 Is there any way to tell SOLR to cache and warmup when needed this ID
 field?
 

 Absolutely, just put a warmup query in solrconfig.xml which makes request 
 that invokes FieldCache.DEFAULT.getStrings on that field.

 Simplest would probably be to invoke your custom handler, perhaps passing 
 arguments that limit it to only processing one document to limit the data 
 which gets cached; since getStrings returns the entire array, one pass 
 through your loop is fine.

 If that's not easy with your handler, you could achieve the same effect by 
 setting up a handler which facets on the ID field, sorting by ID 
 (facet.sort=false), and only asks for a single value (facet.limit=1) (the 
 entire id[docid] array will get scanned to count references to that ID, but 
 that ensures it gets paged in).

 - J.J.