How to index large set data
Hi, I have about 45GB xml files to be indexed. I am using DataImportHandler. I started the full import 4 hours ago, and it's still running My computer has 4GB memory. Any suggestion on the solutions? Thanks! JB
Phrase Search Issue
Hi, I am facing one issue in phrase query. I am entering 'Top of the world' as my search criteria. I am expecting it to return all the records in which, one field should all these words in any order. But it is treating as OR and returning all the records, which are having either of these words. I am doing this using dismax request. I would appreciate if somebody can provide me some pointers. Thanks, Amit Garg -- View this message in context: http://www.nabble.com/Phrase-Search-Issue-tp23648813p23648813.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Phrase Search Issue
This problem is related with the default operator in dismax. Currently OR is the default operator and it is behaving perfectly fine. I have changed the default operator in schema.xml to AND, I also have changed the minimum match to 100%. But it seems like AND as default operator doesnt work with Dismax. Please suggest. Thanks, Amit Garg dabboo wrote: Hi, I am facing one issue in phrase query. I am entering 'Top of the world' as my search criteria. I am expecting it to return all the records in which, one field should all these words in any order. But it is treating as OR and returning all the records, which are having either of these words. I am doing this using dismax request. I would appreciate if somebody can provide me some pointers. Thanks, Amit Garg -- View this message in context: http://www.nabble.com/Phrase-Search-Issue-tp23648813p23649189.html Sent from the Solr - User mailing list archive at Nabble.com.
what does the version parameter in the query mean?
Hello all, I'm using Solr 1.3.0, and when I query my index for solr using the admin page, the query string in the address bar of my browser reads like this: http://localhost:8080/solr/select/?q=solrversion=2.2start=0rows=10indent=on Now, I don't know what version=2.2 means, and the wiki or the docs don't tell me. Could someone enlighten me? Thank You Anshuman Manur
Re: How to change the weight of the fields ?
It seems I can only search on the field 'text'. With the following url : http://localhost:8983/solr/select/?q=novelqt=dismaxfl=title_s,idversion=2.2start=0rows=10indent=ondebugQuery=on I get answers, but on the debug area, it seems it's only searching on the 'text' field (with or without 'qt' the results are displayed within the same order) : lst name=debug str name=rawquerystringnovel/str str name=querystringnovel/str − str name=parsedquery +DisjunctionMaxQuery((text:novel^0.5 | title_s:novel^5.0 | id:novel^10.0)~0.01) () /str − str name=parsedquery_toString +(text:novel^0.5 | title_s:novel^5.0 | id:novel^10.0)~0.01 () /str − lst name=explain − str name=33395 0.014641666 = (MATCH) sum of: 0.014641666 = (MATCH) max plus 0.01 times others of: 0.014641666 = (MATCH) weight(text:novel^0.5 in 114927), product of: 0.01362607 = queryWeight(text:novel^0.5), product of: 0.5 = boost 3.4734163 = idf(docFreq=10634, numDocs=43213) 0.007845918 = queryNorm 1.0745333 = (MATCH) fieldWeight(text:novel in 114927), product of: 1.4142135 = tf(termFreq(text:novel)=2) 3.4734163 = idf(docFreq=10634, numDocs=43213) 0.21875 = fieldNorm(field=text, doc=114927) /str etc. I should have a debug below with a search of the term into 'title_s' and 'id' no? Thanks for your answers ! Vincent -- View this message in context: http://www.nabble.com/How-to-change-the-weight-of-the-fields---tp23619971p23649624.html Sent from the Solr - User mailing list archive at Nabble.com.
Strange Phrase Query Issue with Dismax
Hi, I am facing very strange issue on solr, not sure if it is already a bug. If I am searching for 'Top 500' then it returns all the records which contains either of these anywhere, which is fine. But if I search for 'Top 500 Companies' in any order, it gives me all those records, which contains these 3 words in any one of the field, irrespective of sequence. In this case, it is not returning me the records, which contains either of these word (which actually is my requirement). Please suggest. Thanks, Amit Garg -- View this message in context: http://www.nabble.com/Strange-Phrase-Query-Issue-with-Dismax-tp23650114p23650114.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: java.lang.RuntimeException: after flush: fdx size mismatch
On Wed, May 20, 2009 at 11:18 AM, James X hello.nigerian.spamm...@gmail.com wrote: Hi Mike, thanks for the quick response: $ java -version java version 1.6.0_11 Java(TM) SE Runtime Environment (build 1.6.0_11-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode) I hadn't noticed the 268m trigger for LUCENE-1521 - I'm definitely not hitting that yet! The issue didn't spell this out very well -- I've added a comment. The exception always reports 0 length, but the number of of docs varies, heavily weighted towards 1 or two docs. Of the last 130 or so exceptions: 89 1 docs vs 0 length 20 2 docs vs 0 length 9 3 docs vs 0 length 1 4 docs vs 0 length 3 5 docs vs 0 length 2 6 docs vs 0 length 1 7 docs vs 0 length 1 9 docs vs 0 length 1 10 docs vs 0 length Hmm... odd that it's always 0 file length. What filesystem IO devices is the index being written to? The only unusual thing I can think of that we're doing with Solr is aggressively CREATE-ing and UNLOAD-ing cores. I've not been able to spot a pattern between core admin operations and these exceptions, however... I think from Lucene's standpoint this just means creating closing lots of IndexWriters? (Which should be just fine). What are your documents like? Ie, how many and what type of fields? Are you adding docs from multiple threads? (Solr would do so, I believe, so I guess: is your client that's submitting docs to a given core, doing so with multiple threads?). Mike
Re: java.lang.RuntimeException: after flush: fdx size mismatch
Another question: are there any other exceptions in your logs? Eg problems adding certain documents, or anything? Mike On Wed, May 20, 2009 at 11:18 AM, James X hello.nigerian.spamm...@gmail.com wrote: Hi Mike, thanks for the quick response: $ java -version java version 1.6.0_11 Java(TM) SE Runtime Environment (build 1.6.0_11-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode) I hadn't noticed the 268m trigger for LUCENE-1521 - I'm definitely not hitting that yet! The exception always reports 0 length, but the number of of docs varies, heavily weighted towards 1 or two docs. Of the last 130 or so exceptions: 89 1 docs vs 0 length 20 2 docs vs 0 length 9 3 docs vs 0 length 1 4 docs vs 0 length 3 5 docs vs 0 length 2 6 docs vs 0 length 1 7 docs vs 0 length 1 9 docs vs 0 length 1 10 docs vs 0 length The only unusual thing I can think of that we're doing with Solr is aggressively CREATE-ing and UNLOAD-ing cores. I've not been able to spot a pattern between core admin operations and these exceptions, however... James On Wed, May 20, 2009 at 2:37 AM, Michael McCandless luc...@mikemccandless.com wrote: Hmm... somehow Lucene is flushing a new segment on closing the IndexWriter, and thinks 1 doc had been added to the stored fields file, yet the fdx file is the wrong size (0 bytes). This check ( exception) are designed to prevent corruption from entering the index, so it's at least good to see CheckIndex passes after this. I don't think you're hitting LUCENE-1521: that issue only happens if a single segment has more than ~268 million docs. Which exact JRE version are you using? When you hit this exception, is it always 1 docs vs 0 length in bytes? Mike On Wed, May 20, 2009 at 3:19 AM, James X hello.nigerian.spamm...@gmail.com wrote: Hello all,I'm running Solr 1.3 in a multi-core environment. There are up to 2000 active cores in each Solr webapp instance at any given time. I've noticed occasional errors such as: SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1 docs vs 0 length in bytes of _h.fdx at org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:94) at org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83) at org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47) at org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367) at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450) at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578) at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153) during commit / optimise operations. These errors then cause cascading errors during updates on the offending cores: SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: SingleInstanceLock: write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:85) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1070) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:924) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:116) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:122) This looks like http://issues.apache.org/jira/browse/LUCENE-1521, but when I upgraded Lucene to 2.4.1 under Solr 1.3, the issue still remains. CheckIndex doesn't find any problems with the index, and problems disappear after an (inconvenient, for me) restart of Solr. Firstly, can I as the symptoms are so close to those in 1521, can I check my Lucene upgrade method should work: - unzip the Solr 1.3 war - remove the Lucene 2.4dev jars (lucene-core, lucene-spellchecker, lucene-snowball, lucene-queries, lucene-memory,lucene-highlighter, lucene-analyzers) - move in the Lucene 2.4.1 jars - rezip the directory structures as solr.war. I think this has worked, as solr/default/admin/registry.jsp shows: lucene-spec-version2.4.1/lucene-spec-version lucene-impl-version2.4.1 750176 - 2009-03-04 21:56:52/lucene-impl-version Secondly, if this Lucene fix isn't the right solution to this problem, can anyone suggest an alternative approach? The only problems I've had up to now is to do with the number of allowed file handles, which was fixed by changing limits.conf (RHEL machine). Many thanks! James
Re: java.lang.RuntimeException: after flush: fdx size mismatch
If you're able to run a patched version of Lucene, can you apply the attached patch, run it, get the issue to happen again, and post back the resulting exception? It only adds further diagnostics to that RuntimeException you're hitting. Another thing to try is turning on assertions, which may very well catch the issue sooner. Mike On Wed, May 20, 2009 at 11:18 AM, James X hello.nigerian.spamm...@gmail.com wrote: Hi Mike, thanks for the quick response: $ java -version java version 1.6.0_11 Java(TM) SE Runtime Environment (build 1.6.0_11-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode) I hadn't noticed the 268m trigger for LUCENE-1521 - I'm definitely not hitting that yet! The exception always reports 0 length, but the number of of docs varies, heavily weighted towards 1 or two docs. Of the last 130 or so exceptions: 89 1 docs vs 0 length 20 2 docs vs 0 length 9 3 docs vs 0 length 1 4 docs vs 0 length 3 5 docs vs 0 length 2 6 docs vs 0 length 1 7 docs vs 0 length 1 9 docs vs 0 length 1 10 docs vs 0 length The only unusual thing I can think of that we're doing with Solr is aggressively CREATE-ing and UNLOAD-ing cores. I've not been able to spot a pattern between core admin operations and these exceptions, however... James On Wed, May 20, 2009 at 2:37 AM, Michael McCandless luc...@mikemccandless.com wrote: Hmm... somehow Lucene is flushing a new segment on closing the IndexWriter, and thinks 1 doc had been added to the stored fields file, yet the fdx file is the wrong size (0 bytes). This check ( exception) are designed to prevent corruption from entering the index, so it's at least good to see CheckIndex passes after this. I don't think you're hitting LUCENE-1521: that issue only happens if a single segment has more than ~268 million docs. Which exact JRE version are you using? When you hit this exception, is it always 1 docs vs 0 length in bytes? Mike On Wed, May 20, 2009 at 3:19 AM, James X hello.nigerian.spamm...@gmail.com wrote: Hello all,I'm running Solr 1.3 in a multi-core environment. There are up to 2000 active cores in each Solr webapp instance at any given time. I've noticed occasional errors such as: SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1 docs vs 0 length in bytes of _h.fdx at org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:94) at org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83) at org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47) at org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367) at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450) at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578) at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153) during commit / optimise operations. These errors then cause cascading errors during updates on the offending cores: SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: SingleInstanceLock: write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:85) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1070) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:924) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:116) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:122) This looks like http://issues.apache.org/jira/browse/LUCENE-1521, but when I upgraded Lucene to 2.4.1 under Solr 1.3, the issue still remains. CheckIndex doesn't find any problems with the index, and problems disappear after an (inconvenient, for me) restart of Solr. Firstly, can I as the symptoms are so close to those in 1521, can I check my Lucene upgrade method should work: - unzip the Solr 1.3 war - remove the Lucene 2.4dev jars (lucene-core, lucene-spellchecker, lucene-snowball, lucene-queries, lucene-memory,lucene-highlighter, lucene-analyzers) - move in the Lucene 2.4.1 jars - rezip the directory structures as solr.war. I think this has worked, as solr/default/admin/registry.jsp shows: lucene-spec-version2.4.1/lucene-spec-version lucene-impl-version2.4.1 750176 - 2009-03-04 21:56:52/lucene-impl-version Secondly, if this Lucene fix isn't the right solution to this problem, can anyone suggest an alternative approach? The only problems I've
Re: best way to cache base queries (before application of filters)
On Thu, May 21, 2009 at 3:30 AM, Kent Fitch kent.fi...@gmail.com wrote: #2) Your problem might be able to be solved with field collapsing on the category field in the future (but it's not in Solr yet). Sorry - I didnt understand this A single relevancy search, but group or collapse results based on the value of the category field such that you don't get more than 10 results for each value of category. but it's not in Solr yet... http://issues.apache.org/jira/browse/SOLR-236 - we've got one query we want filtered 5 ways to find the top scoring results matching the query and each filter The problem is that caching the base query involves caching not only all of the matching documents, but the score for each document. That's expensive. You could also write your own HitCollector that filtered the results of the base query 5 different ways simultaneously. -Yonik http://www.lucidimagination.com
Re: master/slave failure scenario
Just curious. What would be the disadvantages of a no replication / multi master (no slave) setup? The client code should do the updates for evey master ofc, but if one machine would fail then I can imediatly continue the indexing process and also I can query the index on any machine for a valid result. I might be missing something... On Thu, May 14, 2009 at 4:19 PM, nk 11 nick.cass...@gmail.com wrote: wow! that was just a couple of days old! thanks as lot! 2009/5/14 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com yeah there is a hack https://issues.apache.org/jira/browse/SOLR-1154?focusedCommentId=12708316page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12708316 On Thu, May 14, 2009 at 6:07 PM, nk 11 nick.cass...@gmail.com wrote: sorry for the mail. I wanted to hit reply :( On Thu, May 14, 2009 at 3:37 PM, nk 11 nick.cass...@gmail.com wrote: oh, so the configuration must be manualy changed? Can't something be passed at (re)start time? 2009/5/14 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com On Thu, May 14, 2009 at 4:07 PM, nk 11 nick.cass...@gmail.com wrote: Ok so the VIP will point to the new master. but what makes a slave promoted to a master? Only the fact that it will receive add/update requests? And I suppose that this hot promotion is possible only if the slave is convigured as master also... right.. By default you can setup all slaves to be master also. It does not cost anything if it is not serving any requests. so , if you have such a setting you will have to disable that slave to be a slave and restart it and you will have to make the VIP point to this new slave as master. so hot promotion is still not possible. 2009/5/14 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com ideally , we don't do that. you can just keep the master host behind a VIP so if you wish to change the master make the VIP point to the new host On Wed, May 13, 2009 at 10:52 PM, nk 11 nick.cass...@gmail.com wrote: This is more interesting.Such a procedure would involve taking down and reconfiguring the slave? On Wed, May 13, 2009 at 7:55 PM, Bryan Talbot btal...@aeriagames.comwrote: Or ... 1. Promote existing slave to new master 2. Add new slave to cluster -Bryan On May 13, 2009, at May 13, 9:48 AM, Jay Hill wrote: - Migrate configuration files from old master (or backup) to new master. - Replicate from a slave to the new master. - Resume indexing to new master. -Jay On Wed, May 13, 2009 at 4:26 AM, nk 11 nick.cass...@gmail.com wrote: Nice. What if the master fails permanently (like a disk crash...) and the new master is a clean machine? 2009/5/13 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com On Wed, May 13, 2009 at 12:10 PM, nk 11 nick.cass...@gmail.com wrote: Hello I'm kind of new to Solr and I've read about replication, and the fact that a node can act as both master and slave. I a replica fails and then comes back on line I suppose that it will resyncs with the master. right But what happnes if the master fails? A slave that is configured as master will kick in? What if that slave is not yes fully sync'ed with the failed master and has old data? if the master fails you can't index the data. but the slaves will continue serving the requests with the last index. You an bring back the master up and resume indexing. What happens when the original master comes back on line? He will remain a slave because there is another node with the master role? Thank you! -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Customizing SOLR-236 field collapsing
Hey there, I have been testing the last adjacent field collapsing patch in trunk and seems to work perfectly. I am trying to modify the function of it but don't know exactly how to do it. What I would like to do is instead of collapse the results send them to the end of the results cue. Aparently it is not possible to do that due to the way it is implemented. I have noticed that you get a DocSet of the ids that survived the collapsing and that match the query and filters (collapseFilterDocSet = collapseFilter.getDocSet();, you get it in CollapseComponent.java. Once it is done the search is excuted again, this time the DocSet obtained before is passed as a filter: DocListAndSet results = searcher.getDocListAndSet(rb.getQuery(), collapseFilterDocSet == null? rb.getFilters(): null, collapseFilterDocSet, rb.getSortSpec().getSort(), rb.getSortSpec().getOffset(), rb.getSortSpec().getCount(), rb.getFieldFlags()); The result of this search will give you the final result (with the correct offset and start). I have thought in saving the collapsed docs in another DocSet and after do something with them... but don't know how to manage it. Any clue about how could I reach the goal? Thanks in advance -- View this message in context: http://www.nabble.com/Customizing-SOLR-236-field-collapsing-tp23653220p23653220.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to index large set data
This isn't much data to go on. Do you have any idea what your throughput is?How many documents are you indexing? one 45G doc or 4.5 billion 10 character docs? Have you looked at any profiling data to see how much memory is being consumed? Are you IO bound or CPU bound? Best Erick On Thu, May 21, 2009 at 2:18 AM, Jianbin Dai djian...@yahoo.com wrote: Hi, I have about 45GB xml files to be indexed. I am using DataImportHandler. I started the full import 4 hours ago, and it's still running My computer has 4GB memory. Any suggestion on the solutions? Thanks! JB
Re: Plugin Not Found
Nothing else is in the lib directory but this one jar. Additionally, the logs seem to say that it finds the lib as shown below INFO: Solr home set to '/home/zetasolr/' May 20, 2009 10:16:56 AM org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Adding 'file:/home/zetasolr/lib/FacetCubeComponent.jar' to Solr classloader However as soon as it tries the component it cannot find the class. -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 From: Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com Reply-To: solr-user@lucene.apache.org Date: Thu, 21 May 2009 10:19:19 +0530 To: solr-user@lucene.apache.org Subject: Re: Plugin Not Found what else is there in the solr.home/lib other than this component? On Wed, May 20, 2009 at 9:08 PM, Jeff Newburn jnewb...@zappos.com wrote: I tried to change the package name to com.zappos.solr. When I declared the search component with: searchComponent name=facetcube class=com.zappos.solr.FacetCubeComponent/ I get: SEVERE: org.apache.solr.common.SolrException: Unknown Search Component: facetcube at org.apache.solr.core.SolrCore.getSearchComponent(SolrCore.java:874) at org.apache.solr.handler.component.SearchHandler.inform(SearchHandler.java:12 7) at When I declare the component with solr.FacetCubeComponent I get the same error message. When we turned on trace we got the same exception plus Caused by: java.lang.ClassNotFoundException: com.zappos.solr.FacetCubeComponent at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.jav a:1360) at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.jav a:1206) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:29 4) ... 27 more -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 From: Grant Ingersoll gsing...@apache.org Reply-To: solr-user@lucene.apache.org Date: Wed, 20 May 2009 10:38:30 -0400 To: solr-user@lucene.apache.org Subject: Re: Plugin Not Found Just a wild guess here, but... Try doing one of two things: 1. change the package name to be something other than o.a.s 2. Change your config to use solr.FacetCubeComponent You might also try turning on trace level logging for the SolrResourceLoader and report back the output. -Grant On May 20, 2009, at 10:20 AM, Jeff Newburn wrote: Error is below. This error does not appear when I manually copy the jar file into the tomcat webapp directory only when I try to put it in the solr.home lib directory. SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.component.FacetCubeComponent' at org .apache .solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:31 0) at org .apache .solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java: 325) at org .apache .solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader .java:84) at org .apache .solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.j ava:141) at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:841) at org.apache.solr.core.SolrCore.init(SolrCore.java:528) at org.apache.solr.core.CoreContainer.create(CoreContainer.java: 350) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:227) at org.apache.solr.core.CoreContainer $Initializer.initialize(CoreContainer.java :107) at org .apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java: 69) at org .apache .catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilter Config.java:275) at org .apache .catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFil terConfig.java:397) at org .apache .catalina.core.ApplicationFilterConfig.init(ApplicationFilterCon fig.java:108) at org .apache .catalina.core.StandardContext.filterStart(StandardContext.java:37 09) at org.apache.catalina.core.StandardContext.start(StandardContext.java: 4356) at org .apache .catalina.core.ContainerBase.addChildInternal(ContainerBase.java:7 91) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java: 771) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:525) at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:829) at org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:718) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:490) at org.apache.catalina.startup.HostConfig.start(HostConfig.java: 1147) at org .apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java: 311) at org .apache .catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSuppor
Re: master/slave failure scenario
Indexing is usually much more expensive that replication so it won't scale well as you add more servers. Also, what would a client do if it was able to send the update to only some of the servers because others were down (for maintenance, etc)? -Bryan On May 21, 2009, at May 21, 6:04 AM, nk 11 wrote: Just curious. What would be the disadvantages of a no replication / multi master (no slave) setup? The client code should do the updates for evey master ofc, but if one machine would fail then I can imediatly continue the indexing process and also I can query the index on any machine for a valid result. I might be missing something... On Thu, May 14, 2009 at 4:19 PM, nk 11 nick.cass...@gmail.com wrote: wow! that was just a couple of days old! thanks as lot! 2009/5/14 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com yeah there is a hack https://issues.apache.org/jira/browse/SOLR-1154?focusedCommentId=12708316page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel #action_12708316 On Thu, May 14, 2009 at 6:07 PM, nk 11 nick.cass...@gmail.com wrote: sorry for the mail. I wanted to hit reply :( On Thu, May 14, 2009 at 3:37 PM, nk 11 nick.cass...@gmail.com wrote: oh, so the configuration must be manualy changed? Can't something be passed at (re)start time? 2009/5/14 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com On Thu, May 14, 2009 at 4:07 PM, nk 11 nick.cass...@gmail.com wrote: Ok so the VIP will point to the new master. but what makes a slave promoted to a master? Only the fact that it will receive add/update requests? And I suppose that this hot promotion is possible only if the slave is convigured as master also... right.. By default you can setup all slaves to be master also. It does not cost anything if it is not serving any requests. so , if you have such a setting you will have to disable that slave to be a slave and restart it and you will have to make the VIP point to this new slave as master. so hot promotion is still not possible. 2009/5/14 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com ideally , we don't do that. you can just keep the master host behind a VIP so if you wish to change the master make the VIP point to the new host On Wed, May 13, 2009 at 10:52 PM, nk 11 nick.cass. 1...@gmail.com wrote: This is more interesting.Such a procedure would involve taking down and reconfiguring the slave? On Wed, May 13, 2009 at 7:55 PM, Bryan Talbot btal...@aeriagames.comwrote: Or ... 1. Promote existing slave to new master 2. Add new slave to cluster -Bryan On May 13, 2009, at May 13, 9:48 AM, Jay Hill wrote: - Migrate configuration files from old master (or backup) to new master. - Replicate from a slave to the new master. - Resume indexing to new master. -Jay On Wed, May 13, 2009 at 4:26 AM, nk 11 nick.cass...@gmail.com wrote: Nice. What if the master fails permanently (like a disk crash...) and the new master is a clean machine? 2009/5/13 Noble Paul നോബിള് नो ब्ळ् noble.p...@corp.aol.com On Wed, May 13, 2009 at 12:10 PM, nk 11 nick.cass...@gmail.com wrote: Hello I'm kind of new to Solr and I've read about replication, and the fact that a node can act as both master and slave. I a replica fails and then comes back on line I suppose that it will resyncs with the master. right But what happnes if the master fails? A slave that is configured as master will kick in? What if that slave is not yes fully sync'ed with the failed master and has old data? if the master fails you can't index the data. but the slaves will continue serving the requests with the last index. You an bring back the master up and resume indexing. What happens when the original master comes back on line? He will remain a slave because there is another node with the master role? Thank you! -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com
RE: Creating a distributed search in a searchComponent
I was looking for answer to the same question, and have similar concern. Looks like any serious customization work requires developing custom SearchComponent, but it's not clear to me how Solr designer wanted this to be done. I have more confident to either do it at Lucene level, or stay on client side and using something like Multi-core (as discussed here http://wiki.apache.org/solr/MultipleIndexes). Date: Wed, 20 May 2009 13:47:20 -0400 Subject: RE: Creating a distributed search in a searchComponent From: nicholas.bai...@rackspace.com To: solr-user@lucene.apache.org It seems I sent this out a bit too soon. After looking at the source it seems there are two seperate paths for distributed and regular queries, however the prepare method for for all components is run before the shards parameter is checked. So I can build the shards portion by using the prepare method of the my own search component. However I'm not sure if this is the greatest idea in case solr changes at some point. -Nick -Original Message- From: Nick Bailey nicholas.bai...@rackspace.com Sent: Wednesday, May 20, 2009 1:29pm To: solr-user@lucene.apache.org Subject: Creating a distributed search in a searchComponent Hi, I am wondering if it is possible to basically add the distributed portion of a search query inside of a searchComponent. I am hoping to build my own component and add it as a first-component to the StandardRequestHandler. Then hopefully I will be able to use this component to build the shards parameter of the query and have the Handler then treat the query as a distributed search. Anyone have any experience or know if this is possible? Thanks, Nick _ Hotmail® has ever-growing storage! Don’t worry about storage limits. http://windowslive.com/Tutorial/Hotmail/Storage?ocid=TXT_TAGLM_WL_HM_Tutorial_Storage1_052009
Re: Plugin Not Found
Jeff Newburn wrote: Nothing else is in the lib directory but this one jar. Additionally, the logs seem to say that it finds the lib as shown below INFO: Solr home set to '/home/zetasolr/' May 20, 2009 10:16:56 AM org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Adding 'file:/home/zetasolr/lib/FacetCubeComponent.jar' to Solr classloader However as soon as it tries the component it cannot find the class. Something must be wacky. I just did a quick custom component with 1.3 and trunk, and it loaded no problem in both cases. Anything odd about your Component? Your sure it extends SearchComponent? As Noble mentioned, you will not be able to find other classes/jars in the solr.home/lib directory from a class/jar in the solr.home/lib directory. But this, oddly, doesn't appear to be the issue your facing. Do share if you have anything else you can add. -- - Mark http://www.lucidimagination.com
Re: Creating a distributed search in a searchComponent
On Wed, May 20, 2009 at 10:59 PM, Nick Bailey nicholas.bai...@rackspace.com wrote: Hi, I am wondering if it is possible to basically add the distributed portion of a search query inside of a searchComponent. I am hoping to build my own component and add it as a first-component to the StandardRequestHandler. Then hopefully I will be able to use this component to build the shards parameter of the query and have the Handler then treat the query as a distributed search. Anyone have any experience or know if this is possible? You can also add a ServletFilter before SolrDispatchFilter and add the parameters before Solr processes the query. -- Regards, Shalin Shekhar Mangar.
Re: Creating a distributed search in a searchComponent
Also look at SOLR-565 and see if that helps you. https://issues.apache.org/jira/browse/SOLR-565 On Thu, May 21, 2009 at 9:58 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Wed, May 20, 2009 at 10:59 PM, Nick Bailey nicholas.bai...@rackspace.com wrote: Hi, I am wondering if it is possible to basically add the distributed portion of a search query inside of a searchComponent. I am hoping to build my own component and add it as a first-component to the StandardRequestHandler. Then hopefully I will be able to use this component to build the shards parameter of the query and have the Handler then treat the query as a distributed search. Anyone have any experience or know if this is possible? You can also add a ServletFilter before SolrDispatchFilter and add the parameters before Solr processes the query. -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
Re: what does the version parameter in the query mean?
I was interested in this recently and also couldn't find anything on the wiki. I found this in the list archive: The version parameter determines the XML protocol used in the response. Clients are strongly encouraged to ''always'' specify the protocol version, so as to ensure that the format of the response they receive does not change unexpectedly if/when the Solr server is upgraded. Here is a link to the archive: http://www.mail-archive.com/solr-comm...@lucene.apache.org/msg00518.html -Jay On Thu, May 21, 2009 at 1:06 AM, Anshuman Manur anshuman_ma...@stragure.com wrote: Hello all, I'm using Solr 1.3.0, and when I query my index for solr using the admin page, the query string in the address bar of my browser reads like this: http://localhost:8080/solr/select/?q=solrversion=2.2start=0rows=10indent=on Now, I don't know what version=2.2 means, and the wiki or the docs don't tell me. Could someone enlighten me? Thank You Anshuman Manur
No sanity checks before replicating files?
Hi list, We have deployed an experimental Solr 1.4 cluster (a master/slave setup, with automatic promotion of the slave as a master in case of failure) on drupal.org, to manage our medium size index (3GB, about 400K documents). One of the problem we are facing is that there seems to be no sanity checks before downloading files. Take the following scenario: - initial situation: s1 is master, s2 is slave - s1 fails, the virtual IP falls back to s2 - some updates happen on s2 - suppose now that s1 gets back online, s2 tries to replicate from s1, but after replicating all the files (3GB), the commit fails because the local index has been locally updated, the replication fails, but the process restarts at the next poll (redownload all the index files, fails again...) and so on We are considering configuring each server to replicate from the virtual IP, which should solve that issue for us, but couldn't the slave do some sanity checks before trying to download all the files from the master? Thanks in advance for any help you could provide, Damien Tournoud
Re: master/slave failure scenario
You are right... I just don't like the idea of stopping the indexing process if the master fails until a new one is started (more or less by hand). On Thu, May 21, 2009 at 6:49 PM, Bryan Talbot btal...@aeriagames.comwrote: Indexing is usually much more expensive that replication so it won't scale well as you add more servers. Also, what would a client do if it was able to send the update to only some of the servers because others were down (for maintenance, etc)? -Bryan On May 21, 2009, at May 21, 6:04 AM, nk 11 wrote: Just curious. What would be the disadvantages of a no replication / multi master (no slave) setup? The client code should do the updates for evey master ofc, but if one machine would fail then I can imediatly continue the indexing process and also I can query the index on any machine for a valid result. I might be missing something... On Thu, May 14, 2009 at 4:19 PM, nk 11 nick.cass...@gmail.com wrote: wow! that was just a couple of days old! thanks as lot! 2009/5/14 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com yeah there is a hack https://issues.apache.org/jira/browse/SOLR-1154?focusedCommentId=12708316page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel #action_12708316 On Thu, May 14, 2009 at 6:07 PM, nk 11 nick.cass...@gmail.com wrote: sorry for the mail. I wanted to hit reply :( On Thu, May 14, 2009 at 3:37 PM, nk 11 nick.cass...@gmail.com wrote: oh, so the configuration must be manualy changed? Can't something be passed at (re)start time? 2009/5/14 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com On Thu, May 14, 2009 at 4:07 PM, nk 11 nick.cass...@gmail.com wrote: Ok so the VIP will point to the new master. but what makes a slave promoted to a master? Only the fact that it will receive add/update requests? And I suppose that this hot promotion is possible only if the slave is convigured as master also... right.. By default you can setup all slaves to be master also. It does not cost anything if it is not serving any requests. so , if you have such a setting you will have to disable that slave to be a slave and restart it and you will have to make the VIP point to this new slave as master. so hot promotion is still not possible. 2009/5/14 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com ideally , we don't do that. you can just keep the master host behind a VIP so if you wish to change the master make the VIP point to the new host On Wed, May 13, 2009 at 10:52 PM, nk 11 nick.cass...@gmail.com wrote: This is more interesting.Such a procedure would involve taking down and reconfiguring the slave? On Wed, May 13, 2009 at 7:55 PM, Bryan Talbot btal...@aeriagames.comwrote: Or ... 1. Promote existing slave to new master 2. Add new slave to cluster -Bryan On May 13, 2009, at May 13, 9:48 AM, Jay Hill wrote: - Migrate configuration files from old master (or backup) to new master. - Replicate from a slave to the new master. - Resume indexing to new master. -Jay On Wed, May 13, 2009 at 4:26 AM, nk 11 nick.cass...@gmail.com wrote: Nice. What if the master fails permanently (like a disk crash...) and the new master is a clean machine? 2009/5/13 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com On Wed, May 13, 2009 at 12:10 PM, nk 11 nick.cass...@gmail.com wrote: Hello I'm kind of new to Solr and I've read about replication, and the fact that a node can act as both master and slave. I a replica fails and then comes back on line I suppose that it will resyncs with the master. right But what happnes if the master fails? A slave that is configured as master will kick in? What if that slave is not yes fully sync'ed with the failed master and has old data? if the master fails you can't index the data. but the slaves will continue serving the requests with the last index. You an bring back the master up and resume indexing. What happens when the original master comes back on line? He will remain a slave because there is another node with the master role? Thank you! -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Customizing SOLR-236 field collapsing
Yes, I have tried it but I see couple of problems doing that. I will have to do more searches so response time will increase. The second thing is that, imagine I show the results collapsed in page one and put a button to see the non collapsed results. If later results for the second page are requested, some results from the non collapsed request would be the same that some results that apeared in the first page doing collapsing: collapsing page 1 shows docs: 1-2-3-6-7 non collapsing results page 1 shows docs: 1-2-3-4-5 collapsing results page 2 shows docs: 8-9-10-11-12 non collapsing results page 2 show docs: 6-7-8-9-10 I want to avoid that and make the response as fast as possible. That is the reason because I want to send the collapsed docs to the end of the queue... Thanks Thomas Traeger-2 wrote: Is adding QueryComponent to your SearchComponents an option? When combined with the CollapseComponent this approach would return the collapsed and the complete result set. i.e.: arr name=components strcollapse/str strquery/str strfacet/str strmlt/str strhighlight/str /arr Thomas Marc Sturlese schrieb: Hey there, I have been testing the last adjacent field collapsing patch in trunk and seems to work perfectly. I am trying to modify the function of it but don't know exactly how to do it. What I would like to do is instead of collapse the results send them to the end of the results cue. Aparently it is not possible to do that due to the way it is implemented. I have noticed that you get a DocSet of the ids that survived the collapsing and that match the query and filters (collapseFilterDocSet = collapseFilter.getDocSet();, you get it in CollapseComponent.java. Once it is done the search is excuted again, this time the DocSet obtained before is passed as a filter: DocListAndSet results = searcher.getDocListAndSet(rb.getQuery(), collapseFilterDocSet == null? rb.getFilters(): null, collapseFilterDocSet, rb.getSortSpec().getSort(), rb.getSortSpec().getOffset(), rb.getSortSpec().getCount(), rb.getFieldFlags()); The result of this search will give you the final result (with the correct offset and start). I have thought in saving the collapsed docs in another DocSet and after do something with them... but don't know how to manage it. Any clue about how could I reach the goal? Thanks in advance -- View this message in context: http://www.nabble.com/Customizing-SOLR-236-field-collapsing-tp23653220p23656522.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: master/slave failure scenario
Hi, You should be able to do the following. Put masters behind a load balancer (LB). Create a LB VIP and a pool with 2 masters, masterA masterB with a rule that all requests always go to A unless A is down. If If A is down they go to B. Bring up master instances A and B on 2 servers and make them point to the shared storage. masterA \ \-- shared storage / masterB / Your indexing client doesn't talk to the servers directly. It talks through the VIP you created in LB. At any one time only one of the masters is active. If A goes down, LB detects it and makes B active. Your indexer may have to reconnect if it detects a failure, maybe it would need to reindex some number of documents if they didn't make it to disk before A died, maybe even some lock file cleanup might be needed, but the above should be doable with little effort. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: nk 11 nick.cass...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, May 21, 2009 12:44:55 PM Subject: Re: master/slave failure scenario You are right... I just don't like the idea of stopping the indexing process if the master fails until a new one is started (more or less by hand). On Thu, May 21, 2009 at 6:49 PM, Bryan Talbot wrote: Indexing is usually much more expensive that replication so it won't scale well as you add more servers. Also, what would a client do if it was able to send the update to only some of the servers because others were down (for maintenance, etc)? -Bryan On May 21, 2009, at May 21, 6:04 AM, nk 11 wrote: Just curious. What would be the disadvantages of a no replication / multi master (no slave) setup? The client code should do the updates for evey master ofc, but if one machine would fail then I can imediatly continue the indexing process and also I can query the index on any machine for a valid result. I might be missing something... On Thu, May 14, 2009 at 4:19 PM, nk 11 wrote: wow! that was just a couple of days old! thanks as lot! 2009/5/14 Noble Paul നോബിള് नोब्ळ् yeah there is a hack https://issues.apache.org/jira/browse/SOLR-1154?focusedCommentId=12708316page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel #action_12708316 On Thu, May 14, 2009 at 6:07 PM, nk 11 wrote: sorry for the mail. I wanted to hit reply :( On Thu, May 14, 2009 at 3:37 PM, nk 11 wrote: oh, so the configuration must be manualy changed? Can't something be passed at (re)start time? 2009/5/14 Noble Paul നോബിള് नोब्ळ् On Thu, May 14, 2009 at 4:07 PM, nk 11 wrote: Ok so the VIP will point to the new master. but what makes a slave promoted to a master? Only the fact that it will receive add/update requests? And I suppose that this hot promotion is possible only if the slave is convigured as master also... right.. By default you can setup all slaves to be master also. It does not cost anything if it is not serving any requests. so , if you have such a setting you will have to disable that slave to be a slave and restart it and you will have to make the VIP point to this new slave as master. so hot promotion is still not possible. 2009/5/14 Noble Paul നോബിള് नोब्ळ् ideally , we don't do that. you can just keep the master host behind a VIP so if you wish to change the master make the VIP point to the new host On Wed, May 13, 2009 at 10:52 PM, nk 11 wrote: This is more interesting.Such a procedure would involve taking down and reconfiguring the slave? On Wed, May 13, 2009 at 7:55 PM, Bryan Talbot wrote: Or ... 1. Promote existing slave to new master 2. Add new slave to cluster -Bryan On May 13, 2009, at May 13, 9:48 AM, Jay Hill wrote: - Migrate configuration files from old master (or backup) to new master. - Replicate from a slave to the new master. - Resume indexing to new master. -Jay On Wed, May 13, 2009 at 4:26 AM, nk 11 wrote: Nice. What if the master fails permanently (like a disk crash...) and the new master is a clean machine? 2009/5/13 Noble Paul നോബിള് नोब्ळ् On Wed, May 13, 2009 at 12:10 PM, nk 11 nick.cass...@gmail.com wrote: Hello I'm kind of new to Solr and I've read about replication, and the fact that a node can act as both master and slave. I a replica fails and then comes back on line I suppose that it will resyncs with the master. right But what happnes if the master fails? A slave that is configured as master will kick in? What if that slave is not yes fully sync'ed with the failed master and has old data? if the master fails you can't index the data. but the slaves will continue
Re: No sanity checks before replicating files?
Hi Damien, Interesting, this is similar to my suggestion to another person I just replied to here on solr-user. Have you actually run into this problem? I haven't tried it, but I'd think the first next replication (copying index from s1 to s2) would not necessarily fail, but would simply overwrite any changes that were made on s2 while it was serving as the master. Is that not what happens? If that's what happens, then I think what you'd simply have to do is to: 1) bring s1 back up, but don't make it a master immediately 2) take away the master role from s2 3) make s1 copy the index from s2, since s2 might have a more up to date index now 4) make s1 the master Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Damien Tournoud dam...@tournoud.net To: solr-user@lucene.apache.org Sent: Thursday, May 21, 2009 12:37:10 PM Subject: No sanity checks before replicating files? Hi list, We have deployed an experimental Solr 1.4 cluster (a master/slave setup, with automatic promotion of the slave as a master in case of failure) on drupal.org, to manage our medium size index (3GB, about 400K documents). One of the problem we are facing is that there seems to be no sanity checks before downloading files. Take the following scenario: - initial situation: s1 is master, s2 is slave - s1 fails, the virtual IP falls back to s2 - some updates happen on s2 - suppose now that s1 gets back online, s2 tries to replicate from s1, but after replicating all the files (3GB), the commit fails because the local index has been locally updated, the replication fails, but the process restarts at the next poll (redownload all the index files, fails again...) and so on We are considering configuring each server to replicate from the virtual IP, which should solve that issue for us, but couldn't the slave do some sanity checks before trying to download all the files from the master? Thanks in advance for any help you could provide, Damien Tournoud
clustering SOLR-769
Hi, I built Solr from SVN today morning. I am using Clustering example. I have added my own schema.xml. The problem is the even though I change carrot.snippet field from features to filecontent the clustering results are not changed a bit. Please note features field is also there in my document. str name=carrot.titlename/str !-- The field to cluster on -- str name=carrot.snippetfeatures/str str name=carrot.urlid/str Why I get the same cluster even though I have changed the carrot.snippet. Whether there is some problem with my understarnding? Regards, allahbaksh
Re: No sanity checks before replicating files?
Hi Otis, Thanks for your answer. On Thu, May 21, 2009 at 7:14 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Interesting, this is similar to my suggestion to another person I just replied to here on solr-user. Have you actually run into this problem? I haven't tried it, but I'd think the first next replication (copying index from s1 to s2) would not necessarily fail, but would simply overwrite any changes that were made on s2 while it was serving as the master. Is that not what happens? No it doesn't. For some reason, Solr download all the files of the index, but fails to commit the changes locally. At the next poll, the process restarts. Not only does this clogs the network, but it also unnecessarily uses resources on the newly promoted slave, until we change its configuration. If that's what happens, then I think what you'd simply have to do is to: 1) bring s1 back up, but don't make it a master immediately 2) take away the master role from s2 3) make s1 copy the index from s2, since s2 might have a more up to date index now 4) make s1 the master Once s2 is the master, we want it to stay this way. We will reassign s1 as the slave at a later stage, when resources allows. What worries me is that strange behavior of Solr 1.4 replication when the slave index is fresher then the master one. Damien
Re: How to change the weight of the fields ?
Hi, I'm not sure why the rest of the scoring explanation is not shown, but your query *was* expanded to search on text and title_s, and id fields, so I think that expanded/rewritten query is what went to the index. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Vincent Pérès vincent.pe...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, May 21, 2009 4:34:00 AM Subject: Re: How to change the weight of the fields ? It seems I can only search on the field 'text'. With the following url : http://localhost:8983/solr/select/?q=novelqt=dismaxfl=title_s,idversion=2.2start=0rows=10indent=ondebugQuery=on I get answers, but on the debug area, it seems it's only searching on the 'text' field (with or without 'qt' the results are displayed within the same order) : novel novel − +DisjunctionMaxQuery((text:novel^0.5 | title_s:novel^5.0 | id:novel^10.0)~0.01) () − +(text:novel^0.5 | title_s:novel^5.0 | id:novel^10.0)~0.01 () − − 0.014641666 = (MATCH) sum of: 0.014641666 = (MATCH) max plus 0.01 times others of: 0.014641666 = (MATCH) weight(text:novel^0.5 in 114927), product of: 0.01362607 = queryWeight(text:novel^0.5), product of: 0.5 = boost 3.4734163 = idf(docFreq=10634, numDocs=43213) 0.007845918 = queryNorm 1.0745333 = (MATCH) fieldWeight(text:novel in 114927), product of: 1.4142135 = tf(termFreq(text:novel)=2) 3.4734163 = idf(docFreq=10634, numDocs=43213) 0.21875 = fieldNorm(field=text, doc=114927) etc. I should have a debug below with a search of the term into 'title_s' and 'id' no? Thanks for your answers ! Vincent -- View this message in context: http://www.nabble.com/How-to-change-the-weight-of-the-fields---tp23619971p23649624.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Phrase Search Issue
Amit, Append debugQuery=true to the search request URL and you'll see how your query string was interpreted. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: dabboo ag...@sapient.com To: solr-user@lucene.apache.org Sent: Thursday, May 21, 2009 3:48:45 AM Subject: Re: Phrase Search Issue This problem is related with the default operator in dismax. Currently OR is the default operator and it is behaving perfectly fine. I have changed the default operator in schema.xml to AND, I also have changed the minimum match to 100%. But it seems like AND as default operator doesnt work with Dismax. Please suggest. Thanks, Amit Garg dabboo wrote: Hi, I am facing one issue in phrase query. I am entering 'Top of the world' as my search criteria. I am expecting it to return all the records in which, one field should all these words in any order. But it is treating as OR and returning all the records, which are having either of these words. I am doing this using dismax request. I would appreciate if somebody can provide me some pointers. Thanks, Amit Garg -- View this message in context: http://www.nabble.com/Phrase-Search-Issue-tp23648813p23649189.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: No sanity checks before replicating files?
Aha, I see. Perhaps you can post the error message/stack trace? As for the sanity check, I bet a call to http://host:port/solr/replication?command=indexversion could be used ensure only newer versions of the index are being pulled. We'll see what Paul says when he wakes up. :) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Damien Tournoud dam...@tournoud.net To: solr-user@lucene.apache.org Sent: Thursday, May 21, 2009 1:26:30 PM Subject: Re: No sanity checks before replicating files? Hi Otis, Thanks for your answer. On Thu, May 21, 2009 at 7:14 PM, Otis Gospodnetic wrote: Interesting, this is similar to my suggestion to another person I just replied to here on solr-user. Have you actually run into this problem? I haven't tried it, but I'd think the first next replication (copying index from s1 to s2) would not necessarily fail, but would simply overwrite any changes that were made on s2 while it was serving as the master. Is that not what happens? No it doesn't. For some reason, Solr download all the files of the index, but fails to commit the changes locally. At the next poll, the process restarts. Not only does this clogs the network, but it also unnecessarily uses resources on the newly promoted slave, until we change its configuration. If that's what happens, then I think what you'd simply have to do is to: 1) bring s1 back up, but don't make it a master immediately 2) take away the master role from s2 3) make s1 copy the index from s2, since s2 might have a more up to date index now 4) make s1 the master Once s2 is the master, we want it to stay this way. We will reassign s1 as the slave at a later stage, when resources allows. What worries me is that strange behavior of Solr 1.4 replication when the slave index is fresher then the master one. Damien
Re: Plugin Not Found
One additional note we are on 1.4 tunk as of 5/7/2009. Just not sure why it won't load since it obviously works fine if directly inserted into the WEB-INF directory. -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 From: Mark Miller markrmil...@gmail.com Reply-To: solr-user@lucene.apache.org Date: Thu, 21 May 2009 12:19:47 -0400 To: solr-user@lucene.apache.org Subject: Re: Plugin Not Found Jeff Newburn wrote: Nothing else is in the lib directory but this one jar. Additionally, the logs seem to say that it finds the lib as shown below INFO: Solr home set to '/home/zetasolr/' May 20, 2009 10:16:56 AM org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Adding 'file:/home/zetasolr/lib/FacetCubeComponent.jar' to Solr classloader However as soon as it tries the component it cannot find the class. Something must be wacky. I just did a quick custom component with 1.3 and trunk, and it loaded no problem in both cases. Anything odd about your Component? Your sure it extends SearchComponent? As Noble mentioned, you will not be able to find other classes/jars in the solr.home/lib directory from a class/jar in the solr.home/lib directory. But this, oddly, doesn't appear to be the issue your facing. Do share if you have anything else you can add. -- - Mark http://www.lucidimagination.com
Regarding Delta-Import Query in DIH
Hi All, I understand from the details provided under http://wiki.apache.org/solr/DataImportHandler regarding Delta-import that there should be an additional column *last_modified* of timestamp type in the table. Is there any other way/method the same can be achieved without creating the additional column *last_modified* in the tables?? please advise. Thanks in advance
Re: Plugin Not Found
Can you share your full log (at least through startup) as well as the config for both the component and the ReqHandler that is using it? -Grant On May 21, 2009, at 3:37 PM, Jeff Newburn wrote: One additional note we are on 1.4 tunk as of 5/7/2009. Just not sure why it won't load since it obviously works fine if directly inserted into the WEB-INF directory. -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 From: Mark Miller markrmil...@gmail.com Reply-To: solr-user@lucene.apache.org Date: Thu, 21 May 2009 12:19:47 -0400 To: solr-user@lucene.apache.org Subject: Re: Plugin Not Found Jeff Newburn wrote: Nothing else is in the lib directory but this one jar. Additionally, the logs seem to say that it finds the lib as shown below INFO: Solr home set to '/home/zetasolr/' May 20, 2009 10:16:56 AM org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Adding 'file:/home/zetasolr/lib/FacetCubeComponent.jar' to Solr classloader However as soon as it tries the component it cannot find the class. Something must be wacky. I just did a quick custom component with 1.3 and trunk, and it loaded no problem in both cases. Anything odd about your Component? Your sure it extends SearchComponent? As Noble mentioned, you will not be able to find other classes/jars in the solr.home/lib directory from a class/jar in the solr.home/lib directory. But this, oddly, doesn't appear to be the issue your facing. Do share if you have anything else you can add. -- - Mark http://www.lucidimagination.com -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: clustering SOLR-769
Hi. I built Solr from SVN today morning. I am using Clustering example. I have added my own schema.xml. The problem is the even though I change carrot.snippet field from features to filecontent the clustering results are not changed a bit. Please note features field is also there in my document. str name=carrot.titlename/str !-- The field to cluster on -- str name=carrot.snippetfeatures/str str name=carrot.urlid/str Why I get the same cluster even though I have changed the carrot.snippet. Whether there is some problem with my understarnding? If you get back to the clustering dir in examples and change str name=carrot.snippetfeatures/str to str name=carrot.snippetmanu/str do you see any change in clusters? Cheers, Staszek -- http://carrot2.org
Re: java.lang.RuntimeException: after flush: fdx size mismatch
Hi Mike,Documents are web pages, about 20 fields, mostly strings, a couple of integers, booleans and one html field (for document body content). I do have a multi-threaded client pushing docs to Solr, so yes, I suppose that would mean I have several active Solr worker threads. The only exceptions I have are the RuntimeException flush errors, followed by a handful (normally 10-20) of LockObtainFailedExceptions, which i presumed were being caused by the faulty threads dying and failing to release locks. Oh wait, I am getting WstxUnexpectedCharException exceptions every now and then: SEVERE: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character ((CTRL-CHAR, code 8)) at [row,col {unknown-source}]: [1,26070] at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675) at com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4668) at com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126) at com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701) at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) at org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:327) I presumed these were caused by character encoding issues, but haven't looked into them at all yet. Thanks again for your help! I'll make some time this afternoon to build some patched Lucene jars and get the results On Thu, May 21, 2009 at 5:06 AM, Michael McCandless luc...@mikemccandless.com wrote: Another question: are there any other exceptions in your logs? Eg problems adding certain documents, or anything? Mike On Wed, May 20, 2009 at 11:18 AM, James X hello.nigerian.spamm...@gmail.com wrote: Hi Mike, thanks for the quick response: $ java -version java version 1.6.0_11 Java(TM) SE Runtime Environment (build 1.6.0_11-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode) I hadn't noticed the 268m trigger for LUCENE-1521 - I'm definitely not hitting that yet! The exception always reports 0 length, but the number of of docs varies, heavily weighted towards 1 or two docs. Of the last 130 or so exceptions: 89 1 docs vs 0 length 20 2 docs vs 0 length 9 3 docs vs 0 length 1 4 docs vs 0 length 3 5 docs vs 0 length 2 6 docs vs 0 length 1 7 docs vs 0 length 1 9 docs vs 0 length 1 10 docs vs 0 length The only unusual thing I can think of that we're doing with Solr is aggressively CREATE-ing and UNLOAD-ing cores. I've not been able to spot a pattern between core admin operations and these exceptions, however... James On Wed, May 20, 2009 at 2:37 AM, Michael McCandless luc...@mikemccandless.com wrote: Hmm... somehow Lucene is flushing a new segment on closing the IndexWriter, and thinks 1 doc had been added to the stored fields file, yet the fdx file is the wrong size (0 bytes). This check ( exception) are designed to prevent corruption from entering the index, so it's at least good to see CheckIndex passes after this. I don't think you're hitting LUCENE-1521: that issue only happens if a single segment has more than ~268 million docs. Which exact JRE version are you using? When you hit this exception, is it always 1 docs vs 0 length in bytes? Mike On Wed, May 20, 2009 at 3:19 AM, James X hello.nigerian.spamm...@gmail.com wrote: Hello all,I'm running Solr 1.3 in a multi-core environment. There are up to 2000 active cores in each Solr webapp instance at any given time. I've noticed occasional errors such as: SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1 docs vs 0 length in bytes of _h.fdx at org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:94) at org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83) at org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47) at org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367) at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450) at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578) at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153) during commit / optimise operations. These errors then cause cascading errors during updates on the
Re: Solr statistics of top searches and results returned
On May 20, 2009, at 4:33 AM, Shalin Shekhar Mangar wrote: On Wed, May 20, 2009 at 1:31 PM, Plaatje, Patrick patrick.plaa...@getronics.com wrote: At the moment Solr does not have such functionality. I have written a plugin for Solr though which uses a second Solr core to store/index the searches. If you're interested, send me an email and I'll get you the source for the plugin. Patrick, this will be a useful addition. However instead of doing this with another core, we can keep running statistics which can be shown on the statistics page itself. What do you think? I think you will want some type of persistence mechanism otherwise you will end up consuming a lot of resources keeping track of all the query strings, unless I'm missing something. Either a Lucene index (Solr core) or the option of embedding a DB. Ideally, it would be pluggable such that people could choose their storage mechanism. Most people do this kind of thing offline via log analysis as logs can grow quite large quite quickly. A related approach for showing slow queries was discussed recently. There's an issue open which has more details: https://issues.apache.org/jira/browse/SOLR-1101 -- Regards, Shalin Shekhar Mangar. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: clustering SOLR-769
Hi, I will try this. Because when I tried it with field declared by me there was no change. Will check out this and let you know. Is it possbile to specify more than one snippet field or should I use copy field to copy copy two or three field into single field and specify it in snippet field. Regards, Allahbaksh On Fri, May 22, 2009 at 2:24 AM, Stanislaw Osinski stanis...@osinski.namewrote: Hi. I built Solr from SVN today morning. I am using Clustering example. I have added my own schema.xml. The problem is the even though I change carrot.snippet field from features to filecontent the clustering results are not changed a bit. Please note features field is also there in my document. str name=carrot.titlename/str !-- The field to cluster on -- str name=carrot.snippetfeatures/str str name=carrot.urlid/str Why I get the same cluster even though I have changed the carrot.snippet. Whether there is some problem with my understarnding? If you get back to the clustering dir in examples and change str name=carrot.snippetfeatures/str to str name=carrot.snippetmanu/str do you see any change in clusters? Cheers, Staszek -- http://carrot2.org -- Allahbaksh Mohammedali Asadullah, Software Engineering Technology Labs, Infosys Technolgies Limited, Electronic City, Hosur Road, Bangalore 560 100, India. (Board: 91-80-28520261 | Extn: 73927 | Direct: 41173927. Fax: 91-80-28520362 | Mobile: 91-9845505322.
getting all rows from SOLRJ client using setRows method
Hello is there a way you can get all the results back from SOLR when querying solrJ client my gut feeling was that this might work query.setRows(-1) The way is to change the configuration xml file, but that like hard coding the configuration, and there also i have to set some valid number, i cant say return all rows. Is there a way to done through query. Thanks rashid -- View this message in context: http://www.nabble.com/getting-all-rows-from-SOLRJ-client-using-setRows-method-tp23662668p23662668.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: getting all rows from SOLRJ client using setRows method
careful what you ask for... what if you have a million docs? will you get an OOM? Maybe a better solution is to run a loop where you grab a bunch of docs and then increase the start value. but you can always use: query.setRows( Integer.MAX_VALUE ) ryan On May 21, 2009, at 8:37 PM, darniz wrote: Hello is there a way you can get all the results back from SOLR when querying solrJ client my gut feeling was that this might work query.setRows(-1) The way is to change the configuration xml file, but that like hard coding the configuration, and there also i have to set some valid number, i cant say return all rows. Is there a way to done through query. Thanks rashid -- View this message in context: http://www.nabble.com/getting-all-rows-from-SOLRJ-client-using-setRows-method-tp23662668p23662668.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: what does the version parameter in the query mean?
ahI see! thank you so much for the response! I'm using SolrJ, so I probably don't need to set XML version since the wiki tells me that it uses binary as a default! On Thu, May 21, 2009 at 10:00 PM, Jay Hill jayallenh...@gmail.com wrote: I was interested in this recently and also couldn't find anything on the wiki. I found this in the list archive: The version parameter determines the XML protocol used in the response. Clients are strongly encouraged to ''always'' specify the protocol version, so as to ensure that the format of the response they receive does not change unexpectedly if/when the Solr server is upgraded. Here is a link to the archive: http://www.mail-archive.com/solr-comm...@lucene.apache.org/msg00518.html -Jay On Thu, May 21, 2009 at 1:06 AM, Anshuman Manur anshuman_ma...@stragure.com wrote: Hello all, I'm using Solr 1.3.0, and when I query my index for solr using the admin page, the query string in the address bar of my browser reads like this: http://localhost:8080/solr/select/?q=solrversion=2.2start=0rows=10indent=on Now, I don't know what version=2.2 means, and the wiki or the docs don't tell me. Could someone enlighten me? Thank You Anshuman Manur
lock problem
Hi, The scenario is I have 2 different solr instances running at different locations concurrently. The data location for both instances is same: \\hostname\FileServer\CoreTeam\Research\data. Both instances use EmbeddedSolrServer and locktype at both instances is 'single'. I am getting following exception : Cannot overwrite: \\hostname\FileServer\CoreTeam\Research\data\index\_1.fdt at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:440) at org.apache.lucene.index.FieldsWriter.init(FieldsWriter.java:64) at org.apache.lucene.index.StoredFieldsWriter.initFieldsWriter(StoredFieldsWriter.java:73) I tried simple locktype also but it shows timeout exception when writing to index. Please help me out.. Thanks, Ashish -- View this message in context: http://www.nabble.com/lock-problem-tp23663558p23663558.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Regarding Delta-Import Query in DIH
the last_modified column is just one way. The query has to be intelligent enough to detect the delta . it doesn't matter how you do it On Fri, May 22, 2009 at 1:32 AM, jayakeerthi s mail2keer...@gmail.com wrote: Hi All, I understand from the details provided under http://wiki.apache.org/solr/DataImportHandler regarding Delta-import that there should be an additional column *last_modified* of timestamp type in the table. Is there any other way/method the same can be achieved without creating the additional column *last_modified* in the tables?? please advise. Thanks in advance -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: No sanity checks before replicating files?
Let us see what is the desired behavior. When s1 comes back up online , s2 must download a fresh copy of index from s1 because s1 is the slave and s2 has a newer version of index than s1. Are you suggesting that s2 downloads the index files and then commit fails? The code is written as follows boolean freshDownloadneeded = myIndexGeneration = mastersIndexgeneration; then it should be a problem can u post the stacktrace? On Thu, May 21, 2009 at 11:45 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Aha, I see. Perhaps you can post the error message/stack trace? As for the sanity check, I bet a call to http://host:port/solr/replication?command=indexversion could be used ensure only newer versions of the index are being pulled. We'll see what Paul says when he wakes up. :) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Damien Tournoud dam...@tournoud.net To: solr-user@lucene.apache.org Sent: Thursday, May 21, 2009 1:26:30 PM Subject: Re: No sanity checks before replicating files? Hi Otis, Thanks for your answer. On Thu, May 21, 2009 at 7:14 PM, Otis Gospodnetic wrote: Interesting, this is similar to my suggestion to another person I just replied to here on solr-user. Have you actually run into this problem? I haven't tried it, but I'd think the first next replication (copying index from s1 to s2) would not necessarily fail, but would simply overwrite any changes that were made on s2 while it was serving as the master. Is that not what happens? No it doesn't. For some reason, Solr download all the files of the index, but fails to commit the changes locally. At the next poll, the process restarts. Not only does this clogs the network, but it also unnecessarily uses resources on the newly promoted slave, until we change its configuration. If that's what happens, then I think what you'd simply have to do is to: 1) bring s1 back up, but don't make it a master immediately 2) take away the master role from s2 3) make s1 copy the index from s2, since s2 might have a more up to date index now 4) make s1 the master Once s2 is the master, we want it to stay this way. We will reassign s1 as the slave at a later stage, when resources allows. What worries me is that strange behavior of Solr 1.4 replication when the slave index is fresher then the master one. Damien -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: How to index large set data
check the status page of DIH and see if it is working properly. and if, yes what is the rate of indexing On Thu, May 21, 2009 at 11:48 AM, Jianbin Dai djian...@yahoo.com wrote: Hi, I have about 45GB xml files to be indexed. I am using DataImportHandler. I started the full import 4 hours ago, and it's still running My computer has 4GB memory. Any suggestion on the solutions? Thanks! JB -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: How to index large set data
Hi Paul, Thank you so much for answering my questions. It really helped. After some adjustment, basically setting mergeFactor to 1000 from the default value of 10, I can finished the whole job in 2.5 hours. I checked that during running time, only around 18% of memory is being used, and VIRT is always 1418m. I am thinking it may be restricted by JVM memory setting. But I run the data import command through web, i.e., http://host:port/solr/dataimport?command=full-import, how can I set the memory allocation for JVM? Thanks again! JB --- On Thu, 5/21/09, Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com wrote: From: Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com Subject: Re: How to index large set data To: solr-user@lucene.apache.org Date: Thursday, May 21, 2009, 9:57 PM check the status page of DIH and see if it is working properly. and if, yes what is the rate of indexing On Thu, May 21, 2009 at 11:48 AM, Jianbin Dai djian...@yahoo.com wrote: Hi, I have about 45GB xml files to be indexed. I am using DataImportHandler. I started the full import 4 hours ago, and it's still running My computer has 4GB memory. Any suggestion on the solutions? Thanks! JB -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: How to index large set data
what is the total no:of docs created ? I guess it may not be memory bound. indexing is mostly amn IO bound operation. You may be able to get a better perf if a SSD is used (solid state disk) On Fri, May 22, 2009 at 10:46 AM, Jianbin Dai djian...@yahoo.com wrote: Hi Paul, Thank you so much for answering my questions. It really helped. After some adjustment, basically setting mergeFactor to 1000 from the default value of 10, I can finished the whole job in 2.5 hours. I checked that during running time, only around 18% of memory is being used, and VIRT is always 1418m. I am thinking it may be restricted by JVM memory setting. But I run the data import command through web, i.e., http://host:port/solr/dataimport?command=full-import, how can I set the memory allocation for JVM? Thanks again! JB --- On Thu, 5/21/09, Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com wrote: From: Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com Subject: Re: How to index large set data To: solr-user@lucene.apache.org Date: Thursday, May 21, 2009, 9:57 PM check the status page of DIH and see if it is working properly. and if, yes what is the rate of indexing On Thu, May 21, 2009 at 11:48 AM, Jianbin Dai djian...@yahoo.com wrote: Hi, I have about 45GB xml files to be indexed. I am using DataImportHandler. I started the full import 4 hours ago, and it's still running My computer has 4GB memory. Any suggestion on the solutions? Thanks! JB -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com