Re: Type converters for DocumentObjectBinder
Hi Paul, it's working for Query, but not for Updating (Add Bean). The getter method is returning a Calendar (GregorianCalendar instance) On the indexer side, a toString() or something equivalent is done and an error is thrown Caused by: java.text.ParseException: Unparseable date: java.util.GregorianCalendar:java.util.GregorianCalendar[time=1258100168327,areFieldsSet= rue,areAllFieldsSet=true,lenient=true,zone=sun.util.calendar.ZoneInfo[id=Europe/Berlin,offset=360,dstSavings=360,useDaylight=true,tran itions=143,lastRule=java.util.SimpleTimeZone[id=Europe/Berlin,offset=360,dstSavings=360,useDaylight=true,startYear=0,startMode=2,startMo th=2,startDay=-1,startDayOfWeek=1,startTime=360,startTimeMode=2,endMode=2,endMonth=9,endDay=-1,endDayOfWeek=1,endTime=360,endTimeMode=2] ,firstDayOfWeek=2,minimalDaysInFirstWeek=4,ERA=1,YEAR=2009,MONTH=10,WEEK_OF_YEAR=46,WEEK_OF_MONTH=2,DAY_OF_MONTH=13,DAY_OF_YEAR=317,DAY_OF_WEEK= ,DAY_OF_WEEK_IN_MONTH=2,AM_PM=0,HOUR=9,HOUR_OF_DAY=9,MINUTE=16,SECOND=8,MILLISECOND=327,ZONE_OFFSET=360,DST_OFFSET=0] public Calendar getValidFrom() { return validFrom; } public void setValidFrom(Calendar validFrom) { this.validFrom = validFrom; } @Field public void setValidFrom(String validFrom) { Calendar cal = Calendar.getInstance(); try { cal.setTime(dateFormat.parse(validFrom)); } catch (ParseException e) { e.printStackTrace(); } this.validFrom = cal; } Noble Paul നോബിള് नोब्ळ्-2 wrote: create a setter method for the field which take s a Stringand apply the annotation there example private Calendar validFrom; @Field public void setvalidFrom(String s){ //convert to Calendar object and set the field } On Fri, Nov 13, 2009 at 12:24 PM, paulhyo st...@ouestil.ch wrote: Hi, I would like to know if there is a way to add type converters when using getBeans. I need convertion when Updating (Calendar - String) and when Searching (String - Calendar) The Bean class defines : @Field private Calendar validFrom; but the recieved type within Query Response is a String (2009-11-13)... Actually I get this error : java.lang.RuntimeException: Exception while setting value : 2009-09-16 on private java.util.Calendar ch.mycompany.access.solr.impl.SoNatPersonImpl.validFrom at org.apache.solr.client.solrj.beans.DocumentObjectBinder$DocField.set(DocumentObjectBinder.java:360) at org.apache.solr.client.solrj.beans.DocumentObjectBinder$DocField.inject(DocumentObjectBinder.java:342) at org.apache.solr.client.solrj.beans.DocumentObjectBinder.getBeans(DocumentObjectBinder.java:55) at org.apache.solr.client.solrj.response.QueryResponse.getBeans(QueryResponse.java:324) at ch.mycompany.access.solr.impl.result.NatPersonPartnerResultBuilder.buildBeanListResult(NatPersonPartnerResultBuilder.java:38) at ch.mycompany.access.solr.impl.SoQueryManagerImpl.searchNatPersons(SoQueryManagerImpl.java:41) at ch.mycompany.access.solr.impl.SolrQueryManagerTest.testQueryFamilyNameRigg(SolrQueryManagerTest.java:36) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:164) at junit.framework.TestCase.runBare(TestCase.java:130) at junit.framework.TestResult$1.protect(TestResult.java:106) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.framework.TestResult.run(TestResult.java:109) at junit.framework.TestCase.run(TestCase.java:120) at junit.framework.TestSuite.runTest(TestSuite.java:230) at junit.framework.TestSuite.run(TestSuite.java:225) at org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Caused by: java.lang.IllegalArgumentException: Can not set java.util.Calendar field ch.mycompany.access.solr.impl.SoNatPersonImpl.validFrom to java.lang.String at sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:146) at
highlighting issue lst.name is a leaf node
Hello list, I'm new to solr but from what I'm experimenting, it's awesome. I have a small issue regarding the highlighting feature. It finds stuff (as I see from the query analyzer), but the highlight list looks something like this: lst name=highlighting lst name=c:\0596520107.pdf/ lst name=c:\0470511389.pdf/ /lst (the files were added using ContentStreamUpdateRequest req = new ContentStreamUpdateRequest(/update/extract); and I set the literal.id to the filename) My solrconfig.xml requesthandler looks like: requestHandler name=standard class=solr.SearchHandler default=true !-- default values for query parameters -- lst name=defaults str name=echoParamsexplicit/str !-- int name=rows10/int str name=fl*/str str name=version2.1/str -- bool name=hltrue/bool int name=hl.snippets3/int int name=hl.fragsize30/int str name=hl.simple.pre![CDATA[span]]/str str name=hl.simple.post![CDATA[/span]]/str str name=hl.fl*/str bool name=hl.requireFieldMatchtrue/bool float name=hl.regex.slop0.5/float str name=hl.regex.pattern[-\w ,/\n\']{20,200}/str bool name=hl.usePhraseHighlightertrue/bool /lst /requestHandler The schema.xml is untouched and downloaded yesterday from the latest stable build. At first, I thought it had something to do with the extraction of the pdf, but I tried the demo xml docs also and got the same result. I'm new to this, so please help. Thank you, Chuck
Re: Stop solr without losing documents
Michael wrote: I've got a process external to Solr that is constantly feeding it new documents, retrying if Solr is nonresponding. What's the right way to stop Solr (running in Tomcat) so no documents are lost? Currently I'm committing all cores and then running catalina's stop script, but between my commit and the stop, more documents can come in that would need *another* commit... Lots of people must have had this problem already, so I know the answer is simple; I just can't find it! Thanks. Michael I don't know if this is the best solution, or even if it's applicable to your situation but we do incremental updates from a database based on a timestamp, (from a simple seperate sql table filled by triggers so deletes are measures correctly as well). We store this timestamp in solr as well. Our index script first does a simple Solr request to request the newest timestamp and basically selects the documents to update with a SELECT * FROM document_updates WHERE timestamp = X where X is the timestamp returned from Solr (We use = for the hopefully extremely rare case when two updates are at the same time and also at the same time the index script is run where it only retrieved one of the updates, this will cause some documents to be updates multiple times but as document updates are idempotent this is no real problem.) Regards, gwk
Re: Arguments for Solr implementation at public web site
Some extra for the pros list: - Full control over which content to be searchable and not. - Posibility to make pages searchable almost instant after publication - Control over when the site is indexed Friendly Jan-Eirik On Fri, Nov 13, 2009 at 10:52 AM, Lukáš Vlček lukas.vl...@gmail.com wrote: Hi, I am looking for good arguments to justify implementation a search for sites which are available on the public internet. There are many sites in powered by Solr section which are indexed by Google and other search engines but still they decided to invest resources into building and maintenance of their own search functionality and not to go with [user_query site: my_site.com] google search. Why? By no mean I am saying it makes not sense to implement Solr! But I want to put together list of reasons and possibly with examples. Your help would be much appreciated! Let's narrow the scope of this discussion to the following: - the search should cover several community sites running open source CMSs, JIRAs, Bugillas ... and the like - all documents use open formats (no need to parse Word or Excel) (maybe something close to what LucidImagination does for mailing lists of Lucene and Solr) My initial kick off list would be: pros: - considering we understand the content (we understand the domain scope) we can fine tune the search engine to provide more accurate results - Solr can give us facets - we have user search logs (valuable for analysis) - implementing Solr is a fun cons: - requires resources (but the cost is relatively low depending on the query traffic, index size and frequency of updates) Regards, Lukas http://blog.lukas-vlcek.com/ -- Jan Eirik B. Nævdal Solutions Engineer | +47 982 65 347 Iterate AS | www.iterate.no The Lean Software Development Consultancy
Re: Arguments for Solr implementation at public web site
Next to the faceting engine: - MoreLikeThis - Highlighting - Spellchecker But also more flexible querying using the DisMax handler which is clearly superior. Solr can also be used to store data which can be retrieved in an instant! We have used this technique in a site and it is obviously much faster than multiple large and complex SQL statements. On Fri, 2009-11-13 at 10:52 +0100, Lukáš Vlček wrote: pros: - considering we understand the content (we understand the domain scope) we can fine tune the search engine to provide more accurate results - Solr can give us facets - we have user search logs (valuable for analysis) - implementing Solr is a fun cons: - requires resources (but the cost is relatively low depending on the query traffic, index size and frequency of updates) Regards, Lukas http://blog.lukas-vlcek.com/
Re: highlighting issue lst.name is a leaf node
I found the solution. If somebody will run into the same problem, here is how I solved it. - while uploading the document: req.setParam(uprefix, attr_); req.setParam(fmap.content, attr_content); req.setParam(overwrite, true); req.setParam(commit, true); - in the query: http://localhost:8983/solr/select?q=attr_content:%22Django%22rows=4 - edit the solrconfig.xml in the requesthandler params str name=flid,title/str so that you won't get the whole text content inside the response. Regards, Chuck On Fri, Nov 13, 2009 at 11:21 AM, Chuck Mysak chuck.my...@gmail.com wrote: Hello list, I'm new to solr but from what I'm experimenting, it's awesome. I have a small issue regarding the highlighting feature. It finds stuff (as I see from the query analyzer), but the highlight list looks something like this: lst name=highlighting lst name=c:\0596520107.pdf/ lst name=c:\0470511389.pdf/ /lst (the files were added using ContentStreamUpdateRequest req = new ContentStreamUpdateRequest(/update/extract); and I set the literal.id to the filename) My solrconfig.xml requesthandler looks like: requestHandler name=standard class=solr.SearchHandler default=true !-- default values for query parameters -- lst name=defaults str name=echoParamsexplicit/str !-- int name=rows10/int str name=fl*/str str name=version2.1/str -- bool name=hltrue/bool int name=hl.snippets3/int int name=hl.fragsize30/int str name=hl.simple.pre![CDATA[span]]/str str name=hl.simple.post![CDATA[/span]]/str str name=hl.fl*/str bool name=hl.requireFieldMatchtrue/bool float name=hl.regex.slop0.5/float str name=hl.regex.pattern[-\w ,/\n\']{20,200}/str bool name=hl.usePhraseHighlightertrue/bool /lst /requestHandler The schema.xml is untouched and downloaded yesterday from the latest stable build. At first, I thought it had something to do with the extraction of the pdf, but I tried the demo xml docs also and got the same result. I'm new to this, so please help. Thank you, Chuck
Re: Arguments for Solr implementation at public web site
Jan-Eirik B. Nævdal schrieb: Some extra for the pros list: - Full control over which content to be searchable and not. - Posibility to make pages searchable almost instant after publication - Control over when the site is indexed +1 expecially the last point you can also add a robot.txt and prohibit spidering of the site to reduce traffic. google won't index any highly dynamic content, then. Friendly Jan-Eirik On Fri, Nov 13, 2009 at 10:52 AM, Lukáš Vlček lukas.vl...@gmail.com wrote: Hi, I am looking for good arguments to justify implementation a search for sites which are available on the public internet. There are many sites in powered by Solr section which are indexed by Google and other search engines but still they decided to invest resources into building and maintenance of their own search functionality and not to go with [user_query site: my_site.com] google search. Why? By no mean I am saying it makes not sense to implement Solr! But I want to put together list of reasons and possibly with examples. Your help would be much appreciated! Let's narrow the scope of this discussion to the following: - the search should cover several community sites running open source CMSs, JIRAs, Bugillas ... and the like - all documents use open formats (no need to parse Word or Excel) (maybe something close to what LucidImagination does for mailing lists of Lucene and Solr) My initial kick off list would be: pros: - considering we understand the content (we understand the domain scope) we can fine tune the search engine to provide more accurate results - Solr can give us facets - we have user search logs (valuable for analysis) - implementing Solr is a fun cons: - requires resources (but the cost is relatively low depending on the query traffic, index size and frequency of updates) Regards, Lukas http://blog.lukas-vlcek.com/ -- Jan Eirik B. Nævdal Solutions Engineer | +47 982 65 347 Iterate AS | www.iterate.no The Lean Software Development Consultancy
Re: Arguments for Solr implementation at public web site
Lukáš Vlček wrote: I am looking for good arguments to justify implementation a search for sites which are available on the public internet. There are many sites in powered by Solr section which are indexed by Google and other search engines but still they decided to invest resources into building and maintenance of their own search functionality and not to go with [user_query site: my_site.com] google search. Why? You're assuming that Solr is just used in these cases to index discrete web pages which Google etc. would be able to access via following navigational links. I would imagine that in a lot of cases, Solr is used to index database entities which are used to build [parts of] pages dynamically, and which might be viewable in different forms in various different pages. Plus, with stored fields, you have the option of actually driving a website off Solr instead of directly off a database, which might make sense from a speed perspective in some cases. And further, going back to page-only indexing -- you have no guarantee when Google will decide to recrawl your site, so there may be a delay before changes show up in their index. With an in-house search engine you can reindex as often as you like. Andrew. -- View this message in context: http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334734.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Arguments for Solr implementation at public web site
Hi, thanks for inputs so far... however, let's put it this way: When you need to search for something Lucene or Solr related, which one do you use: - generic Google - go to a particular mail list web site and search from here (if there is any search form at all) - go to LucidImagination.com and use its search capability Regards, Lukas On Fri, Nov 13, 2009 at 11:50 AM, Andrew Clegg andrew.cl...@gmail.comwrote: Lukáš Vlček wrote: I am looking for good arguments to justify implementation a search for sites which are available on the public internet. There are many sites in powered by Solr section which are indexed by Google and other search engines but still they decided to invest resources into building and maintenance of their own search functionality and not to go with [user_query site: my_site.com] google search. Why? You're assuming that Solr is just used in these cases to index discrete web pages which Google etc. would be able to access via following navigational links. I would imagine that in a lot of cases, Solr is used to index database entities which are used to build [parts of] pages dynamically, and which might be viewable in different forms in various different pages. Plus, with stored fields, you have the option of actually driving a website off Solr instead of directly off a database, which might make sense from a speed perspective in some cases. And further, going back to page-only indexing -- you have no guarantee when Google will decide to recrawl your site, so there may be a delay before changes show up in their index. With an in-house search engine you can reindex as often as you like. Andrew. -- View this message in context: http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334734.html Sent from the Solr - User mailing list archive at Nabble.com.
Data import problem with child entity from different database
Morning all, I'm having problems with joining child a child entity from one database to a parent from another... My entity definitions look like this (names changed for brevity): entity name=parent dataSource=db1 query=select a, b, c from parent_table entity name=child dataSource=db2 onError=continue query=select c, d from child_table where c = '${parent.c}' / /entity c is getting indexed fine (it's stored, I can see field 'c' in the search results) but child.d isn't. I know the child table has data for the corresponding parent rows, and I've even watched the SQL queries against the child table appearing in Oracle's sqldeveloper as the DataImportHandler runs. But no content for child.d gets into the index. My schema contains a definition for a field called d like so: field name=d type=keywords_ids indexed=true stored=true multiValued=true termVectors=true / (keywords_ids is a conservatively-analyzed text type which has worked fine in other contexts.) Two things occur to me. 1. db1 is PostgreSQL and db2 is Oracle, although the d field in both tables is just a char(4), nothing fancy. Could something weird with character encodings be happening? 2. d isn't a primary key in either parent or child, but this shouldn't matter should it? Additional data points -- I also tried using the CachedSqlEntityProcessor to do in-memory table caching of child, but it didn't work then either. I got a lot of error messages like this: No value available for the cache key : d in the entity : child If anyone knows whether this is a known limitation (if so I can work round it), or an unexpected case (if so I'll file a bug report), please shout. I'm using 1.4. Yet again, many thanks :-) Andrew. -- View this message in context: http://old.nabble.com/Data-import-problem-with-child-entity-from-different-database-tp26334948p26334948.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Arguments for Solr implementation at public web site
Lukáš Vlček wrote: When you need to search for something Lucene or Solr related, which one do you use: - generic Google - go to a particular mail list web site and search from here (if there is any search form at all) Both of these (Nabble in the second case) in case any recent posts have appeared which Google hasn't picked up. Andrew. -- View this message in context: http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334980.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Arguments for Solr implementation at public web site
For this list I usually end up @ http://solr.markmail.org (which I believe also uses Lucene under the hood) Google is such a black box ... Pros: + 1 Open Source (enough said :-) There also seems to always be the notion that crawling leads itself to produce the best results but that is rarely the case. And unless you are a special type of site Google will not overlay your results w/ some type of context in the search (ie news or sports, etc). What I think really needs to happen is Solr (and is a bit missing @ the moment) is there needs to be a common interface to reindexing another index (if that makes sense) ... something akin or like OpenSearch (http://www.opensearch.org/Community/OpenSearch_software) For example what I would like to do is have my site, have my search index, and connect Google to indexing just to my search index (and not crawl the site) ... the only current option for something like that are sitemaps which I think Solr (templates) should have a contrib project for (but you would have to generate these offline for sure). - Jon On Nov 13, 2009, at 6:00 AM, Lukáš Vlček wrote: Hi, thanks for inputs so far... however, let's put it this way: When you need to search for something Lucene or Solr related, which one do you use: - generic Google - go to a particular mail list web site and search from here (if there is any search form at all) - go to LucidImagination.com and use its search capability Regards, Lukas On Fri, Nov 13, 2009 at 11:50 AM, Andrew Clegg andrew.cl...@gmail.comwrote: Lukáš Vlček wrote: I am looking for good arguments to justify implementation a search for sites which are available on the public internet. There are many sites in powered by Solr section which are indexed by Google and other search engines but still they decided to invest resources into building and maintenance of their own search functionality and not to go with [user_query site: my_site.com] google search. Why? You're assuming that Solr is just used in these cases to index discrete web pages which Google etc. would be able to access via following navigational links. I would imagine that in a lot of cases, Solr is used to index database entities which are used to build [parts of] pages dynamically, and which might be viewable in different forms in various different pages. Plus, with stored fields, you have the option of actually driving a website off Solr instead of directly off a database, which might make sense from a speed perspective in some cases. And further, going back to page-only indexing -- you have no guarantee when Google will decide to recrawl your site, so there may be a delay before changes show up in their index. With an in-house search engine you can reindex as often as you like. Andrew. -- View this message in context: http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334734.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Selection of terms for MoreLikeThis
Any ideas on this? Is it worth sending a bug report? Those links are live, by the way, in case anyone wants to verify that MLT is returning suggestions with very low tf.idf. Cheers, Andrew. Andrew Clegg wrote: Hi, If I run a MoreLikeThis query like the following: http://www.cathdb.info/solr/mlt?q=id:3.40.50.720rows=0mlt.interestingTerms=listmlt.match.include=falsemlt.fl=keywordsmlt.mintf=1mlt.mindf=1 one of the hits in the results is and (I don't do any stopword removal on this field). However if I look inside that document with the TermVectorComponent: http://www.cathdb.info/solr/select/?q=id:3.40.50.720tv=truetv.all=truetv.fl=keywords I see that and has a measly tf.idf of 7.46E-4. But there are other terms with *much* higher tf.idf scores, e.g.: lst name=aquaspirillum int name=tf1/int int name=df10/int double name=tf-idf0.1/double /lst that *don't* appear in the MoreLikeThis list. (I tried adding mlt.maxwl=999 to the end of the MLT query but it makes no difference.) What's going on? Surely something with tf.idf = 0.1 is a far better candidate for a MoreLikeThis query than something with tf.idf = 1.46E-4? Or does MoreLikeThis do some other heuristic magic to select good candidates, and sometimes get it wrong? BTW the keywords field is indexed, stored, multi-valued and term-vectored. Thanks, Andrew. -- :: http://biotext.org.uk/ :: -- View this message in context: http://old.nabble.com/Selection-of-terms-for-MoreLikeThis-tp26286005p26335061.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Data import problem with child entity from different database
no obvious issues. you may post your entire data-config.xml do w/o CachedSqlEntityProcessor first and then apply that later On Fri, Nov 13, 2009 at 4:38 PM, Andrew Clegg andrew.cl...@gmail.com wrote: Morning all, I'm having problems with joining child a child entity from one database to a parent from another... My entity definitions look like this (names changed for brevity): entity name=parent dataSource=db1 query=select a, b, c from parent_table entity name=child dataSource=db2 onError=continue query=select c, d from child_table where c = '${parent.c}' / /entity c is getting indexed fine (it's stored, I can see field 'c' in the search results) but child.d isn't. I know the child table has data for the corresponding parent rows, and I've even watched the SQL queries against the child table appearing in Oracle's sqldeveloper as the DataImportHandler runs. But no content for child.d gets into the index. My schema contains a definition for a field called d like so: field name=d type=keywords_ids indexed=true stored=true multiValued=true termVectors=true / (keywords_ids is a conservatively-analyzed text type which has worked fine in other contexts.) Two things occur to me. 1. db1 is PostgreSQL and db2 is Oracle, although the d field in both tables is just a char(4), nothing fancy. Could something weird with character encodings be happening? 2. d isn't a primary key in either parent or child, but this shouldn't matter should it? Additional data points -- I also tried using the CachedSqlEntityProcessor to do in-memory table caching of child, but it didn't work then either. I got a lot of error messages like this: No value available for the cache key : d in the entity : child If anyone knows whether this is a known limitation (if so I can work round it), or an unexpected case (if so I'll file a bug report), please shout. I'm using 1.4. Yet again, many thanks :-) Andrew. -- View this message in context: http://old.nabble.com/Data-import-problem-with-child-entity-from-different-database-tp26334948p26334948.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
exclude some fields from copying dynamic fields | schema.xml
Hi, we are using the following entry in schema.xml to make a copy of one type of dynamic field to another : copyField source=*_s dest=*_str_s / Is it possible to exclude some fields from copying. We are using Solr1.3 ~Vikrant -- View this message in context: http://old.nabble.com/exclude-some-fields-from-copying-dynamic-fields-%7C-schema.xml-tp26335109p26335109.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Data import problem with child entity from different database
Noble Paul നോബിള് नोब्ळ्-2 wrote: no obvious issues. you may post your entire data-config.xml Here it is, exactly as last attempt but with usernames etc. removed. Ignore the comments and the unused FileDataSource... http://old.nabble.com/file/p26335171/dataimport.temp.xml dataimport.temp.xml Noble Paul നോബിള് नोब्ळ्-2 wrote: do w/o CachedSqlEntityProcessor first and then apply that later Yep, that was just a bit of a wild stab in the dark to see if it made any difference. Thanks, Andrew. -- View this message in context: http://old.nabble.com/Data-import-problem-with-child-entity-from-different-database-tp26334948p26335171.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 1.3 query and index perf tank during optimize
I think we sorely need a Directory impl that down-prioritizes IO performed by merging. It would be wonderful if from Java we could simply set a per-thread IO priority, but, it'll be a looong time until that's possible. So I think for now we should make a Directory impl that emulates such behavior, eg Lucene could state the context (merge, flush, search, nrt-reopen, etc.) whenever it opens an IndexInput / IndexOutput, and then the Directory could hack in pausing the merge IO whenever search/nrt-reopen IO is active. Mike On Thu, Nov 12, 2009 at 7:18 PM, Mark Miller markrmil...@gmail.com wrote: Jerome L Quinn wrote: Hi, everyone, this is a problem I've had for quite a while, and have basically avoided optimizing because of it. However, eventually we will get to the point where we must delete as well as add docs continuously. I have a Solr 1.3 index with ~4M docs at around 90G. This is a single instance running inside tomcat 6, so no replication. Merge factor is the default 10. ramBufferSizeMB is 32. maxWarmingSearchers=4. autoCommit is set at 3 sec. We continually push new data into the index, at somewhere between 1-10 docs every 10 sec or so. Solr is running on a quad-core 3.0GHz server. under IBM java 1.6. The index is sitting on a local 15K scsi disk. There's nothing else of substance running on the box. Optimizing the index takes about 65 min. As long as I'm not optimizing, search and indexing times are satisfactory. When I start the optimize, I see massive problems with timeouts pushing new docs into the index, and search times balloon. A typical search while optimizing takes about 1 min instead of a few seconds. Can anyone offer me help with fixing the problem? Thanks, Jerry Quinn Ah, the pains of optimization. Its kind of just how it is. One solution is to use two boxes and replication - optimize on the master, and then queries only hit the slave. Out of reach for some though, and adds many complications. Another kind of option is to use the partial optimize feature: optimize maxOptimizeSegments=5/ Using this, you can optimize down to n segments and take a shorter hit each time. Also, if optimizing is so painful, you might lower the merge factor amortize that pain better. Thats another way to slowly get there - if you lower the merge factor, as merging takes place, the new merge factor will be respected, and semgents will merge down. A merge factor of 2 (the lowest) will make it so you only ever have 2 segments. Sometimes that works reasonably well - you could try 3-6 or something as well. Then when you do your partial optimizes (and eventually a full optimize perhaps), you want have so far to go. -- - Mark http://www.lucidimagination.com
Re: javabin in .NET?
Nope. It has to be manually ported. Not so much because of the language itself but because of differences in the libraries. 2009/11/13 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com Is there any tool to directly port java to .Net? then we can etxract out the client part of the javabin code and convert it. On Thu, Nov 12, 2009 at 9:56 PM, Erik Hatcher erik.hatc...@gmail.com wrote: Has anyone looked into using the javabin response format from .NET (instead of SolrJ)? It's mainly a curiosity. How much better could performance/bandwidth/throughput be? How difficult would it be to implement some .NET code (C#, I'd guess being the best choice) to handle this response format? Thanks, Erik -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Selection of terms for MoreLikeThis
Hi Andrew, no idea, I'm afraid - but could you sent the output of interestingTerms=details? This at least would show what MoreLikeThis uses, in comparison to the TermVectorComponent you've already pasted. Chantal Andrew Clegg schrieb: Any ideas on this? Is it worth sending a bug report? Those links are live, by the way, in case anyone wants to verify that MLT is returning suggestions with very low tf.idf. Cheers, Andrew. Andrew Clegg wrote: Hi, If I run a MoreLikeThis query like the following: http://www.cathdb.info/solr/mlt?q=id:3.40.50.720rows=0mlt.interestingTerms=listmlt.match.include=falsemlt.fl=keywordsmlt.mintf=1mlt.mindf=1 one of the hits in the results is and (I don't do any stopword removal on this field). However if I look inside that document with the TermVectorComponent: http://www.cathdb.info/solr/select/?q=id:3.40.50.720tv=truetv.all=truetv.fl=keywords I see that and has a measly tf.idf of 7.46E-4. But there are other terms with *much* higher tf.idf scores, e.g.: lst name=aquaspirillum int name=tf1/int int name=df10/int double name=tf-idf0.1/double /lst that *don't* appear in the MoreLikeThis list. (I tried adding mlt.maxwl=999 to the end of the MLT query but it makes no difference.) What's going on? Surely something with tf.idf = 0.1 is a far better candidate for a MoreLikeThis query than something with tf.idf = 1.46E-4? Or does MoreLikeThis do some other heuristic magic to select good candidates, and sometimes get it wrong? BTW the keywords field is indexed, stored, multi-valued and term-vectored. Thanks, Andrew. -- :: http://biotext.org.uk/ :: -- View this message in context: http://old.nabble.com/Selection-of-terms-for-MoreLikeThis-tp26286005p26335061.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 1.3 query and index perf tank during optimize
Another thing to try, is reducing the maxThreadCount for ConcurrentMergeScheduler. It defaults to 3, which I think is too high -- we should change this default to 1 (I'll open a Lucene issue). Mike On Thu, Nov 12, 2009 at 6:30 PM, Jerome L Quinn jlqu...@us.ibm.com wrote: Hi, everyone, this is a problem I've had for quite a while, and have basically avoided optimizing because of it. However, eventually we will get to the point where we must delete as well as add docs continuously. I have a Solr 1.3 index with ~4M docs at around 90G. This is a single instance running inside tomcat 6, so no replication. Merge factor is the default 10. ramBufferSizeMB is 32. maxWarmingSearchers=4. autoCommit is set at 3 sec. We continually push new data into the index, at somewhere between 1-10 docs every 10 sec or so. Solr is running on a quad-core 3.0GHz server. under IBM java 1.6. The index is sitting on a local 15K scsi disk. There's nothing else of substance running on the box. Optimizing the index takes about 65 min. As long as I'm not optimizing, search and indexing times are satisfactory. When I start the optimize, I see massive problems with timeouts pushing new docs into the index, and search times balloon. A typical search while optimizing takes about 1 min instead of a few seconds. Can anyone offer me help with fixing the problem? Thanks, Jerry Quinn
Re: Solr 1.3 query and index perf tank during optimize
On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless luc...@mikemccandless.com wrote: I think we sorely need a Directory impl that down-prioritizes IO performed by merging. Presumably this prioritizing Directory impl could wrap/decorate any existing Directory. Mike
Re: javabin in .NET?
The javabin format does not have many dependencies. it may have 3-4 classes an that is it. On Fri, Nov 13, 2009 at 6:05 PM, Mauricio Scheffer mauricioschef...@gmail.com wrote: Nope. It has to be manually ported. Not so much because of the language itself but because of differences in the libraries. 2009/11/13 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com Is there any tool to directly port java to .Net? then we can etxract out the client part of the javabin code and convert it. On Thu, Nov 12, 2009 at 9:56 PM, Erik Hatcher erik.hatc...@gmail.com wrote: Has anyone looked into using the javabin response format from .NET (instead of SolrJ)? It's mainly a curiosity. How much better could performance/bandwidth/throughput be? How difficult would it be to implement some .NET code (C#, I'd guess being the best choice) to handle this response format? Thanks, Erik -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Selection of terms for MoreLikeThis
Chantal Ackermann wrote: no idea, I'm afraid - but could you sent the output of interestingTerms=details? This at least would show what MoreLikeThis uses, in comparison to the TermVectorComponent you've already pasted. I can, but I'm afraid they're not very illuminating! http://www.cathdb.info/solr/mlt?q=id:3.40.50.720rows=0mlt.interestingTerms=detailsmlt.match.include=falsemlt.fl=keywordsmlt.mintf=1mlt.mindf=1 response lst name=responseHeader int name=status0/int int name=QTime59/int /lst result name=response numFound=280227 start=0/ lst name=interestingTerms float name=keywords:dehydrogenase1.0/float float name=keywords:reductase1.0/float float name=keywords:metabolism1.0/float float name=keywords:activity1.0/float float name=keywords:process1.0/float float name=keywords:alcohol1.0/float float name=keywords:and1.0/float float name=keywords:malate1.0/float float name=keywords:biosynthesis1.0/float float name=keywords:biosynthetic1.0/float float name=keywords:degradation1.0/float float name=keywords:precursor1.0/float float name=keywords:metabolic1.0/float float name=keywords:protein1.0/float float name=keywords:synthase1.0/float float name=keywords:acid1.0/float float name=keywords:enzyme1.0/float float name=keywords:succinyl-coa1.0/float float name=keywords:putative1.0/float float name=keywords:(nadp+)1.0/float float name=keywords:4,6-dehydratase1.0/float float name=keywords:fatty1.0/float float name=keywords:chloroplast1.0/float float name=keywords:lactobacillus1.0/float float name=keywords:glyoxylate1.0/float /lst /response Cheers, Andrew. -- View this message in context: http://old.nabble.com/Selection-of-terms-for-MoreLikeThis-tp26286005p26336558.html Sent from the Solr - User mailing list archive at Nabble.com.
non english languages
Hello all, is there support for non-english language content indexing in Solr? I'm interested in Bulgarian, Hungarian, Romanian and Russian. Best regards, Chuck
Re: non english languages
the included snowball filters support hungarian, romanian, and russian. On Fri, Nov 13, 2009 at 9:03 AM, Chuck Mysak chuck.my...@gmail.com wrote: Hello all, is there support for non-english language content indexing in Solr? I'm interested in Bulgarian, Hungarian, Romanian and Russian. Best regards, Chuck -- Robert Muir rcm...@gmail.com
Re: Selection of terms for MoreLikeThis
Hi Andrew, your URL does not include the parameter mlt.boost. Setting that to true made a noticeable difference for my queries. If not, there is also the parameter mlt.minwl minimum word length below which words will be ignored. All your other terms seem longer than 3, so it would help in this case? But seems a bit like work around. Cheers, Chantal Andrew Clegg schrieb: Chantal Ackermann wrote: no idea, I'm afraid - but could you sent the output of interestingTerms=details? This at least would show what MoreLikeThis uses, in comparison to the TermVectorComponent you've already pasted. I can, but I'm afraid they're not very illuminating! http://www.cathdb.info/solr/mlt?q=id:3.40.50.720rows=0mlt.interestingTerms=detailsmlt.match.include=falsemlt.fl=keywordsmlt.mintf=1mlt.mindf=1 response lst name=responseHeader int name=status0/int int name=QTime59/int /lst result name=response numFound=280227 start=0/ lst name=interestingTerms float name=keywords:dehydrogenase1.0/float float name=keywords:reductase1.0/float float name=keywords:metabolism1.0/float float name=keywords:activity1.0/float float name=keywords:process1.0/float float name=keywords:alcohol1.0/float float name=keywords:and1.0/float float name=keywords:malate1.0/float float name=keywords:biosynthesis1.0/float float name=keywords:biosynthetic1.0/float float name=keywords:degradation1.0/float float name=keywords:precursor1.0/float float name=keywords:metabolic1.0/float float name=keywords:protein1.0/float float name=keywords:synthase1.0/float float name=keywords:acid1.0/float float name=keywords:enzyme1.0/float float name=keywords:succinyl-coa1.0/float float name=keywords:putative1.0/float float name=keywords:(nadp+)1.0/float float name=keywords:4,6-dehydratase1.0/float float name=keywords:fatty1.0/float float name=keywords:chloroplast1.0/float float name=keywords:lactobacillus1.0/float float name=keywords:glyoxylate1.0/float /lst /response Cheers, Andrew. -- View this message in context: http://old.nabble.com/Selection-of-terms-for-MoreLikeThis-tp26286005p26336558.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: javabin in .NET?
I meant the standard IO libraries. They are different enough that the code has to be manually ported. There were some automated tools back when Microsoft introduced .Net, but IIRC they never really worked. Anyway it's not a big deal, it should be a straightforward job. Testing it thoroughly cross-platform is another thing though. 2009/11/13 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com The javabin format does not have many dependencies. it may have 3-4 classes an that is it. On Fri, Nov 13, 2009 at 6:05 PM, Mauricio Scheffer mauricioschef...@gmail.com wrote: Nope. It has to be manually ported. Not so much because of the language itself but because of differences in the libraries. 2009/11/13 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com Is there any tool to directly port java to .Net? then we can etxract out the client part of the javabin code and convert it. On Thu, Nov 12, 2009 at 9:56 PM, Erik Hatcher erik.hatc...@gmail.com wrote: Has anyone looked into using the javabin response format from .NET (instead of SolrJ)? It's mainly a curiosity. How much better could performance/bandwidth/throughput be? How difficult would it be to implement some .NET code (C#, I'd guess being the best choice) to handle this response format? Thanks, Erik -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Reseting doc boosts
Hi, Im trying to figure out if there is an easy way to basically reset all of any doc boosts which you have made (for analytical purposes) ... for example if I run an index, gather report, doc boost on the report, and reset the boosts @ time of next index ... It would seem to be from just knowing how Lucene works that I would really need to reindex since its a attrib on the doc itself which would have to be modified, but there is no easy way to query for docs which have been boosted either. Any insight? Thanks. - Jon
Re: Question about the message Indexing failed. Rolled back all changes.
I'm getting the same thing. The process runs, seemingly successfully, and I can even go to other SOLR pages pointing to the same server and pull queries against the index with these just-added entires. But the response to the original import says failed and rollback both through the XML response and also in the logs. Why is the process reporting failure and saying it did not commit/rolled back, when it actually succeeded in importing and indexing? If it rolled back, as the logs say, I would expect to not be able to pull those rows out with new queries against the index. Avlesh Singh wrote: But even after I successfully index data using http://host:port/solr-example/dataimport?command=full-importcommit=trueclean=true, do solr search which returns meaningful results I am not sure what meaningful means. The full-import command starts an asynchronous process to start re-indexing. The response that you get in return to the above mentioned URL, (always) indicates that a full-import has been started. It does NOT know about anything that might go wrong with the process itself. and then visit http://host:port/solr-example/dataimport?command=status, I can see thefollowing result ... The status URL is the one which tells you what is going on with the process. The message - Indexing failed. Rolled back all changes can come because of multiple reasons - missing database drivers, incorrect sql queries, runtime errors in custom transformers etc. Start the full-import once more. Keep a watch on the Solr server log. If you can figure out what's going wrong, great; otherwise, copy-paste the exception stack-trace from the log file for specific answers. Cheers Avlesh On Tue, Nov 10, 2009 at 1:32 PM, Bertie Shen bertie.s...@gmail.com wrote: No. I did not check the logs. But even after I successfully index data using http://host:port /solr-example/dataimport?command=full-importcommit=trueclean=true, do solr search which returns meaningful results, and then visit http://host:port/solr-example/dataimport?command=status, I can see the following result response - lst name=responseHeader int name=status0/int int name=QTime1/int /lst - lst name=initArgs - lst name=defaults str name=configdata-config.xml/str /lst /lst str name=commandstatus/str str name=statusidle/str str name=importResponse/ - lst name=statusMessages str name=Time Elapsed0:2:11.426/str str name=Total Requests made to DataSource584/str str name=Total Rows Fetched1538/str str name=Total Documents Skipped0/str str name=Full Dump Started2009-11-09 23:54:41/str *str name=Indexing failed. Rolled back all changes./str* str name=Committed2009-11-09 23:54:42/str str name=Optimized2009-11-09 23:54:42/str str name=Rolledback2009-11-09 23:54:42/str /lst - str name=WARNING This response format is experimental. It is likely to change in the future. /str /response On Mon, Nov 9, 2009 at 7:39 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Sat, Nov 7, 2009 at 1:10 PM, Bertie Shen bertie.s...@gmail.com wrote: When I use http://localhost:8180/solr/admin/dataimport.jsp?handler=/dataimport to debug the indexing config file, I always see the status message on the right part str name=Indexing failed. Rolled back all changes./str, even the indexing process looks to be successful. I am not sure whether you guys have seen the same phenomenon or not. BTW, I usually check the checkbox Clean and sometimes check Commit box, and then click Debug Now button. Do you see any exceptions in the logs? -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://old.nabble.com/Question-about-the-message-%22Indexing-failed.-Rolled-back-all--changes.%22-tp26242714p26338287.html Sent from the Solr - User mailing list archive at Nabble.com.
scanning folders recursively / Tika
Hello. I am on work with Tika 0.5 and want to scan a folder system about 10GB. Is there a comfortable way to scan folders recursively with an existing class or have i to write it myself? Any tips for best practise? Greetings, Peter -- Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 - sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser
Re: scanning folders recursively / Tika
Have one thread recursing depth first down the directories adding to a queue (fixed size). Have many threads reading off of the queue and doing the work. -glen http://zzzoot.blogspot.com/ 2009/11/13 Peter Gabriel zarato...@gmx.net: Hello. I am on work with Tika 0.5 and want to scan a folder system about 10GB. Is there a comfortable way to scan folders recursively with an existing class or have i to write it myself? Any tips for best practise? Greetings, Peter -- Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 - sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser -- -
Re: Stop solr without losing documents
On Fri, Nov 13, 2009 at 4:32 AM, gwk g...@eyefi.nl wrote: I don't know if this is the best solution, or even if it's applicable to your situation but we do incremental updates from a database based on a timestamp, (from a simple seperate sql table filled by triggers so deletes Thanks, gwk! This doesn't exactly meet our needs, but helped us get to a solution. In short, we are manually committing in our outside updater process (instead of letting Solr autocommit), and marking which documents have been updated before a successful commit. Now stopping solr is as easy as kill -9. Michael
how to search against multiple attributes in the index
I want to build AND search query against field1 AND field2 etc. Both these fields are stored in an index. I am migrating lucene code to Solr. Following is my existing lucene code BooleanQuery currentSearchingQuery = new BooleanQuery(); currentSearchingQuery.add(titleDescQuery,Occur.MUST); highlighter = new Highlighter( new QueryScorer(titleDescQuery)); TermQuery searchTechGroupQyery = new TermQuery(new Term (techGroup,searchForm.getTechGroup())); currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST); TermQuery searchProgramQyery = new TermQuery(new Term(techProgram,searchForm.getTechProgram())); currentSearchingQuery.add(searchProgramQyery, Occur.MUST); } What's the equivalent Solr code for above Luce code. Any samples would be appreciated. Thanks, -- View this message in context: http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html Sent from the Solr - User mailing list archive at Nabble.com.
The status of Local/Geo/Spatial/Distance Solr
Hey, I am interested in using LocalSolr to go Local/Geo/Spatial/Distance search. But the wiki of LocalSolr(http://wiki.apache.org/solr/LocalSolr) points to pretty old documentation. Is there a better document I refer to for the setting up of LocalSolr and some performance analysis? Just sync-ed Solr codebase and found LocalSolr is still NOT in the contrib package. Do we have a plan to incorporate it? I download a LocalSolr lib localsolr-1.5.jar from http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ and notice that the namespace is com.pjaol.search. blah blah, while LocalLucene package is in Lucene codebase and the package name is org.apache.lucene.spatial blah blah. But localsolr-1.5.jar from from http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ does not work with lucene-spatial-3.0-dev.jar I build from Lucene codebase directly. After I restart tomcat, I could not load solr admin page. The error is as follows. It looks solr is still looking for old named classes. Thanks. HTTP Status 500 - Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: abortOnConfigurationErrorfalse/abortOnConfigurationError in null - java.lang.NoClassDefFoundError: com/pjaol/search/geo/utils/DistanceFilter at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525) at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:833) at org.apache.solr.core.SolrCore.init(SolrCore.java:551) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:221) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:302) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:78) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3635) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4222) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760) at org.apache.catalina.core.ContainerBase.access$0(ContainerBase.java:744) at org.apache.catalina.core.ContainerBase$PrivilegedAddChild.run(ContainerBase.java:144) at java.security.AccessController.doPrivileged(Native Method) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:738) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:544) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:626) at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:488) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1138) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:311) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1022) at org.apache.catalina.core.StandardHost.start(StandardHost.java:736) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1014) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) at org.apache.catalina.core.StandardService.start(StandardService.java:448) at org.apache.catalina.core.StandardServer.start(StandardServer.java:700) at org.apache.catalina.startup.Catalina.start(Catalina.java:552) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:295) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:177) Caused by: java.lang.ClassNotFoundException: com.pjaol.search.geo.utils.DistanceFilter at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1362) at
Re: how to search against multiple attributes in the index
Dive in - http://wiki.apache.org/solr/Solrj Cheers Avlesh On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev vika...@yahoo.com wrote: I want to build AND search query against field1 AND field2 etc. Both these fields are stored in an index. I am migrating lucene code to Solr. Following is my existing lucene code BooleanQuery currentSearchingQuery = new BooleanQuery(); currentSearchingQuery.add(titleDescQuery,Occur.MUST); highlighter = new Highlighter( new QueryScorer(titleDescQuery)); TermQuery searchTechGroupQyery = new TermQuery(new Term (techGroup,searchForm.getTechGroup())); currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST); TermQuery searchProgramQyery = new TermQuery(new Term(techProgram,searchForm.getTechProgram())); currentSearchingQuery.add(searchProgramQyery, Occur.MUST); } What's the equivalent Solr code for above Luce code. Any samples would be appreciated. Thanks, -- View this message in context: http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: The status of Local/Geo/Spatial/Distance Solr
It looks like solr+spatial will get some attention in 1.5, check: https://issues.apache.org/jira/browse/SOLR-1561 Depending on your needs, that may be enough. More robust/scaleable solutions will hopefully work their way into 1.5 (any help is always appreciated!) On Nov 13, 2009, at 11:12 AM, Bertie Shen wrote: Hey, I am interested in using LocalSolr to go Local/Geo/Spatial/Distance search. But the wiki of LocalSolr(http://wiki.apache.org/solr/LocalSolr ) points to pretty old documentation. Is there a better document I refer to for the setting up of LocalSolr and some performance analysis? Just sync-ed Solr codebase and found LocalSolr is still NOT in the contrib package. Do we have a plan to incorporate it? I download a LocalSolr lib localsolr-1.5.jar from http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ and notice that the namespace is com.pjaol.search. blah blah, while LocalLucene package is in Lucene codebase and the package name is org.apache.lucene.spatial blah blah. But localsolr-1.5.jar from from http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ does not work with lucene-spatial-3.0-dev.jar I build from Lucene codebase directly. After I restart tomcat, I could not load solr admin page. The error is as follows. It looks solr is still looking for old named classes. Thanks. HTTP Status 500 - Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: abortOnConfigurationErrorfalse/abortOnConfigurationError in null - java.lang.NoClassDefFoundError: com/pjaol/search/geo/utils/DistanceFilter at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org .apache .solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525) at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java: 833) at org.apache.solr.core.SolrCore.init(SolrCore.java:551) at org.apache.solr.core.CoreContainer $Initializer.initialize(CoreContainer.java:137) at org .apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java: 83) at org .apache .catalina .core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java: 221) at org .apache .catalina .core .ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java: 302) at org .apache .catalina .core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:78) at org .apache .catalina.core.StandardContext.filterStart(StandardContext.java:3635) at org.apache.catalina.core.StandardContext.start(StandardContext.java: 4222) at org .apache .catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760) at org.apache.catalina.core.ContainerBase.access $0(ContainerBase.java:744) at org.apache.catalina.core.ContainerBase $PrivilegedAddChild.run(ContainerBase.java:144) at java.security.AccessController.doPrivileged(Native Method) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java: 738) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java: 544) at org .apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java: 626) at org .apache .catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java: 488) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1138) at org .apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java: 311) at org .apache .catalina .util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java: 1022) at org.apache.catalina.core.StandardHost.start(StandardHost.java:736) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java: 1014) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java: 443) at org.apache.catalina.core.StandardService.start(StandardService.java: 448) at org.apache.catalina.core.StandardServer.start(StandardServer.java: 700) at org.apache.catalina.startup.Catalina.start(Catalina.java:552) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun .reflect .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun .reflect .DelegatingMethodAccessorImpl .invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:295) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun .reflect
Re: The status of Local/Geo/Spatial/Distance Solr
Also: https://issues.apache.org/jira/browse/SOLR-1302 On Nov 13, 2009, at 11:12 AM, Bertie Shen wrote: Hey, I am interested in using LocalSolr to go Local/Geo/Spatial/Distance search. But the wiki of LocalSolr(http://wiki.apache.org/solr/LocalSolr ) points to pretty old documentation. Is there a better document I refer to for the setting up of LocalSolr and some performance analysis? Just sync-ed Solr codebase and found LocalSolr is still NOT in the contrib package. Do we have a plan to incorporate it? I download a LocalSolr lib localsolr-1.5.jar from http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ and notice that the namespace is com.pjaol.search. blah blah, while LocalLucene package is in Lucene codebase and the package name is org.apache.lucene.spatial blah blah. But localsolr-1.5.jar from from http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ does not work with lucene-spatial-3.0-dev.jar I build from Lucene codebase directly. After I restart tomcat, I could not load solr admin page. The error is as follows. It looks solr is still looking for old named classes. Thanks. HTTP Status 500 - Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: abortOnConfigurationErrorfalse/abortOnConfigurationError in null - java.lang.NoClassDefFoundError: com/pjaol/search/geo/utils/DistanceFilter at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org .apache .solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525) at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java: 833) at org.apache.solr.core.SolrCore.init(SolrCore.java:551) at org.apache.solr.core.CoreContainer $Initializer.initialize(CoreContainer.java:137) at org .apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java: 83) at org .apache .catalina .core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java: 221) at org .apache .catalina .core .ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java: 302) at org .apache .catalina .core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:78) at org .apache .catalina.core.StandardContext.filterStart(StandardContext.java:3635) at org.apache.catalina.core.StandardContext.start(StandardContext.java: 4222) at org .apache .catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760) at org.apache.catalina.core.ContainerBase.access $0(ContainerBase.java:744) at org.apache.catalina.core.ContainerBase $PrivilegedAddChild.run(ContainerBase.java:144) at java.security.AccessController.doPrivileged(Native Method) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java: 738) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java: 544) at org .apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java: 626) at org .apache .catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java: 488) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1138) at org .apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java: 311) at org .apache .catalina .util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java: 1022) at org.apache.catalina.core.StandardHost.start(StandardHost.java:736) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java: 1014) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java: 443) at org.apache.catalina.core.StandardService.start(StandardService.java: 448) at org.apache.catalina.core.StandardServer.start(StandardServer.java: 700) at org.apache.catalina.startup.Catalina.start(Catalina.java:552) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun .reflect .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun .reflect .DelegatingMethodAccessorImpl .invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:295) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun .reflect .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun .reflect .DelegatingMethodAccessorImpl .invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597)
Obtaining list of dynamic fields beind available in index
Hi there! How can we retrieve the complete list of dynamic fields, which are currently available in index? Thank you in advance! -- Eugene N Dzhurinsky pgpKftn1PiY0K.pgp Description: PGP signature
Re: how to search against multiple attributes in the index
For a starting point, this might be a good read - http://www.lucidimagination.com/search/document/f4d91628ced293bf/lucene_query_to_solr_query Cheers Avlesh On Fri, Nov 13, 2009 at 10:02 PM, javaxmlsoapdev vika...@yahoo.com wrote: I already did dive in before. I am using solrj API and SolrQuery object to build query. but its not clear/written how to build booleanQuery ANDing bunch of different attributes in the index. Any samples please? Avlesh Singh wrote: Dive in - http://wiki.apache.org/solr/Solrj Cheers Avlesh On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev vika...@yahoo.com wrote: I want to build AND search query against field1 AND field2 etc. Both these fields are stored in an index. I am migrating lucene code to Solr. Following is my existing lucene code BooleanQuery currentSearchingQuery = new BooleanQuery(); currentSearchingQuery.add(titleDescQuery,Occur.MUST); highlighter = new Highlighter( new QueryScorer(titleDescQuery)); TermQuery searchTechGroupQyery = new TermQuery(new Term (techGroup,searchForm.getTechGroup())); currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST); TermQuery searchProgramQyery = new TermQuery(new Term(techProgram,searchForm.getTechProgram())); currentSearchingQuery.add(searchProgramQyery, Occur.MUST); } What's the equivalent Solr code for above Luce code. Any samples would be appreciated. Thanks, -- View this message in context: http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339402.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Return doc if one or more query keywords occur multiple times
Anyone? Original-Nachricht Datum: Thu, 12 Nov 2009 13:29:20 +0100 Von: gistol...@gmx.de An: solr-user@lucene.apache.org Betreff: Return doc if one or more query keywords occur multiple times Hello, I am using Dismax request handler for queries: ...select?q=foo bar foo2 bar2qt=dismaxmm=2... With parameter mm=2 I configure that at least 2 of the optional clauses must match, regardless of how many clauses there are. But now I want change this to the following: List all documents that have at least 2 of the optional clauses OR that have at least one of the query terms (e.g. foo) more than once. Is this possible? Thanks, Gisto -- DSL-Preisknaller: DSL Komplettpakete von GMX schon für 16,99 Euro mtl.!* Hier klicken: http://portal.gmx.net/de/go/dsl02 -- Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 - sicherer, schneller und einfacher! http://portal.gmx.net/de/go/chbrowser
Re: Obtaining list of dynamic fields beind available in index
Luke Handler? - http://wiki.apache.org/solr/LukeRequestHandler /admin/luke?numTerms=0 Cheers Avlesh On Fri, Nov 13, 2009 at 10:05 PM, Eugene Dzhurinsky b...@redwerk.comwrote: Hi there! How can we retrieve the complete list of dynamic fields, which are currently available in index? Thank you in advance! -- Eugene N Dzhurinsky
Re: how to search against multiple attributes in the index
I think I found the answer. needed to read more API documentation :-) you can do it using solrQuery.setFilterQueries() and build AND queries of multiple parameters. Avlesh Singh wrote: For a starting point, this might be a good read - http://www.lucidimagination.com/search/document/f4d91628ced293bf/lucene_query_to_solr_query Cheers Avlesh On Fri, Nov 13, 2009 at 10:02 PM, javaxmlsoapdev vika...@yahoo.com wrote: I already did dive in before. I am using solrj API and SolrQuery object to build query. but its not clear/written how to build booleanQuery ANDing bunch of different attributes in the index. Any samples please? Avlesh Singh wrote: Dive in - http://wiki.apache.org/solr/Solrj Cheers Avlesh On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev vika...@yahoo.com wrote: I want to build AND search query against field1 AND field2 etc. Both these fields are stored in an index. I am migrating lucene code to Solr. Following is my existing lucene code BooleanQuery currentSearchingQuery = new BooleanQuery(); currentSearchingQuery.add(titleDescQuery,Occur.MUST); highlighter = new Highlighter( new QueryScorer(titleDescQuery)); TermQuery searchTechGroupQyery = new TermQuery(new Term (techGroup,searchForm.getTechGroup())); currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST); TermQuery searchProgramQyery = new TermQuery(new Term(techProgram,searchForm.getTechProgram())); currentSearchingQuery.add(searchProgramQyery, Occur.MUST); } What's the equivalent Solr code for above Luce code. Any samples would be appreciated. Thanks, -- View this message in context: http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339402.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339903.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to search against multiple attributes in the index
you can do it using solrQuery.setFilterQueries() and build AND queries of multiple parameters. Nope. You would need to read more - http://wiki.apache.org/solr/FilterQueryGuidance For your impatience, here's a quick starter - #and between two fields solrQuery.setQuery(+field1:foo +field2:bar); #or between two fields solrQuery.setQuery(field1:foo field2:bar); Cheers Avlesh On Fri, Nov 13, 2009 at 10:35 PM, javaxmlsoapdev vika...@yahoo.com wrote: I think I found the answer. needed to read more API documentation :-) you can do it using solrQuery.setFilterQueries() and build AND queries of multiple parameters. Avlesh Singh wrote: For a starting point, this might be a good read - http://www.lucidimagination.com/search/document/f4d91628ced293bf/lucene_query_to_solr_query Cheers Avlesh On Fri, Nov 13, 2009 at 10:02 PM, javaxmlsoapdev vika...@yahoo.com wrote: I already did dive in before. I am using solrj API and SolrQuery object to build query. but its not clear/written how to build booleanQuery ANDing bunch of different attributes in the index. Any samples please? Avlesh Singh wrote: Dive in - http://wiki.apache.org/solr/Solrj Cheers Avlesh On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev vika...@yahoo.com wrote: I want to build AND search query against field1 AND field2 etc. Both these fields are stored in an index. I am migrating lucene code to Solr. Following is my existing lucene code BooleanQuery currentSearchingQuery = new BooleanQuery(); currentSearchingQuery.add(titleDescQuery,Occur.MUST); highlighter = new Highlighter( new QueryScorer(titleDescQuery)); TermQuery searchTechGroupQyery = new TermQuery(new Term (techGroup,searchForm.getTechGroup())); currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST); TermQuery searchProgramQyery = new TermQuery(new Term(techProgram,searchForm.getTechProgram())); currentSearchingQuery.add(searchProgramQyery, Occur.MUST); } What's the equivalent Solr code for above Luce code. Any samples would be appreciated. Thanks, -- View this message in context: http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339402.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339903.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Reseting doc boosts
AFAIK there is no way to reset the doc boost. You would need to re-index. Moreover, there is no way to search by boost. Cheers Avlesh On Fri, Nov 13, 2009 at 8:17 PM, Jon Baer jonb...@gmail.com wrote: Hi, Im trying to figure out if there is an easy way to basically reset all of any doc boosts which you have made (for analytical purposes) ... for example if I run an index, gather report, doc boost on the report, and reset the boosts @ time of next index ... It would seem to be from just knowing how Lucene works that I would really need to reindex since its a attrib on the doc itself which would have to be modified, but there is no easy way to query for docs which have been boosted either. Any insight? Thanks. - Jon
Re: The status of Local/Geo/Spatial/Distance Solr
Hi Ian and Ryan, Thanks for the reply. Ian, I checked your pasted config, I am using the same one except the values of int name=startTier4/int int name=endTier25/int. Basically I use the set up specified at http://www.gissearch.com/localsolr. But there are still the same error I pasted in previous email. Ryan, I just checked out the lib lucene-spatial-2.9.1.jar Grant checked in today. Previously I built lucene-spatial-3.0-dev.jar from Lucene java code base directly. There is still no luck after the lib replacement. I do not think other lib matters in this case. On Fri, Nov 13, 2009 at 8:34 AM, Ian Ibbotson iani...@googlemail.comwrote: Heya.. could it be a problem with your solr config files? I seem to recall a change from the docs as they were to get this working.. I have... updateRequestProcessorChain processor class=com.pjaol.search.solr.update.LocalUpdateProcessorFactory str name=latFieldlat/str str name=lngFieldlng/str int name=startTier4/int int name=endTier25/int /processor processor class=solr.RunUpdateProcessorFactory / processor class=solr.LogUpdateProcessorFactory / /updateRequestProcessorChain searchComponent name=localsolr class=com.pjaol.search.solr.component.LocalSolrQueryComponent / requestHandler name=geo class=org.apache.solr.handler.component.SearchHandler arr name=components strlocalsolr/str strfacet/str strmlt/str strhighlight/str strdebug/str /arr /requestHandler That tie up with your config/ I'd bascially interpreted the current packaging as... What used to be locallucene has deffo merged into lucene-spatial in this build, no more locallucene. However, you still need to build localsolr for now... My solr jars are: commons-beanutils-1.8.0.jar commons-logging-1.1.1.jar localsolr-1.5.2-rc1.jar lucene-misc-2.9.1-ki-rc3.jar serializer-2.7.1.jar stax-1.2.0.jar xml-apis-1.3.04.jar commons-codec-1.4.jar commons-pool-1.5.3.jar log4j-1.2.13.jar lucene-queries-2.9.1-ki-rc3.jar slf4j-api-1.5.5.jarstax-api-1.0.jar xpp3-1.1.3.4.O.jar commons-dbcp-1.2.2.jargeoapi-nogenerics-2.1M2.jar lucene-analyzers-2.9.1-ki-rc3.jarlucene-snowball-2.9.1-ki-rc3.jar slf4j-log4j12-1.5.5.jarstax-utils-20040917.jar commons-fileupload-1.2.1.jar geronimo-stax-api_1.0_spec-1.0.1.jar lucene-core-2.9.1-ki-rc3.jar lucene-spatial-2.9.1-ki-rc3.jar solr-commons-csv-1.4.0-ki-rc1.jar woodstox-wstx-asl-3.2.7.jar commons-httpclient-3.1.jargt2-referencing-2.3.1.jar lucene-highlighter-2.9.1-ki-rc3.jar lucene-spellchecker-2.9.1-ki-rc3.jar solr-core-1.4.0-ki-rc1.jar xalan-2.7.1.jar commons-io-1.3.2.jar jsr108-0.01.jar lucene-memory-2.9.1-ki-rc3.jar org.codehaus.woodstox-wstx-asl-3.2.7.jar solr-solrj-1.4.0-ki-rc1.jar xercesImpl-2.9.1.jar Sorry for dumping the info at you... hope it helps tho Ian. 2009/11/13 Bertie Shen bertie.s...@gmail.com: Hey, I am interested in using LocalSolr to go Local/Geo/Spatial/Distance search. But the wiki of LocalSolr(http://wiki.apache.org/solr/LocalSolr) points to pretty old documentation. Is there a better document I refer to for the setting up of LocalSolr and some performance analysis? Just sync-ed Solr codebase and found LocalSolr is still NOT in the contrib package. Do we have a plan to incorporate it? I download a LocalSolr lib localsolr-1.5.jar from http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ and notice that the namespace is com.pjaol.search. blah blah, while LocalLucene package is in Lucene codebase and the package name is org.apache.lucene.spatial blah blah. But localsolr-1.5.jar from from http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ does not work with lucene-spatial-3.0-dev.jar I build from Lucene codebase directly. After I restart tomcat, I could not load solr admin page. The error is as follows. It looks solr is still looking for old named classes. Thanks. HTTP Status 500 - Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: abortOnConfigurationErrorfalse/abortOnConfigurationError in null - java.lang.NoClassDefFoundError: com/pjaol/search/geo/utils/DistanceFilter at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498) at
Re: Question about the message Indexing failed. Rolled back all changes.
The process initially completes with: str name=Full Dump Started2009-11-13 09:40:46/str str name=Indexing completed. Added/Updated: 20 documents. Deleted 0 documents./str ...but then it fails with: str name=Full Dump Started2009-11-13 09:40:46/str str name=Indexing failed. Rolled back all changes./str str name=Committed2009-11-13 09:41:10/str str name=Optimized2009-11-13 09:41:10/str str name=Rolledback2009-11-13 09:41:10/str I think it may have something to do with this, which I found by using the DataImport.jsp: (Thread.java:636) Caused by: java.sql.SQLException: Illegal value for setFetchSize(). at com.mysql.jdbc.Statement.setFetchSize(Statement.java:1864) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:242) ... 28 more/str -- View this message in context: http://old.nabble.com/Question-about-the-message-%22Indexing-failed.-Rolled-back-all--changes.%22-tp26242714p26340360.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to search against multiple attributes in the index
great. thanks. that was helpful Avlesh Singh wrote: you can do it using solrQuery.setFilterQueries() and build AND queries of multiple parameters. Nope. You would need to read more - http://wiki.apache.org/solr/FilterQueryGuidance For your impatience, here's a quick starter - #and between two fields solrQuery.setQuery(+field1:foo +field2:bar); #or between two fields solrQuery.setQuery(field1:foo field2:bar); Cheers Avlesh On Fri, Nov 13, 2009 at 10:35 PM, javaxmlsoapdev vika...@yahoo.com wrote: I think I found the answer. needed to read more API documentation :-) you can do it using solrQuery.setFilterQueries() and build AND queries of multiple parameters. Avlesh Singh wrote: For a starting point, this might be a good read - http://www.lucidimagination.com/search/document/f4d91628ced293bf/lucene_query_to_solr_query Cheers Avlesh On Fri, Nov 13, 2009 at 10:02 PM, javaxmlsoapdev vika...@yahoo.com wrote: I already did dive in before. I am using solrj API and SolrQuery object to build query. but its not clear/written how to build booleanQuery ANDing bunch of different attributes in the index. Any samples please? Avlesh Singh wrote: Dive in - http://wiki.apache.org/solr/Solrj Cheers Avlesh On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev vika...@yahoo.com wrote: I want to build AND search query against field1 AND field2 etc. Both these fields are stored in an index. I am migrating lucene code to Solr. Following is my existing lucene code BooleanQuery currentSearchingQuery = new BooleanQuery(); currentSearchingQuery.add(titleDescQuery,Occur.MUST); highlighter = new Highlighter( new QueryScorer(titleDescQuery)); TermQuery searchTechGroupQyery = new TermQuery(new Term (techGroup,searchForm.getTechGroup())); currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST); TermQuery searchProgramQyery = new TermQuery(new Term(techProgram,searchForm.getTechProgram())); currentSearchingQuery.add(searchProgramQyery, Occur.MUST); } What's the equivalent Solr code for above Luce code. Any samples would be appreciated. Thanks, -- View this message in context: http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339402.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339903.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26340776.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: scanning folders recursively / Tika
Peter - if you want, download the code from Lucene in Action 1 or 2, it has index traversal and indexing. 2nd edition uses Tika. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Peter Gabriel zarato...@gmx.net To: solr-user@lucene.apache.org Sent: Fri, November 13, 2009 10:26:48 AM Subject: scanning folders recursively / Tika Hello. I am on work with Tika 0.5 and want to scan a folder system about 10GB. Is there a comfortable way to scan folders recursively with an existing class or have i to write it myself? Any tips for best practise? Greetings, Peter -- Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 - sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser
Re: Customizing Field Score (Multivalued Field)
On Thu, Nov 12, 2009 at 3:00 PM, Stephen Duncan Jr stephen.dun...@gmail.com wrote: On Thu, Nov 12, 2009 at 2:54 PM, Chris Hostetter hossman_luc...@fucit.org wrote: oh man, so you were parsing the Stored field values of every matching doc at query time? ouch. Assuming i'm understanding your goal, the conventional way to solve this type of problem is payloads ... you'll find lots of discussion on it in the various Lucene mailing lists, and if you look online Michael Busch has various slides that talk about using them. they let you say things like in this document, at this postion of field 'x' the word 'microsoft' is worth 37.4, but at this other position (or in this other document) 'microsoft' is only worth 17.2 The simplest way to use them in Solr (as i understand it) is to use soemthing like the DelimitedPayloadTokenFilterFactory when indexing, and then write yourself a simple little custom QParser that generates a BoostingTermQuery on your field. should be a lot simpler to implement then the Query you are describing, and much faster. -Hoss Thanks. I finally got around to looking at this again today and was looking at a similar path, so I appreciate the confirmation. -- Stephen Duncan Jr www.stephenduncanjr.com For posterity, here's the rest of what I discovered trying to implement this: You'll need to write a PayloadSimilarity as described here: http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/(here's my updated version due to deprecation of the method mentioned in that article): @Override public float scorePayload( int docId, String fieldName, int start, int end, byte[] payload, int offset, int length) { // can ignore length here, because we know it is encoded as 4 bytes return PayloadHelper.decodeFloat(payload, offset); } You'll need to register that similarity in your Solr schema.xml (was hard to figure out, as I didn't realize that the similarity has to be applied globally to the writer/search used generally, even though I only care about payloads on one field, so I wasted time trying to figure out how to plug in the similarity in my query parser). You'll want to use the payloads type or something based on it that's in the example schema.xml. The latest and greatest query type to use is PayloadTermQuery. I use it in my custom query parser class, overriding getFieldQuery, checking for my field name, and then: return new PayloadTermQuery(new Term(field, queryText), new AveragePayloadFunction()); Due to the global nature of the Similarity, I guess you'd have to modify it to look at the field name and base behavior on that if you wanted different kinds of payloads on different fields in one schema. Also, whereas in my original implementation, I controlled the score completely, and therefore if I set a score of 0.8, the doc came back as score of 0.8, in this technique the payload is just used as a boost/addition to the score, so my scores came out higher than before. Since they're still in the same relative order, that still satisfied my needs, but did require updating my test cases. -- Stephen Duncan Jr www.stephenduncanjr.com
Making search results more stable as index is updated
If documents are being added to and removed from an index (and commits are being issued) while a user is searching, then the experience of paging through search results using the obvious solr mechanism (start=100Rows=10) may be disorienting for the user. For one example, by the time the user clicks next page for the first time, a document that they saw on page 1 may have been pushed onto page 2. (This may be especially pronounced if docs are being sorted by date.) I'm wondering what are the best options available for presenting a more stable set of search results to users in such cases. The obvious candidates to me are: #1: Cache results in the user session of the web tier. (In particular, maybe just cache the uniqueKey of each maching document.) Pro: Simple Con: May require capping the # of search results in order to make the initial query (which now has Solr numRows param web pageSize) fast enough. For example, maybe it's only practical to cache the first 500 records. #2: Create some kind of per-user results cache in Solr. (One simple implementation idea: You could make your Solr search handler take a userid parameter, and cache each user's last search in a special per-user results cache. You then also provide an API that says, give me records n through m of userid #1334's last search. For your subsequent queries, you consult the latter API rather than redoing your search. Because Lucene docids are unstable across commits and such, I think this means caching the uniqueKey of each maching document. This in turn means looking up the uniqueKey of each maching document at search time. It also means you can't use the existing Solr caches, but need to make a new one.) Pro: Maybe faster than #1?? (Saves on data transfer between Solr and web tier, at least during the initial query.) Con: More complicated than #1. #3: Use filter queries to attempt to make your subsequent queries (for page 2, page 3, etc.) return results consistent with your original query. (One idea is to give each document a docAddedTimestamp field, which would have precision down to the millisecond or something. On your initial query, you could note the current time, T. Then for the subsequent queries you add a filter query for docAddedTimestamp=T. Hopefully with a trie date field this would be fast. This should hopefully keep any docs newly added after T from showing up in the user's search results as they page through them. However, it won't necessarily protect you from docs that were *reindexed* (i.e. re-add a doc with the same uniqueKey as an existing doc) or docs that were deleted.) Pro: Doesn't require a new cache, and no cap on # of search results Con: Maybe doesn't provide total stability. Any feedback on these options? Are there other ideas to consider? Thanks, Chris
Re: having solr generate and execute other related queries automatically
tpunder wrote: Maybe I misunderstand what you are trying to do (or the facet.query feature). If I did an initial query on my data-set that left me with the following questions: ... http://localhost:8983/solr/select/?q=*%3A*start=0rows=0facet=onfacet.query=brand_id:1facet.query=brand_id:2facet.query=+%2Bbrand_id:5+%2Bcategory_id:4051 ... Thanks for the reply Tim. I can't provide you with an example as I dont have anything prototyped as yet; I am still trying to work things thru in my head. The +20 queries would allow us to suggest other possibilities to users in a facet-like way (but not returning the exact same info as facets). With the technique you mention I would have to specify the list of query params for each facet.query. That would work for relatively simple queries. Unfortunately, the queries I was looking at doing would be fairly long (say hundreds of AND/OR statements). That said, I dont think solr would be able to handle the query size I would end up with (at least not efficiently), because the resulting query would consist of thousands of AND/OR statements (isnt there a limit of sorts in Solr?) I think that my best bet would be to extend the SearchComponent and perform the additional query generation and execution in the extension. That approach should also allow me to have access to the facet values that the base query would generate (which would allow me to generate and execute the other queries). thx again. -- View this message in context: http://old.nabble.com/having-solr-generate-and-execute-other-related-queries-automatically-tp26327032p26343409.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Multicore solr.xml schemaName parameter not being recognized
: On the CoreAdmin wiki page. thanks FWIW: The only time the string schemaName appears on the CoreAdmin wiki page is when it mentions that solr.core.schemaName is a property that is available to cores by default. the documentation for core specificly says... The core tag accepts the following attributes: ... * schema - The schema file name for a given core. The default is ... So the documentation is correct. -Hoss
Re: Solr 1.3 query and index perf tank during optimize
Mark Miller markrmil...@gmail.com wrote on 11/12/2009 07:18:03 PM: Ah, the pains of optimization. Its kind of just how it is. One solution is to use two boxes and replication - optimize on the master, and then queries only hit the slave. Out of reach for some though, and adds many complications. Yes, in my use case 2 boxes isn't a great option. Another kind of option is to use the partial optimize feature: optimize maxOptimizeSegments=5/ Using this, you can optimize down to n segments and take a shorter hit each time. Is this a 1.4 feature? I'm planning to migrate to 1.4, but it'll take a while since I have to port custom code forward, including a query parser. Also, if optimizing is so painful, you might lower the merge factor amortize that pain better. Thats another way to slowly get there - if you lower the merge factor, as merging takes place, the new merge factor will be respected, and semgents will merge down. A merge factor of 2 (the lowest) will make it so you only ever have 2 segments. Sometimes that works reasonably well - you could try 3-6 or something as well. Then when you do your partial optimizes (and eventually a full optimize perhaps), you want have so far to go. So this will slow down indexing but speed up optimize somewhat? Unfortunately right now I lose docs I'm indexing, as well slowing searching to a crawl. Ugh. I've got plenty of CPU horsepower. This is where having the ability to optimize on another filesystem would be useful. Would it perhaps make sense to set up a master/slave on the same machine? Then I suppose I can have an index being optimized that might not clobber the search. Would new indexed items still be dropped on the floor? Thanks, Jerry
Re: Stop solr without losing documents
: which documents have been updated before a successful commit. Now : stopping solr is as easy as kill -9. please don't kill -9 ... it's grossly overkill, and doesn't give your servlet container a fair chance to cleanthings up. A lot of work has been done to make Lucene indexes robust to hard terminations of the JVM (or physical machine) but there's no reason to go out of your way to try and stab it in the heart when you could just shut it down cleanly. that's not to say your appraoch isn't a good one -- if you only have one client sending updates/commits then having it keep track of what was indexed prior to the lasts successful commit is a viable way to dela with what happens if solr stops responding (either because you shut it down, or because it crashed for some other reason). Alternately, you could take advantage of the enabled feature from your client (just have it test the enabled url ever N updates or so) and when it sees that you have disabled the port it can send one last commit and then stop sending updates until it sees the enabled URL work againg -- as soon as you see the updates stop, you can safely shutdown hte port. -Hoss
Re: Solr 1.3 query and index perf tank during optimize
ysee...@gmail.com wrote on 11/13/2009 09:06:29 AM: On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless luc...@mikemccandless.com wrote: I think we sorely need a Directory impl that down-prioritizes IO performed by merging. It's unclear if this case is caused by IO contention, or the OS cache of the hot parts of the index being lost by that extra IO activity. Of course the latter would lead to the former, but without that OS disk cache, the searches may be too slow even w/o the extra IO. Is there a way to configure things so that search and new data indexing get cached under the control of solr/lucene? Then we'd be less reliant on the OS behavior. Alternatively if there are OS params I can tweak (RHEL/Centos 5) to solve the problem, that's an option for me. Would you know if 1.4 is better behaved than 1.3? Thanks, Jerry
Re: Solr 1.3 query and index perf tank during optimize
ysee...@gmail.com wrote on 11/13/2009 09:06:29 AM: On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless luc...@mikemccandless.com wrote: I think we sorely need a Directory impl that down-prioritizes IO performed by merging. It's unclear if this case is caused by IO contention, or the OS cache of the hot parts of the index being lost by that extra IO activity. Of course the latter would lead to the former, but without that OS disk cache, the searches may be too slow even w/o the extra IO. On linux there's the ionice command to try to throttle processes. Would it be possible and make sense to have a separate process for optimizing that had ionice set it to idle? Can the index be shared this way? Thanks, Jerry
Re: NPE when trying to view a specific document via Luke
: I'm seeing this stack trace when I try to view a specific document, e.g. : /admin/luke?id=1 but luke appears to be working correctly when I just FWIW: I was able to reproduce this using the example setup (i picked a doc id at random) suspecting it was a bug in docFreq when using multiple segments, i tried optimizing and still got an NPE, but then my entire computer crashed (unrelated) before i could look any deeper. I have to go out now, but i'll try to dig into this more when i get back ... given where it happens in the code, it seems like a potentially serious lucene bug (either that: or LukeRequestHandler is doing something it really shouldn't be, but i can't imagine how it could trigger an NPE that deep in the lucene code) : view /admin/luke. Does this look familiar to anyone? Our sysadmin just : upgraded us to the 1.4 release, I'm not sure if this occurred before : that. : : Thanks, : Jake : : 1. java.lang.NullPointerException : 2. at org.apache.lucene.index.TermBuffer.set(TermBuffer.java:95) : 3. at org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:158) : 4. at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:232) : 5. at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:179) : 6. at org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:975) : 7. at org.apache.lucene.index.DirectoryReader.docFreq(DirectoryReader.java:627) : 8. at org.apache.solr.search.SolrIndexReader.docFreq(SolrIndexReader.java:308) : 9. at org.apache.solr.handler.admin.LukeRequestHandler.getDocumentFieldsInfo(LukeRequestHandler.java:248) : 10.at org.apache.solr.handler.admin.LukeRequestHandler.handleRequestBody(LukeRequestHandler.java:124) : 11.at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) : 12.at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) : 13.at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) : 14.at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) : 15.at com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain.java:76) : 16.at com.caucho.server.cache.CacheFilterChain.doFilter(CacheFilterChain.java:158) : 17.at com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:178) : 18.at com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:241) : 19.at com.caucho.server.hmux.HmuxRequest.handleRequest(HmuxRequest.java:435) : 20.at com.caucho.server.port.TcpConnection.run(TcpConnection.java:586) : 21.at com.caucho.util.ThreadPool$Item.runTasks(ThreadPool.java:690) : 22.at com.caucho.util.ThreadPool$Item.run(ThreadPool.java:612) : 23.at java.lang.Thread.run(Thread.java:619) : 24. : 25. Date: Fri, 13 Nov 2009 02:19:54 GMT : 26. Server: Apache/2.2.3 (Red Hat) : 27. Cache-Control: no-cache, no-store : 28. Pragma: no-cache : 29. Expires: Sat, 01 Jan 2000 01:00:00 GMT : 30. Content-Type: text/html; charset=UTF-8 : 31. Vary: Accept-Encoding,User-Agent : 32. Content-Encoding: gzip : 33. Content-Length: 1066 : 34. Connection: close : 35. : -Hoss
Re: Request assistance with distributed search multi shard/core setup and configuration
DS requires a bunch of shard names in the url. That's all. Note that a ds does not use the data of the solr you call. You can create an entry point for your distributed search by adding a new requestHandler element in solrconfig.xml. You would add the shard list parameter to the defaults list. Do not have it call the same requesthandler path- you'll get an infinite loop. On Tue, Nov 10, 2009 at 6:44 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hm, I don't follow. You don't need to create a custom (request) handler to make use of Solr's distributed search. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Turner, Robbin J robbin.j.tur...@boeing.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Tue, November 10, 2009 6:41:32 PM Subject: RE: Request assistance with distributed search multi shard/core setup and configuration Thanks, I had already read through this url. I guess my request was is there a way to setup something that is already part of solr itself to pass the URL[shard...] then having create a custom handler. thanks -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Tuesday, November 10, 2009 6:09 PM To: solr-user@lucene.apache.org Subject: Re: Request assistance with distributed search multi shard/core setup and configuration Right, that's http://wiki.apache.org/solr/DistributedSearch Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Turner, Robbin J To: solr-user@lucene.apache.org Sent: Tue, November 10, 2009 6:05:19 PM Subject: RE: Request assistance with distributed search multi shard/core setup and configuration I've already done the single Solr, that's why my request. I read on some site that there is a way to setup the configuration so I can send a query to one solr instance and have it pass it on or distribute it across all the instances? Btw, thanks for the quick reply. RJ -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Tuesday, November 10, 2009 6:02 PM To: solr-user@lucene.apache.org Subject: Re: Request assistance with distributed search multi shard/core setup and configuration RJ, You may want to take a simpler step - single Solr core (no solr.xml needed) per machine. Then distributed search really only requires that you specify shard URLs in the URL of the search requests. In practice/production you rarely benefit from distributed search against multiple cores on the same server anyway. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR From: Turner, Robbin J To: solr-user@lucene.apache.org Sent: Tue, November 10, 2009 5:58:52 PM Subject: Request assistance with distributed search multi shard/core setup and configuration I've been looking through all the documentation. I've set up a single solr instance, and one multicore instance. If someone would be willing to share some configuration examples and/or advise for setting up solr for distributing the search, I would really appreciate it. I've read that there is a way to do it, but most of the current documentation doesn't provide enough example on what to do with solr.xml, and the solrconfig.xml. Also, I'm using tomcat 6 for the servlet container. I deployed the solr 1.4.0 released yesterday. Thanks RJ -- Lance Norskog goks...@gmail.com
Re: NPE when trying to view a specific document via Luke
On Fri, Nov 13, 2009 at 5:41 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : I'm seeing this stack trace when I try to view a specific document, e.g. : /admin/luke?id=1 but luke appears to be working correctly when I just FWIW: I was able to reproduce this using the example setup (i picked a doc id at random) suspecting it was a bug in docFreq Probably just a null being passed in the text part of the term. I bet Luke expects all field values to be strings, but some are binary. -Yonik http://www.lucidimagination.com
Fwd: Lucene MMAP Usage with Solr
Folks, I am trying to get Lucene MMAP to work in solr. I am assuming that when I configure MMAP the entire index will be loaded into RAM. Is that the right assumption ? I have tried the following ways for using MMAP: Option 1. Using the solr config below for MMAP configuration -Dorg.apache.lucene.FSDirectory.class=org.apache.lucene.store.MMapDirectory With this config, when I start solr with a 30G index, I expected that the RAM usage should go up, but it did not. Option 2. By Code Change I made the following code change : Changed org.apache.solr.core.StandardDirectoryFactory to use MMAP instead of FSDirectory. Code snippet pasted below. Could you help me to understand if these are the right way to use MMAP? Thanks much /ST. Code SNippet for Option 2: package org.apache.solr.core; /** * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the License); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * *http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an AS IS BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import java.io.File; import java.io.IOException; import org.apache.lucene.store.Directory; import org.apache.lucene.store.MMapDirectory; /** * Directory provider which mimics original Solr FSDirectory based behavior. * */ public class StandardDirectoryFactory extends DirectoryFactory { public Directory open(String path) throws IOException { return MMapDirectory.open(new File(path)); } }
Re: any docs on solr.EdgeNGramFilterFactory?
Thanks for the link - there doesn't seem a be a fix version specified, so I guess this will not officially ship with lucene 2.9? -Peter On Wed, Nov 11, 2009 at 10:36 PM, Robert Muir rcm...@gmail.com wrote: Peter, here is a project that does this: http://issues.apache.org/jira/browse/LUCENE-1488 That's kind of interesting - in general can I build a custom tokenizer from existing tokenizers that treats different parts of the input differently based on the utf-8 range of the characters? E.g. use a porter stemmer for stretches of Latin text and n-gram or something else for CJK? -Peter On Tue, Nov 10, 2009 at 9:21 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Yes, that's the n-gram one. I believe the existing CJK one in Lucene is really just an n-gram tokenizer, so no different than the normal n-gram tokenizer. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Peter Wolanin peter.wola...@acquia.com To: solr-user@lucene.apache.org Sent: Tue, November 10, 2009 7:34:37 PM Subject: Re: any docs on solr.EdgeNGramFilterFactory? So, this is the normal N-gram one? NGramTokenizerFactory Digging deeper - there are actualy CJK and Chinese tokenizers in the Solr codebase: http://lucene.apache.org/solr/api/org/apache/solr/analysis/CJKTokenizerFactory.html http://lucene.apache.org/solr/api/org/apache/solr/analysis/ChineseTokenizerFactory.html The CJK one uses the lucene CJKTokenizer http://lucene.apache.org/java/2_9_1/api/contrib-analyzers/org/apache/lucene/analysis/cjk/CJKTokenizer.html and there seems to be another one even that no one has wrapped into Solr: http://lucene.apache.org/java/2_9_1/api/contrib-smartcn/org/apache/lucene/analysis/cn/smart/package-summary.html So seems like the existing options are a little better than I thought, though it would be nice to have some docs on properly configuring these. -Peter On Tue, Nov 10, 2009 at 6:05 PM, Otis Gospodnetic wrote: Peter, For CJK and n-grams, I think you don't want the *Edge* n-grams, but just n-grams. Before you take the n-gram route, you may want to look at the smart Chinese analyzer in Lucene contrib (I think it works only for Simplified Chinese) and Sen (on java.net). I also spotted a Korean analyzer in the wild a few months back. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Peter Wolanin To: solr-user@lucene.apache.org Sent: Tue, November 10, 2009 4:06:52 PM Subject: any docs on solr.EdgeNGramFilterFactory? This fairly recent blog post: http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ describes the use of the solr.EdgeNGramFilterFactory as the tokenizer for the index. I don't see any mention of that tokenizer on the Solr wiki - is it just waiting to be added, or is there any other documentation in addition to the blog post? In particular, there was a thread last year about using an N-gram tokenizer to enable reasonable (if not ideal) searching of CJK text, so I'd be curious to know how people are configuring their schema (with this tokenizer?) for that use case. Thanks, Peter -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com -- Robert Muir rcm...@gmail.com -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Reseting doc boosts
I'm not sure this is what you are looking for, but there is FieldNormModifier tool in Lucene. Koji -- http://www.rondhuit.com/en/ Avlesh Singh wrote: AFAIK there is no way to reset the doc boost. You would need to re-index. Moreover, there is no way to search by boost. Cheers Avlesh On Fri, Nov 13, 2009 at 8:17 PM, Jon Baer jonb...@gmail.com wrote: Hi, Im trying to figure out if there is an easy way to basically reset all of any doc boosts which you have made (for analytical purposes) ... for example if I run an index, gather report, doc boost on the report, and reset the boosts @ time of next index ... It would seem to be from just knowing how Lucene works that I would really need to reindex since its a attrib on the doc itself which would have to be modified, but there is no easy way to query for docs which have been boosted either. Any insight? Thanks. - Jon
Re: any docs on solr.EdgeNGramFilterFactory?
ah, thanks, i'll tentatively set one in the future, but definitely not 2.9.x more just to show you the idea, you can do different things depending on different runs of writing systems in text. but it doesnt solve everything: you only know its Latin script, not english, so you can't safely automatically do anything like stemming. say your content is only chinese, english: the analyzer won't know your latin script text is english, versus say, french from the unicode, so it won't stem it. but that analyzer will lowercase it. it won't know if your ideographs are chinese or japanese, but it will use n-gram tokenization, you get the drift. in that impl, it puts the script code in the flags so downstream you could do something like stemming if you happen to know more than is evident from the unicode. On Fri, Nov 13, 2009 at 6:23 PM, Peter Wolanin peter.wola...@acquia.comwrote: Thanks for the link - there doesn't seem a be a fix version specified, so I guess this will not officially ship with lucene 2.9? -Peter On Wed, Nov 11, 2009 at 10:36 PM, Robert Muir rcm...@gmail.com wrote: Peter, here is a project that does this: http://issues.apache.org/jira/browse/LUCENE-1488 That's kind of interesting - in general can I build a custom tokenizer from existing tokenizers that treats different parts of the input differently based on the utf-8 range of the characters? E.g. use a porter stemmer for stretches of Latin text and n-gram or something else for CJK? -Peter On Tue, Nov 10, 2009 at 9:21 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Yes, that's the n-gram one. I believe the existing CJK one in Lucene is really just an n-gram tokenizer, so no different than the normal n-gram tokenizer. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Peter Wolanin peter.wola...@acquia.com To: solr-user@lucene.apache.org Sent: Tue, November 10, 2009 7:34:37 PM Subject: Re: any docs on solr.EdgeNGramFilterFactory? So, this is the normal N-gram one? NGramTokenizerFactory Digging deeper - there are actualy CJK and Chinese tokenizers in the Solr codebase: http://lucene.apache.org/solr/api/org/apache/solr/analysis/CJKTokenizerFactory.html http://lucene.apache.org/solr/api/org/apache/solr/analysis/ChineseTokenizerFactory.html The CJK one uses the lucene CJKTokenizer http://lucene.apache.org/java/2_9_1/api/contrib-analyzers/org/apache/lucene/analysis/cjk/CJKTokenizer.html and there seems to be another one even that no one has wrapped into Solr: http://lucene.apache.org/java/2_9_1/api/contrib-smartcn/org/apache/lucene/analysis/cn/smart/package-summary.html So seems like the existing options are a little better than I thought, though it would be nice to have some docs on properly configuring these. -Peter On Tue, Nov 10, 2009 at 6:05 PM, Otis Gospodnetic wrote: Peter, For CJK and n-grams, I think you don't want the *Edge* n-grams, but just n-grams. Before you take the n-gram route, you may want to look at the smart Chinese analyzer in Lucene contrib (I think it works only for Simplified Chinese) and Sen (on java.net). I also spotted a Korean analyzer in the wild a few months back. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Peter Wolanin To: solr-user@lucene.apache.org Sent: Tue, November 10, 2009 4:06:52 PM Subject: any docs on solr.EdgeNGramFilterFactory? This fairly recent blog post: http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ describes the use of the solr.EdgeNGramFilterFactory as the tokenizer for the index. I don't see any mention of that tokenizer on the Solr wiki - is it just waiting to be added, or is there any other documentation in addition to the blog post? In particular, there was a thread last year about using an N-gram tokenizer to enable reasonable (if not ideal) searching of CJK text, so I'd be curious to know how people are configuring their schema (with this tokenizer?) for that use case. Thanks, Peter -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com -- Robert Muir rcm...@gmail.com -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com -- Robert Muir
Re: NPE when trying to view a specific document via Luke
: FWIW: I was able to reproduce this using the example setup (i picked a : doc id at random) �suspecting it was a bug in docFreq : : Probably just a null being passed in the text part of the term. : I bet Luke expects all field values to be strings, but some are binary. I'm not sure i follow you ... i think you saying that naive assumptions in the LukeRequestHandler could result in it asking for the docFreq of a term that has a null string value because some field types are binary, except that... 1) 1.3 didn't have this problem 2) LukeRequestHandler.getDocumentFieldsInfo didn't change from 1.3 to 1.4 I tied to reproduce this in 1.4 using an index/configs created with 1.3, but i got a *different* NPE when loading this url... http://localhost:8983/solr/admin/luke?id=SP2514N SEVERE: java.lang.NullPointerException at org.apache.solr.util.NumberUtils.SortableStr2int(NumberUtils.java:127) at org.apache.solr.util.NumberUtils.SortableStr2float(NumberUtils.java:83) at org.apache.solr.util.NumberUtils.SortableStr2floatStr(NumberUtils.java:89) at org.apache.solr.schema.SortableFloatField.indexedToReadable(SortableFloatField.java:62) at org.apache.solr.schema.SortableFloatField.toExternal(SortableFloatField.java:53) at org.apache.solr.handler.admin.LukeRequestHandler.getDocumentFieldsInfo(LukeRequestHandler.java:245) ...all three of these stack traces seem to suggest that some imple of Fieldable.stringValue in 2.9 is returning null in cases where it returned *something* else in the 2.4-dev jar used by Solr 1.3. That seems like it could have other impacts besides LukeRequestHandler. -Hoss
Re: NPE when trying to view a specific document via Luke
: I tied to reproduce this in 1.4 using an index/configs created with 1.3, : but i got a *different* NPE when loading this url... I should have tried a simpler test ... iget NPE's just trying to execute a simple search for *:* when i try to use the example index built in 1.3 (with the 1.3 configs) in 1.4. same (apparent) cause: code is attempting to deref a string returned by Fieldable.stringValue() which is null... java.lang.NullPointerException at org.apache.solr.schema.SortableIntField.write(SortableIntField.java:72) at org.apache.solr.schema.SchemaField.write(SchemaField.java:108) at org.apache.solr.request.XMLWriter.writeDoc(XMLWriter.java:311) at org.apache.solr.request.XMLWriter$3.writeDocs(XMLWriter.java:483) at org.apache.solr.request.XMLWriter.writeDocuments(XMLWriter.java:420) at org.apache.solr.request.XMLWriter.writeDocList(XMLWriter.java:457) at org.apache.solr.request.XMLWriter.writeVal(XMLWriter.java:520) at org.apache.solr.request.XMLWriter.writeResponse(XMLWriter.java:130) at org.apache.solr.request.XMLResponseWriter.write(XMLResponseWriter.java:34) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:325) This really does smell like something in Lucene changed behavior drasticly. I've been looking at diffs from java/tr...@691741 and java/tags/lucene_2_9_1 but nothing jumps out at me that would explain this. If nothing else, i'm opening a solr issue... -Hoss
StreamingUpdateSolrServer commit?
When does StreamingUpdateSolrServer commit? I know there's a threshhold and thread pool as params but I don't see a commit timeout. Do I have to manage this myself?
Re: exclude some fields from copying dynamic fields | schema.xml
There is no direct way. Let's say you have a nocopy_s and you do not want a copy nocopy_str_s. This might work: declare nocopy_str_s as a field and make it not indexed and not stored. I don't know if this will work. It requires two overrides to work: 1) that declaring a field name that matches a wildcard will override the default wildcard rule, and 2) that stored=false indexed=false works. On Fri, Nov 13, 2009 at 3:23 AM, Vicky_Dev vikrantv_shirbh...@yahoo.co.in wrote: Hi, we are using the following entry in schema.xml to make a copy of one type of dynamic field to another : copyField source=*_s dest=*_str_s / Is it possible to exclude some fields from copying. We are using Solr1.3 ~Vikrant -- View this message in context: http://old.nabble.com/exclude-some-fields-from-copying-dynamic-fields-%7C-schema.xml-tp26335109p26335109.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Re: Reseting doc boosts
This looks exactly like what I was needing ... this looks like it would be a great tool / addition to Solr web interface but it looks like it only takes (Directory d, Similarity s) (vs. subset collection of documents) ... Either way great find, thanks for your help ... - Jon On Nov 13, 2009, at 6:40 PM, Koji Sekiguchi wrote: I'm not sure this is what you are looking for, but there is FieldNormModifier tool in Lucene. Koji -- http://www.rondhuit.com/en/ Avlesh Singh wrote: AFAIK there is no way to reset the doc boost. You would need to re-index. Moreover, there is no way to search by boost. Cheers Avlesh On Fri, Nov 13, 2009 at 8:17 PM, Jon Baer jonb...@gmail.com wrote: Hi, Im trying to figure out if there is an easy way to basically reset all of any doc boosts which you have made (for analytical purposes) ... for example if I run an index, gather report, doc boost on the report, and reset the boosts @ time of next index ... It would seem to be from just knowing how Lucene works that I would really need to reindex since its a attrib on the doc itself which would have to be modified, but there is no easy way to query for docs which have been boosted either. Any insight? Thanks. - Jon
Re: Making search results more stable as index is updated
This is one case where permanent caches are interesting. Another case is highlighting: in some cases highlighting takes a lot of work, and this work is not cached. It might be a cleaner architecture to have session-maintaining code in a separate front-end app, and leave Solr session-free. On Fri, Nov 13, 2009 at 12:48 PM, Chris Harris rygu...@gmail.com wrote: If documents are being added to and removed from an index (and commits are being issued) while a user is searching, then the experience of paging through search results using the obvious solr mechanism (start=100Rows=10) may be disorienting for the user. For one example, by the time the user clicks next page for the first time, a document that they saw on page 1 may have been pushed onto page 2. (This may be especially pronounced if docs are being sorted by date.) I'm wondering what are the best options available for presenting a more stable set of search results to users in such cases. The obvious candidates to me are: #1: Cache results in the user session of the web tier. (In particular, maybe just cache the uniqueKey of each maching document.) Pro: Simple Con: May require capping the # of search results in order to make the initial query (which now has Solr numRows param web pageSize) fast enough. For example, maybe it's only practical to cache the first 500 records. #2: Create some kind of per-user results cache in Solr. (One simple implementation idea: You could make your Solr search handler take a userid parameter, and cache each user's last search in a special per-user results cache. You then also provide an API that says, give me records n through m of userid #1334's last search. For your subsequent queries, you consult the latter API rather than redoing your search. Because Lucene docids are unstable across commits and such, I think this means caching the uniqueKey of each maching document. This in turn means looking up the uniqueKey of each maching document at search time. It also means you can't use the existing Solr caches, but need to make a new one.) Pro: Maybe faster than #1?? (Saves on data transfer between Solr and web tier, at least during the initial query.) Con: More complicated than #1. #3: Use filter queries to attempt to make your subsequent queries (for page 2, page 3, etc.) return results consistent with your original query. (One idea is to give each document a docAddedTimestamp field, which would have precision down to the millisecond or something. On your initial query, you could note the current time, T. Then for the subsequent queries you add a filter query for docAddedTimestamp=T. Hopefully with a trie date field this would be fast. This should hopefully keep any docs newly added after T from showing up in the user's search results as they page through them. However, it won't necessarily protect you from docs that were *reindexed* (i.e. re-add a doc with the same uniqueKey as an existing doc) or docs that were deleted.) Pro: Doesn't require a new cache, and no cap on # of search results Con: Maybe doesn't provide total stability. Any feedback on these options? Are there other ideas to consider? Thanks, Chris -- Lance Norskog goks...@gmail.com
Re: StreamingUpdateSolrServer commit?
Unless I slept through it, you still need to explicitly commit, even with SUSS. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: erikea...@yahoo.com erikea...@yahoo.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Fri, November 13, 2009 9:43:53 PM Subject: StreamingUpdateSolrServer commit? When does StreamingUpdateSolrServer commit? I know there's a threshhold and thread pool as params but I don't see a commit timeout. Do I have to manage this myself?
Re: Fwd: Lucene MMAP Usage with Solr
I thought that was the way to use it (but I've never had to use it myself) and that it means memory through the roof, yes. If you look at the Solr Admin statistics page, does it show you which Directory you are using? For example, on 1 Solr instance I'm looking at I see: readerDir : org.apache.lucene.store.NIOFSDirectory@/mnt/ Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: ST ST stst2...@gmail.com To: solr-user@lucene.apache.org Sent: Fri, November 13, 2009 6:03:57 PM Subject: Fwd: Lucene MMAP Usage with Solr Folks, I am trying to get Lucene MMAP to work in solr. I am assuming that when I configure MMAP the entire index will be loaded into RAM. Is that the right assumption ? I have tried the following ways for using MMAP: Option 1. Using the solr config below for MMAP configuration -Dorg.apache.lucene.FSDirectory.class=org.apache.lucene.store.MMapDirectory With this config, when I start solr with a 30G index, I expected that the RAM usage should go up, but it did not. Option 2. By Code Change I made the following code change : Changed org.apache.solr.core.StandardDirectoryFactory to use MMAP instead of FSDirectory. Code snippet pasted below. Could you help me to understand if these are the right way to use MMAP? Thanks much /ST. Code SNippet for Option 2: package org.apache.solr.core; /** * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the License); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * *http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an AS IS BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import java.io.File; import java.io.IOException; import org.apache.lucene.store.Directory; import org.apache.lucene.store.MMapDirectory; /** * Directory provider which mimics original Solr FSDirectory based behavior. * */ public class StandardDirectoryFactory extends DirectoryFactory { public Directory open(String path) throws IOException { return MMapDirectory.open(new File(path)); } }
Re: Stop solr without losing documents
So I think the question is really: If I stop the servlet container, does Solr issue a commit in the shutdown hook in order to ensure all buffered docs are persisted to disk before the JVM exits. I don't have the Solr source handy, but if I did, I'd look for Shutdown, Hook and finalize in the code. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Chris Hostetter hossman_luc...@fucit.org To: solr-user@lucene.apache.org Sent: Fri, November 13, 2009 4:09:00 PM Subject: Re: Stop solr without losing documents : which documents have been updated before a successful commit. Now : stopping solr is as easy as kill -9. please don't kill -9 ... it's grossly overkill, and doesn't give your servlet container a fair chance to cleanthings up. A lot of work has been done to make Lucene indexes robust to hard terminations of the JVM (or physical machine) but there's no reason to go out of your way to try and stab it in the heart when you could just shut it down cleanly. that's not to say your appraoch isn't a good one -- if you only have one client sending updates/commits then having it keep track of what was indexed prior to the lasts successful commit is a viable way to dela with what happens if solr stops responding (either because you shut it down, or because it crashed for some other reason). Alternately, you could take advantage of the enabled feature from your client (just have it test the enabled url ever N updates or so) and when it sees that you have disabled the port it can send one last commit and then stop sending updates until it sees the enabled URL work againg -- as soon as you see the updates stop, you can safely shutdown hte port. -Hoss
changes to highlighting config or syntax in 1.4?
I'm testing out the final release of Solr 1.4 as compared to the build I have been using from around June. I'm using hte dismax handler for searches. I'm finding that highlighting is completely broken as compared to previously. Much more text is returned than it should for each string in lst name=highlighting, but the search words are never highlighted in that response. Setting usePhraseHighlighter=false makes no difference. Any pointers appreciated. -Peter -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Solr 1.3 query and index perf tank during optimize
Let's take a step back. Why do you need to optimize? You said: As long as I'm not optimizing, search and indexing times are satisfactory. :) You don't need to optimize just because you are continuously adding and deleting documents. On the contrary! Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Jerome L Quinn jlqu...@us.ibm.com To: solr-user@lucene.apache.org Sent: Thu, November 12, 2009 6:30:42 PM Subject: Solr 1.3 query and index perf tank during optimize Hi, everyone, this is a problem I've had for quite a while, and have basically avoided optimizing because of it. However, eventually we will get to the point where we must delete as well as add docs continuously. I have a Solr 1.3 index with ~4M docs at around 90G. This is a single instance running inside tomcat 6, so no replication. Merge factor is the default 10. ramBufferSizeMB is 32. maxWarmingSearchers=4. autoCommit is set at 3 sec. We continually push new data into the index, at somewhere between 1-10 docs every 10 sec or so. Solr is running on a quad-core 3.0GHz server. under IBM java 1.6. The index is sitting on a local 15K scsi disk. There's nothing else of substance running on the box. Optimizing the index takes about 65 min. As long as I'm not optimizing, search and indexing times are satisfactory. When I start the optimize, I see massive problems with timeouts pushing new docs into the index, and search times balloon. A typical search while optimizing takes about 1 min instead of a few seconds. Can anyone offer me help with fixing the problem? Thanks, Jerry Quinn
Re: Solr 1.3 query and index perf tank during optimize
The 'maxSegments' feature is new with 1.4. I'm not sure that it will cause any less disk I/O during optimize. The 'mergeFactor=2' idea is not what you think: in this case the index is always mostly optimized, so you never need to run optimize. Indexing is always slower, because you amortize the optimize time into little continuous chunks during indexing. You never stop indexing. You should not lose documents. On Fri, Nov 13, 2009 at 1:07 PM, Jerome L Quinn jlqu...@us.ibm.com wrote: Mark Miller markrmil...@gmail.com wrote on 11/12/2009 07:18:03 PM: Ah, the pains of optimization. Its kind of just how it is. One solution is to use two boxes and replication - optimize on the master, and then queries only hit the slave. Out of reach for some though, and adds many complications. Yes, in my use case 2 boxes isn't a great option. Another kind of option is to use the partial optimize feature: optimize maxOptimizeSegments=5/ Using this, you can optimize down to n segments and take a shorter hit each time. Is this a 1.4 feature? I'm planning to migrate to 1.4, but it'll take a while since I have to port custom code forward, including a query parser. Also, if optimizing is so painful, you might lower the merge factor amortize that pain better. Thats another way to slowly get there - if you lower the merge factor, as merging takes place, the new merge factor will be respected, and semgents will merge down. A merge factor of 2 (the lowest) will make it so you only ever have 2 segments. Sometimes that works reasonably well - you could try 3-6 or something as well. Then when you do your partial optimizes (and eventually a full optimize perhaps), you want have so far to go. So this will slow down indexing but speed up optimize somewhat? Unfortunately right now I lose docs I'm indexing, as well slowing searching to a crawl. Ugh. I've got plenty of CPU horsepower. This is where having the ability to optimize on another filesystem would be useful. Would it perhaps make sense to set up a master/slave on the same machine? Then I suppose I can have an index being optimized that might not clobber the search. Would new indexed items still be dropped on the floor? Thanks, Jerry -- Lance Norskog goks...@gmail.com
Re: changes to highlighting config or syntax in 1.4?
Apparently one of my conf files was broken - odd that I didn't see any exceptions. Anyhow - excuse my haste, I don't see the problem now. -Peter On Fri, Nov 13, 2009 at 11:06 PM, Peter Wolanin peter.wola...@acquia.com wrote: I'm testing out the final release of Solr 1.4 as compared to the build I have been using from around June. I'm using hte dismax handler for searches. I'm finding that highlighting is completely broken as compared to previously. Much more text is returned than it should for each string in lst name=highlighting, but the search words are never highlighted in that response. Setting usePhraseHighlighter=false makes no difference. Any pointers appreciated. -Peter -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Data import problem with child entity from different database
am unable to get the file http://old.nabble.com/file/p26335171/dataimport.temp.xml On Fri, Nov 13, 2009 at 4:57 PM, Andrew Clegg andrew.cl...@gmail.com wrote: Noble Paul നോബിള് नोब्ळ्-2 wrote: no obvious issues. you may post your entire data-config.xml Here it is, exactly as last attempt but with usernames etc. removed. Ignore the comments and the unused FileDataSource... http://old.nabble.com/file/p26335171/dataimport.temp.xml dataimport.temp.xml Noble Paul നോബിള് नोब्ळ्-2 wrote: do w/o CachedSqlEntityProcessor first and then apply that later Yep, that was just a bit of a wild stab in the dark to see if it made any difference. Thanks, Andrew. -- View this message in context: http://old.nabble.com/Data-import-problem-with-child-entity-from-different-database-tp26334948p26335171.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Stop solr without losing documents
I would go with polling Solr to find what is not yet there. In production, it is better to assume that things will break, and have backstop janitors that fix them. And then test those janitors regularly. On Fri, Nov 13, 2009 at 8:02 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: So I think the question is really: If I stop the servlet container, does Solr issue a commit in the shutdown hook in order to ensure all buffered docs are persisted to disk before the JVM exits. I don't have the Solr source handy, but if I did, I'd look for Shutdown, Hook and finalize in the code. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Chris Hostetter hossman_luc...@fucit.org To: solr-user@lucene.apache.org Sent: Fri, November 13, 2009 4:09:00 PM Subject: Re: Stop solr without losing documents : which documents have been updated before a successful commit. Now : stopping solr is as easy as kill -9. please don't kill -9 ... it's grossly overkill, and doesn't give your servlet container a fair chance to cleanthings up. A lot of work has been done to make Lucene indexes robust to hard terminations of the JVM (or physical machine) but there's no reason to go out of your way to try and stab it in the heart when you could just shut it down cleanly. that's not to say your appraoch isn't a good one -- if you only have one client sending updates/commits then having it keep track of what was indexed prior to the lasts successful commit is a viable way to dela with what happens if solr stops responding (either because you shut it down, or because it crashed for some other reason). Alternately, you could take advantage of the enabled feature from your client (just have it test the enabled url ever N updates or so) and when it sees that you have disabled the port it can send one last commit and then stop sending updates until it sees the enabled URL work againg -- as soon as you see the updates stop, you can safely shutdown hte port. -Hoss -- Lance Norskog goks...@gmail.com
Re: javabin in .NET?
OK. Is there anyone trying it out? where is this code ? I can try to help .. On Fri, Nov 13, 2009 at 8:10 PM, Mauricio Scheffer mauricioschef...@gmail.com wrote: I meant the standard IO libraries. They are different enough that the code has to be manually ported. There were some automated tools back when Microsoft introduced .Net, but IIRC they never really worked. Anyway it's not a big deal, it should be a straightforward job. Testing it thoroughly cross-platform is another thing though. 2009/11/13 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com The javabin format does not have many dependencies. it may have 3-4 classes an that is it. On Fri, Nov 13, 2009 at 6:05 PM, Mauricio Scheffer mauricioschef...@gmail.com wrote: Nope. It has to be manually ported. Not so much because of the language itself but because of differences in the libraries. 2009/11/13 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com Is there any tool to directly port java to .Net? then we can etxract out the client part of the javabin code and convert it. On Thu, Nov 12, 2009 at 9:56 PM, Erik Hatcher erik.hatc...@gmail.com wrote: Has anyone looked into using the javabin response format from .NET (instead of SolrJ)? It's mainly a curiosity. How much better could performance/bandwidth/throughput be? How difficult would it be to implement some .NET code (C#, I'd guess being the best choice) to handle this response format? Thanks, Erik -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Data import problem with child entity from different database
dataConfig dataSource name=caffdubya driver=org.postgresql.Driver url=jdbc:postgresql://db1/cathdb_v3_3_0 user=USER password=PASS / dataSource name=sinatra driver=oracle.jdbc.OracleDriver url=jdbc:oracle:thin:@db2:1521:biomapwh user=USER password=PASS / !-- The following path is on bsmcmp11's local disk for speed. -- !-- The master copy (compressed) lives at /cath/data/current/pdb-XML-noatom -- !-- For convenience, there's a script at bsmcmp11:/export/local/refresh_pdb to copy and unpack it. -- dataSource name=filesystem type=FileDataSource basePath=/export/local/pdb-XML-noatom/ encoding=UTF-8 connectionTimeout=5000 readTimeout=1/ document entity name=domain dataSource=caffdubya query=select * from domain_text !-- Subquery for related PubMed IDs (we could pull the actual text in later...) ... NOT WORKING! :-( -- entity name=domain_pubmed_ids dataSource=sinatra onError=continue query=select id as pdb_code, related_id as related_ids from biomap_admin.uniprot_pdb_pubmed_for_solr where id = '${domain.pdb_code}' / /entity !-- REMOVED MOST ENTITIES FOR TEST PURPOSES, RESTORE FROM PREVIOUS REVISION -- /document /dataConfig 2009/11/13 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: am unable to get the file http://old.nabble.com/file/p26335171/dataimport.temp.xml On Fri, Nov 13, 2009 at 4:57 PM, Andrew Clegg andrew.cl...@gmail.com wrote: Noble Paul നോബിള് नोब्ळ्-2 wrote: no obvious issues. you may post your entire data-config.xml Here it is, exactly as last attempt but with usernames etc. removed. Ignore the comments and the unused FileDataSource... http://old.nabble.com/file/p26335171/dataimport.temp.xml dataimport.temp.xml Noble Paul നോബിള് नोब्ळ्-2 wrote: do w/o CachedSqlEntityProcessor first and then apply that later Yep, that was just a bit of a wild stab in the dark to see if it made any difference. Thanks, Andrew. -- View this message in context: http://old.nabble.com/Data-import-problem-with-child-entity-from-different-database-tp26334948p26335171.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- Lance Norskog goks...@gmail.com