Re: Reading timestamp for DIH
(10/11/24 6:05), Siddharth Powar wrote: Hey, Is it possible to read the timestamp that the DataImportHandler uses for a delta-import from a location other than conf/dataimport.properties. Thanks, Sid No. There is an open issue for this problem: https://issues.apache.org/jira/browse/SOLR-1970 Koji -- http://www.rondhuit.com/en/
Re: finding exact case insensitive matches on single and multiword values
Geert-Jan and Erick, thanks! What I tried first is making it work with string type, that works perfect for all lowercase values! What I do not understand is how and why I have to make the casing work at the client, since the casing differs in the database. Right now in the database I have values for city: Den Haag Den HAAG den haag den haag using fq=city:(den\ haag) gives me 2 results. So it seems to me that because of the string type this casing issue cannot be resolved as long as I'm using this fieldtype? Then to the solution of tweaking the fieldtype for me to work. I have this right now: fieldType name=myField class=solr.TextField sortMissingLast=true omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType But I find it difficult to test what the result of the filters are, and since as Erick already mentioned, the result looks correct but really isnt... Is there some tool where I can add and remove the filters to quickly see what the output will be? (without having to reload schema.xml and do reimport? -- View this message in context: http://lucene.472066.n3.nabble.com/finding-exact-case-insensitive-matches-on-single-and-multiword-values-tp2012207p2017851.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem with DIH delta-import delete.
(10/11/17 20:18), Matti Oinas wrote: Solr does not delete documents from index although delta-import says it has deleted n documents from index. I'm using version 1.4.1. The schema looks like fields field name=uuid type=string indexed=true stored=true required=true / field name=type type=int indexed=true stored=true required=true / field name=blog_id type=int indexed=true stored=true / field name=entry_id type=int indexed=false stored=true / field name=content type=textgen indexed=true stored=true / /fields uniqueKeyuuid/uniqueKey Relevant fields from database tables: TABLE: blogs and entries both have Field: id Type: int(11) Null: NO Key: PRI Default: NULL Extra: auto_increment Field: modified Type: datetime Null: YES Key: Default: NULL Extra: Field: status Type: tinyint(1) unsigned Null: YES Key: Default: NULL Extra: ?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver.../ document entity name=blog pk=id query=SELECT id,description,1 as type FROM blogs WHERE status=2 deltaImportQuery=SELECT id,description,1 as type FROM blogs WHERE status=2 AND id='${dataimporter.delta.id}' deltaQuery=SELECT id FROM blogs WHERE '${dataimporter.last_index_time}'lt; modified AND status=2 deletedPkQuery=SELECT id FROM blogs WHERE '${dataimporter.last_index_time}'lt;= modified AND status=3 transformer=TemplateTransformer field column=uuid name=uuid template=blog-${blog.id} / field column=id name=blog_id / field column=description name=content / field column=type name=type / /entity entity name=entry pk=id query=SELECT f.id as id,f.content,f.blog_id,2 as type FROM entries f,blogs b WHERE f.blog_id=b.id AND b.status=2 deltaImportQuery=SELECT f.id as id,f.content,f.blog_id,2 as type FROM entries f,blogs b WHERE f.blog_id=b.id AND f.id='${dataimporter.delta.id}' deltaQuery=SELECT f.id as id FROM entries f JOIN blogs b ON b.id=f.blog_id WHERE '${dataimporter.last_index_time}'lt; b.modified AND b.status=2 deletedPkQuery=SELECT f.id as id FROM entries f JOIN blogs b ON b.id=f.blog_id WHERE b.status!=2 AND '${dataimporter.last_index_time}' lt; b.modified transformer=HTMLStripTransformer,TemplateTransformer field column=uuid name=uuid template=entry-${entry.id} / field column=id name=entry_id / field column=blog_id name=blog_id / field column=content name=content stripHTML=true / field column=type name=type / /entity /document /dataConfig Full import and delta import works without problems when it comes to adding new documents to the index but when blog is deleted (status is set to 3 in database), solr report after delta import is something like Indexing completed. Added/Updated: 0 documents. Deleted 81 documents.. The problem is that documents are still found from solr index. 1. UPDATE blogs SET modified=NOW(),status=3 WHERE id=26; 2. delta-import = str name= Indexing completed. Added/Updated: 0 documents. Deleted 81 documents. /str str name=Committed2010-11-17 13:00:50/str str name=Optimized2010-11-17 13:00:50/str So solr says it has deleted documents and that index is also optimzed and committed after the operation. 3. Search; blog_id:26 still returns 1 document with type 1 (blog) and 80 documents with type 2 (entry). Hi Matti, Can you see something like the following Completed DeletedRowKey for Entity and then Deleting document: ID-1 in your solr log? (sample messages from my Solr log) Dec 4, 2010 8:25:40 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed DeletedRowKey for Entity: product rows obtained : 2 : Dec 4, 2010 8:25:40 PM org.apache.solr.handler.dataimport.DocBuilder deleteAll INFO: Deleting stale documents Dec 4, 2010 8:25:40 PM org.apache.solr.handler.dataimport.SolrWriter deleteDoc INFO: Deleting document: OVEN-2 : If you cannot find these messages, I think there is something incorrect setting (but I couldn't find incorrect ones in your data-config.xml...). Koji -- http://www.rondhuit.com/en/
Re: finding exact case insensitive matches on single and multiword values
Then to the solution of tweaking the fieldtype for me to work. I have this right now: fieldType name=myField class=solr.TextField sortMissingLast=true omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Additionally you can add TrimFilterFactory to your analyzer chain. And instead of escaping white spaces you can use RawQParserPlugin. fq={!raw f=city}den haag
RE: finding exact case insensitive matches on single and multiword values
ALL solr queries are case-sensitive. The trick is in the analyzers. If you downcase everything at index time before you put it in the index, and downcase all queries at query time too -- then you have case-insensitive query. Not because the Solr search algorithms are case insensitive, but because you've normalized all values to be all lowercase at both index and query time, so things will match. You can only do this kind of normalization through analyzers on a Solr text field, not a Solr string field. It's what the Solr text type is for. This wiki page, and this question in particular, will be helpful to you: http://wiki.apache.org/solr/SolrRelevancyCookbook#Relevancy_and_Case_Matching From: PeterKerk [vettepa...@hotmail.com] Sent: Saturday, December 04, 2010 6:24 AM To: solr-user@lucene.apache.org Subject: Re: finding exact case insensitive matches on single and multiword values Geert-Jan and Erick, thanks! What I tried first is making it work with string type, that works perfect for all lowercase values! What I do not understand is how and why I have to make the casing work at the client, since the casing differs in the database. Right now in the database I have values for city: Den Haag Den HAAG den haag den haag using fq=city:(den\ haag) gives me 2 results. So it seems to me that because of the string type this casing issue cannot be resolved as long as I'm using this fieldtype? Then to the solution of tweaking the fieldtype for me to work. I have this right now: fieldType name=myField class=solr.TextField sortMissingLast=true omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType But I find it difficult to test what the result of the filters are, and since as Erick already mentioned, the result looks correct but really isnt... Is there some tool where I can add and remove the filters to quickly see what the output will be? (without having to reload schema.xml and do reimport? -- View this message in context: http://lucene.472066.n3.nabble.com/finding-exact-case-insensitive-matches-on-single-and-multiword-values-tp2012207p2017851.html Sent from the Solr - User mailing list archive at Nabble.com.
autocommit commented out -- what is the default?
Hi, if you comment out the block in solrconfig.xml !-- autoCommit maxDocs1/maxDocs maxTime60/maxTime /autoCommit -- Does this mean that (a) commits never happen automatically or (b) some default autocommit is applied?
Re: autocommit commented out -- what is the default?
On Sat, Dec 4, 2010 at 10:36 AM, Brian Whitman br...@echonest.com wrote: Hi, if you comment out the block in solrconfig.xml !-- autoCommit maxDocs1/maxDocs maxTime60/maxTime /autoCommit -- Does this mean that (a) commits never happen automatically or (b) some default autocommit is applied? Commented out means they never happen automatically (i.e., no default). In general commitWithin is a better strategy to use... bulk updates can use a large value (or no value w/ explicit commit at end) for better indexing performance, while other updates can use a smaller value depending on how soon the update needs to be visible. -Yonik http://www.lucidimagination.com
Re: Batch Update Fields
Synonyms eh? I have a synonym list like the following so how do I identify the synonyms on a specific field. The only place the field is used is as a facet. original field = country name AF = AFGHANISTAN AX = ÅLAND ISLANDS AL = ALBANIA DZ = ALGERIA AS = AMERICAN SAMOA AD = ANDORRA AO = ANGOLA AI = ANGUILLA AQ = ANTARCTICA AG = ANTIGUA AND BARBUDA AR = ARGENTINA AM = ARMENIA AW = ARUBA AU = AUSTRALIA AT = AUSTRIA etc... Any advise on that would be great and very much appreciated! Adam On Fri, Dec 3, 2010 at 3:55 PM, Erick Erickson erickerick...@gmail.comwrote: That will certainly work. Another option, assuming the country codes are in their own field would be to put the transformations into a synonym file that was only used on that field. That way you'd get this without having to do the pre-process step of the raw data... That said, if you pre-processing is working for you it may not be worth your while to worry about doing it differently Best Erick On Fri, Dec 3, 2010 at 12:51 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: First off...I know enough about Solr to be VERY dangerous so please bare with me ;-) I am indexing the geonames database which only provides country codes. I can facet the codes but to the end user who may not know all 249 codes, it isn't really all that helpful. Therefore, I want to map the full country names to the country codes provided in the geonames db. http://download.geonames.org/export/dump/ http://download.geonames.org/export/dump/I used a simple split function to chop the 850 meg txt file in to manageable csv's that I can import in to Solr. Now that all 7 million + documents are in there, I want to change the country codes to the actual country names. I would of liked to have done it in the index but finding and replacing the strings in the csv seems to be working fine. After that I can just reindex the entire thing. Adam On Fri, Dec 3, 2010 at 12:42 PM, Erick Erickson erickerick...@gmail.com wrote: Have you consider defining synonyms for your code -country conversion at index time (or query time for that matter)? We may have an XY problem here. Could you state the high-level problem you're trying to solve? Maybe there's a better solution... Best Erick On Fri, Dec 3, 2010 at 12:20 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: I wonder...I know that sed would work to find and replace the terms in all of the csv files that I am indexing but would it work to find and replace key terms in the index? find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g' {} \; That command would iterate through all the files in the data directory and replace the country code with the full country name. I many just back up the directory and try it. I have it running on csv files right now and it's working wonderfully. For those of you interested, I am indexing the entire Geonames dataset http://download.geonames.org/export/dump/(allCountries.zip) which gives me a pretty comprehensive world gazetteer. My next step is gonna be to display the results as KML to view over a google globe. Thoughts? Adam On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson erickerick...@gmail.com wrote: No, there's no equivalent to SQL update for all values in a column. You'll have to reindex all the documents. On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: OK part 2 of my previous question... Is there a way to batch update field values based on a certain criteria? For example, if thousands of documents have a field value of 'US' can I update all of them to 'United States' programmatically? Adam
Re: Batch Update Fields
When you define your fieldType at index time. My idea was that you substitue these on the way in to your index. You may need a specific field type just for your country conversion Perhaps in a copyField if you need both the code and full name Best Erick On Sat, Dec 4, 2010 at 12:16 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: Synonyms eh? I have a synonym list like the following so how do I identify the synonyms on a specific field. The only place the field is used is as a facet. original field = country name AF = AFGHANISTAN AX = ÅLAND ISLANDS AL = ALBANIA DZ = ALGERIA AS = AMERICAN SAMOA AD = ANDORRA AO = ANGOLA AI = ANGUILLA AQ = ANTARCTICA AG = ANTIGUA AND BARBUDA AR = ARGENTINA AM = ARMENIA AW = ARUBA AU = AUSTRALIA AT = AUSTRIA etc... Any advise on that would be great and very much appreciated! Adam On Fri, Dec 3, 2010 at 3:55 PM, Erick Erickson erickerick...@gmail.com wrote: That will certainly work. Another option, assuming the country codes are in their own field would be to put the transformations into a synonym file that was only used on that field. That way you'd get this without having to do the pre-process step of the raw data... That said, if you pre-processing is working for you it may not be worth your while to worry about doing it differently Best Erick On Fri, Dec 3, 2010 at 12:51 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: First off...I know enough about Solr to be VERY dangerous so please bare with me ;-) I am indexing the geonames database which only provides country codes. I can facet the codes but to the end user who may not know all 249 codes, it isn't really all that helpful. Therefore, I want to map the full country names to the country codes provided in the geonames db. http://download.geonames.org/export/dump/ http://download.geonames.org/export/dump/I used a simple split function to chop the 850 meg txt file in to manageable csv's that I can import in to Solr. Now that all 7 million + documents are in there, I want to change the country codes to the actual country names. I would of liked to have done it in the index but finding and replacing the strings in the csv seems to be working fine. After that I can just reindex the entire thing. Adam On Fri, Dec 3, 2010 at 12:42 PM, Erick Erickson erickerick...@gmail.com wrote: Have you consider defining synonyms for your code -country conversion at index time (or query time for that matter)? We may have an XY problem here. Could you state the high-level problem you're trying to solve? Maybe there's a better solution... Best Erick On Fri, Dec 3, 2010 at 12:20 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: I wonder...I know that sed would work to find and replace the terms in all of the csv files that I am indexing but would it work to find and replace key terms in the index? find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g' {} \; That command would iterate through all the files in the data directory and replace the country code with the full country name. I many just back up the directory and try it. I have it running on csv files right now and it's working wonderfully. For those of you interested, I am indexing the entire Geonames dataset http://download.geonames.org/export/dump/(allCountries.zip) which gives me a pretty comprehensive world gazetteer. My next step is gonna be to display the results as KML to view over a google globe. Thoughts? Adam On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson erickerick...@gmail.com wrote: No, there's no equivalent to SQL update for all values in a column. You'll have to reindex all the documents. On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: OK part 2 of my previous question... Is there a way to batch update field values based on a certain criteria? For example, if thousands of documents have a field value of 'US' can I update all of them to 'United States' programmatically? Adam
Re: How to make a client in JSP which will take output from Solr Server
Ok, I solved it by just opening the connection and then parsing the output from xml to front page. Though It has some security isuue... - Kumar Anurag -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-make-a-client-in-JSP-which-will-take-output-from-Solr-Server-tp1519527p2019632.html Sent from the Solr - User mailing list archive at Nabble.com.
FastVectorHighlighter ignoring fragmenter parameter . . .
Got the FVH to work in Solr 3.1 (or at least I presume I have given I can see multi-color highlighting in the output.) But I am not able to get it to recognize the regex fragmenter. I get no change in output if I specify the fragmenter. In fact, I can even enter bogus names for the fragmenter and get no change in the output. Grateful for any suggestions. Settings and output below. Christopher *Query* http://localhost:8983/solr/10k-Fragments/select? q=content%3Aliquidity rows=100 fl=id%2Ccontent qt=standard hl.fl=content hl.useFastVectorHighlighter=true hl=true hl.fragmentsBuilder=colored hl.fragmenter=regex *Response* (Abbreviated) response - lst name=responseHeader int name=status0/int int name=QTime47/int - lst name=params str name=flid,content/str str name=hl.useFastVectorHighlightertrue/str str name=qcontent:liquidity/str str name=hl.fragmenterregex1text/str str name=hl.flcontent/str str name=hl.fragmentsBuildercolored/str str name=qtstandard/str str name=hltrue/str str name=rows100/str /lst /lst . . . lst name=highlighting - lst name=10K/1997-12-31/1998-04-01/1stBergenBancorp/0001005016/ManagementsDiscussionAndAnalysisOfFinancialConditionAndResultsOfOperations/LiquidityAndCapitalResource/paragraph/1/mh1261 - arr name=content - str #4504; b style=background:yellowLiquidity/b is a measure of a bank's ability to fund loans and withdrawals of deposits in a cost-ef /str /arr /lst . . . *Field listing in schema.xml* field name=content type=text indexed=true stored=true termVectors=true termPositions=true termOffsets=true/ *Highlighter listing in solrconfig.xml* highlighting fragmenter name=gap class=org.apache.solr.highlight.GapFragmenter default=true lst name=defaults int name=hl.fragsize100/int /lst /fragmenter fragmenter name=regex class=org.apache.solr.highlight.RegexFragmenter lst name=defaults int name=hl.fragsize70/int float name=hl.regex.slop0.5/float str name=hl.regex.pattern[-\w ,/\n\']{20,200}/str /lst /fragmenter formatter name=html class=org.apache.solr.highlight.HtmlFormatter default=true lst name=defaults str name=hl.simple.pre![CDATA[em]]/str str name=hl.simple.post![CDATA[/em]]/str /lst /formatter !-- Configure the standard encoder -- encoder name=html class=org.apache.solr.highlight.HtmlEncoder default=true/ !-- Configure the standard fragListBuilder -- fragListBuilder name=simple class=org.apache.solr.highlight.SimpleFragListBuilder default=true/ !-- multi-colored tag FragmentsBuilder -- fragmentsBuilder name=colored class=org.apache.solr.highlight.ScoreOrderFragmentsBuilder default=true lst name=defaults str name=hl.tag.pre![CDATA[ b style=background:yellow,b style=background:lawgreen, b style=background:aquamarine,b style=background:magenta, b style=background:palegreen,b style=background:coral, b style=background:wheat,b style=background:khaki, b style=background:lime,b style=background:deepskyblue]]/str str name=hl.tag.post![CDATA[/b]]/str /lst /fragmentsBuilder /highlighting
Solr Got Exceptions When schema.xml is Changed
Dear all, I am a new user of Solr. Now I am just trying to try some basic samples. Solr can be started correctly with Tomcat. However, when putting a new schema.xml under SolrHome/conf and starting Tomcat again, I got the following two exceptions. The Solr cannot be started correctly unless using the initial schema.xml from Solr. Why cannot I change the schema.xml? Thanks so much! Bing Dec 5, 2010 4:52:49 AM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:52) at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1146) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) - SEVERE: Could not start SOLR. Check solr/home property org.apache.solr.common.SolrException: QueryElevationComponent requires the schema to have a uniqueKeyFie ld implemented using StrField at org.apache.solr.handler.component.QueryElevationComponent.inform(QueryElevationComponent.java :157) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:508) at org.apache.solr.core.SolrCore.init(SolrCore.java:588) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:273) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:254) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:37 2) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:98) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4405) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5037) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:140) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:812) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:787) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:570) at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:891) at org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:683) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:466) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1267) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:308) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119) at org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:89) at org.apache.catalina.util.LifecycleBase.setState(LifecycleBase.java:328) at org.apache.catalina.util.LifecycleBase.setState(LifecycleBase.java:308) at org.apache.catalina.core.ContainerBase.startInternal(ContainerBase.java:1043) at org.apache.catalina.core.StandardHost.startInternal(StandardHost.java:738) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:140) at org.apache.catalina.core.ContainerBase.startInternal(ContainerBase.java:1035) at org.apache.catalina.core.StandardEngine.startInternal(StandardEngine.java:289) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:140) at org.apache.catalina.core.StandardService.startInternal(StandardService.java:442) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:140) at org.apache.catalina.core.StandardServer.startInternal(StandardServer.java:674) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:140) at org.apache.catalina.startup.Catalina.start(Catalina.java:596) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at
Re: Solr Got Exceptions When schema.xml is Changed
QueryElevationComponent requires the schema to have a uniqueKeyFie ld implemented using StrField you should use the type StrField ('string') for the field used in uniqueKeyField
RE: autocommit commented out -- what is the default?
It means they never happen automatically, added documents wont' be committed until you send a commit to solr. Jonathan From: Brian Whitman [br...@echonest.com] Sent: Saturday, December 04, 2010 10:36 AM To: solr-user@lucene.apache.org Subject: autocommit commented out -- what is the default? Hi, if you comment out the block in solrconfig.xml !-- autoCommit maxDocs1/maxDocs maxTime60/maxTime /autoCommit -- Does this mean that (a) commits never happen automatically or (b) some default autocommit is applied?
Full text hit term highlighting
Anyone ever use Solr to present a view of a document with hit-terms highlighted within? Kind of like Google's cached http://bit.ly/hgudWqcopies?
Re: Full text hit term highlighting
Set the fragment length to 0. This means highlight the entire text body. If, you have stored the text body. Otherwise, you have to get the term vectors somehow and highlight the text yourself. I investigated this problem awhile back for PDFs. You can add a starting page and an OR list of search terms to the URL that loads a PDF into the in-browser version of the Adobe PDF reader. This allows you to load the PDF at the first occurence of any of the search terms, with the terms highlighted. The search button takes you to the next of any of the terms. On Sat, Dec 4, 2010 at 4:10 PM, Rich Cariens richcari...@gmail.com wrote: Anyone ever use Solr to present a view of a document with hit-terms highlighted within? Kind of like Google's cached http://bit.ly/hgudWqcopies? -- Lance Norskog goks...@gmail.com
Re: Question about Solr Fieldtypes, Chaining of Tokenizers
Could you expand on your example and show the output you want? FWIW, you could simply write a token filter that does the same thing as the WhitespaceTokenizer. -Grant On Dec 3, 2010, at 1:14 PM, Matthew Hall wrote: Hey folks, I'm working with a fairly specific set of requirements for our corpus that needs a somewhat tricky text type for both indexing and searching. The chain currently looks like this: tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.PatternReplaceFilterFactory pattern=(.*?)(\p{Punct}*)$ replacement=$1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ filter class=solr.PatternReplaceFilterFactory pattern=\p{Punct} replacement= / tokenizer class=solr.WhitespaceTokenizerFactory/ Now you will notice that I'm trying to add in a second tokenizer to this chain at the very end, this is due to the final replacement of punctuation to whitespace. At that point I'd like to further break up these tokens to smaller tokens. The reason for this is that we have a mixed normal english word and scientific corpus. For example you could expect string like The symposium of TgThe(RX3fg+and) gene studies being added to the index, and parts of those phrases being searched on. We want to be able to remove the stopwords in the mostly english parts of these types of statements, which the whitespace tokenizer, followed by removing trailing punctuation, followed by the stopfilter takes care of. We do not want to remove references to genetic information contained in allele symbols and the like. Sadly as far as I can tell, you cannot chain tokenizers in the schema.xml, so does anyone have some suggestions on how this could be accomplished? Oh, and let me add that the WordDelimiterFilter comes really close to what I want, but since we are unwilling to promote our solr version to the trunk (we are on the 1.4x) version atm, the inability to turn off the automatic phrase queries makes it a no go. We need to be able to make searches on left/right match right/left. My searches through the old material on this subject isn't really showing me much except some advice on using the copyField attribute. But my understanding is that this will simply take your original input to the field, and then analyze it in two different ways depending on the field definitions. It would be very nice if it were copying the already analyzed version of the text... but that's not what its doing, right? Thanks for any advice on this matter. Matt -- Grant Ingersoll http://www.lucidimagination.com
Re: Question about Solr Fieldtypes, Chaining of Tokenizers
On Fri, Dec 3, 2010 at 1:14 PM, Matthew Hall mh...@informatics.jax.org wrote: Oh, and let me add that the WordDelimiterFilter comes really close to what I want, but since we are unwilling to promote our solr version to the trunk (we are on the 1.4x) version atm, the inability to turn off the automatic phrase queries makes it a no go. We need to be able to make searches on left/right match right/left. if this is the case, it doesnt matter what your analysis does, it won't work. your only workaround if you cannot upgrade, is to use PositionFilter at query-time... but then you cannot use phrasequeries at all.
Re: Exceptions in Embedded Solr
Any help on this? On Thu, Dec 2, 2010 at 7:51 PM, Tharindu Mathew mcclou...@gmail.com wrote: Hi everyone, I get the exception below when using Embedded Solr suddenly. If I delete the Solr index it goes back to normal, but it obviously has to start indexing from scratch. Any idea what the cause of this is? java.lang.RuntimeException: java.io.FileNotFoundException: /home/evanthika/WSO2/CARBON/GREG/3.6.0/23-11-2010/normal/wso2greg-3.6.0/solr/data/index/segments_2 (No such file or directory) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) at org.apache.solr.core.SolrCore.init(SolrCore.java:579) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) at org.wso2.carbon.registry.indexing.solr.SolrClient.init(SolrClient.java:103) at org.wso2.carbon.registry.indexing.solr.SolrClient.getInstance(SolrClient.java:115) ... 44 more Caused by: java.io.FileNotFoundException: /home/evanthika/WSO2/CARBON/GREG/3.6.0/23-11-2010/normal/wso2greg-3.6.0/solr/data/index/segments_2 (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:212) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.init(SimpleFSDirectory.java:78) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.init(SimpleFSDirectory.java:108) at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.init(NIOFSDirectory.java:94) at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:70) at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:691) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:236) at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:72) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:683) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69) at org.apache.lucene.index.IndexReader.open(IndexReader.java:476) at org.apache.lucene.index.IndexReader.open(IndexReader.java:403) at org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:38) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1057) ... 48 more [2010-11-23 14:14:46,568] ERROR {org.apache.solr.core.SolrCore} - REFCOUNT ERROR: unreferenced org.apache.solr.core.solrc...@58f24b6 (null) has a reference count of 1 [2010-11-23 14:14:46,568] ERROR {org.apache.solr.core.SolrCore} - REFCOUNT ERROR: unreferenced org.apache.solr.core.solrc...@654dbbf6 (null) has a reference count of 1 [2010-11-23 14:14:46,568] ERROR {org.apache.solr.core.CoreContainer} - CoreContainer was not shutdown prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! [2010-11-23 14:14:46,568] ERROR {org.apache.solr.core.CoreContainer} - CoreContainer was not shutdown prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! -- Regards, Tharindu -- Regards, Tharindu -- Regards, Tharindu
Re: How to make a client in JSP which will take output from Solr Server
On Sun, Dec 5, 2010 at 1:51 AM, Anurag anurag.it.jo...@gmail.com wrote: Ok, I solved it by just opening the connection and then parsing the output from xml to front page. Though It has some security isuue... See AJAX Solr: http://evolvingweb.github.com/ajax-solr/ Regards, Gora