Re: Newbie Question, can I store structured sub elements?
I know I can have multi value on them but that doesn't let me see that a showing instance happens at a particular time on a particular channel, just that it shows on a range of channels at a range of times Starting to think I will have to either store a formatted string that combines them or keep it flat just for indexing, retrieve ids and use them to get data out of the RDBMS On 24 Aug 2011, at 23:09, dan whelan d...@adicio.com wrote: You could change starttime and channelname to multiValued=true and use these fields to store all the values for those fields. showing.movie_id and showing.id probably isn't needed in a solr record. On 8/24/11 7:53 AM, Zac Tolley wrote: I have a very scenario in which I have a film and showings, each film has multiple showings at set times on set channels, so I have: Movie - id title description duration Showing - id movie_id starttime channelname I want to know can I store this in solr so that I keep this stucture? I did try to do an initial import with the DIH using this config: entity name=movie query=SELECT * from movies field column=ID name=id/ field column=TITLE name=title/ field name=DESCRIPTION name=description/ entity name=showing query=SELECT * from showings WHERE movie_id = ${ movie.id} field column=ID name=id/ field column=STARTTIME name=starttime/ field column=CHANNELNAME name=channelname/ /entity /entity I was hoping, for each movie to get a sub entity with the showing like: doc str name=title./str showing str name=channelname. but instead all the fields are flattened down to the top level. I know this must be easy, what am I missing... ?
Re: can i create filters of score range
well when i said client.. i meant querying through solr.NET (in a way this can be seen as posting it through web browser url). so coming back to the issue .. even if am sorting it by _docid_ i need to do paging( 2 million docs in result) how is it internally doing it ? when sorted by docid, don we have deep pagin issue ? (getting all the previous pages into memory to get the next page) so whats the main difference we are gaining by sorting lucene docids and normal fields ? On 23 August 2011 22:41, Erick Erickson erickerick...@gmail.com wrote: Did you try exactly what Chris suggested? Appending sort=_docid_ asc to the query? When you say client I assume you're talking SolrJ, and I'm pretty sure that SolrQuery.setSortField is what you want. I suppose you could also set this as the default in your query handler. Best Erick On Tue, Aug 23, 2011 at 4:43 AM, jame vaalet jamevaa...@gmail.com wrote: okey, so this is something i was looking for .. the default order of result docs in lucene\solr .. and you are right, since i don care about the order in which i get the docs ideally i shouldn't ask solr to do any sorting on its raw result list ... though i understand your point, how do i do it as solr client ? by default if am not mentioning the sort parameter in query URL to solr, solr will try to sort it with respect to the score it calculated .. how do i prevent even this sorting ..do we have any setting as such in solr for this ? On 23 August 2011 03:29, Chris Hostetter hossman_luc...@fucit.org wrote: : before going into lucene doc id , i have got creationDate datetime field in : my index which i can use as page definition using filter query.. : i have learned exposing lucene docid wont be a clever idea, as its again : relative to index instance.. where as my index date field will be unique : ..and i can definitely create ranges with that.. i think you missunderstood me: i'm *not* suggesting you do any filtering on the internal lucene doc id. I am suggesting that you forget all about trying to filter to work arround the issues with deep paging, and simply *sort* on _docid_ asc, which should make all inherient issues with deep paging go away (as far as i know). At no point with the internal lucene doc ids be exposed to your client code, it's just a instruction to Solr/Lucene that it doesn't really need to do any sorting, it can just return the Nth-Mth docs as collected. : i ahve got on more doubt .. if i use filter query each time will it result : in memory problem like that we see in deep paging issues.. it could, i'm not sure. that's why i said... : I'm not sure if this would really gain you much though -- yes this would : work arround some of the memory issues inherient in deep paging but it : would still require a lot or rescoring of documents again and again. -Hoss -- -JAME -- -JAME
Re: Preserve XML hierarchy
Hi Michael, Thanks for your help! I am using Apache Solr 3.2 on windows. I am trying to apply the 2 patches ( https://issues.apache.org/jira/browse/SOLR-2597?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#issue-tabs XMLCharFilter Patch ), but I have no idea to do that. What do I need to open the Solr project? Or Which is the .jar file that I need to open? I saw that there is 2 files (woodstox and stax2), what I have to do with that files? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Preserve-XML-hierarchy-tp3166690p3283275.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Spatial Search problems
Thanx David, Just one more question. Am I able to do spatial search with solr1.4? And with lucene 2.9? What's your recomendation? Javier -- View this message in context: http://lucene.472066.n3.nabble.com/Spatial-Search-problems-tp3277945p3283285.html Sent from the Solr - User mailing list archive at Nabble.com.
Grouping and performing statistics per group
Hi All, I want to group-by certain field and perform statistics per group on a certain field of my choice. For example, if I have the next documents in my collection: doc child-id 12353 /child-id weight 65 /weight gender male /gender /doc doc child-id 12353 /child-id weight 63 /weight gender male /gender /doc doc child-id 12353 /child-id weight 49 /weight gender male /gender /doc now I want to group by gender, and let say for the sake of the example, that I want to average statistic on the weight. Is that possible? I would appreciate if anyone can also elaborate on performing such actions using SolrJ ... Thanks.
Re: Grouping and performing statistics per group
Hi Is it possible that Luke Handler can be used for this? I used Something like: http://localhost:8080/solr/admin/luke?fl=fieldNamenumTerms=1 to get an estimate of a range of values a field can have. Hope you find this information useful. Sowmya. On Thu, Aug 25, 2011 at 10:58 AM, Omri Cohen omri...@gmail.com wrote: Hi All, I want to group-by certain field and perform statistics per group on a certain field of my choice. For example, if I have the next documents in my collection: doc child-id 12353 /child-id weight 65 /weight gender male /gender /doc doc child-id 12353 /child-id weight 63 /weight gender male /gender /doc doc child-id 12353 /child-id weight 49 /weight gender male /gender /doc now I want to group by gender, and let say for the sake of the example, that I want to average statistic on the weight. Is that possible? I would appreciate if anyone can also elaborate on performing such actions using SolrJ ... Thanks. -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com
Re: Grouping and performing statistics per group
Thanks, but I actually need something more deterministic and more accurate.. Anyone knows if there is an already existing feature for that? thanks again
Upload doc and pdf in Solr 3.3.0
Good Morning, I have to set up a Solr System to seek in documents like pdf and doc. My Solr System is running in the meantime, but i cant find a tutorial that tells me what i have to do to put the files in the system. I hope you can help me a bit to bring that off on a simple way. And please excuse my bad english. -- View this message in context: http://lucene.472066.n3.nabble.com/Upload-doc-and-pdf-in-Solr-3-3-0-tp3283224p3283224.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr in a windows shared hosting environment
Hi, Is it possible to install Solr in a windows (IIS 7 or IIS 7.5) shared hosting environment? If yes, where can I find instructions how to do that? Thank you!
Re: Upload doc and pdf in Solr 3.3.0
http://wiki.apache.org/solr/ExtractingRequestHandler may help. Regards, Jayendra On Thu, Aug 25, 2011 at 3:24 AM, Moinsn felix.wieg...@googlemail.com wrote: Good Morning, I have to set up a Solr System to seek in documents like pdf and doc. My Solr System is running in the meantime, but i cant find a tutorial that tells me what i have to do to put the files in the system. I hope you can help me a bit to bring that off on a simple way. And please excuse my bad english. -- View this message in context: http://lucene.472066.n3.nabble.com/Upload-doc-and-pdf-in-Solr-3-3-0-tp3283224p3283224.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best way to anchor solr searches?
I don't think solr conforms to ACID type behaviours for its queries. This is not to say your use-case is not important just that its not SOLR's focus. I think its a interesting question but the solution is probably going to involve rolling your own. Something like returning 1 user docs and caching these in an application cache. pagination occurs in this cache rather than as a solr query with the start param incremented. Maybe offer a refresh data link with repopulates the cache from solr. cheers lee c On 25 August 2011 01:01, arian487 akarb...@tagged.com wrote: If I'm searching for users based on last login time, and I search once, then go to the second page with a new offset, I could potentially see the same users on page 2 if the index has changed. What is the best way to anchor it so I avoid this? -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-anchor-solr-searches-tp3282576p3282576.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Upload doc and pdf in Solr 3.3.0
Thanks. Now it works. -- View this message in context: http://lucene.472066.n3.nabble.com/Upload-doc-and-pdf-in-Solr-3-3-0-tp3283224p3283760.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Grouping and performing statistics per group
Hi Omri, I think you can achieve that with grouping and the Solr StatsComponent ( http://wiki.apache.org/solr/StatsComponent). In order to compute statistics on groups you must set the option group.truncate=true An example query: q=*:*group=truegroup.field=gendergroup.truncate=truestats=truestats.field=weight Let me know if this fits your use case. Martijn On 25 August 2011 11:17, Omri Cohen omri...@gmail.com wrote: Thanks, but I actually need something more deterministic and more accurate.. Anyone knows if there is an already existing feature for that? thanks again -- Met vriendelijke groet, Martijn van Groningen
Re: can you help on this?
Can you please specify: 1. Solr version 2. Platform 3. JDK version 4. Under what conditions does this error happens? Is it reproducible? On Wed, Aug 24, 2011 at 6:53 PM, abhijit bashetti abhijitbashe...@gmail.com wrote: SEVERE: java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code at org.apache.lucene.store.DataInput.readVInt(DataInput.java:108) at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:64) at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:131) at org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:166) at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:273) at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:209) at org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:503) at org.apache.solr.search.SolrIndexReader.docFreq(SolrIndexReader.java:309) at org.apache.lucene.search.TermQuery$TermWeight$1.add(TermQuery.java:56) at org.apache.lucene.util.ReaderUtil$Gather.run(ReaderUtil.java:77) at org.apache.lucene.util.ReaderUtil$Gather.run(ReaderUtil.java:82) at org.apache.lucene.util.ReaderUtil$Gather.run(ReaderUtil.java:66) at org.apache.lucene.search.TermQuery$TermWeight.init(TermQuery.java:53) at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:198) at org.apache.lucene.search.BooleanQuery$BooleanWeight.init(BooleanQuery.java:176) at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:354) at org.apache.lucene.search.Searcher.createNormalizedWeight(Searcher.java:168) at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:661) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:320) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1178) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1066) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:358) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:258) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:879) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) at java.lang.Thread.run(Thread.java:619) -- Regards, Shalin Shekhar Mangar.
Re: Grouping and performing statistics per group
Or if you dont care about grouped results you can also add the following option: stats.facet=gender On 25 August 2011 14:40, Martijn v Groningen martijn.v.gronin...@gmail.comwrote: Hi Omri, I think you can achieve that with grouping and the Solr StatsComponent ( http://wiki.apache.org/solr/StatsComponent). In order to compute statistics on groups you must set the option group.truncate=true An example query: q=*:*group=truegroup.field=gendergroup.truncate=truestats=truestats.field=weight Let me know if this fits your use case. Martijn On 25 August 2011 11:17, Omri Cohen omri...@gmail.com wrote: Thanks, but I actually need something more deterministic and more accurate.. Anyone knows if there is an already existing feature for that? thanks again -- Met vriendelijke groet, Martijn van Groningen -- Met vriendelijke groet, Martijn van Groningen
Re: Grouping and performing statistics per group
Hi, thanks for your reply.. it doesn't work. I am getting the plain stats results at the end of the response, but no statistics per group.. thanks anyway
Re: How to copy and extract information from a multi-line text before the tokenizer
You could consider writing your own UpdateHandler. It allows you to get access to the underlying SolrInputDocument, and freely modify the fields before it even gets to the analysis chain in defined in your schema. So you can get your AllData out of the doc, split it apart as many ways as you want and put fields back in the SolrInputDocument. You can even remove the AllData field if you want and not even define it in your schema A programmer had a problem. He tried to solve it with regular expressions. Now he has two problems :). Best Erick On Tue, Aug 23, 2011 at 6:28 AM, Michael Kliewe m...@mail.de wrote: Hello all, I have a custom schema which has a few fields, and I would like to create a new field in the schema that only has one special line of another field indexed. Lets use this example: field AllData (TextField) has for example this data: Title: exampleTitle of the book Author: Example Author Date: 01.01.1980 Each line is separated by a line break. I now need a new field named OnlyAuthor which only has the Author information in it, so I can search and facet for specific Author information. I added this to my schema: fieldType name=authorField class=solr.TextField analyzer type=index charFilter class=solr.PatternReplaceCharFilterFactory pattern=^.*\nAuthor: (.*?)\n.*$ replacement=$1 replace=all / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.TrimFilterFactory/ /analyzer analyzer type=query charFilter class=solr.PatternReplaceCharFilterFactory pattern=^.*\nAuthor: (.*?)\n.*$ replacement=$1 replace=all / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.TrimFilterFactory/ /analyzer /fieldType field name=OnlyAuthor type=authorField indexed=true stored=true / copyField source=AllData dest=OnlyAuthor/ But this is not working, the new AuthorOnly field contains all data, because the regex didn't match. But I need Example Author in that field (I think) to be able to search and facet only author information. I don't know where the problem is, perhaps someone of you can give me a hint, or a totally different method to achieve my goal to extract a single line from this multi-line-text. Kind regards and thanks for any help Michael
Re: Query parameter changes from solr 1.4 to 3.3
Have you looked at the CHANGES.txt file? That is supposed to be updated with all these kinds of notes... Best Erick On Tue, Aug 23, 2011 at 7:11 AM, Samarendra Pratap samarz...@gmail.com wrote: Hi, We are upgrading solr 1.4 (with collapsing patch solr-236) to solr 3.3. I was looking for the required changes in query parameters (or parameter names) if any. One thing I know for sure is that collapse and its sub-options are now known by group, but didn't find anything else. Can someone point me to some document or webpage for this? Or if there aren't any other changes can someone confirm that? -- Regards, Samar
Re: Spellcheck Phrases
Please start a new thread for this question, see: http://people.apache.org/~hossman/#threadhijack When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. Best Erick On Tue, Aug 23, 2011 at 11:47 AM, Herman Kiefus herm...@angieslist.com wrote: The angle that I am trying here is to create a dictionary from indexed terms that contain only correctly spelled words. We are doing this by having the field from which the dictionary is created utilize a type that employs solr.KeepWordFilterFactory, which in turn utilizes a text file of known correctly spelled words (including their respective derivations example: lead, leads, leading, etc.). This is working great for us with the exception being those fields in our schema that contain proper names. I can't seem to get (unfiltered) terms from those fields along with (correctly spelled) terms from other fields into the single field upon which the dictionary is built. -Original Message- From: Dyer, James [mailto:james.d...@ingrambook.com] Sent: Thursday, June 02, 2011 11:40 AM To: solr-user@lucene.apache.org Subject: RE: Spellcheck Phrases Actually, someone just pointed out to me that a patch like this is unnecessary. The code works as-is if configured like this: float name=thresholdTokenFrequency.01/float (correct) instead of this: str name=thresholdTokenFrequency.01/str (incorrect) I tested this and it seems to work. I'm still am trying to figure out if using this parameter actually improves the quality of our spell suggestions, now that I know how to use it properly. Sorry about the mis-information earlier. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Dyer, James Sent: Wednesday, June 01, 2011 3:02 PM To: solr-user@lucene.apache.org Subject: RE: Spellcheck Phrases Tanner, I just entered SOLR-2571 to fix the float-parsing-bug that breaks thresholdTokenFrequency. Its just a 1-line code fix so I also included a patch that should cleanly apply to solr 3.1. See https://issues.apache.org/jira/browse/SOLR-2571 for info and patches. This parameter appears absent from the wiki. And as it has always been broken for me, I haven't tested it. However, my understanding it should be set as the minimum percentage of documents in which a term has to occur in order for it to appear in the spelling dictionary. For instance in the config below, a term would have to occur in at least 1% of the documents for it to be part of the spelling dictionary. This might be a good setting for long fields but for the short fields in my application, I was thinking of setting this to something like 1/1000 of 1% ... searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetext/str lst name=spellchecker str name=namespellchecker/str str name=fieldSpelling_Dictionary/str str name=fieldTypetext/str str name=spellcheckIndexDir./spellchecker/str str name=thresholdTokenFrequency.01/str /lst /searchComponent James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Tanner Postert [mailto:tanner.post...@gmail.com] Sent: Friday, May 27, 2011 6:04 PM To: solr-user@lucene.apache.org Subject: Re: Spellcheck Phrases are there any updates on this? any third party apps that can make this work as expected? On Wed, Feb 23, 2011 at 12:38 PM, Dyer, James james.d...@ingrambook.comwrote: Tanner, Currently Solr will only make suggestions for words that are not in the dictionary, unless you specifiy spellcheck.onlyMorePopular=true. However, if you do that, then it will try to improve every word in your query, even the ones that are spelled correctly (so while it might change brake to break it might also change leg to log.) You might be able to alleviate some of the pain by setting the thresholdTokenFrequency so as to remove misspelled and rarely-used words from your dictionary, although I personally haven't been able to get this parameter to work. It also doesn't seem to be documented on the wiki but it is in the 1.4.1. source code, in class IndexBasedSpellChecker. Its also mentioned in SmileyPugh's book. I tried setting it like this, but got a ClassCastException on the float value: searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetext_spelling/str lst name=spellchecker str name=namespellchecker/str str name=fieldSpelling_Dictionary/str str name=fieldTypetext_spelling/str str name=buildOnOptimizetrue/str str name=thresholdTokenFrequency.001/str
Re: Grouping and performing statistics per group
If you take this query from the wiki: http://localhost:8983/solr/select?q=*:*stats=truestats.field=pricestats.field=popularitystats.twopass=truerows=0indent=truestats.facet=inStock In this case you get stats about the popularity per inStock value (true / false). Replacing this values with weight and gender respectively and if you run the query on your index, this should give the result you want. On 25 August 2011 14:47, Omri Cohen omri...@gmail.com wrote: Hi, thanks for your reply.. it doesn't work. I am getting the plain stats results at the end of the response, but no statistics per group.. thanks anyway -- Met vriendelijke groet, Martijn van Groningen
RE: Best way to anchor solr searches?
I don't think it has to be quite so bleak as that, depending upon the number of queries done over a given timeframe, and the size of the result sets. Solr does cache the identifiers of documents returned by search results. See http://wiki.apache.org/solr/SolrCaching paying particular attention to queryResultCache and queryResultWindoeSize . Not a guarantee, of course, but if the number of people searching is limited, and the result set sizes is manageable, the cache can probably accommodate their business need. JRJ -Original Message- From: lee carroll [mailto:lee.a.carr...@googlemail.com] Sent: Thursday, August 25, 2011 6:33 AM To: solr-user@lucene.apache.org Subject: Re: Best way to anchor solr searches? I don't think solr conforms to ACID type behaviours for its queries. This is not to say your use-case is not important just that its not SOLR's focus. I think its a interesting question but the solution is probably going to involve rolling your own. Something like returning 1 user docs and caching these in an application cache. pagination occurs in this cache rather than as a solr query with the start param incremented. Maybe offer a refresh data link with repopulates the cache from solr. cheers lee c On 25 August 2011 01:01, arian487 akarb...@tagged.com wrote: If I'm searching for users based on last login time, and I search once, then go to the second page with a new offset, I could potentially see the same users on page 2 if the index has changed. What is the best way to anchor it so I avoid this? -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-anchor-solr-searches-tp3282576p3282576.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Newbie question, ant target for packaging source files from local copy?
Hi sid, The current source packaging scheme aims to *avoid* including local changes :), so yes, there is no support currently for what you want to do. Prior to https://issues.apache.org/jira/browse/LUCENE-2973, the source packaging scheme used the current sources rather than pulling from Subversion. If you check out trunk revision 1083212 or earlier, or branch_3x revision 1083234 or earlier, you can see how it used to be done. If you want to resurrect the previous source packaging scheme as a new Ant target (maybe named package-local-src-tgz?), and you make a new JIRA issue and post a patch, and I'll help you get it committed (assuming nobody objects). If you haven't seen the Solr Wiki HowToContribute page http://wiki.apache.org/solr/HowToContribute, it may be of use to you for this. Steve -Original Message- From: syyang [mailto:syyan...@gmail.com] Sent: Wednesday, August 24, 2011 10:07 PM To: solr-user@lucene.apache.org Subject: Newbie question, ant target for packaging source files from local copy? Hi all, I am trying to package source files containing local changes. While running ant dist creates a war file containing the local changes, running ant package-src-tgz exports files straight from svn repository, and does not pick up any of the local changes. Is there an ant target that I can use to package local copy of the source files? Or are are we expected to just write our own? Thanks, -Sid -- View this message in context: http://lucene.472066.n3.nabble.com/Newbie- question-ant-target-for-packaging-source-files-from-local-copy- tp3282787p3282787.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr indexing process: keep a persistent Mysql connection throu all the indexing process
Yes, but you can always employ a singleton to open and maintain a DB connection. Best Erick On Tue, Aug 23, 2011 at 9:16 PM, samuele.mattiuzzo samum...@gmail.com wrote: those documents are unrelated to the database. the db i have is just storing countries - region - cities, and it's used to do a refinement on a specific solr field example: solrField thetext with content Mary comes from London updateHandler polls the database for europe - great britain - london and updates those values to the correct fields isnt an update handler relative to a single document? at least, that's what i understood... -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-indexing-process-keep-a-persistent-Mysql-connection-throu-all-the-indexing-process-tp3278608p3279765.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrServer instances
Hi All, I am using SolrJ (3.1) and Tomcat 6.x. I want to open solr server once (20 concurrence) and reuse this across all the site. Or something like connection pool like we are using for DB (ie Apache DBCP). There is a way to use static method which is a way but I want better solution from you people. I read one threade where Ahmet suggest to use something like that String serverPath = http://localhost:8983/solr;; HttpClient client = new HttpClient(new MultiThreadedHttpConnectionManager()); URL url = new URL(serverPath); CommonsHttpSolrServer solrServer = new CommonsHttpSolrServer(url, client); But how to use instance of this across all class. Please suggest. regards Jonty
Re: Problem using stop words
Hmmm, I'd expect you to have an error in your log file if you haven't removed the default field type named string. If you have removed it from your schema, this should work... But I'd change the name anyway, it'll cause endless confusion. And don't forget to re-index *everything*, preferably after removing the data/index directory (the whole directory, not just the contents of index). Take a look at the admin/analysis page after you've made your changes, and click the verbose checkbox to see exactly what happens at each step of the analysis chain. That'll tell you whether you're on the right track. The definition you're using should do what you've indicated you want. Best Erick On Wed, Aug 24, 2011 at 3:43 AM, _snake_ lucas.mig...@gmail.com wrote: Thanks everybody for your help!! I change the stopwords file, and I only use one word per line, without start / ending spaces, and without comments. I change it to UTF-8. I am using the TermsComponent to suggest words to the user (JQuery UI Autocomplete). So, the stopwords are still showed here... Do I have to change the name of the fieldtype string? I think the problem is that TermsComponent doesn't use the stopwords file. Is there another way to suggest words to the user? Thanks! The Solr Analysis shows the following when I search the word 'a' (that is an stopword) in a field that have all the content. Query Analyzer a The content of schema file is: fieldtype name=string class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=spanish_stop2.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=spanish_stop2.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldtype The solrconfig.xml file: requestHandler name=/terms class=solr.SearchHandler startup=lazy lst name=defaults bool name=termstrue/bool /lst arr name=components strterms/str /arr /requestHandler -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-using-stop-words-tp3274598p3280291.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr in a windows shared hosting environment
Yes, but since Solr is written in Java to run in a JEE container, you would host Solr in a web application server, either Jetty (which comes packaged), or something else (say, Tomcat or WebSphere or something like that). As a result, you aren't going to find anything that says how to run Solr under IIS because it doesn't run under IIS. It doesn't need IIS, though it can certainly coexist alongside IIS. If you want the requests to go thru IIS you might need a plug-in in IIS to handle that (IBM's WebSphere has such a plugin). If you don't need the requests to go thru IIS, then that isn't an issue. Hope that helps. JRJ -Original Message- From: Devora [mailto:devora...@gmail.com] Sent: Thursday, August 25, 2011 5:15 AM To: solr-user@lucene.apache.org Subject: Solr in a windows shared hosting environment Hi, Is it possible to install Solr in a windows (IIS 7 or IIS 7.5) shared hosting environment? If yes, where can I find instructions how to do that? Thank you!
RE: How to copy and extract information from a multi-line text before the tokenizer
A programmer had a problem. He tried to solve it with regular expressions. Now he has two problems :). A. That just isn't fair... 8^) (I can't think of very many things that have allowed me to perform more magic over my career than regular expressions, starting with SNOBOL. Uh oh: I just dated myself. 8^) ). JRJ -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Thursday, August 25, 2011 7:54 AM To: solr-user@lucene.apache.org Subject: Re: How to copy and extract information from a multi-line text before the tokenizer You could consider writing your own UpdateHandler. It allows you to get access to the underlying SolrInputDocument, and freely modify the fields before it even gets to the analysis chain in defined in your schema. So you can get your AllData out of the doc, split it apart as many ways as you want and put fields back in the SolrInputDocument. You can even remove the AllData field if you want and not even define it in your schema A programmer had a problem. He tried to solve it with regular expressions. Now he has two problems :). Best Erick On Tue, Aug 23, 2011 at 6:28 AM, Michael Kliewe m...@mail.de wrote: Hello all, I have a custom schema which has a few fields, and I would like to create a new field in the schema that only has one special line of another field indexed. Lets use this example: field AllData (TextField) has for example this data: Title: exampleTitle of the book Author: Example Author Date: 01.01.1980 Each line is separated by a line break. I now need a new field named OnlyAuthor which only has the Author information in it, so I can search and facet for specific Author information. I added this to my schema: fieldType name=authorField class=solr.TextField analyzer type=index charFilter class=solr.PatternReplaceCharFilterFactory pattern=^.*\nAuthor: (.*?)\n.*$ replacement=$1 replace=all / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.TrimFilterFactory/ /analyzer analyzer type=query charFilter class=solr.PatternReplaceCharFilterFactory pattern=^.*\nAuthor: (.*?)\n.*$ replacement=$1 replace=all / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.TrimFilterFactory/ /analyzer /fieldType field name=OnlyAuthor type=authorField indexed=true stored=true / copyField source=AllData dest=OnlyAuthor/ But this is not working, the new AuthorOnly field contains all data, because the regex didn't match. But I need Example Author in that field (I think) to be able to search and facet only author information. I don't know where the problem is, perhaps someone of you can give me a hint, or a totally different method to achieve my goal to extract a single line from this multi-line-text. Kind regards and thanks for any help Michael
Re: Spatial Search problems
Um, You might try googling LocalSolr or LocalLucene -- dead projects but you insist on using an old Solr/Lucene. Of course if all you need is a bounding box filter than a pair of lat lon range queries is sufficient. ~ David Smiley On Aug 25, 2011, at 4:01 AM, Javier Heras wrote: Thanx David, Just one more question. Am I able to do spatial search with solr1.4? And with lucene 2.9? What's your recomendation? Javier -- View this message in context: http://lucene.472066.n3.nabble.com/Spatial-Search-problems-tp3277945p3283285.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Return only the fields where there are results
So, there are 2 documents with results, but Solr show me fields without the word alquileres, why? Because that's the way Solr is built G. It wouldn't be very useful for Solr to only return the fields with matches. Consider indexing articles with a title and contents. If you search matched on contents but not the title, would you really want the returned document from Solr to NOT have the title? You can control what fields are returned for the result set by specifying the fl parameter, either in the defaults for your config or on the URL if you only want specific fields back. And a general comment. It is highly unusual to have all those string fields. String fields are unanalyzed, meaning the entire field is a single term, no matter now many words are in it. This is rarely useful for anything except, say, part numbers or SKUs or IDs. For general text that you want to search on, this is almost never correct. You might want to spend some time on the admin/analysis page to get a better sense of this... Best Erick On Wed, Aug 24, 2011 at 9:35 AM, _snake_ lucas.mig...@gmail.com wrote: Hi, I am using Apache Solr 3.2 and I want to know if it's possible to return only the fields that have the term that I am searching. In the Solr Admin page, if I put the word alquileres and then I click the button Search, the following response is showed: response - lst name=responseHeader int name=status0/int int name=QTime438/int /lst - result name=response numFound=2 start=0 - doc - arr name=Accion str/ str/ str/ str/ str/ str/ str/ str/ str/ /arr - arr name=Acciones str/ str/ str/ str/ str/ str/ /arr - arr name=Actividad str/ str/ str/ str/ str/ str/ str/ str/ /arr - arr name=Actividades str/ str/ str/ /arr - arr name=Beneficiarios str/ str/ str/ str/ /arr - arr name=Condicion str/ str/ /arr - arr name=Creacion str/ /arr - arr name=Cuantia str/ str/ str/ /arr - arr name=Excepcion ... The HTTP Request is : ...select/?q=alquileresversion=2.2start=0rows=10indent=on So, there are 2 documents with results, but Solr show me fields without the word alquileres, why? My schema.xml file is: ?xml version=1.0 ? schema name=example core zero version=1.1 types fieldtype name=string class=solr.TextField positionIncrementGap=100 analyzer type=index filter class=solr.StopFilterFactory ignoreCase=true words=spanish_stop2.txt enablePositionIncrements=true / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query filter class=solr.StopFilterFactory ignoreCase=true words=spanish_stop2.txt enablePositionIncrements=true / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldtype fieldtype name=string2 class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldtype /types fields field name=RutaFichero type=string2 indexed=true stored=true multiValued=false required=true/ field name=TitlePVtype=string indexed=true stored=true multiValued=true termVectors=true termPositions=true termOffsets=true / field name=TextPVtype=string indexed=true stored=true multiValued=true termVectors=true termPositions=true termOffsets=true / field name=TitleParttype=string indexed=true stored=true multiValued=true termVectors=true termPositions=true termOffsets=true / field name=TextParttype=string indexed=true stored=true multiValued=true termVectors=true termPositions=true termOffsets=true / field name=TextSubPart type=string indexed=true stored=true multiValued=true termVectors=true termPositions=true termOffsets=true / field name=Acciontype=string indexed=true stored=true multiValued=true / field name=Actividadtype=string indexed=true stored=true multiValued=true / field name=Objeto type=string indexed=true stored=true multiValued=true / field name=Objetivo type=string indexed=true stored=true multiValued=true / field name=LineasDe type=string indexed=true stored=true multiValued=true / field name=Linea type=string indexed=true stored=true multiValued=true / field
Re: Solr indexing process: keep a persistent Mysql connection throu all the indexing process
since i'm barely new to solr, can you please give some guidelines or provide an example i can look at for starters? i already tought about a singleton implementation, but i'm not sure where i have to put it and how should i start coding it -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-indexing-process-keep-a-persistent-Mysql-connection-throu-all-the-indexing-process-tp3278608p3283901.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Where the heck do you put maxAnalyzedChars?
Thanks. That seemed to do it. I was thrown by the section of documentation that said This parameter makes sense for Highlighter only and tried to put it in the various highlighter elements. On Wed, Aug 24, 2011 at 6:52 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: (11/08/25 5:29), Daniel Skiles wrote: I have a very large field in my index that I need to highlight. Where in the config file do I set the maxAnalyzedChars in order to make this work? Has anyone successfully done this? Placing it in your requestHandler should work. For example: requestHandler name=search class=solr.SearchHandler default=true lst name=defaults int name=hl.maxAnalyzedChars**1000/int /lst /requestHandler koji -- Check out Query Log Visualizer for Apache Solr http://www.rondhuit-demo.com/**loganalyzer/loganalyzer.htmlhttp://www.rondhuit-demo.com/loganalyzer/loganalyzer.html http://www.rondhuit.com/en/
Re: not equals query in solr
http://wiki.apache.org/solr/SolrQuerySyntax has answers for you. -Simon On Thu, Aug 25, 2011 at 1:04 AM, Ranveer Kumar ranveer.s...@gmail.comwrote: any help... On Wed, Aug 24, 2011 at 12:58 PM, Ranveer Kumar ranveer.s...@gmail.com wrote: Hi, is it right way to do : q=(state:[* TO *] AND city:[* TO *]) regards Ranveer On Wed, Aug 24, 2011 at 12:54 PM, Ranveer Kumar ranveer.s...@gmail.com wrote: Hi All, How to do negative query in solr. Following are the criteria : I have state and city field where I want to filter only those state and city which is not blank. something like: state NOT AND city NOT . I tried -state: but its not working. Or suggest me to do this in better way.. regards Ranveer
Re: csv responsewriter and numfound
I've added a case with patch here: https://issues.apache.org/jira/browse/SOLR-2731 On Wed, Aug 24, 2011 at 11:59 AM, Jon Hoffman j...@foursquare.com wrote: I took a look at the source and agree that it would be a bit hairy to bubble up header settings from the response writers. Alternatively, and I'll admit that this is a somewhat hacky proposal, an optional parameter csv.numfound=true could be added to the request which would cause the first line of the response to be the numfound. It would have no impact on existing behavior, and those who are interested in that value can simply read off the first line before sending to their usual csv parser. It's a trivial change to the code and I can create a JIRA ticket and submit the patch. This is my first interaction with this forum, so let me know if the dev list is a more appropriate place to propose changes. - Jon On Wed, Aug 24, 2011 at 10:47 AM, Erik Hatcher erik.hatc...@gmail.comwrote: Good idea. However response writers can't control HTTP response headers currently... Only the content type returned. Erik On Aug 24, 2011, at 8:52, Jon Hoffman j...@foursquare.com wrote: What about the HTTP response header? Great question. But how would that get returned in the response? It is a drag that the header is lost when results are written in CSV, but there really isn't an obvious spot for that information to be returned. I guess a comment would be one option. -Yonik http://www.lucidimagination.com
Paging over mutlivalued field results?
Hi, Is it possible to construct a query in Solr where the paged results are matching multivalued fields and not documents? thanks, Darren
Highlight on alternateField
Hi there, I am trying to utilize highlighting alternateField and can't get highlights on the results from targeted fields. Is this expected behavior or am I understanding alternateFields wrong? schema.xml: field name=description type=text indexed=true stored=false multiValued=true termVectors=true termPositions=true termOffsets=true/ field name=description_highlighting type=string indexed=true stored=true termVectors=true termPositions=true termOffsets=true/ copyField source=description dest=description_highlighting maxChars=2500/ solrconfig.xml: str name=f.description.hl.alternateFielddescription_highlighting/str str name=f.description.hl.alternateFieldLen100/str
RE: Text Analysis and copyField
It had crossed my mind but for now we have a 'DictionarySource' field whose type utilizes the KeepWordFilterFactory that uses a text file containing all correctly spelled words (thanks to scrabble), location/last/first names (courtesy of the US census bureau) and a few other adds (month/day) names. A file this large does not seem to have a material impact on indexing. What we're seeing now (we also have a field 'TermsMisspelled' that utilizes the same text file with StopFilterFactory) is almost pure misspellings and some contractions (can't, won't, don't, etc.). Thank you everyone for your help here, this is a truly fine community. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, August 24, 2011 1:00 PM To: solr-user@lucene.apache.org Subject: Re: Text Analysis and copyField Have you considered having two dictionaries and using ajax to query them both and intermingling the results in your suggestions? It'd be some work, but I think it might accomplish what you want. Best Erick On Tue, Aug 23, 2011 at 1:48 PM, Herman Kiefus herm...@angieslist.com wrote: To close, I found this article from Hoss: http://lucene.472066.n3.nabble.com/CopyField-into-another-CopyField-td 3122408.html Since I cannot use one copyField directive to copy from another copyField's dest[ination], I cannot achieve what I desire: some terms that are subject to KeepWordFilterFactory and some that are not. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Monday, August 22, 2011 1:16 PM To: solr-user@lucene.apache.org Subject: Re: Text Analysis and copyField I suspect that the things going into TermsDictionary are from fields other than CorrectlySpelledTerms. In other words I don't think that anything is getting into TermsDictionary from CorrectlySpelledTerms... Be careful to remove the index between schema changes, just to be sure that you're not seeing old data. Best Erick On Mon, Aug 22, 2011 at 11:41 AM, Herman Kiefus herm...@angieslist.com wrote: That's what I thought, but my experiments show differently. In actuality: I have a number of fields that are of type text (the default as it is packaged). I have a type 'textCorrectlySpelled' that utilizes KeepWordFilterFactory in index-time analysis, using a file of terms which are known to be correctly spelled. I have a type 'textDictionary' that has no index-time analysis. I have the fields: field name=CorrectlySpelledTerms type=textCorrectlySpelled indexed=false stored=false multiValued=true/ field name=TermsDictionary type=textDictionary indexed=true stored=false multiValued=true/ I want 'TermsDictionary' to contain only those terms from some fields that are correctly spelled plus those terms from a couple other fields (CompanyName and ContactName) as is. I use several copyField directives as follows: copyField source=Field1 dest=CorrectlySpelledTerms/ copyField source=Field2 dest=CorrectlySpelledTerms/ copyField source=Field3 dest=CorrectlySpelledTerms/ copyField source=Name dest=TermsDictionary/ copyField source=Contact dest=TermsDictionary/ copyField source =CorrectlySpelledTerms dest=TermsDictionary/ If I query 'Field1' for a term that I know is misspelled (electical) it yields results. If I query 'TermsDictionary' for the same term it yields no results. It would seem by these results that 'TermsDictionary' only contains those terms with misspellings stripped as a results of the text analysis on the field 'CorrectlySpelledTerms'. Asked another way, I think you can see what I'm getting at: a source for the spellchecker that only contains correct spelled terms plus proper names; should I have gone about this in a different way? -Original Message- From: Stephen Duncan Jr [mailto:stephen.dun...@gmail.com] Sent: Monday, August 22, 2011 9:30 AM To: solr-user@lucene.apache.org Subject: Re: Text Analysis and copyField On Mon, Aug 22, 2011 at 9:25 AM, Herman Kiefus herm...@angieslist.com wrote: Is my thinking correct? I have a field 'F1' of type 'T1' whose index time analysis employs the StopFilterFactory. I also have a field 'F2' of type 'T2' whose index time analysis does NOT employ the StopFilterFactory. There is a copyField directive source=F1 dest=F2 F2 will not contain any stop words because they were filtered out as F1 was populated. No, F2 will contain stop words. Copy fields does not process input through a chain, it sends the original content to each field and therefore analysis is totally independent. -- Stephen Duncan Jr www.stephenduncanjr.com
RE: Best way to anchor solr searches?
Thanks for the replies. I did look at caching but our commit time time is 90 seconds. It's definitely possible for someone to make a search, change the page, and have wonky results. How about getting it to autowarm the x most recent searches in the queryResultCache and that can hopefully reduce the issues? Though even that can result in issues with the search being out of date. Application cache per user would for sure solve such issues but I'd like to avoid this if possible. Definitely an interesting problem... -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-anchor-solr-searches-tp3282576p3284674.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Field type change / copy field
What version of Solr are you using? Because 3.2 (and I believe 3.1) and later have faceting and range on numeric values, so there would be no need to use two fields. And you could then avoid the format thing entirely. Best Erick On Wed, Aug 24, 2011 at 5:53 AM, Oliver Schihin oliver.schi...@unibas.ch wrote: Hello list My documents come with a field holding a date, always a year: year2008/year In the schema, this content is taken for a field year as an integer, and it will be searchable. Through a copyfield-instruction I move the year to a facet_year-field, you guess, to use it for faceting and make range queries possible. Its field type is of the class 'solr.TrieDateField' that requires canonical date representation. Is there a way in solr to extend the simple year to facet_year2008-01-01T00:00:00Z/facet_year. Or, do i have to solve the problem in preprocessing, before posting? Thanks Oliver
Re: Query vs Filter Query Usage
The pitfalls of filter queries is also their strength. The results will be cached and re-used if possible. This will take some memory, of course. Depending upon how big your index is, this could be quite a lot. Yet another time/space tradeoff But yeah, use filter queries until you have OOMs, then get more memory G... Best Erick On Wed, Aug 24, 2011 at 8:07 PM, Joshua Harness jkharnes...@gmail.com wrote: Shawn - Thanks for your reply. Given that my application is mainly used as faceted search, would the following types of queries make sense or are there other pitfalls to consider? *q=*:*fq=someField:someValuefq=anotherField:anotherValue* Thanks! Josh On Wed, Aug 24, 2011 at 4:48 PM, Shawn Heisey s...@elyograg.org wrote: On 8/24/2011 2:02 PM, Joshua Harness wrote: I've done some basic query performance testing on my SOLR instance, which allows users to search via a faceted search interface. As such, document relevancy is less important to me since I am performing exact match searching. Comparing using filter queries with a plain query has yielded remarkable performance. However, I'm suspicious of statements like 'always use filter queries since they are so much faster'. In my experience, things are never so straightforward. Can anybody provide any further guidance? What are the pitfalls of relying heavily on filter queries? When would one want to use plain vanilla SOLR queries as opposed to filter queries? Completely separate from any performance consideration, the key to their usage lies in their name: They are filters. They are particularly useful in a faceted situation, because you can have more than one of them, and the overall result is the intersection (AND) of them all. When someone tells the interface to restrict their search by a facet, you can simply add a filter query with the field:value relating to that facet and reissue the query. If they decide to remove that restriction, you just have to remove the filter query. You don't have to try and combine the various pieces in the query, which means you'll have much less hassle with parentheses. If you need a union (OR) operation with your filters, you'll have to use more complex construction within a single filter query, or not use them at all. Thanks, Shawn
Re: Newbie Question, can I store structured sub elements?
nope, it's not easy. Solr docs are flat, flat, flat with the tiny exception that multiValued fields are returned as a lists. However, you can count on multi-valued fields being returned in the order they were added, so it might work out for you to treat these as parallel arrays in Solr documents. Best Erick On Thu, Aug 25, 2011 at 3:10 AM, Zac Tolley z...@thetolleys.com wrote: I know I can have multi value on them but that doesn't let me see that a showing instance happens at a particular time on a particular channel, just that it shows on a range of channels at a range of times Starting to think I will have to either store a formatted string that combines them or keep it flat just for indexing, retrieve ids and use them to get data out of the RDBMS On 24 Aug 2011, at 23:09, dan whelan d...@adicio.com wrote: You could change starttime and channelname to multiValued=true and use these fields to store all the values for those fields. showing.movie_id and showing.id probably isn't needed in a solr record. On 8/24/11 7:53 AM, Zac Tolley wrote: I have a very scenario in which I have a film and showings, each film has multiple showings at set times on set channels, so I have: Movie - id title description duration Showing - id movie_id starttime channelname I want to know can I store this in solr so that I keep this stucture? I did try to do an initial import with the DIH using this config: entity name=movie query=SELECT * from movies field column=ID name=id/ field column=TITLE name=title/ field name=DESCRIPTION name=description/ entity name=showing query=SELECT * from showings WHERE movie_id = ${ movie.id} field column=ID name=id/ field column=STARTTIME name=starttime/ field column=CHANNELNAME name=channelname/ /entity /entity I was hoping, for each movie to get a sub entity with the showing like: doc str name=title./str showing str name=channelname. but instead all the fields are flattened down to the top level. I know this must be easy, what am I missing... ?
RE: Solr in a windows shared hosting environment
Thank you! Since it's shared hosting, how do I install java? -Original Message- From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] Sent: Thursday, August 25, 2011 4:34 PM To: solr-user@lucene.apache.org Subject: RE: Solr in a windows shared hosting environment Yes, but since Solr is written in Java to run in a JEE container, you would host Solr in a web application server, either Jetty (which comes packaged), or something else (say, Tomcat or WebSphere or something like that). As a result, you aren't going to find anything that says how to run Solr under IIS because it doesn't run under IIS. It doesn't need IIS, though it can certainly coexist alongside IIS. If you want the requests to go thru IIS you might need a plug-in in IIS to handle that (IBM's WebSphere has such a plugin). If you don't need the requests to go thru IIS, then that isn't an issue. Hope that helps. JRJ -Original Message- From: Devora [mailto:devora...@gmail.com] Sent: Thursday, August 25, 2011 5:15 AM To: solr-user@lucene.apache.org Subject: Solr in a windows shared hosting environment Hi, Is it possible to install Solr in a windows (IIS 7 or IIS 7.5) shared hosting environment? If yes, where can I find instructions how to do that? Thank you!
Re: Preserve XML hierarchy
Jars aren't where it's at. You apply patches to *source* code, then compile. Here's a good place to start understanding this process: http://wiki.apache.org/solr/HowToContribute See getting the code and working with patches I *strongly* advise you to get the code and compile it and run it first before applying the patch, just to eliminate an extra variable... Best Erick On Thu, Aug 25, 2011 at 3:56 AM, _snake_ lucas.mig...@gmail.com wrote: Hi Michael, Thanks for your help! I am using Apache Solr 3.2 on windows. I am trying to apply the 2 patches ( https://issues.apache.org/jira/browse/SOLR-2597?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#issue-tabs XMLCharFilter Patch ), but I have no idea to do that. What do I need to open the Solr project? Or Which is the .jar file that I need to open? I saw that there is 2 files (woodstox and stax2), what I have to do with that files? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Preserve-XML-hierarchy-tp3166690p3283275.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Newbie Question, can I store structured sub elements?
have come to that conclusion so had to choose between multiple fields with multiple vales or a field with delimited text, gone for the former On Thu, Aug 25, 2011 at 7:58 PM, Erick Erickson erickerick...@gmail.comwrote: nope, it's not easy. Solr docs are flat, flat, flat with the tiny exception that multiValued fields are returned as a lists. However, you can count on multi-valued fields being returned in the order they were added, so it might work out for you to treat these as parallel arrays in Solr documents. Best Erick On Thu, Aug 25, 2011 at 3:10 AM, Zac Tolley z...@thetolleys.com wrote: I know I can have multi value on them but that doesn't let me see that a showing instance happens at a particular time on a particular channel, just that it shows on a range of channels at a range of times Starting to think I will have to either store a formatted string that combines them or keep it flat just for indexing, retrieve ids and use them to get data out of the RDBMS On 24 Aug 2011, at 23:09, dan whelan d...@adicio.com wrote: You could change starttime and channelname to multiValued=true and use these fields to store all the values for those fields. showing.movie_id and showing.id probably isn't needed in a solr record. On 8/24/11 7:53 AM, Zac Tolley wrote: I have a very scenario in which I have a film and showings, each film has multiple showings at set times on set channels, so I have: Movie - id title description duration Showing - id movie_id starttime channelname I want to know can I store this in solr so that I keep this stucture? I did try to do an initial import with the DIH using this config: entity name=movie query=SELECT * from movies field column=ID name=id/ field column=TITLE name=title/ field name=DESCRIPTION name=description/ entity name=showing query=SELECT * from showings WHERE movie_id = ${ movie.id} field column=ID name=id/ field column=STARTTIME name=starttime/ field column=CHANNELNAME name=channelname/ /entity /entity I was hoping, for each movie to get a sub entity with the showing like: doc str name=title./str showing str name=channelname. but instead all the fields are flattened down to the top level. I know this must be easy, what am I missing... ?
Re: Newbie Question, can I store structured sub elements?
Delimited text is the baby form of lists. Text can be made very very structured (think XML, ontologies...). I think the crux is your search needs. For example, with Lucene, I made a search for formulæ (including sub-terms) by converting the OpenMath-encoded terms into rows of tokens and querying with SpanQueries. Quite structured to my taste. What you don't have is the freedom of joins which brings a very flexible query mechanism almost independent of the schema... but this often can be circumvented by the flat solr and lucene storage whose performance is really amazing. paul Le 25 août 2011 à 21:07, Zac Tolley a écrit : have come to that conclusion so had to choose between multiple fields with multiple vales or a field with delimited text, gone for the former On Thu, Aug 25, 2011 at 7:58 PM, Erick Erickson erickerick...@gmail.comwrote: nope, it's not easy. Solr docs are flat, flat, flat with the tiny exception that multiValued fields are returned as a lists. However, you can count on multi-valued fields being returned in the order they were added, so it might work out for you to treat these as parallel arrays in Solr documents. Best Erick On Thu, Aug 25, 2011 at 3:10 AM, Zac Tolley z...@thetolleys.com wrote: I know I can have multi value on them but that doesn't let me see that a showing instance happens at a particular time on a particular channel, just that it shows on a range of channels at a range of times Starting to think I will have to either store a formatted string that combines them or keep it flat just for indexing, retrieve ids and use them to get data out of the RDBMS On 24 Aug 2011, at 23:09, dan whelan d...@adicio.com wrote: You could change starttime and channelname to multiValued=true and use these fields to store all the values for those fields. showing.movie_id and showing.id probably isn't needed in a solr record. On 8/24/11 7:53 AM, Zac Tolley wrote: I have a very scenario in which I have a film and showings, each film has multiple showings at set times on set channels, so I have: Movie - id title description duration Showing - id movie_id starttime channelname I want to know can I store this in solr so that I keep this stucture? I did try to do an initial import with the DIH using this config: entity name=movie query=SELECT * from movies field column=ID name=id/ field column=TITLE name=title/ field name=DESCRIPTION name=description/ entity name=showing query=SELECT * from showings WHERE movie_id = ${ movie.id} field column=ID name=id/ field column=STARTTIME name=starttime/ field column=CHANNELNAME name=channelname/ /entity /entity I was hoping, for each movie to get a sub entity with the showing like: doc str name=title./str showing str name=channelname. but instead all the fields are flattened down to the top level. I know this must be easy, what am I missing... ?
Re: Solr in a windows shared hosting environment
That's not a question we can answer in this group - you need to take it up with your hosting provider - they may already have it available. On Thu, Aug 25, 2011 at 2:59 PM, Devora devora...@gmail.com wrote: Thank you! Since it's shared hosting, how do I install java? -Original Message- From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] Sent: Thursday, August 25, 2011 4:34 PM To: solr-user@lucene.apache.org Subject: RE: Solr in a windows shared hosting environment Yes, but since Solr is written in Java to run in a JEE container, you would host Solr in a web application server, either Jetty (which comes packaged), or something else (say, Tomcat or WebSphere or something like that). As a result, you aren't going to find anything that says how to run Solr under IIS because it doesn't run under IIS. It doesn't need IIS, though it can certainly coexist alongside IIS. If you want the requests to go thru IIS you might need a plug-in in IIS to handle that (IBM's WebSphere has such a plugin). If you don't need the requests to go thru IIS, then that isn't an issue. Hope that helps. JRJ -Original Message- From: Devora [mailto:devora...@gmail.com] Sent: Thursday, August 25, 2011 5:15 AM To: solr-user@lucene.apache.org Subject: Solr in a windows shared hosting environment Hi, Is it possible to install Solr in a windows (IIS 7 or IIS 7.5) shared hosting environment? If yes, where can I find instructions how to do that? Thank you!
Re: Newbie Question, can I store structured sub elements?
My search is very simple, mainly on titles, actors, show times and channels. Having multiple lists of values is probably better for that, and as the order is kept the same its relatively simple to map the response back onto pojos for my presentation layer. On Thu, Aug 25, 2011 at 8:18 PM, Paul Libbrecht p...@hoplahup.net wrote: Delimited text is the baby form of lists. Text can be made very very structured (think XML, ontologies...). I think the crux is your search needs. For example, with Lucene, I made a search for formulæ (including sub-terms) by converting the OpenMath-encoded terms into rows of tokens and querying with SpanQueries. Quite structured to my taste. What you don't have is the freedom of joins which brings a very flexible query mechanism almost independent of the schema... but this often can be circumvented by the flat solr and lucene storage whose performance is really amazing. paul Le 25 août 2011 à 21:07, Zac Tolley a écrit : have come to that conclusion so had to choose between multiple fields with multiple vales or a field with delimited text, gone for the former On Thu, Aug 25, 2011 at 7:58 PM, Erick Erickson erickerick...@gmail.com wrote: nope, it's not easy. Solr docs are flat, flat, flat with the tiny exception that multiValued fields are returned as a lists. However, you can count on multi-valued fields being returned in the order they were added, so it might work out for you to treat these as parallel arrays in Solr documents. Best Erick On Thu, Aug 25, 2011 at 3:10 AM, Zac Tolley z...@thetolleys.com wrote: I know I can have multi value on them but that doesn't let me see that a showing instance happens at a particular time on a particular channel, just that it shows on a range of channels at a range of times Starting to think I will have to either store a formatted string that combines them or keep it flat just for indexing, retrieve ids and use them to get data out of the RDBMS On 24 Aug 2011, at 23:09, dan whelan d...@adicio.com wrote: You could change starttime and channelname to multiValued=true and use these fields to store all the values for those fields. showing.movie_id and showing.id probably isn't needed in a solr record. On 8/24/11 7:53 AM, Zac Tolley wrote: I have a very scenario in which I have a film and showings, each film has multiple showings at set times on set channels, so I have: Movie - id title description duration Showing - id movie_id starttime channelname I want to know can I store this in solr so that I keep this stucture? I did try to do an initial import with the DIH using this config: entity name=movie query=SELECT * from movies field column=ID name=id/ field column=TITLE name=title/ field name=DESCRIPTION name=description/ entity name=showing query=SELECT * from showings WHERE movie_id = ${ movie.id} field column=ID name=id/ field column=STARTTIME name=starttime/ field column=CHANNELNAME name=channelname/ /entity /entity I was hoping, for each movie to get a sub entity with the showing like: doc str name=title./str showing str name=channelname. but instead all the fields are flattened down to the top level. I know this must be easy, what am I missing... ?
Re: Query vs Filter Query Usage
Erick - Thanks for the insight. Does the filter cache just cache the internal document id's of the result set, correct (as opposed to the document)? If so, am I correct in the following math: 10,000,000 document index Internal Document id is 32 bit unsigned int Max Memory Used by a single cache slot in the filter cache = 32 bits x 10,000,000 docs = 320,000,000 bits or 38 MB Of course, I realize there some additional overhead if we're dealing with Integer objects as opposed to primitives -- and I'm way off if the internal document id is implemented as a long. Also, does SOLR fail gracefully when an OOM occurs (e.g. the cache fails but the query still succeeds)? Thanks! Josh On Thu, Aug 25, 2011 at 2:55 PM, Erick Erickson erickerick...@gmail.comwrote: The pitfalls of filter queries is also their strength. The results will be cached and re-used if possible. This will take some memory, of course. Depending upon how big your index is, this could be quite a lot. Yet another time/space tradeoff But yeah, use filter queries until you have OOMs, then get more memory G... Best Erick On Wed, Aug 24, 2011 at 8:07 PM, Joshua Harness jkharnes...@gmail.com wrote: Shawn - Thanks for your reply. Given that my application is mainly used as faceted search, would the following types of queries make sense or are there other pitfalls to consider? *q=*:*fq=someField:someValuefq=anotherField:anotherValue* Thanks! Josh On Wed, Aug 24, 2011 at 4:48 PM, Shawn Heisey s...@elyograg.org wrote: On 8/24/2011 2:02 PM, Joshua Harness wrote: I've done some basic query performance testing on my SOLR instance, which allows users to search via a faceted search interface. As such, document relevancy is less important to me since I am performing exact match searching. Comparing using filter queries with a plain query has yielded remarkable performance. However, I'm suspicious of statements like 'always use filter queries since they are so much faster'. In my experience, things are never so straightforward. Can anybody provide any further guidance? What are the pitfalls of relying heavily on filter queries? When would one want to use plain vanilla SOLR queries as opposed to filter queries? Completely separate from any performance consideration, the key to their usage lies in their name: They are filters. They are particularly useful in a faceted situation, because you can have more than one of them, and the overall result is the intersection (AND) of them all. When someone tells the interface to restrict their search by a facet, you can simply add a filter query with the field:value relating to that facet and reissue the query. If they decide to remove that restriction, you just have to remove the filter query. You don't have to try and combine the various pieces in the query, which means you'll have much less hassle with parentheses. If you need a union (OR) operation with your filters, you'll have to use more complex construction within a single filter query, or not use them at all. Thanks, Shawn
Re: Newbie Question, can I store structured sub elements?
Whether multi-valued or token-streams, the question is search, not (de)serialization: that's opaque to Solr which will take and give it to you as needed. paul Le 25 août 2011 à 21:24, Zac Tolley a écrit : My search is very simple, mainly on titles, actors, show times and channels. Having multiple lists of values is probably better for that, and as the order is kept the same its relatively simple to map the response back onto pojos for my presentation layer.
Solr Implementations
First, I would like to apologize if this is a repeat question but can't seem to get the right answer anywhere. - What happens to pending documents when the server dies abruptly? I understand that when the server shuts down gracefully, it will commit the pending documents and close the IndexWriter. For the case where the server just crashes, I am assuming that the pending documents are lost but would it also corrupt the index files? If so, when the server comes back online what is the state? I would think that a full re-indexing is in order. - What are the dangers of having n-number of ReadOnly Solr instances pointing to the same data directory? (Shared by a SAN)? Will there be issues with locking? This is a scenario with replication. The Read-Only instances are pointing to the same data directory on a SAN. Thank you very much. Z
RE: Solr in a windows shared hosting environment
You visit the Sun (oops, I mean Oracle -- old habits die hard) web site, download it and install it, or, being shared, you ask your provider to do it for you. They might decline, of course, in which case you host another web server using a hosting provide who does support java (or, at least another machine, if they don't want to have a Java app. container alongside IIS.) http://java.com/en/download/index.jsp -Original Message- From: Devora [mailto:devora...@gmail.com] Sent: Thursday, August 25, 2011 2:00 PM To: solr-user@lucene.apache.org Subject: RE: Solr in a windows shared hosting environment Thank you! Since it's shared hosting, how do I install java? -Original Message- From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] Sent: Thursday, August 25, 2011 4:34 PM To: solr-user@lucene.apache.org Subject: RE: Solr in a windows shared hosting environment Yes, but since Solr is written in Java to run in a JEE container, you would host Solr in a web application server, either Jetty (which comes packaged), or something else (say, Tomcat or WebSphere or something like that). As a result, you aren't going to find anything that says how to run Solr under IIS because it doesn't run under IIS. It doesn't need IIS, though it can certainly coexist alongside IIS. If you want the requests to go thru IIS you might need a plug-in in IIS to handle that (IBM's WebSphere has such a plugin). If you don't need the requests to go thru IIS, then that isn't an issue. Hope that helps. JRJ -Original Message- From: Devora [mailto:devora...@gmail.com] Sent: Thursday, August 25, 2011 5:15 AM To: solr-user@lucene.apache.org Subject: Solr in a windows shared hosting environment Hi, Is it possible to install Solr in a windows (IIS 7 or IIS 7.5) shared hosting environment? If yes, where can I find instructions how to do that? Thank you!
RE: Query vs Filter Query Usage
10,000,000 document index Internal Document id is 32 bit unsigned int Max Memory Used by a single cache slot in the filter cache = 32 bits x 10,000,000 docs = 320,000,000 bits or 38 MB I think it depends on where exactly the result set was generated. I believe the result set will usually be represented by a BitDocSet, which requires 1 bit per doc in your index (result set size doesn't matter), so in your case it would be about 1.2MB. -Michael
Re: Query vs Filter Query Usage
On Thu, Aug 25, 2011 at 5:19 PM, Michael Ryan mr...@moreover.com wrote: 10,000,000 document index Internal Document id is 32 bit unsigned int Max Memory Used by a single cache slot in the filter cache = 32 bits x 10,000,000 docs = 320,000,000 bits or 38 MB I think it depends on where exactly the result set was generated. I believe the result set will usually be represented by a BitDocSet, which requires 1 bit per doc in your index (result set size doesn't matter), so in your case it would be about 1.2MB. Right - and Solr switches between the implementation depending on set size... so if the number of documents in the set were 100, then it would only take up 400 bytes. -Yonik http://www.lucidimagination.com
solr UIMA exception
Hi, I have followed this solr UIMA config, using AlchemyAPIAnnotator and OpenCalaisAnnotator. https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/contrib/uima/README.txt http://wiki.apache.org/solr/SolrUIMA so, I got the AlchemyAPI key and OpenCalais key. and I can successfully hit http://access.alchemyapi.com/calls/url/URLGetRankedNamedEntities?apikey=my_alchemy_keyurl=www.cnn.com but somehow, i got the following exception when I run *java -jar post.jar entity.xml* - this is entity.xml add doc field name=id12345/field field name=textSenator Dick Durbin (D-IL) Chicago , March 3,2007./field field name=titleEntity Extraction/field /doc /add - any suggestion, really appreciated..please.. Aug 25, 2011 5:11:50 PM WhitespaceTokenizer process INFO: Whitespace tokenizer starts processing Aug 25, 2011 5:11:50 PM WhitespaceTokenizer process INFO: Whitespace tokenizer finished processing Aug 25, 2011 5:11:54 PM *org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl callAnalysisComponentProcess(405)* SEVERE: Exception occurred org.apache.uima.analysis_engine.AnalysisEngineProcessException at org.apache.uima.alchemy.annotator.AbstractAlchemyAnnotator.process(AbstractAlchemyAnnotator.java:138) at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:295) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:567) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.init(ASB_impl.java:409) at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:342) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267) at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267) at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:280) at org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processText(UIMAUpdateRequestProcessor.java:151) at org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd(UIMAUpdateRequestProcessor.java:77) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: org.apache.uima.alchemy.digester.exception.ResultDigestingException: org.apache.uima.alchemy.annotator.exception.AlchemyCallFailedException: ERROR at org.apache.uima.alchemy.annotator.AbstractAlchemyAnnotator.process(AbstractAlchemyAnnotator.java:133) ... 35 more Caused by:
Re: Query vs Filter Query Usage
The point of filter queries is that they are applied very early in the searching algorithm, and thus cut the amount of work later on. Some complex queries take a lot of time and so this pre-trimming helps a lot. On Thu, Aug 25, 2011 at 2:37 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Thu, Aug 25, 2011 at 5:19 PM, Michael Ryan mr...@moreover.com wrote: 10,000,000 document index Internal Document id is 32 bit unsigned int Max Memory Used by a single cache slot in the filter cache = 32 bits x 10,000,000 docs = 320,000,000 bits or 38 MB I think it depends on where exactly the result set was generated. I believe the result set will usually be represented by a BitDocSet, which requires 1 bit per doc in your index (result set size doesn't matter), so in your case it would be about 1.2MB. Right - and Solr switches between the implementation depending on set size... so if the number of documents in the set were 100, then it would only take up 400 bytes. -Yonik http://www.lucidimagination.com -- Lance Norskog goks...@gmail.com
Is it possible to do a partial update of a doc's fields in the index?
New to Solr and Lucene. We're indexing text, pdf, html docs located on local Unix file systems, and need the ability to search for file owner, group, and other Linux file metadata, in addition to the file contents. It would be great if we could use nutch to index everything, and then crawl through the file system again with a 10 line shell script that passed the missing metadata to solr, and updated the existing docs. But adddoc deletes all the old fields even if they're not present in the new document. If partial updates aren't possible, what would be the best way to accomplish what we need? Do we want to to modify the source code for each of the different doc format parsers to add support for this metadata? Thanks,
Re: Solr indexing process: keep a persistent Mysql connection throu all the indexing process
Put it anywhere you want G. Here's a good place to start: http://www.javapractices.com/topic/TopicAction.do?Id=46 where the distributePresents method is the one you have that returns the connection. Here's a sample class that doesn't do much... public enum MyEnum { INSTANCE; private String _tester = ; public void doStuff(String stuff) { if (_tester.length() == 0) { _tester = stuff; System.out.println(In initialization); } System.out.println(Tester is + _tester + Stuff is + stuff); } } you can imagine doStuff as getConnection with logic to initialize the connection where the string _tester is defined. Now you can call it from anywhere the enum is available like this: public class MyMain { public static void main(String[] args) { MyEnum.INSTANCE.doStuff(first time); MyEnum.INSTANCE.doStuff(second time); MyEnum.INSTANCE.doStuff(third time); } } Best Erick On Thu, Aug 25, 2011 at 9:43 AM, samuele.mattiuzzo samum...@gmail.com wrote: since i'm barely new to solr, can you please give some guidelines or provide an example i can look at for starters? i already tought about a singleton implementation, but i'm not sure where i have to put it and how should i start coding it -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-indexing-process-keep-a-persistent-Mysql-connection-throu-all-the-indexing-process-tp3278608p3283901.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Paging over mutlivalued field results?
Hmm, I don't quite understand what you want. An example or two would help. Best Erick On Thu, Aug 25, 2011 at 12:11 PM, Darren Govoni dar...@ontrenet.com wrote: Hi, Is it possible to construct a query in Solr where the paged results are matching multivalued fields and not documents? thanks, Darren
Re: Highlight on alternateField
(11/08/26 2:32), Val Minyaylo wrote: Hi there, I am trying to utilize highlighting alternateField and can't get highlights on the results from targeted fields. Is this expected behavior or am I understanding alternateFields wrong? Yes, it is expected behavior. solrconfig.xml: str name=f.description.hl.alternateFielddescription_highlighting/str str name=f.description.hl.alternateFieldLen100/str With hl=onhl.fl=description parameters, you can get the first 100 chars of the (raw) stored description_highlighting field value if highlighter cannot generate snippets on description field for some reason. koji -- Check out Query Log Visualizer for Apache Solr http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html http://www.rondhuit.com/en/
Re: Query parameter changes from solr 1.4 to 3.3
On Tue, Aug 23, 2011 at 7:11 AM, Samarendra Pratap samarz...@gmail.com wrote: We are upgrading solr 1.4 (with collapsing patch solr-236) to solr 3.3. I was looking for the required changes in query parameters (or parameter names) if any. There should be very few (but check CHANGES.txt as Erick pointed out). We try to keep the main HTTP APIs very stable, even across major versions. One thing I know for sure is that collapse and its sub-options are now known by group, but didn't find anything else. Field collapsing/grouping was never in any 1.4 release. -Yonik http://www.lucidimagination.com
RE: how to deal with URLDatasource which needs authorization?
Well, let me explain in details about the problem... I have a website www.blablabla.com on which users can have profiles, with any kind of information. And each user has an id, something like user_xyz. So www.blablabla.com/user_xyz shows user profile, and www.blablabla.com/solr/index/user_xyz shows an xml file, holding all of the static information about the user. Solr uses www.blablabla.com/solr/index/user_xyz to index the data. Currently www.blablabla.com/solr/index/user_xyz is accessible by everyone, both users and non-users of the site... I would like to put some kind of secuirty thing which only allows solr to access www.blablabla.com/solr/index/user_xyz, and preventing both users and non users to access it. So that link will be a 'solr only' link. is there any other options than restricting ip address for access this link? or that is the only option? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-deal-with-URLDatasource-which-needs-authorization-tp3280515p3285579.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is it possible to do a partial update of a doc's fields in the index?
Got an answer from the excellent support folks at LucidWorks: Currently Lucene/Solr can't do field-level updating. So whenever a new document is indexed, if it has the same unique identifier (in this case id) field, then the new document replaces the older document. There is an open JIRA issue in Lucene to add field-level updates, but it is a very major, long-term task. Thanks, On Thu, Aug 25, 2011 at 6:45 PM, Goran Pocina gpoc...@gmail.com wrote: New to Solr and Lucene. We're indexing text, pdf, html docs located on local Unix file systems, and need the ability to search for file owner, group, and other Linux file metadata, in addition to the file contents. It would be great if we could use nutch to index everything, and then crawl through the file system again with a 10 line shell script that passed the missing metadata to solr, and updated the existing docs. But adddoc deletes all the old fields even if they're not present in the new document. If partial updates aren't possible, what would be the best way to accomplish what we need? Do we want to to modify the source code for each of the different doc format parsers to add support for this metadata? Thanks,
Re: Paging over mutlivalued field results?
Hi Erick, Sure thing. I have a document schema where I put the sentences of that document in a multivalued field sentences. I search that field in a query but get back the document results, naturally. I then need to further find which exact sentences matched the query (for each document result) and then do my own paging since I am only returning pages of sentences and not the whole document. (i.e. I don't want to page the document results). Does this make sense? Or is there a better way Solr can accomodate this? Much appreciated. Darren On 08/25/2011 07:24 PM, Erick Erickson wrote: Hmm, I don't quite understand what you want. An example or two would help. Best Erick On Thu, Aug 25, 2011 at 12:11 PM, Darren Govonidar...@ontrenet.com wrote: Hi, Is it possible to construct a query in Solr where the paged results are matching multivalued fields and not documents? thanks, Darren
missing field in schema browser on solr admin
hi all... i have added a new field to index... but now when i check solr admin, i see some interesting stuff... i can see the field in schema and also db config file but there is nothing about the field in schema browser... in addition i cant make a search in that field... all of the config files seem correct but still no change... any ideas or anyone who has ever had a similar problem? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/missing-field-in-schema-browser-on-solr-admin-tp3285739p3285739.html Sent from the Solr - User mailing list archive at Nabble.com.
making a Solr search that returns documents where every field in the document is matched
Hi everybody, I've just started using solr. Not sure if this is too specific of a problem, but here goes. My situation is I have a semi long query and then im searching on very short documents. The basic issue is I want this query to return documents where every word/token in the document is matched. I also have a synonyms file. The way i've been going about this is using dismaxparserplugin's minimum match and indexing each document's length, and then do max_tokens_document # of queries, where each query is mm=x, doc_length=x, q=query_string. Example: dog, dogs puppy, and canine are synonyms query=dog dog cat love puppy canine water not doc1=cat love dog matches doc2=cat hate water doesn't match doc3=hot dog contests rule matches, but i don't want it to. doc4 happens because dog, dog, puppy, canine all match, so that's four matches, and doc3's token_length is also four. Four words in my query are matching to the same word in the doc, and each of them counts toward the minimum match count. Thanks, Henry
Re: SolrServer instances
Deal all please help I am stuck here as I have not much experience.. thanks On Thu, Aug 25, 2011 at 6:51 PM, Jonty Rhods jonty.rh...@gmail.com wrote: Hi All, I am using SolrJ (3.1) and Tomcat 6.x. I want to open solr server once (20 concurrence) and reuse this across all the site. Or something like connection pool like we are using for DB (ie Apache DBCP). There is a way to use static method which is a way but I want better solution from you people. I read one threade where Ahmet suggest to use something like that String serverPath = http://localhost:8983/solr;; HttpClient client = new HttpClient(new MultiThreadedHttpConnectionManager()); URL url = new URL(serverPath); CommonsHttpSolrServer solrServer = new CommonsHttpSolrServer(url, client); But how to use instance of this across all class. Please suggest. regards Jonty
FileDataSource baseDir to be solr.data.dir
hi we have multiple environments where we run solr and use DIH to index promotions.xml file. here is snippet from dih config file entity name=f processor=FileListEntityProcessor baseDir=/sites/ fileName=promotions.xml how do i set base dir to be solr.data.dir? on each server solr.data.dir is different. I use multi core solr instance -- View this message in context: http://lucene.472066.n3.nabble.com/FileDataSource-baseDir-to-be-solr-data-dir-tp3285872p3285872.html Sent from the Solr - User mailing list archive at Nabble.com.