Re: How to post in-memory[not residing on local disks] Xml files to Solr server for indexing?
As far as I know, Maven is a build/mgmt tool for java projects quite similar to Ant, right? No I'm not using this , then I think I don't need to worry about those pom files. But I'm still not able to figure out the error with classpath/jar files I mentioned in my previous mails. Shall I try getting those jar files, specifically that solr-solrj jar that contains commons-http-solr-server class files? If yes then can you tell me where to get those jar files from, on the web? Has anyone ever faced similar problems? Please help me fixing these silly issues? Thanks, Ahmed. On Mon, Apr 27, 2009 at 6:59 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Mon, Apr 27, 2009 at 6:27 PM, ahmed baseet ahmed.bas...@gmail.com wrote: Can anyone help me selecting the proper pom.xml file out of the bunch of *-pom.xml.templates available. Ahmed, are you using Maven? If not, then you do not need these pom files. If you are using Maven, then you need to add a dependency to solrj. http://wiki.apache.org/solr/Solrj#head-674dd7743df665fdd56e8eccddce16fc2de20e6e -- Regards, Shalin Shekhar Mangar.
Re: How to post in-memory[not residing on local disks] Xml files to Solr server for indexing?
the Solr distro contains all the jar files. you can take either the latest release (1.3) or a nightly On Tue, Apr 28, 2009 at 11:34 AM, ahmed baseet ahmed.bas...@gmail.com wrote: As far as I know, Maven is a build/mgmt tool for java projects quite similar to Ant, right? No I'm not using this , then I think I don't need to worry about those pom files. But I'm still not able to figure out the error with classpath/jar files I mentioned in my previous mails. Shall I try getting those jar files, specifically that solr-solrj jar that contains commons-http-solr-server class files? If yes then can you tell me where to get those jar files from, on the web? Has anyone ever faced similar problems? Please help me fixing these silly issues? Thanks, Ahmed. On Mon, Apr 27, 2009 at 6:59 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Mon, Apr 27, 2009 at 6:27 PM, ahmed baseet ahmed.bas...@gmail.com wrote: Can anyone help me selecting the proper pom.xml file out of the bunch of *-pom.xml.templates available. Ahmed, are you using Maven? If not, then you do not need these pom files. If you are using Maven, then you need to add a dependency to solrj. http://wiki.apache.org/solr/Solrj#head-674dd7743df665fdd56e8eccddce16fc2de20e6e -- Regards, Shalin Shekhar Mangar. -- --Noble Paul
Re: half width katakana
If you use CharFilter, you should use CharStream aware Tokenizer to correct terms offsets. There are two CharStreamAware*Tokenizer in trunk/Solr 1.4. Probably you want to use CharStreamAwareCJKTokenizer(Factory). Koji Ashish P wrote: After this should I be using same cjkAnalyzer or use charFilter?? Thanks, Ashish Koji Sekiguchi-2 wrote: Ashish P wrote: I want to convert half width katakana to full width katakana. I tried using cjk analyzer but not working. Does cjkAnalyzer do it or is there any other way?? CharFilter which comes with trunk/Solr 1.4 just covers this type of problem. If you are using Solr 1.3, try the patch attached below: https://issues.apache.org/jira/browse/SOLR-822 Koji
Re: highlighting html content
Hi Matt, On Tue, Apr 28, 2009 at 4:24 AM, Matt Mitchell goodie...@gmail.com wrote: I've been toying with setting custom pre/post delimiters and then removing them in the client, but I thought I'd ask the list before I go to far with that idea :) this is what I do. I define the custom highlight delimiters as [solr:hl] and [/solr:hl], and then do a string replace with em class=highlight /em on the search results. It is simple to implement, and effective. Best regards - Christian
Getting incorrect value while trying to extract content from xlsx
HI, I was trying to extract content from an xlsx file for indexing. However, I am getting julian date value for a cell with date format and '1.0' in place of '100%'. I want to retain the value as present in that xlsx file. Solution appreciated. Thanks, Koushik CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS***
Re: half width katakana
Koji san, Using CharStreamAwareCJKTokenizerFactory is giving me following error, SEVERE: java.lang.ClassCastException: java.io.StringReader cannot be cast to org.apache.solr.analysis.CharStream May be you are typecasting Reader to subclass. Thanks, Ashish Koji Sekiguchi-2 wrote: If you use CharFilter, you should use CharStream aware Tokenizer to correct terms offsets. There are two CharStreamAware*Tokenizer in trunk/Solr 1.4. Probably you want to use CharStreamAwareCJKTokenizer(Factory). Koji Ashish P wrote: After this should I be using same cjkAnalyzer or use charFilter?? Thanks, Ashish Koji Sekiguchi-2 wrote: Ashish P wrote: I want to convert half width katakana to full width katakana. I tried using cjk analyzer but not working. Does cjkAnalyzer do it or is there any other way?? CharFilter which comes with trunk/Solr 1.4 just covers this type of problem. If you are using Solr 1.3, try the patch attached below: https://issues.apache.org/jira/browse/SOLR-822 Koji -- View this message in context: http://www.nabble.com/half-width-katakana-tp23270186p23272475.html Sent from the Solr - User mailing list archive at Nabble.com.
Multiple Facet Dates
Hey there, I needed to have a multiple date facet functionality. Like say for example to show the latests results in the last day, last week and last month. I wanted to do it with just one query. The date facet part of solrconfig.xml would look like: str name=facet.datedate_field/str str name=facet.date.startNOW/DAY-1DAY/str str name=facet.date.startNOW/DAY-7DAY/str str name=facet.date.startNOW/DAY-30DAY/str str name=facet.date.endNOW/DAY+1DAY/str str name=facet.date.endNOW/DAY+1DAY/str str name=facet.date.endNOW/DAY+1DAY/str str name=facet.date.gap+2DAY/str str name=facet.date.gap+8DAY/str str name=facet.date.gap+31DAY/str What I have done to have it working is do some changes at getFacetdateCounts() in SimpleFacets.java Instead of getting start,end and gap params as String I get them as array of strings. I would have 3 array. In the first position of each would have the first start, the first ends and the firs gap. Same for the second and thirth( in my example ) Once I have them I do exactly what the function did before but for every position of the array. The resultant output looks like this: lst name=facet_dates lst name=date_field int name=2009-04-27T00:00:00Z21/int str name=gap+2DAY/str date name=end2009-04-29T00:00:00Z/date int name=2009-04-21T00:00:00Z86/int str name=gap+8DAY/str date name=end2009-04-29T00:00:00Z/date int name=2009-03-29T00:00:00Z316/int str name=gap+31DAY/str date name=end2009-04-29T00:00:00Z/date /lst /lst I am doing it just for testing. This works for me but maybe would be confusing to parse the output for other examples (let's say when you need to repeat the gap to cover all the range). Does someone think that could be good to have this functionality? In case yes I could post what I have and do it in the right way it in case someone points me in the right direction. Thanks in advance -- View this message in context: http://www.nabble.com/Multiple-Facet-Dates-tp23272868p23272868.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: half width katakana
The exception is expected if you use CharStream aware Tokenizer without CharFilters. Please see example/solr/conf/schema.xml for the setting of CharFilter and CharStreamAware*Tokenizer: !-- charFilter + CharStream aware WhitespaceTokenizer -- !-- fieldType name=textCharNorm class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.CharStreamAwareWhitespaceTokenizerFactory/ /analyzer /fieldType -- Thank you, Koji Ashish P wrote: Koji san, Using CharStreamAwareCJKTokenizerFactory is giving me following error, SEVERE: java.lang.ClassCastException: java.io.StringReader cannot be cast to org.apache.solr.analysis.CharStream May be you are typecasting Reader to subclass. Thanks, Ashish Koji Sekiguchi-2 wrote: If you use CharFilter, you should use CharStream aware Tokenizer to correct terms offsets. There are two CharStreamAware*Tokenizer in trunk/Solr 1.4. Probably you want to use CharStreamAwareCJKTokenizer(Factory). Koji Ashish P wrote: After this should I be using same cjkAnalyzer or use charFilter?? Thanks, Ashish Koji Sekiguchi-2 wrote: Ashish P wrote: I want to convert half width katakana to full width katakana. I tried using cjk analyzer but not working. Does cjkAnalyzer do it or is there any other way?? CharFilter which comes with trunk/Solr 1.4 just covers this type of problem. If you are using Solr 1.3, try the patch attached below: https://issues.apache.org/jira/browse/SOLR-822 Koji
RE: OutofMemory on Highlightling
Is it possible to read only maxAnalyzedChars from the stored field instead of reading the complete field in the memory? For instance, in my case, is it possible to read only first 50K characters instead of complete 1 MB stored text? That will help minimizing the memory usage (Though, it will still take 50K * 500 * 2 = 50 MB for 500 results). I would really appreciate some feedback on this issue... Thanks, Siddharth -Original Message- From: Gargate, Siddharth [mailto:sgarg...@ptc.com] Sent: Friday, April 24, 2009 10:46 AM To: solr-user@lucene.apache.org Subject: RE: OutofMemory on Highlightling I am not sure whether lazy loading should help solve this problem. I have set enableLazyFieldLoading to true but it is not helping. I went through the code and observed that DefaultSolrHighlighter.doHighlighting is reading all the documents and the fields for highlighting (In my case, 1 MB stored field is read for all documents). Also I am confused over the following code in SolrIndexSearcher.doc() method if(!enableLazyFieldLoading || fields == null) { d = searcher.getIndexReader().document(i); } else { d = searcher.getIndexReader().document(i, new SetNonLazyFieldSelector(fields)); } Are we setting the fields as NonLazy even if lazy loading is enabled? Thanks, Siddharth -Original Message- From: Gargate, Siddharth [mailto:sgarg...@ptc.com] Sent: Wednesday, April 22, 2009 11:12 AM To: solr-user@lucene.apache.org Subject: RE: OutofMemory on Highlightling Here is the stack trace SEVERE: java.lang.OutOfMemoryError: Java heap space at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:133) at java.lang.StringCoding.decode(StringCoding.java:173) at java.lang.String.init(String.java:444) at org.apache.lucene.store.IndexInput.readString(IndexInput.java:125) at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:390) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:230) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:892) at org.apache.lucene.index.MultiSegmentReader.document(MultiSegmentReader.j ava:277) at org.apache.solr.search.SolrIndexReader.document(SolrIndexReader.java:176 ) at org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:457) at org.apache.solr.search.SolrIndexSearcher.readDocs(SolrIndexSearcher.java :482) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultS olrHighlighter.java:253) at org.apache.solr.handler.component.HighlightComponent.process(HighlightCo mponent.java:84) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(Search Handler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB ase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja va:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica tionFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt erChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv e.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv e.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java :128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java :102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve. java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:2 86) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:84 5) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process( Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) -Original Message- From: Gargate, Siddharth [mailto:sgarg...@ptc.com] Sent: Wednesday, April 22, 2009 9:29 AM To: solr-user@lucene.apache.org Subject: RE: OutofMemory on Highlightling I tried disabling the documentCache but still the same issue. documentCache class=solr.LRUCache size=0 initialSize=0 autowarmCount=0/ -Original Message- From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] Sent: Monday, April 20, 2009 4:38 PM To: solr-user@lucene.apache.org Subject: Re: OutofMemory on Highlightling Gargate, Siddharth wrote: Anybody facing the same issue? Following is my configuration ... field name=content type=text indexed=true stored=false multiValued=true/ field name=teaser type=text indexed=false stored=true/
Re: Getting incorrect value while trying to extract content from xlsx
Koushik, You didn't say much about how you are doing the extraction. Note that Solr doesn't do any extraction from spreadsheets, even though it has a component (known as Solr Cell) to provide that interface. The actual extraction is done by a tool called Tika, or more precisely, POI, both of which are separate Apache projects. Asking there may get you to the solution faster. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Koushik Mitra koushik_mi...@infosys.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Tuesday, April 28, 2009 4:17:00 AM Subject: Getting incorrect value while trying to extract content from xlsx HI, I was trying to extract content from an xlsx file for indexing. However, I am getting julian date value for a cell with date format and '1.0' in place of '100%'. I want to retain the value as present in that xlsx file. Solution appreciated. Thanks, Koushik CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS***
Re: Solr Performance bottleneck
On Mon, Apr 27, 2009 at 10:27 PM, Jon Bodner jbod...@blackboard.com wrote: Trying to point multiple Solrs on multiple boxes at a single shared directory is almost certainly doomed to failure; the read-only Solrs won't know when the read/write Solr instance has updated the index. I'm solving the same problem while working with index stored in data-grid and I've just created a data-grid listener which looks for segments.gen file changes and forces Solr to refresh its structures after receiving this event. You can do the same job with file system index - write some code which looks at segments.gen file changes and kicks solr when a change is detected. It would be great to add such a mechanism to Solr, I mean some abstracted (via an interface) way to implement index refresh events sources. Also there's code in SolrCore which checks index existence by looking into file system and it would be better to abstract that code too. WDYT? I can provide patches. -- Andrew Klochkov
Re: how to reset the index in solr
On Apr 24, 2009, at 1:54 AM, sagi4 wrote: Can i get the rake task for clearing the index of solr, I mean rake index::rebuild, It would be very helpful and also to avoid the delete id by manually. How do you currently build your index? But making a Rake task to do perform Solr operations is generally pretty trivial. In Ruby (after gem install solr-ruby): require 'solr' solr = Solr::Connection.new(http://localhost:8983/solr;) solr.optimize # for example Erik
Re: Term highlighting with MoreLikeThisHandler?
Yes... at least I think so. the highlighting works correctly for me on another request handler... see below the request handler for my morelikethishandler query. Thanks for your help... Eric requestHandler name=/mlt class=solr.MoreLikeThisHandler lst name=defaults str name=fl score,id,timestamp,type,textualId,subject,url,server /str str name=echoParamsexplicit/str str name=mlt.match.includetrue/str str name=mlt.interestingTermslist/str str name=mlt.flsubject,requirements,productName,justification,operation_exact/str int name=mlt.minwl2/int int name=mlt.mintf1/int int name=mlt.mindf2/int str name=hltrue/str str name=hl.snippets1/str !-- for subject and textualID fields, we want no fragmenting, just highlighting -- str name=f.textualId.hl.fragsize0/str str name=f.subject.hl.fragsize0/str str name=f.requirements.hl.fragmenterregex/str !-- defined below -- str name=f.justification.hl.fragmenterregex/str /lst /requestHandler On Mon, Apr 27, 2009 at 11:30 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Eric, Have you tried using MLT with parameters described on http://wiki.apache.org/solr/HighlightingParameters ? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Eric Sabourin eric.sabourin2...@gmail.com To: solr-user@lucene.apache.org Sent: Monday, April 27, 2009 10:31:38 AM Subject: Term highlighting with MoreLikeThisHandler? I submit a query to the MoreLikeThisHandler to find documents similar to a specified document. This works and I've configured my request handler to also return the interesting terms. Is it possible to have MLT return to me highlight snippets in the similar documents it returns? I mean generate hl snippets of the interesting terms? If so how? Thanks... Eric -- Eric Sent from Halifax, NS, Canada
Re: highlighting html content
Hi Christian, I decided to do something very similar. How do you handle cases where the highlighting is inside of html/xml tags though? I'm getting stuff like this: ?q=jackson entry type=song author=Michael emJackson/emBad by Michael emJackson/em/entry I wrote a regular expression to take care of the html/xml problem (highlighting inside of the tag), I'd be interested in seeing your and others approach to this, even if it's a regular expression. Matt On Tue, Apr 28, 2009 at 3:21 AM, Christian Vogler christian.vog...@gmail.com wrote: Hi Matt, On Tue, Apr 28, 2009 at 4:24 AM, Matt Mitchell goodie...@gmail.com wrote: I've been toying with setting custom pre/post delimiters and then removing them in the client, but I thought I'd ask the list before I go to far with that idea :) this is what I do. I define the custom highlight delimiters as [solr:hl] and [/solr:hl], and then do a string replace with em class=highlight /em on the search results. It is simple to implement, and effective. Best regards - Christian
Re: Getting incorrect value while trying to extract content from xlsx
How are you indexing it? A sample of the CSV file would be helpful. Note that while the CSV update handler is very convenient and very fast, it also doesn't have much in the way of data massaging/ transformation - so it might require you pre-format the data for Solr ingestion, or have a programmatic indexer that does this. Erik On Apr 28, 2009, at 4:17 AM, Koushik Mitra wrote: HI, I was trying to extract content from an xlsx file for indexing. However, I am getting julian date value for a cell with date format and '1.0' in place of '100%'. I want to retain the value as present in that xlsx file. Solution appreciated. Thanks, Koushik CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS***
Re: Solr Performance bottleneck
Hi, You should probably just look at the index version number to figure out if the name changed. If you are looking at segments.gen, you are looking at a file that may not exist in Lucene in the future. Use IndexReader API instead. By refreshes do you mean reopened a new Searcher? Does commit + post commit event not work for you? By kicks Solr I hope you don't mean a Solr/container restart! :) Otis-- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Andrey Klochkov akloch...@griddynamics.com To: solr-user@lucene.apache.org Sent: Tuesday, April 28, 2009 4:57:54 AM Subject: Re: Solr Performance bottleneck On Mon, Apr 27, 2009 at 10:27 PM, Jon Bodner wrote: Trying to point multiple Solrs on multiple boxes at a single shared directory is almost certainly doomed to failure; the read-only Solrs won't know when the read/write Solr instance has updated the index. I'm solving the same problem while working with index stored in data-grid and I've just created a data-grid listener which looks for segments.gen file changes and forces Solr to refresh its structures after receiving this event. You can do the same job with file system index - write some code which looks at segments.gen file changes and kicks solr when a change is detected. It would be great to add such a mechanism to Solr, I mean some abstracted (via an interface) way to implement index refresh events sources. Also there's code in SolrCore which checks index existence by looking into file system and it would be better to abstract that code too. WDYT? I can provide patches. -- Andrew Klochkov
Re: DataImportHandler Questions-Load data in parallel and temp tables
Amit, You might want to take a look at LuSql[1] and see if it may be appropriate for the issues you have. thanks, Glen [1]http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql 2009/4/27 Amit Nithian anith...@gmail.com: All, I have a few questions regarding the data import handler. We have some pretty gnarly SQL queries to load our indices and our current loader implementation is extremely fragile. I am looking to migrate over to the DIH; however, I am looking to use SolrJ + EmbeddedSolr + some custom stuff to remotely load the indices so that my index loader and main search engine are separated. Currently, unless I am missing something, the data gathering from the entity and the data processing (i.e. conversion to a Solr Document) is done sequentially and I was looking to make this execute in parallel so that I can have multiple threads processing different parts of the resultset and loading documents into Solr. Secondly, I need to create temporary tables to store results of a few queries and use them later for inner joins was wondering how to best go about this? I am thinking to add support in DIH for the following: 1) Temporary tables (maybe call it temporary entities)? --Specific only to SQL though unless it can be generalized to other sources. 2) Parallel support - Including some mechanism to get the number of records (whether it be count or the MAX(custom_id)-MIN(custom_id)) 3) Support in DIH or Solr to post documents to a remote index (i.e. create a new UpdateHandler instead of DirectUpdateHandler2). If any of these exist or anyone else is working on this (OR you have better suggestions), please let me know. Thanks! Amit -- -
Re: How to post in-memory[not residing on local disks] Xml files to Solr server for indexing?
Thank you very much. Now its working fine, fixed those minor classpath issues. Thanks, Ahmed. 2009/4/28 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com the Solr distro contains all the jar files. you can take either the latest release (1.3) or a nightly On Tue, Apr 28, 2009 at 11:34 AM, ahmed baseet ahmed.bas...@gmail.com wrote: As far as I know, Maven is a build/mgmt tool for java projects quite similar to Ant, right? No I'm not using this , then I think I don't need to worry about those pom files. But I'm still not able to figure out the error with classpath/jar files I mentioned in my previous mails. Shall I try getting those jar files, specifically that solr-solrj jar that contains commons-http-solr-server class files? If yes then can you tell me where to get those jar files from, on the web? Has anyone ever faced similar problems? Please help me fixing these silly issues? Thanks, Ahmed. On Mon, Apr 27, 2009 at 6:59 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Mon, Apr 27, 2009 at 6:27 PM, ahmed baseet ahmed.bas...@gmail.com wrote: Can anyone help me selecting the proper pom.xml file out of the bunch of *-pom.xml.templates available. Ahmed, are you using Maven? If not, then you do not need these pom files. If you are using Maven, then you need to add a dependency to solrj. http://wiki.apache.org/solr/Solrj#head-674dd7743df665fdd56e8eccddce16fc2de20e6e -- Regards, Shalin Shekhar Mangar. -- --Noble Paul
Re: Unique Identifiers
On Apr 28, 2009, at 9:49 AM, ahammad wrote: Is it possible for Solr to assign a unique number to every document? Solr has a UUIDField that can be used for this. But... For example, let's say that I am indexing from several databases with different data structures. The first one has a unique field called artID, and the second database has a unique field called SRNum. If I want to have an interface that allows me to search both of those data sources, it makes it easier to have a single field per document that is common to both datasources...maybe something like uniqueDocID or something like that. That field does not exist in the DB. Is it possible for Solr to create that field and assign a number while it's indexing? I recommend an aggregate unique key field, using maybe this scheme: table-name-primary key value' Erik
Re: Snapinstaller on slave solr server | Can not connect to solr server issue
To add to that : This issue was coming because of the commit script called internally by snapinstaller . Commit script creates the solr url to do the comit as shown below: curl_url=http://${solr_hostname}:${solr_port}/${webapp_name}/update commitscript logs: 2009/04/28 18:48:21 started by root 2009/04/28 18:48:21 command: /opt/apache-solr-1.3.0/example/solr/multicore/CORE_WWW.PUFFIN.CO.UK/bin/commit 2009/04/28 18:48:21 commit request to Solr at http://delpearsondm:8080/apache-solr-1.3.0/update failed: 2009/04/28 18:48:21 htmlheadtitleApache Tomcat/6.0.18 - Error report/title/headbodyh1HTTP Status 400 - Missing solr core name in path/h1HR size=1 noshade=noshadeptype Status report/ppmessage uMissing solr core name in path/u/ppdescription uThe request sent by the client was syntactically incorrect (Missing solr core name in path)./u/pHR size=1 noshade=noshadeh3Apache Tomcat/6.0.18/h3/body/html 2009/04/28 18:48:21 failed (elapsed time: 0 sec) Solr server set at our end contains multi cores, thus forms the URL like : http://servername:8080/apache-solr-1.3.0/CORE_WWW.ABCD.COM/update The Core name is not getting appended in the commit script. Please let me know whether I need to change the commit script to accomodate the core name in URL formed, or there is some alternate way to achieve the same without modifying the script. Thanks, Payal payalsharma wrote: Hi All, I m facing an issue while running snapinstaller script on the Slave server, scripts installs the latest snapshot , but creates issue while making connectivity to the solr server , logs for the same from snapinstaller.log : 2009/04/28 18:48:03 command: /opt/apache-solr-1.3.0/example/solr/multicore/CORE_WWW.ABCD.COM/bin/snapinstaller -u webuser 2009/04/28 18:48:16 installing snapshot /opt/apache-solr-1.3.0/example/solr/multicore/CORE_WWW.ABCD.COM/data/snapshot.20090428180619 2009/04/28 18:48:21 notifing Solr to open a new Searcher 2009/04/28 18:48:21 failed to connect to Solr server 2009/04/28 18:48:21 snapshot installed but Solr server has not open a new Searcher 2009/04/28 18:48:21 failed (elapsed time: 18 sec) I ensured that slave solr server was in running state before calling ... snappuller and snapinstaller scripts. As a result of this issue Slave server's Collection was not displaying the indexes of latest installed snapshot, As a temporary solution, I restarted the Slave server and Collection got refreshed. Can anybody let me know the probable reason of this behavior. -- View this message in context: http://www.nabble.com/Snapinstaller-on-slave-solr-server-%7C-Can-not-connect-to-solr-server-issue-tp23278187p23279140.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Snapinstaller on slave solr server | Can not connect to solr server issue
To add to that : This issue was coming because of the commit script called internally by snapinstaller . Commit script creates the solr url to do the comit as shown below: curl_url=http://${solr_hostname}:${solr_port}/${webapp_name}/update commitscript logs: 2009/04/28 18:48:21 started by root 2009/04/28 18:48:21 command: /opt/apache-solr-1.3.0/example/solr/multicore/CORE_WWW.ABCD.COM/bin/commit 2009/04/28 18:48:21 commit request to Solr at http://servername:port/apache-solr-1.3.0/update failed: 2009/04/28 18:48:21 htmlheadtitleApache Tomcat/6.0.18 - Error report/title/headbodyh1HTTP Status 400 - Missing solr core name in path/h1HR size=1 noshade=noshadeptype Status report/ppmessage uMissing solr core name in path/u/ppdescription uThe request sent by the client was syntactically incorrect (Missing solr core name in path)./u/pHR size=1 noshade=noshadeh3Apache Tomcat/6.0.18/h3/body/html 2009/04/28 18:48:21 failed (elapsed time: 0 sec) Solr server set at our end contains multi cores, thus forms the URL like : http://servername:8080/apache-solr-1.3.0/CORE_WWW.ABCD.COM/update The Core name is not getting appended in the commit script. Please let me know whether I need to change the commit script to accomodate the core name in URL formed, or there is some alternate way to achieve the same without modifying the script. Thanks, Payal payalsharma wrote: Hi All, I m facing an issue while running snapinstaller script on the Slave server, scripts installs the latest snapshot , but creates issue while making connectivity to the solr server , logs for the same from snapinstaller.log : 2009/04/28 18:48:03 command: /opt/apache-solr-1.3.0/example/solr/multicore/CORE_WWW.ABCD.COM/bin/snapinstaller -u webuser 2009/04/28 18:48:16 installing snapshot /opt/apache-solr-1.3.0/example/solr/multicore/CORE_WWW.ABCD.COM/data/snapshot.20090428180619 2009/04/28 18:48:21 notifing Solr to open a new Searcher 2009/04/28 18:48:21 failed to connect to Solr server 2009/04/28 18:48:21 snapshot installed but Solr server has not open a new Searcher 2009/04/28 18:48:21 failed (elapsed time: 18 sec) I ensured that slave solr server was in running state before calling ... snappuller and snapinstaller scripts. As a result of this issue Slave server's Collection was not displaying the indexes of latest installed snapshot, As a temporary solution, I restarted the Slave server and Collection got refreshed. Can anybody let me know the probable reason of this behavior. -- View this message in context: http://www.nabble.com/Snapinstaller-on-slave-solr-server-%7C-Can-not-connect-to-solr-server-issue-tp23278187p23279184.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: newbie question about indexing RSS feeds with SOLR
Just an FYI: I've never tried, but there seems to be RSS feed sample in DIH: http://wiki.apache.org/solr/DataImportHandler#head-e68aa93c9ca7b8d261cede2bf1d6110ab1725476 Koji Tom H wrote: Hi, I've just downloaded solr and got it working, it seems pretty cool. I have a project which needs to maintain an index of articles that were published on the web via rss feed. Basically I need to watch some rss feeds, and search and index the items to be searched. Additionally, I need to run jobs based on particular keywords or events during parsing. is this something that I can do with SOLR? are their any related projects using SOLR that are better suited to indexing specific xml types like RSS? I had a look at the project enormo which appears to be a property lettings and sales listing aggregator. But I can see that they must have solved some of the problems I am thinking of such as scheduled indexing of remote resources, and writing a parser to get data fields from some other sites templates. Any advice would be welcome... Many Thanks, Tom
Re: Can we provide context dependent faceted navigation from SOLR search results
Thanh Doan wrote: Assuming a solr search returns 10 listing items as below 1) 4 digital cameras 2) 4 LCD televisions 3) 2 clothing items If we navigate to /electronics we want solr to show us facets specific to 8 electronics items (e.g brand, price). If we navigate to /electronics/cameraswe want solr to show us facets specific to 4 camera items (e.g mega-pixels, screens-size, brand, price). If we navigate to /electronics/televisions we want to see different facets and their counts specific to TV items. If we navigate to /clothing we want to obtain totally different facets and their counts. I am not sure if we can think of this as Hierarchical Facet Navigation system or not. From the UI perspective , we can think of /electronics/cameras as Hierarchical classification. There is a patch for Hierarchical Facet Navigation: https://issues.apache.org/jira/browse/SOLR-64 But how about electronics/cameras/canon vs electronics/canon/camera. In this case both navigation should show the same result set no matter which facet is selected first. The patch supports a document to have multiple hierarchical facet fields. for example: add doc field name=nameCanon Brand-new Digital Camera/field field name=catelectronics/cameras/canon/field field name=catelectronics/canon/cameras/field /doc /add Koji My question is with the current solr implementation can we provide context dependent faceted navigation from SOLR search results? Thank you. Thanh Doan
Re: spellcheck.collate causes StringIndexOutOfBoundsException during startup.
I see you are using firstSearcher/newSearcher event listener on your startup and cause the problem. If you don't need them, commented out them in solrconfig.xml. Koji Eric Sabourin wrote: I’m using SOLR 1.3.0 (from download, not a nightly build) apache-tomcat-5.5.27 on Windows XP. When I add str name=spellcheck.collatetrue/str to my requestHandler in my solrconfig.xml, I get the StringIndexOutOfBoundsException stacktrace below on startup. Removing the element, or setting it to false, causes the exception to no longer occur on startup. Any help is appreciated. Let me know if additional information is required. Eric The exception (from logs): Apr 24, 2009 12:17:53 PM org.apache.solr.servlet.SolrUpdateServlet init INFO: SolrUpdateServlet.init() done Apr 24, 2009 12:17:53 PM org.apache.solr.common.SolrException log SEVERE: java.lang.StringIndexOutOfBoundsException: String index out of range: -5 at java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:800) at java.lang.StringBuilder.replace(StringBuilder.java:272) at org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:232) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:149) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1228) at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:50) at org.apache.solr.core.SolrCore$4.call(SolrCore.java:1034) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269) at java.util.concurrent.FutureTask.run(FutureTask.java:123) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675) at java.lang.Thread.run(Thread.java:595) Apr 24, 2009 12:17:53 PM org.apache.solr.core.SolrCore execute Having the following does not cause the exception: str name=spellchecktrue/str str name=spellcheck.onlyMorePopularfalse/str !-- exr = Extended Results -- str name=spellcheck.extendedResultsfalse/str !-- The number of suggestions to return -- str name=spellcheck.count1/str str name=spellcheck.dictionarydefault/str !-- comment out collate... causes java.lang.StringIndexOutOfBoundsException on startup? -- !-- str name=spellcheck.collatetrue/str-- With the following the exception occurs on startup. str name=spellchecktrue/str str name=spellcheck.onlyMorePopularfalse/str !-- exr = Extended Results -- str name=spellcheck.extendedResultsfalse/str !-- The number of suggestions to return -- str name=spellcheck.count1/str str name=spellcheck.dictionarydefault/str !-- comment out collate... causes java.lang.StringIndexOutOfBoundsException on startup? -- str name=spellcheck.collatetrue/str
Re: Can we provide context dependent faceted navigation from SOLR search results
Wow, this looks great. Thanks for this Koji! Matt On Tue, Apr 28, 2009 at 12:13 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: Thanh Doan wrote: Assuming a solr search returns 10 listing items as below 1) 4 digital cameras 2) 4 LCD televisions 3) 2 clothing items If we navigate to /electronics we want solr to show us facets specific to 8 electronics items (e.g brand, price). If we navigate to /electronics/cameraswe want solr to show us facets specific to 4 camera items (e.g mega-pixels, screens-size, brand, price). If we navigate to /electronics/televisions we want to see different facets and their counts specific to TV items. If we navigate to /clothing we want to obtain totally different facets and their counts. I am not sure if we can think of this as Hierarchical Facet Navigation system or not. From the UI perspective , we can think of /electronics/cameras as Hierarchical classification. There is a patch for Hierarchical Facet Navigation: https://issues.apache.org/jira/browse/SOLR-64 But how about electronics/cameras/canon vs electronics/canon/camera. In this case both navigation should show the same result set no matter which facet is selected first. The patch supports a document to have multiple hierarchical facet fields. for example: add doc field name=nameCanon Brand-new Digital Camera/field field name=catelectronics/cameras/canon/field field name=catelectronics/canon/cameras/field /doc /add Koji My question is with the current solr implementation can we provide context dependent faceted navigation from SOLR search results? Thank you. Thanh Doan
Re: fail to create or find snapshoot
I think this is a bug. I looked at the classes SnapShooter, and it's constructor looks like this: public SnapShooter(SolrCore core) { solrCore = core; } This leaves the variable snapDir to be null, and the variable is never initialized elsewhere, and later in the function SnapShooter.createSnapshot, the line snapShotDir = new File(snapDir, directoryName); is equivalent to snapShotDir = new File(directoryName); because snapDir is null, and therefor the snapshot is created in the directory where the application is launched. A line should be added to the contructor like this: public SnapShooter(SolrCore core) { solrCore = core; snapDir = core.getDataDir(); } This is not a problem during development, but it is when you want to deploy the application to different environments and to schedule snapshot for backup. Can somebody take a look at this problem? Thanks, Jianhan On Mon, Apr 27, 2009 at 12:02 PM, Jian Han Guo jian...@gmail.com wrote: Actually, I found the snapshot in the directory where solr was lauched. Is this done on purpose? shouldn't it be in the data directory? Thanks, Jianhan On Mon, Apr 27, 2009 at 11:43 AM, Jian Han Guo jian...@gmail.com wrote: Hi, According to Solr's wiki page http://wiki.apache.org/solr/SolrReplication, if I send the following request to master, a snapshoot will be created http://master_host:port/solr/replication?command=snapshoothttp://master_host/solr/replication?command=snapshoot But after I did it, nothing seemed happening. I got this response back, ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime2/int/lst /response and I checked the data directory, no snapshoot was created. I am not sure what to expect after making the request, and where to find the snapshoot files (and what they are). Thanks, Jianhan
Unable to import data from database
I am using MS SQL server and want to index a table. I setup my data-config like this: dataConfig dataSource type=JdbcDataSource batchSize=25000 autoCommit=true driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://localhost:1433;databaseName=MYDB user= password=/ document name=products entity name=item query=select TOP 50 * from items field column=item_id name=id / field column=itemname name=name / field column=itemavgbucost name=price / field column=categoryname name=cat / field column=itemdesc name=features / /entity /document /dataConfig I am unable to load data from database. I always receive 0 document fetched: lst name=statusMessages str name=Time Elapsed0:0:12.989/str str name=Total Requests made to DataSource1/str str name=Total Rows Fetched0/str str name=Total Documents Processed0/str str name=Total Documents Skipped0/str str name=Full Dump Started2009-04-28 14:37:49/str /lst The query runs in SQL Server query manager and retrieves records. The funny thing is, even if I purposefully write a wrong query with non-existing tables I get the same response. What am I doing wrong? How can I tell whether a query fails or succeeds or if solr is running the query in the first place? Any help is appreciated. Best, -Ci -- View this message in context: http://www.nabble.com/Unable-to-import-data-from-database-tp23283852p23283852.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Performance bottleneck
On Tue, Apr 28, 2009 at 3:18 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, You should probably just look at the index version number to figure out if the name changed. If you are looking at segments.gen, you are looking at a file that may not exist in Lucene in the future. Use IndexReader API instead. Yeah, I user IndexReader.isCurrent() to determine if I should refresh Solr after catching a data grid event. But I have to create that event listener somehow, and here I have no other way but to hardcode this index file name. So when some node of the cluster performs commit, other nodes which listen for segments.gen changes, receive the event and refresh their Solr instances by calling SolrServer.commit(). By refreshes do you mean reopened a new Searcher? Does commit + post commit event not work for you? Currently I use the following code to refresh cores: new EnbeddedlSolrServer(cores, coreName).commit() By kicks Solr I hope you don't mean a Solr/container restart! :) :) No, I mean the same refresh code i.e. calling SolrServer.commit(). -- Andrew Klochkov
Re: Unable to import data from database
Did you define all the fields that you used in schema.xml? Ci-man wrote: I am using MS SQL server and want to index a table. I setup my data-config like this: dataConfig dataSource type=JdbcDataSource batchSize=25000 autoCommit=true driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://localhost:1433;databaseName=MYDB user= password=/ document name=products entity name=item query=select TOP 50 * from items field column=item_id name=id / field column=itemname name=name / field column=itemavgbucost name=price / field column=categoryname name=cat / field column=itemdesc name=features / /entity /document /dataConfig I am unable to load data from database. I always receive 0 document fetched: lst name=statusMessages str name=Time Elapsed0:0:12.989/str str name=Total Requests made to DataSource1/str str name=Total Rows Fetched0/str str name=Total Documents Processed0/str str name=Total Documents Skipped0/str str name=Full Dump Started2009-04-28 14:37:49/str /lst The query runs in SQL Server query manager and retrieves records. The funny thing is, even if I purposefully write a wrong query with non-existing tables I get the same response. What am I doing wrong? How can I tell whether a query fails or succeeds or if solr is running the query in the first place? Any help is appreciated. Best, -Ci -- View this message in context: http://www.nabble.com/Unable-to-import-data-from-database-tp23283852p23284381.html Sent from the Solr - User mailing list archive at Nabble.com.
Multiple Queries
Hi, I have been trying to solve a performance issue: I have an index of hotels with their ids and another index of reviews. Now, when someone queries for a location, the current process gets all the hotels for that location. And, then corresponding to each hotel-id from all the hotel documents, it calls the review index to fetch reviews associated with that particular hotel and so on it repeats for all the hotels. This process slows down the request significantly. I need to accumulate reviews according to corresponding hotel-ids, so I can't just fetch all the reviews for all the hotel ids and show them. Now, I was thinking about fetching all the reviews for all the hotel-ids and then parse all those reviews in one go and create a map with hotel-id as key and list of reviews as values. Can anyone comment on whether this procedure would be better or worse, or if there's better way of doing this? --Ankush Goyal
Re: DataImportHandler Questions-Load data in parallel and temp tables
I do remember LuSQL and a discussion regarding the performance implications of using it compared to the DIH. My only reason to stick with DIH is that we may have other data sources for document loading in the near term that may make LuSQL too specific for our needs. Regarding the bug to write to the index in a separate thread, while helpful, doesn't address my use case which is as follows: 1) Write a loader application using EmbeddedSolr + SolrJ + DIH (create a bogus local request with path='/dataimport') so that the DIH code is invoked 2) Instead of using DirectUpdate2 update handler, write a custom update handler to take a solr document and POST to a remote Solr server. I could queue documents here and POST in bulk but that's details.. 3) Possibly multi-thread the DIH so that multiple threads can process different database segments, construct and POST solr documents. - For example, thread 1 processes IDs 1-100, thread 2, 101-200, thread 3, 201-... - If the Solr Server is multithreaded in writing to the index, that's great and helps in performance. #3 is possible depending on performance tests. #1 and #2 I believe I need because I want my loader separated from the master server for development, deployment and just general separation of concerns. Thanks Amit On Tue, Apr 28, 2009 at 6:03 AM, Glen Newton glen.new...@gmail.com wrote: Amit, You might want to take a look at LuSql[1] and see if it may be appropriate for the issues you have. thanks, Glen [1]http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql 2009/4/27 Amit Nithian anith...@gmail.com: All, I have a few questions regarding the data import handler. We have some pretty gnarly SQL queries to load our indices and our current loader implementation is extremely fragile. I am looking to migrate over to the DIH; however, I am looking to use SolrJ + EmbeddedSolr + some custom stuff to remotely load the indices so that my index loader and main search engine are separated. Currently, unless I am missing something, the data gathering from the entity and the data processing (i.e. conversion to a Solr Document) is done sequentially and I was looking to make this execute in parallel so that I can have multiple threads processing different parts of the resultset and loading documents into Solr. Secondly, I need to create temporary tables to store results of a few queries and use them later for inner joins was wondering how to best go about this? I am thinking to add support in DIH for the following: 1) Temporary tables (maybe call it temporary entities)? --Specific only to SQL though unless it can be generalized to other sources. 2) Parallel support - Including some mechanism to get the number of records (whether it be count or the MAX(custom_id)-MIN(custom_id)) 3) Support in DIH or Solr to post documents to a remote index (i.e. create a new UpdateHandler instead of DirectUpdateHandler2). If any of these exist or anyone else is working on this (OR you have better suggestions), please let me know. Thanks! Amit -- -
RE: facet with group by (or field collapsing)
I began a similar thread under the subject Distinct terms in facet field. One thing I noticed though is that your fields seem to have a lot of controlled values, or lack free text. Are you sure SOLR is what you should be using? Perhaps a traditional RDB would be better and then you would have GROUP BY and aggregate functions at your disposal... HTH, Tim -Original Message- From: Qingdi [mailto:liuqin...@yahoo.com] Sent: Tuesday, April 28, 2009 1:07 PM To: solr-user@lucene.apache.org Subject: facet with group by (or field collapsing) Hi, Is it possible to group the search result on certain field and then do facet counting? For example, the index is defined with the following fields: Kid_Id, Family_Id, Age, School, Favorite_Sports (MultiValue Field) We want to query with Age between 10 yrs to 12 yrs and School in (School_A, School_B), and do faceting on Favorite_Sports. But instead of showing the count of kids for each sport, we want to show the count of Families. Each family can have multiple kids. How to group the search result on Family_Id, and then do faceting on Favorite_Sports? Appreciate your help. Qingdi -- View this message in context: http://www.nabble.com/facet-with-group-by-%28or-field-collapsing%29-tp23285038p23285038.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multiple Queries
Have you considered indexing the reviews along with the hotels right in the hotel index? That way you would fetch the reviews right along with the hotels... Really, this is another way of saying flatten your data G... Your idea of holding all the hotel reviews in memory is also viable, depending upon how many there are. you'd pay some startup costs, but that's what caching is all about. Given your current index structure, have you tried collecting the hotel IDs, and submitting a query to your review index that just ORs together all the IDs and then parsing that rather than calling your review index for one hotel ID at a time? Best Erick On Tue, Apr 28, 2009 at 4:32 PM, Ankush Goyal ankush.go...@orbitz.comwrote: Hi, I have been trying to solve a performance issue: I have an index of hotels with their ids and another index of reviews. Now, when someone queries for a location, the current process gets all the hotels for that location. And, then corresponding to each hotel-id from all the hotel documents, it calls the review index to fetch reviews associated with that particular hotel and so on it repeats for all the hotels. This process slows down the request significantly. I need to accumulate reviews according to corresponding hotel-ids, so I can't just fetch all the reviews for all the hotel ids and show them. Now, I was thinking about fetching all the reviews for all the hotel-ids and then parse all those reviews in one go and create a map with hotel-id as key and list of reviews as values. Can anyone comment on whether this procedure would be better or worse, or if there's better way of doing this? --Ankush Goyal
Re: Can we provide context dependent faceted navigation from SOLR search results
After posting this question I found this discussion http://www.nabble.com/Hierarchical-Facets--to7135353.html. So what I did was adapting the scheme with 3 fields; cat, subcat,subsubcat and hardcoded the hierarchical logic in the UI layer to present hierarchical taxonomy for the users. The users still see somewhat similar to this page http://www.overstock.com/Electronics/Digital-Cameras/Canon,/brand,/813/cat.html But I have to say that hardcoding the hierarchical logic in UI layer is messy. It looks like Koji patch will be a much better solution. Thanks Koji! Thanh On Tue, Apr 28, 2009 at 11:27 AM, Matt Mitchell goodie...@gmail.com wrote: Wow, this looks great. Thanks for this Koji! Matt On Tue, Apr 28, 2009 at 12:13 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: Thanh Doan wrote: Assuming a solr search returns 10 listing items as below 1) 4 digital cameras 2) 4 LCD televisions 3) 2 clothing items If we navigate to /electronics we want solr to show us facets specific to 8 electronics items (e.g brand, price). If we navigate to /electronics/cameras we want solr to show us facets specific to 4 camera items (e.g mega-pixels, screens-size, brand, price). If we navigate to /electronics/televisions we want to see different facets and their counts specific to TV items. If we navigate to /clothing we want to obtain totally different facets and their counts. I am not sure if we can think of this as Hierarchical Facet Navigation system or not. From the UI perspective , we can think of /electronics/cameras as Hierarchical classification. There is a patch for Hierarchical Facet Navigation: https://issues.apache.org/jira/browse/SOLR-64 But how about electronics/cameras/canon vs electronics/canon/camera. In this case both navigation should show the same result set no matter which facet is selected first. The patch supports a document to have multiple hierarchical facet fields. for example: add doc field name=nameCanon Brand-new Digital Camera/field field name=catelectronics/cameras/canon/field field name=catelectronics/canon/cameras/field /doc /add Koji My question is with the current solr implementation can we provide context dependent faceted navigation from SOLR search results? Thank you. Thanh Doan -- Regards, Thanh Doan 713-884-0576 http://datamatter.blogspot.com/
Re: MacOS Failed to initialize DataSource:db+ DataimportHandler ???
That didn´t work either. All my libraries are at /Applications/tomcat/webapps/solr/WEB-INF/lib So is apache-solr-dataimporthandler-1.3.0.jar However I did create a new /lib directory under my solr home at /Applications/solr and copied the jar to that location as well. But no difference. Here is my entry for the dataimporthandler in solrconfig.xml (path:/Applications/solr/conf): requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=config/Applications/solr/conf/data-config.xml/str /lst /requestHandler Noble Paul നോബിള് नोब्ळ् wrote: apparently you do not have the driver in the path. drop your driver jar into ${solr.home}/lib On Tue, Apr 28, 2009 at 4:42 AM, gateway0 reiterwo...@yahoo.de wrote: Hi, sure: message Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: abortOnConfigurationErrorfalse/abortOnConfigurationError in null - org.apache.solr.common.SolrException: FATAL: Could not create importer. DataImporter config invalid at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:114) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:311) at org.apache.solr.core.SolrCore.init(SolrCore.java:480) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:119) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:108) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3709) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4363) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:525) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:627) at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:488) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1149) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:311) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053) at org.apache.catalina.core.StandardHost.start(StandardHost.java:719) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) at org.apache.catalina.core.StandardService.start(StandardService.java:516) at org.apache.catalina.core.StandardServer.start(StandardServer.java:710) at org.apache.catalina.startup.Catalina.start(Catalina.java:578) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413) Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Failed to initialize DataSource: mydb Processing Document # at org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:308) at org.apache.solr.handler.dataimport.DataImporter.addDataSource(DataImporter.java:273) at org.apache.solr.handler.dataimport.DataImporter.initEntity(DataImporter.java:228) at org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.java:98) at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:106) ... 31 more Caused by: org.apache.solr.common.SolrException: Could not load driver: com.mysql.jdbc.Driver at org.apache.solr.handler.dataimport.JdbcDataSource.createConnectionFactory(JdbcDataSource.java:112) at org.apache.solr.handler.dataimport.JdbcDataSource.init(JdbcDataSource.java:65) at org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:306) ... 35 more Caused by: java.lang.ClassNotFoundException: Unable to load com.mysql.jdbc.Driver or org.apache.solr.handler.dataimport.com.mysql.jdbc.Driver at org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:587) at
RE: facet with group by (or field collapsing)
Hi Tim, Thanks for your reply. The index structure in my original post is just an example. We do have many free text fields with different analyzers. I checked your post Distinct terms in facet field, but I think the issues we try to address are different. Yours is to get distinct terms in the facet field, but what I want is to count the distinct values on the non-facet field. Since the facet results are much smaller than the query result, you could get all the facets and count by yourself. But in my case, if I count by myself, I have to get all the query results, and then count on distinct family_id for each facet value. Thanks. Qingdi Harsch, Timothy J. (ARC-SC)[LOCKHEED MARTIN SPACE OPNS] wrote: I began a similar thread under the subject Distinct terms in facet field. One thing I noticed though is that your fields seem to have a lot of controlled values, or lack free text. Are you sure SOLR is what you should be using? Perhaps a traditional RDB would be better and then you would have GROUP BY and aggregate functions at your disposal... HTH, Tim -Original Message- From: Qingdi [mailto:liuqin...@yahoo.com] Sent: Tuesday, April 28, 2009 1:07 PM To: solr-user@lucene.apache.org Subject: facet with group by (or field collapsing) Hi, Is it possible to group the search result on certain field and then do facet counting? For example, the index is defined with the following fields: Kid_Id, Family_Id, Age, School, Favorite_Sports (MultiValue Field) We want to query with Age between 10 yrs to 12 yrs and School in (School_A, School_B), and do faceting on Favorite_Sports. But instead of showing the count of kids for each sport, we want to show the count of Families. Each family can have multiple kids. How to group the search result on Family_Id, and then do faceting on Favorite_Sports? Appreciate your help. Qingdi -- View this message in context: http://www.nabble.com/facet-with-group-by-%28or-field-collapsing%29-tp23285038p23285038.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/facet-with-group-by-%28or-field-collapsing%29-tp23285038p23287434.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: WordDelimiterFilterFactory removes words when options set to 0
: In trying to understand the various options for : WordDelimiterFilterFactory, I tried setting all options to 0. This seems : to prevent a number of words from being output at all. In particular : can't and 99dxl don't get output, nor do any wods containing hypens. : Is this correct behavior? For the record: there are other options you haven't set... splitOnNumerics defaults to 1; preserveOriginal defaults to 0 ... i'm guessing if you set splitOnNumerics=0 you'd see a lot more tokens come through, and if you set preserveOriginal=1 you'd definitely see a lot more tokens come through my default. : fieldtype name=mbooksOcrXPatLike class=solr.TextField : analyzer : tokenizer class=solr.WhitespaceTokenizerFactory/ : filter class=solr.WordDelimiterFilterFactory : splitOnCaseChange=0 : generateWordParts=0 : generateNumberParts=0 : catenateWords=0 : catenateNumbers=0 : catenateAll=0 : / : filter class=solr.LowerCaseFilterFactory/ : /analyzer : /fieldtype -Hoss
Re: half width katakana
: The exception is expected if you use CharStream aware Tokenizer without : CharFilters. Koji: i thought all of the casts had been eliminated and replaced with a call to CharReader.get(Reader) ? : Please see example/solr/conf/schema.xml for the setting of CharFilter and : CharStreamAware*Tokenizer: : Using CharStreamAwareCJKTokenizerFactory is giving me following error, : SEVERE: java.lang.ClassCastException: java.io.StringReader cannot be cast to : org.apache.solr.analysis.CharStream : : May be you are typecasting Reader to subclass. -Hoss
RE: fl parameter
: Anyone able to help with the question below? dealing with fl is a delicate dance in Solr right now .. complicated by both FieldSelector logic and distributed search (where both DocList and SolrDocumentList objects need to be dealt with). I looked at this recently and even I can't remember what does what at the moment ... i think you can do what you want just by writing a QueryResponseWriter, but it might also be possible to do it as a SearchComponent that prunes any SolrDocumentList objects and actaullizes any DocList objects using just the fields you want. The way to be sure is to look for all uses of CommonParams.FL in the code base. : Yonik, I couldn't find the issues you speak of can you point me in the right direction? http://wiki.apache.org/solr/FieldAliasesAndGlobsInParams -Hoss
Re: half width katakana
Chris Hostetter wrote: : The exception is expected if you use CharStream aware Tokenizer without : CharFilters. Koji: i thought all of the casts had been eliminated and replaced with a call to CharReader.get(Reader) ? Yeah, right. After r758137, ClassCastException should be eliminated. http://svn.apache.org/viewvc?view=revrevision=758137 And then CharReader.get(Reader) idiom added as hoss suggested: http://svn.apache.org/viewvc?view=revrevision=758161 Ashish, what revision/nightly version did you use when you got ClassCast Exception? Koji
field type for serialized code?
Hi, I'm attempting to serialize a simple ruby object into a solr.StrField - but it seems that what I'm getting back is munged up a bit, in that I can't de-serialize it. Is there a field type for doing this type of thing? Thanks, Matt
Re: Multiple Queries
Ankush, It seems that unless reviews are changing constantly, why not do what Erick was saying in flattening your data by storing reviews with the hotel index but re-index your hotels storing the top two reviews. I guess I am suggesting computing the top two reviews for each hotel offline and store them somewhere. You could store the top two reviews in an RDBMS and let whatever front end you have retrieve the top two from the RDBMS after receiving results from Solr based on your unique ID. HTH Amit On Tue, Apr 28, 2009 at 3:14 PM, Ankush Goyal ankush.go...@orbitz.comwrote: Hi Erick, Thanks for response!...the solution I was talking about was same as your last solution to get reviews for only required hotel-ids and then parsing them in one go to make a hash-map, I guess I didn't explain correctly :) As far as putting reviews inside the hotel index is concerned, we thought about that solution, but we also need to sort the reviews and (let's say) show top 2 of maybe 50 reviews for a hotel, so we couldn't put reviews inside hotel doc itself. Now, this again poses another question for the solution we talked about-, as it seems like getting reviews for required hotel-ids and then making a hash-map corresponding to hotel-ids can improve the performance, but then we also need to sort all the reviews for each hotel using a field/ score in the review-doc itself, which seems like would lower down the performance drastically. Any ideas on a better solution? Thanks! -Ankush -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, April 28, 2009 4:05 PM To: solr-user@lucene.apache.org Subject: Re: Multiple Queries Have you considered indexing the reviews along with the hotels right in the hotel index? That way you would fetch the reviews right along with the hotels... Really, this is another way of saying flatten your data G... Your idea of holding all the hotel reviews in memory is also viable, depending upon how many there are. you'd pay some startup costs, but that's what caching is all about. Given your current index structure, have you tried collecting the hotel IDs, and submitting a query to your review index that just ORs together all the IDs and then parsing that rather than calling your review index for one hotel ID at a time? Best Erick On Tue, Apr 28, 2009 at 4:32 PM, Ankush Goyal ankush.go...@orbitz.com wrote: Hi, I have been trying to solve a performance issue: I have an index of hotels with their ids and another index of reviews. Now, when someone queries for a location, the current process gets all the hotels for that location. And, then corresponding to each hotel-id from all the hotel documents, it calls the review index to fetch reviews associated with that particular hotel and so on it repeats for all the hotels. This process slows down the request significantly. I need to accumulate reviews according to corresponding hotel-ids, so I can't just fetch all the reviews for all the hotel ids and show them. Now, I was thinking about fetching all the reviews for all the hotel-ids and then parse all those reviews in one go and create a map with hotel-id as key and list of reviews as values. Can anyone comment on whether this procedure would be better or worse, or if there's better way of doing this? --Ankush Goyal
Re: how to reset the index in solr
Thank you Erik.. Should I write the below code in rake task /lib/tasks/solr.rake? I am newbie to ruby. Erik Hatcher wrote: On Apr 24, 2009, at 1:54 AM, sagi4 wrote: Can i get the rake task for clearing the index of solr, I mean rake index::rebuild, It would be very helpful and also to avoid the delete id by manually. How do you currently build your index? But making a Rake task to do perform Solr operations is generally pretty trivial. In Ruby (after gem install solr-ruby): require 'solr' solr = Solr::Connection.new(http://localhost:8983/solr;) solr.optimize # for example Erik
Re: Multiple Queries
Ankush, Your approach works. Fire a in query on the review index for all hotel ids you care about. Create a map of hotel to its reviews. Cheers Avlesh On Wed, Apr 29, 2009 at 8:09 AM, Amit Nithian anith...@gmail.com wrote: Ankush, It seems that unless reviews are changing constantly, why not do what Erick was saying in flattening your data by storing reviews with the hotel index but re-index your hotels storing the top two reviews. I guess I am suggesting computing the top two reviews for each hotel offline and store them somewhere. You could store the top two reviews in an RDBMS and let whatever front end you have retrieve the top two from the RDBMS after receiving results from Solr based on your unique ID. HTH Amit On Tue, Apr 28, 2009 at 3:14 PM, Ankush Goyal ankush.go...@orbitz.com wrote: Hi Erick, Thanks for response!...the solution I was talking about was same as your last solution to get reviews for only required hotel-ids and then parsing them in one go to make a hash-map, I guess I didn't explain correctly :) As far as putting reviews inside the hotel index is concerned, we thought about that solution, but we also need to sort the reviews and (let's say) show top 2 of maybe 50 reviews for a hotel, so we couldn't put reviews inside hotel doc itself. Now, this again poses another question for the solution we talked about-, as it seems like getting reviews for required hotel-ids and then making a hash-map corresponding to hotel-ids can improve the performance, but then we also need to sort all the reviews for each hotel using a field/ score in the review-doc itself, which seems like would lower down the performance drastically. Any ideas on a better solution? Thanks! -Ankush -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, April 28, 2009 4:05 PM To: solr-user@lucene.apache.org Subject: Re: Multiple Queries Have you considered indexing the reviews along with the hotels right in the hotel index? That way you would fetch the reviews right along with the hotels... Really, this is another way of saying flatten your data G... Your idea of holding all the hotel reviews in memory is also viable, depending upon how many there are. you'd pay some startup costs, but that's what caching is all about. Given your current index structure, have you tried collecting the hotel IDs, and submitting a query to your review index that just ORs together all the IDs and then parsing that rather than calling your review index for one hotel ID at a time? Best Erick On Tue, Apr 28, 2009 at 4:32 PM, Ankush Goyal ankush.go...@orbitz.com wrote: Hi, I have been trying to solve a performance issue: I have an index of hotels with their ids and another index of reviews. Now, when someone queries for a location, the current process gets all the hotels for that location. And, then corresponding to each hotel-id from all the hotel documents, it calls the review index to fetch reviews associated with that particular hotel and so on it repeats for all the hotels. This process slows down the request significantly. I need to accumulate reviews according to corresponding hotel-ids, so I can't just fetch all the reviews for all the hotel ids and show them. Now, I was thinking about fetching all the reviews for all the hotel-ids and then parse all those reviews in one go and create a map with hotel-id as key and list of reviews as values. Can anyone comment on whether this procedure would be better or worse, or if there's better way of doing this? --Ankush Goyal
Re: DataImportHandler Questions-Load data in parallel and temp tables
writing to a remote Solr through SolrJ is in the cards. I may even take it up after 1.4 release. For now your best bet is to override the class SolrWriter and override the corresponding methods for add/delete. On Wed, Apr 29, 2009 at 2:06 AM, Amit Nithian anith...@gmail.com wrote: I do remember LuSQL and a discussion regarding the performance implications of using it compared to the DIH. My only reason to stick with DIH is that we may have other data sources for document loading in the near term that may make LuSQL too specific for our needs. Regarding the bug to write to the index in a separate thread, while helpful, doesn't address my use case which is as follows: 1) Write a loader application using EmbeddedSolr + SolrJ + DIH (create a bogus local request with path='/dataimport') so that the DIH code is invoked 2) Instead of using DirectUpdate2 update handler, write a custom update handler to take a solr document and POST to a remote Solr server. I could queue documents here and POST in bulk but that's details.. 3) Possibly multi-thread the DIH so that multiple threads can process different database segments, construct and POST solr documents. - For example, thread 1 processes IDs 1-100, thread 2, 101-200, thread 3, 201-... - If the Solr Server is multithreaded in writing to the index, that's great and helps in performance. #3 is possible depending on performance tests. #1 and #2 I believe I need because I want my loader separated from the master server for development, deployment and just general separation of concerns. Thanks Amit On Tue, Apr 28, 2009 at 6:03 AM, Glen Newton glen.new...@gmail.com wrote: Amit, You might want to take a look at LuSql[1] and see if it may be appropriate for the issues you have. thanks, Glen [1]http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql 2009/4/27 Amit Nithian anith...@gmail.com: All, I have a few questions regarding the data import handler. We have some pretty gnarly SQL queries to load our indices and our current loader implementation is extremely fragile. I am looking to migrate over to the DIH; however, I am looking to use SolrJ + EmbeddedSolr + some custom stuff to remotely load the indices so that my index loader and main search engine are separated. Currently, unless I am missing something, the data gathering from the entity and the data processing (i.e. conversion to a Solr Document) is done sequentially and I was looking to make this execute in parallel so that I can have multiple threads processing different parts of the resultset and loading documents into Solr. Secondly, I need to create temporary tables to store results of a few queries and use them later for inner joins was wondering how to best go about this? I am thinking to add support in DIH for the following: 1) Temporary tables (maybe call it temporary entities)? --Specific only to SQL though unless it can be generalized to other sources. 2) Parallel support - Including some mechanism to get the number of records (whether it be count or the MAX(custom_id)-MIN(custom_id)) 3) Support in DIH or Solr to post documents to a remote index (i.e. create a new UpdateHandler instead of DirectUpdateHandler2). If any of these exist or anyone else is working on this (OR you have better suggestions), please let me know. Thanks! Amit -- - -- --Noble Paul
Re: how to reset the index in solr
I need a function (through solr ruby) for ruby that will allow us to clear everything regards, Sg.. Geetha wrote: Thank you Erik.. Should I write the below code in rake task /lib/tasks/solr.rake? I am newbie to ruby. Erik Hatcher wrote: On Apr 24, 2009, at 1:54 AM, sagi4 wrote: Can i get the rake task for clearing the index of solr, I mean rake index::rebuild, It would be very helpful and also to avoid the delete id by manually. How do you currently build your index? But making a Rake task to do perform Solr operations is generally pretty trivial. In Ruby (after gem install solr-ruby): require 'solr' solr = Solr::Connection.new(http://localhost:8983/solr;) solr.optimize # for example Erik -- Best Regards, ** *Geetha S *| System and Software Engineer email: gee...@angleritech.com mailto:gee...@angleritech.com * * * *Visit us at **Internet World, UK** (**28th-30th Apr 2009**)* *Click here for FREE TICKETS:** *http://www.angleritech.com/company/latest-technology-news-events.html * *ANGLER Technologies India - Your Offshore Development Partner -- An ISO 9001 Company* Contact us for your high quality Software Outsourcing http://www.angleritech.com/offshore/outsourced_product_development.html, E-Business Products http://www.angleritech.com/ebusiness/index.html and Design Solutions http://www.angleritech.com/design/index.html /* */ web :www.angleritech.com http://www.angleritech.com/ tel :+91 422 2312707, 2313938 fax :+91 422 2313936 address :**1144 Trichy Road, Coimbatore, 641045, India offices http://www.angleritech.com/contact/index.html_ _ : India | USA | UK | Canada | Europe | UAE | South Africa | Singapore | Hong Kong * * *Disclaimer: *The information in the email, files and communication are strictly confidential and meant for the intended recipients. It may contain proprietary information. If you are not an intended recipient; any form of disclosure, copyright, distribution and any other means of use of information is unauthorised and subject to legal implications. We do not accept any liability for the transmission of incomplete, delayed communication and recipients must check this email and any attachments for the presence of viruses before downloading them.
Re: field type for serialized code?
is the serialized data in UTF-8 string? On Wed, Apr 29, 2009 at 6:42 AM, Matt Mitchell goodie...@gmail.com wrote: Hi, I'm attempting to serialize a simple ruby object into a solr.StrField - but it seems that what I'm getting back is munged up a bit, in that I can't de-serialize it. Is there a field type for doing this type of thing? Thanks, Matt -- --Noble Paul