RE: score from two cores
Please correct me if I am doing something wrong. I really appreciate your help! I have a core for metadata (xml files) and a core for pdf documents. Sometimes I need search them separately, sometimes I need search both of them together. There is the same key which is related them for each item. For example, the xml files look like following: ?xml version=1.0 encoding=ISO-8859-1? List Item Keyrmaaac.pdf/Key TIsomethingTI UIrmaaac/UI /Item Item . /List I index rmaaac.pdf file with same Key and UI field in another core. Here is the example after I index rmaaac.pdf. ?xml version=1.0 encoding=UTF-8 ? response lst name=responseHeader int name=status0/int int name=QTime3/int lst name=params str name=indenton/str str name=start0/str str name=qcollectionid: RM/str str name=rows10/str str name=version2.2/str /lst /lst result name=response numFound=1 start=0 doc str name=UIrm/str str name=Keyrm.pdf/str str name=metadata_contentsomething/str /doc /result The result information which is display to user comes from metadata, not from pdf files. If I search a term from documents, in order to display search results to user, I have to get Keys from documents and then redo search from metadata. Then score is different. Please give me some suggestions! Thanks so much, Xiaohui -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, December 03, 2010 12:37 PM To: solr-user@lucene.apache.org Subject: Re: score from two cores Uhhm, what are you trying to do? What do you want to do with the scores from two cores? Best Erick On Fri, Dec 3, 2010 at 11:21 AM, Ma, Xiaohui (NIH/NLM/LHC) [C] xiao...@mail.nlm.nih.gov wrote: I have multiple cores. How can I deal with score? Thanks so much for help! Xiaohui
RE: how to set maxFieldLength to unlimitd
Does anyone know how to index a pdf file with very big size (more than 100MB)? Thanks so much, Xiaohui -Original Message- From: Ma, Xiaohui (NIH/NLM/LHC) [C] Sent: Tuesday, November 30, 2010 4:22 PM To: 'solr-user@lucene.apache.org' Subject: RE: how to set maxFieldLength to unlimitd I set maxFieldLength to 2147483647, restarted tomcat and re-indexed pdf files again. I also commented out the one in the mainIndex section. Unfortunately the files are still chopped out if the size of file is more than 20MB. Any suggestions? I really appreciate your help! Xiaohui -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, November 30, 2010 2:01 PM To: solr-user@lucene.apache.org Subject: Re: how to set maxFieldLength to unlimitd Set the maxFieldLength value in solrconfig.xml to, say, 2147483647 Also, see this thread for a common gotcha: http://lucene.472066.n3.nabble.com/Solr-ignoring-maxFieldLength-td473263.html , it appears you can just comment out the one in the mainIndex section. Best Erick On Tue, Nov 30, 2010 at 1:48 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] xiao...@mail.nlm.nih.gov wrote: I need index and search some pdf files which are very big (around 1000 pages each). How can I set maxFieldLength to unlimited? Thanks so much for your help in advance, Xiaohui
RE: how to set maxFieldLength to unlimitd
Thanks so much for your replay, Jan. I just found I cannot index pdf files with the file size more than 20MB. I use curl index them, didn't get any error either. Do you have any suggestions to index pdf files with more than 20MB? Thanks, Xiaohui -Original Message- From: jan.kure...@nokia.com [mailto:jan.kure...@nokia.com] Sent: Wednesday, December 01, 2010 11:30 AM To: solr-user@lucene.apache.org; solr-user-i...@lucene.apache.org; solr-user-...@lucene.apache.org Subject: RE: how to set maxFieldLength to unlimitd You just can't set it to unlimited. What you could do, is ignoring the positions and put a filter in, that sets the token for all but the first token to 0 (means the field length will be just 1, all tokens stacked on the first position) You could also break per page, so you put each page on a new position. Jan -Original Message- From: ext Ma, Xiaohui (NIH/NLM/LHC) [C] [mailto:xiao...@mail.nlm.nih.gov] Sent: Dienstag, 30. November 2010 19:49 To: solr-user@lucene.apache.org; 'solr-user-i...@lucene.apache.org'; 'solr-user-...@lucene.apache.org' Subject: how to set maxFieldLength to unlimitd I need index and search some pdf files which are very big (around 1000 pages each). How can I set maxFieldLength to unlimited? Thanks so much for your help in advance, Xiaohui
RE: how to set maxFieldLength to unlimitd
Thanks so much, Jan. I use curl to index pdf files. Is there other way to do it? I changed it the positionIncrement to 0, I didn't get it work either. Thanks, Xiaohui -Original Message- From: jan.kure...@nokia.com [mailto:jan.kure...@nokia.com] Sent: Wednesday, December 01, 2010 2:34 PM To: solr-user@lucene.apache.org Subject: Re: how to set maxFieldLength to unlimitd I don't know about upload limitations, but for sure there are some in the default settings, this could explain the limit of 20MB. Which upload mechanism on solr side do you use? I guess this is not a lucene problem but rather the http-layer of solr. If you manage to stream your PDF and start parsing it on the stream you then should go for the filter, that sets the positionIncrement to 0 as mentioned. What we did once for PDF files, we parsed them befor into plain text and where indexing this (but we were using lucene directly) with a streamReader. Grüße, Jan Am 01.12.2010 um 18:13 schrieb ext Ma, Xiaohui (NIH/NLM/LHC) [C] xiao...@mail.nlm.nih.gov : Thanks so much for your replay, Jan. I just found I cannot index pdf files with the file size more than 20MB. I use curl index them, didn't get any error either. Do you have any suggestions to index pdf files with more than 20MB? Thanks, Xiaohui -Original Message- From: jan.kure...@nokia.com [mailto:jan.kure...@nokia.com] Sent: Wednesday, December 01, 2010 11:30 AM To: solr-user@lucene.apache.org; solr-user-i...@lucene.apache.org; solr-user-...@lucene.apache.org Subject: RE: how to set maxFieldLength to unlimitd You just can't set it to unlimited. What you could do, is ignoring the positions and put a filter in, that sets the token for all but the first token to 0 (means the field length will be just 1, all tokens stacked on the first position) You could also break per page, so you put each page on a new position. Jan -Original Message- From: ext Ma, Xiaohui (NIH/NLM/LHC) [C] [mailto:xiao...@mail.nlm.nih.gov] Sent: Dienstag, 30. November 2010 19:49 To: solr-user@lucene.apache.org; 'solr-user- i...@lucene.apache.org'; 'solr-user-...@lucene.apache.org' Subject: how to set maxFieldLength to unlimitd I need index and search some pdf files which are very big (around 1000 pages each). How can I set maxFieldLength to unlimited? Thanks so much for your help in advance, Xiaohui
how to set maxFieldLength to unlimitd
I need index and search some pdf files which are very big (around 1000 pages each). How can I set maxFieldLength to unlimited? Thanks so much for your help in advance, Xiaohui
RE: how to set maxFieldLength to unlimitd
Thanks so much for your help! Xiaohui -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, November 30, 2010 2:01 PM To: solr-user@lucene.apache.org Subject: Re: how to set maxFieldLength to unlimitd Set the maxFieldLength value in solrconfig.xml to, say, 2147483647 Also, see this thread for a common gotcha: http://lucene.472066.n3.nabble.com/Solr-ignoring-maxFieldLength-td473263.html , it appears you can just comment out the one in the mainIndex section. Best Erick On Tue, Nov 30, 2010 at 1:48 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] xiao...@mail.nlm.nih.gov wrote: I need index and search some pdf files which are very big (around 1000 pages each). How can I set maxFieldLength to unlimited? Thanks so much for your help in advance, Xiaohui
RE: how to set maxFieldLength to unlimitd
I set maxFieldLength to 2147483647, restarted tomcat and re-indexed pdf files again. I also commented out the one in the mainIndex section. Unfortunately the files are still chopped out if the size of file is more than 20MB. Any suggestions? I really appreciate your help! Xiaohui -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, November 30, 2010 2:01 PM To: solr-user@lucene.apache.org Subject: Re: how to set maxFieldLength to unlimitd Set the maxFieldLength value in solrconfig.xml to, say, 2147483647 Also, see this thread for a common gotcha: http://lucene.472066.n3.nabble.com/Solr-ignoring-maxFieldLength-td473263.html , it appears you can just comment out the one in the mainIndex section. Best Erick On Tue, Nov 30, 2010 at 1:48 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] xiao...@mail.nlm.nih.gov wrote: I need index and search some pdf files which are very big (around 1000 pages each). How can I set maxFieldLength to unlimited? Thanks so much for your help in advance, Xiaohui
RE: how to deal with virtual collection in solr?
Thanks so much for your help, Jan Høydahl. Have a great weekend! Xiaohui -Original Message- From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com] Sent: Friday, September 03, 2010 3:46 AM To: solr-user@lucene.apache.org Subject: Re: how to deal with virtual collection in solr? You did not supply your actual query. Try to add a q=foobar parameter, also you don't need a before shards since you have the ?. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 1. sep. 2010, at 20.14, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote: Thank you, Jan. Unfortunately I got following exception when I use http://localhost:8983/solr/aapublic/select?shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/ . * Aug 31, 2010 4:54:42 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at java.io.StringReader.init(StringReader.java:33) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:197) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:78) at org.apache.solr.search.QParser.getQuery(QParser.java:131) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:89) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) * -Original Message- From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com] Sent: Tuesday, August 31, 2010 2:15 PM To: solr-user@lucene.apache.org Subject: Re: how to deal with virtual collection in solr? Hi, If you have multiple cores defined in your solr.xml you need to issue your queries to one of the cores. Below it seems as if you are lacking core name. Try instead: http://localhost:8983/solr/aapublic/select?shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/ And as Lance pointed out, make sure your XML files conform to the Solr XML format (http://wiki.apache.org/solr/UpdateXmlMessages). -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 27. aug. 2010, at 15.04, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote: Thank you, Jan Høydahl. I used http://localhost:8983/solr/select?shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/. I got a error Missing solr core name in path. I have aapublic and aaprivate cores. I also got a error if I used http://localhost:8983/solr/aapublic/select?shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/. I got a null exception java.lang.NullPointerException. My collections are xml files. Please let me if I can use the following way you suggested. curl http://localhost:8983/solr/update/extract?literal.collection=aaprivateliteral.id=doc1commit=true; -F fi...@myfile.xml Thanks so much as always! Xiaohui -Original Message- From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com] Sent: Friday, August 27, 2010 7:42 AM To: solr-user@lucene.apache.org Subject: Re: how
RE: how to deal with virtual collection in solr?
Thank you, Jan. Unfortunately I got following exception when I use http://localhost:8983/solr/aapublic/select?shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/ . * Aug 31, 2010 4:54:42 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at java.io.StringReader.init(StringReader.java:33) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:197) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:78) at org.apache.solr.search.QParser.getQuery(QParser.java:131) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:89) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) * -Original Message- From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com] Sent: Tuesday, August 31, 2010 2:15 PM To: solr-user@lucene.apache.org Subject: Re: how to deal with virtual collection in solr? Hi, If you have multiple cores defined in your solr.xml you need to issue your queries to one of the cores. Below it seems as if you are lacking core name. Try instead: http://localhost:8983/solr/aapublic/select?shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/ And as Lance pointed out, make sure your XML files conform to the Solr XML format (http://wiki.apache.org/solr/UpdateXmlMessages). -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 27. aug. 2010, at 15.04, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote: Thank you, Jan Høydahl. I used http://localhost:8983/solr/select?shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/. I got a error Missing solr core name in path. I have aapublic and aaprivate cores. I also got a error if I used http://localhost:8983/solr/aapublic/select?shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/. I got a null exception java.lang.NullPointerException. My collections are xml files. Please let me if I can use the following way you suggested. curl http://localhost:8983/solr/update/extract?literal.collection=aaprivateliteral.id=doc1commit=true; -F fi...@myfile.xml Thanks so much as always! Xiaohui -Original Message- From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com] Sent: Friday, August 27, 2010 7:42 AM To: solr-user@lucene.apache.org Subject: Re: how to deal with virtual collection in solr? Hi, Version 1.4.1 does not support the SolrCloud style sharding. In 1.4.1, please use this style: shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/ However, since schema is the same, I'd opt for one index with a collections field as the filter. You can add that field to your schema, and then inject it as metadata on the ExtractingRequestHandler call: curl http://localhost:8983/solr/update/extract?literal.collection=aaprivateliteral.id=doc1commit=true; -F fi...@myfile.pdf -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com
questions about synonyms
Hello, I have an couple of questions about synonyms. 1. I got a very big text file of synonyms. How I can use it? Do I need to index this text file first? 2. Is there a way to do synonyms' highlight in search result? 3. Does anyone use WordNet to solr? Thanks so much in advance,
questions about synonyms
Hello, I have an couple of questions about synonyms. 1. I got a very big text file of synonyms. How I can use it? Do I need to index this text file first? 2. Is there a way to do synonyms' highlight in search result? 3. Does anyone use WordNet to solr? Thanks so much in advance,
RE: how to deal with virtual collection in solr?
Thanks so much, I really appreciate your help! Have a great weekend! Xiaohui -Original Message- From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com] Sent: Friday, August 27, 2010 7:42 AM To: solr-user@lucene.apache.org Subject: Re: how to deal with virtual collection in solr? Hi, Version 1.4.1 does not support the SolrCloud style sharding. In 1.4.1, please use this style: shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/ However, since schema is the same, I'd opt for one index with a collections field as the filter. You can add that field to your schema, and then inject it as metadata on the ExtractingRequestHandler call: curl http://localhost:8983/solr/update/extract?literal.collection=aaprivateliteral.id=doc1commit=true; -F fi...@myfile.pdf -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 26. aug. 2010, at 20.41, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote: Thanks so much for your help! I will try it. -Original Message- From: Thomas Joiner [mailto:thomas.b.joi...@gmail.com] Sent: Thursday, August 26, 2010 2:36 PM To: solr-user@lucene.apache.org Subject: Re: how to deal with virtual collection in solr? I don't know about the shards, etc. However I recently encountered that exception while indexing pdfs as well. The way that I resolved it was to upgrade to a nightly build of Solr. (You can find them https://hudson.apache.org/hudson/view/Solr/job/Solr-trunk/). The problem is that the version of Tika that 1.4.1 using is a very old version of Tika, which uses a old version of PDFBox to do its parsing. (You might be able to fix the problem just by replacing the Tika jars...however I don't know if there have been any API changes so I can't really suggest that.) We didn't upgrade to trunk in order for that functionality, but it was nice that it started working. (The PDFs we'll be indexing won't be of later versions, but a test file was). On Thu, Aug 26, 2010 at 1:27 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] xiao...@mail.nlm.nih.gov wrote: Thanks so much for your help, Jan Høydahl! I made multiple cores (aa public, aa private, bb public and bb private). I knew how to query them individually. Please tell me if I can do a combinations through shards parameter now. If yes, I tried to append shards=aapub,bbpub after query string. Unfortunately it didn't work. Actually all of content is the same. I don't have collection field in xml files. Please tell me how I can set a collection field in schema and simply search collection through filter. I used curl to index pdf files. I use Solr 1.4.1. I got the following error when I index pdf with version 1.5 and 1.6. * html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 500 /title /head bodyh2HTTP ERROR: 500/h2preorg.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@134ae32 org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@134ae32 at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202
RE: how to deal with virtual collection in solr?
Thank you, Jan Høydahl. I used http://localhost:8983/solr/select?shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/. I got a error Missing solr core name in path. I have aapublic and aaprivate cores. I also got a error if I used http://localhost:8983/solr/aapublic/select?shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/. I got a null exception java.lang.NullPointerException. My collections are xml files. Please let me if I can use the following way you suggested. curl http://localhost:8983/solr/update/extract?literal.collection=aaprivateliteral.id=doc1commit=true; -F fi...@myfile.xml Thanks so much as always! Xiaohui -Original Message- From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com] Sent: Friday, August 27, 2010 7:42 AM To: solr-user@lucene.apache.org Subject: Re: how to deal with virtual collection in solr? Hi, Version 1.4.1 does not support the SolrCloud style sharding. In 1.4.1, please use this style: shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/ However, since schema is the same, I'd opt for one index with a collections field as the filter. You can add that field to your schema, and then inject it as metadata on the ExtractingRequestHandler call: curl http://localhost:8983/solr/update/extract?literal.collection=aaprivateliteral.id=doc1commit=true; -F fi...@myfile.pdf -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 26. aug. 2010, at 20.41, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote: Thanks so much for your help! I will try it. -Original Message- From: Thomas Joiner [mailto:thomas.b.joi...@gmail.com] Sent: Thursday, August 26, 2010 2:36 PM To: solr-user@lucene.apache.org Subject: Re: how to deal with virtual collection in solr? I don't know about the shards, etc. However I recently encountered that exception while indexing pdfs as well. The way that I resolved it was to upgrade to a nightly build of Solr. (You can find them https://hudson.apache.org/hudson/view/Solr/job/Solr-trunk/). The problem is that the version of Tika that 1.4.1 using is a very old version of Tika, which uses a old version of PDFBox to do its parsing. (You might be able to fix the problem just by replacing the Tika jars...however I don't know if there have been any API changes so I can't really suggest that.) We didn't upgrade to trunk in order for that functionality, but it was nice that it started working. (The PDFs we'll be indexing won't be of later versions, but a test file was). On Thu, Aug 26, 2010 at 1:27 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] xiao...@mail.nlm.nih.gov wrote: Thanks so much for your help, Jan Høydahl! I made multiple cores (aa public, aa private, bb public and bb private). I knew how to query them individually. Please tell me if I can do a combinations through shards parameter now. If yes, I tried to append shards=aapub,bbpub after query string. Unfortunately it didn't work. Actually all of content is the same. I don't have collection field in xml files. Please tell me how I can set a collection field in schema and simply search collection through filter. I used curl to index pdf files. I use Solr 1.4.1. I got the following error when I index pdf with version 1.5 and 1.6. * html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 500 /title /head bodyh2HTTP ERROR: 500/h2preorg.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@134ae32 org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@134ae32 at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle
RE: how to deal with virtual collection in solr?
Thanks so much for your help, Jan Høydahl! I made multiple cores (aa public, aa private, bb public and bb private). I knew how to query them individually. Please tell me if I can do a combinations through shards parameter now. If yes, I tried to append shards=aapub,bbpub after query string. Unfortunately it didn't work. Actually all of content is the same. I don't have collection field in xml files. Please tell me how I can set a collection field in schema and simply search collection through filter. I used curl to index pdf files. I use Solr 1.4.1. I got the following error when I index pdf with version 1.5 and 1.6. * html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 500 /title /head bodyh2HTTP ERROR: 500/h2preorg.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@134ae32 org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@134ae32 at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@134ae32 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190) ... 22 more Caused by: java.lang.NullPointerException at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194) at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182) at org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:226) at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216) at org.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:149) at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:53) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:51) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119) ... 24 more /pre pRequestURI=/solr/lhcpdf/update/extract/ppismalla href=http://jetty.mortbay.org/;Powered by Jetty:///a/small/i/pbr/ br/ *** -Original Message- From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com] Sent: Wednesday, August 25, 2010 4:34 PM To: solr-user@lucene.apache.org Subject: Re: how to deal with virtual collection in solr? 1. Currently we use Verity and have more than 20 collections, each collection has a index for public items and a index for private items. So there are virtual collections which point to each collection and a virtual collection which points to all. For example, we have AA and BB collections. AA
RE: how to deal with virtual collection in solr?
Thanks so much for your help! I will try it. -Original Message- From: Thomas Joiner [mailto:thomas.b.joi...@gmail.com] Sent: Thursday, August 26, 2010 2:36 PM To: solr-user@lucene.apache.org Subject: Re: how to deal with virtual collection in solr? I don't know about the shards, etc. However I recently encountered that exception while indexing pdfs as well. The way that I resolved it was to upgrade to a nightly build of Solr. (You can find them https://hudson.apache.org/hudson/view/Solr/job/Solr-trunk/). The problem is that the version of Tika that 1.4.1 using is a very old version of Tika, which uses a old version of PDFBox to do its parsing. (You might be able to fix the problem just by replacing the Tika jars...however I don't know if there have been any API changes so I can't really suggest that.) We didn't upgrade to trunk in order for that functionality, but it was nice that it started working. (The PDFs we'll be indexing won't be of later versions, but a test file was). On Thu, Aug 26, 2010 at 1:27 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] xiao...@mail.nlm.nih.gov wrote: Thanks so much for your help, Jan Høydahl! I made multiple cores (aa public, aa private, bb public and bb private). I knew how to query them individually. Please tell me if I can do a combinations through shards parameter now. If yes, I tried to append shards=aapub,bbpub after query string. Unfortunately it didn't work. Actually all of content is the same. I don't have collection field in xml files. Please tell me how I can set a collection field in schema and simply search collection through filter. I used curl to index pdf files. I use Solr 1.4.1. I got the following error when I index pdf with version 1.5 and 1.6. * html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 500 /title /head bodyh2HTTP ERROR: 500/h2preorg.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@134ae32 org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@134ae32 at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@134ae32 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190) ... 22 more Caused by: java.lang.NullPointerException at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194) at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182) at org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:226) at org.pdfbox.util.PDFTextStripper.writeText
how to deal with virtual collection in solr?
Hello, I just started to investigate Solr several weeks ago. Our current project uses Verity search engine which is commercial product and the company is out of business. I am trying to evaluate if Solr can meet our requirements. I have following questions. 1. Currently we use Verity and have more than 20 collections, each collection has a index for public items and a index for private items. So there are virtual collections which point to each collection and a virtual collection which points to all. For example, we have AA and BB collections. AA virtual collection -- (AA index for public items and AA index for private items). BB virtual collection -- (BB index for public items and BB index for private items). All virtual collection -- (AA index for public items and AA index for private items, BB index for public items and BB index for private items). Would you please tell me what I should do for this if I use Solr? 2. Our project has different kind format files I need index them. For example, xml files, pdf files and text files. Is it possible for Solr to return a search result from all? 3. I got a error when I index pdf files which are version 1.5 or 1.6. Would you please tell me if there is a patch to fix it? Thanks so much in advance,
RE: how to deal with virtual collection in solr?
Thank you for letting me know. Does Autonomy still support Verity search engine? -Original Message- From: Walter Underwood [mailto:wun...@wunderwood.org] Sent: Wednesday, August 25, 2010 3:41 PM To: solr-user@lucene.apache.org Subject: Re: how to deal with virtual collection in solr? On Aug 25, 2010, at 12:18 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote: I just started to investigate Solr several weeks ago. Our current project uses Verity search engine which is commercial product and the company is out of business. Verity is not out of business. They were acquired by Autonomy. wunder -- Walter Underwood
RE: ANNOUNCE: Stump Hoss @ Lucene Revolution
Hello, I just started to investigate Solr several weeks ago. Our current project uses Verity search engine which is commercial product and the company is out of business. I am trying to evaluate if Solr can meet our requirements. I have following questions. 1. Currently we use Verity and have more than 20 collections, each collection has a index for public items and a index for private items. So there are virtual collections which point to each collection and a virtual collection which points to all. For example, we have AA and BB collections. AA virtual collection -- (AA index for public items and AA index for private items). BB virtual collection -- (BB index for public items and BB index for private items). All virtual collection -- (AA index for public items and AA index for private items, BB index for public items and BB index for private items). Would you please tell me what I should do for this if I use Solr? 2. Our project has different kind format files I need index them. For example, xml files, pdf files and text files. Is it possible for Solr to return a search result from all? 3. I got a error when I index pdf files which are version 1.5 or 1.6. Would you please tell me if there is a patch to fix it? Thanks so much in advance, -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Monday, August 23, 2010 4:50 PM To: solr-user@lucene.apache.org Subject: ANNOUNCE: Stump Hoss @ Lucene Revolution Hey everybody, As you (hopefully) have heard by now, Lucid Imagination is sponsoring a Lucene/Solr conference in Boston about 6 weeks from now. We've got a lot of really great speakers lined up to give some really interesting technical talks, so I offered to do something a little bit different. I'm going to be in the hot seat for a Stump The Chump style session, where I'll be answering Solr questions live and unrehearsed... http://bit.ly/stump-hoss The goal is to really make me sweat and work hard to think of creative solutions to non-trivial problems on the spot -- like when I answer questions on the solr-user mailing list, except in a crowded room with hundreds of people staring at me and laughing. But in order to be a success, we need your questions/problems/challenges! If you had a tough situation with Solr that you managed to solve with a creative solution (or haven't solved yet) and are interesting to see what type of solution I might come up with under pressure, please email a description of your problem to st...@lucenerevolution.org -- More details online... http://lucenerevolution.org/Presentation-Abstracts-Day1#stump-hostetter Even if you won't be able to make it to Boston, please send in any challenging problems you would be interested to see me tackle under the gun. The session will be recorded, and the video will be posted online shortly after the conference has ended. And if you can make it to Boston: all the more fun to watch live and in person (and maybe answer follow up questions) In any case, it should be a very interesting session: folks will either get to learn a lot, or laugh at me a lot, or both. (win/win/win) -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
multiple values
Hello, I only can display one author which is last one. It looks like overwrite others. In xml, I have more than one authorname/author in authorlist/authorlist. In data_config.xml, I put the field column=Author xpath=/PublishedArticles/Article/AuthorList/Author /. In schema.xml, I put field name=Author type=text indexed=true stored=true multiValued=true/. Please let me know if I did something wrong, or how I can display it in jsp. I really appreciate your help!
different pdf version issue
I have a problem to index pdf files which are pdf version 1.5 or 1.6. There is no problem at all for me to index pdf files with version 1.4. Here is the error I got: h2HTTP ERROR: 500/h2preorg.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@44ffb2 Does anyone have this problem when you indexed pdf files? Thanks so much in advance, Xiaohui
index pdf files
I wrote a simple java program to import a pdf file. I can get a result when I do search *:* from admin page. I get nothing if I search a word. I wonder if I did something wrong or miss set something. Here is part of result I get when do *:* search: * - doc - arr name=attr_Author strHristovski D/str /arr - arr name=attr_Content-Type strapplication/pdf/str /arr - arr name=attr_Keywords strmicroarray analysis, literature-based discovery, semantic predications, natural language processing/str /arr - arr name=attr_Last-Modified strThu Aug 12 10:58:37 EDT 2010/str /arr - arr name=attr_content strCombining Semantic Relations and DNA Microarray Data for Novel Hypotheses Generation Combining Semantic Relations and DNA Microarray Data for Novel Hypotheses Generation Dimitar Hristovski, PhD,1 Andrej Kastrin,2... * Please help me out if anyone has experience with pdf files. I really appreciate it! Thanks so much,
RE: index pdf files
Thanks so much. I didn't know how to make any changes in schema.xml for pdf files. I used solr default schema.xml. Please tell me what I need do in schema.xml. The simple java program I use is following. I also attached that pdf file. I really appreciate your help! * public class importPDF { public static void main(String[] args) { try { String fileName = pub2009001.pdf; String solrId = pub2009001.pdf; indexFilesSolrCell(fileName, solrId); } catch (Exception ex) { System.out.println(ex.toString()); } } public static void indexFilesSolrCell(String fileName, String solrId) throws IOException, SolrServerException { String urlString = http://lhcinternal.nlm.nih.gov:8989/solr/lhcpdf;; SolrServer solr = new CommonsHttpSolrServer(urlString); ContentStreamUpdateRequest up = new ContentStreamUpdateRequest(/update/extract); up.addFile(new File(fileName)); up.setParam(literal.id, solrId); up.setParam(uprefix, attr_); up.setParam(fmap.content, attr_content); up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); solr.request(up); } } -Original Message- From: Marco Martinez [mailto:mmarti...@paradigmatecnologico.com] Sent: Thursday, August 12, 2010 11:45 AM To: solr-user@lucene.apache.org Subject: Re: index pdf files To help you we need the description of your fields in your schema.xml and the query that you do when you search only a single word. Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42 2010/8/12 Ma, Xiaohui (NIH/NLM/LHC) [C] xiao...@mail.nlm.nih.gov I wrote a simple java program to import a pdf file. I can get a result when I do search *:* from admin page. I get nothing if I search a word. I wonder if I did something wrong or miss set something. Here is part of result I get when do *:* search: * - doc - arr name=attr_Author strHristovski D/str /arr - arr name=attr_Content-Type strapplication/pdf/str /arr - arr name=attr_Keywords strmicroarray analysis, literature-based discovery, semantic predications, natural language processing/str /arr - arr name=attr_Last-Modified strThu Aug 12 10:58:37 EDT 2010/str /arr - arr name=attr_content strCombining Semantic Relations and DNA Microarray Data for Novel Hypotheses Generation Combining Semantic Relations and DNA Microarray Data for Novel Hypotheses Generation Dimitar Hristovski, PhD,1 Andrej Kastrin,2... * Please help me out if anyone has experience with pdf files. I really appreciate it! Thanks so much,
RE: index pdf files
Does anyone know if I need define fields in schema.xml for indexing pdf files? If I need, please tell me how I can do it. I defined fields in schema.xml and created data-configuration file by using xpath for xml files. Would you please tell me if I need do it for pdf files and how I can do? Thanks so much for your help as always! -Original Message- From: Marco Martinez [mailto:mmarti...@paradigmatecnologico.com] Sent: Thursday, August 12, 2010 11:45 AM To: solr-user@lucene.apache.org Subject: Re: index pdf files To help you we need the description of your fields in your schema.xml and the query that you do when you search only a single word. Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42 2010/8/12 Ma, Xiaohui (NIH/NLM/LHC) [C] xiao...@mail.nlm.nih.gov I wrote a simple java program to import a pdf file. I can get a result when I do search *:* from admin page. I get nothing if I search a word. I wonder if I did something wrong or miss set something. Here is part of result I get when do *:* search: * - doc - arr name=attr_Author strHristovski D/str /arr - arr name=attr_Content-Type strapplication/pdf/str /arr - arr name=attr_Keywords strmicroarray analysis, literature-based discovery, semantic predications, natural language processing/str /arr - arr name=attr_Last-Modified strThu Aug 12 10:58:37 EDT 2010/str /arr - arr name=attr_content strCombining Semantic Relations and DNA Microarray Data for Novel Hypotheses Generation Combining Semantic Relations and DNA Microarray Data for Novel Hypotheses Generation Dimitar Hristovski, PhD,1 Andrej Kastrin,2... * Please help me out if anyone has experience with pdf files. I really appreciate it! Thanks so much,
RE: index pdf files
Thanks so much for your help! I defined dynamic field in schema.xml as following: dynamicField name=metadata_* type=string indexed=true stored=true multiValued=false/ But I wonder what I should put for uniqueKey/uniqueKey. I really appreciate your help! -Original Message- From: Stefan Moises [mailto:moi...@shoptimax.de] Sent: Thursday, August 12, 2010 1:58 PM To: solr-user@lucene.apache.org Subject: Re: index pdf files Maybe this helps: http://www.packtpub.com/article/indexing-data-solr-1.4-enterprise-search-server-2 Cheers, Stefan Am 12.08.2010 19:45, schrieb Ma, Xiaohui (NIH/NLM/LHC) [C]: Does anyone know if I need define fields in schema.xml for indexing pdf files? If I need, please tell me how I can do it. I defined fields in schema.xml and created data-configuration file by using xpath for xml files. Would you please tell me if I need do it for pdf files and how I can do? Thanks so much for your help as always! -Original Message- From: Marco Martinez [mailto:mmarti...@paradigmatecnologico.com] Sent: Thursday, August 12, 2010 11:45 AM To: solr-user@lucene.apache.org Subject: Re: index pdf files To help you we need the description of your fields in your schema.xml and the query that you do when you search only a single word. Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42 2010/8/12 Ma, Xiaohui (NIH/NLM/LHC) [C]xiao...@mail.nlm.nih.gov I wrote a simple java program to import a pdf file. I can get a result when I do search *:* from admin page. I get nothing if I search a word. I wonder if I did something wrong or miss set something. Here is part of result I get when do *:* search: * -doc -arr name=attr_Author strHristovski D/str /arr -arr name=attr_Content-Type strapplication/pdf/str /arr -arr name=attr_Keywords strmicroarray analysis, literature-based discovery, semantic predications, natural language processing/str /arr -arr name=attr_Last-Modified strThu Aug 12 10:58:37 EDT 2010/str /arr -arr name=attr_content strCombining Semantic Relations and DNA Microarray Data for Novel Hypotheses Generation Combining Semantic Relations and DNA Microarray Data for Novel Hypotheses Generation Dimitar Hristovski, PhD,1 Andrej Kastrin,2... * Please help me out if anyone has experience with pdf files. I really appreciate it! Thanks so much, -- *** Stefan Moises Senior Softwareentwickler shoptimax GmbH Guntherstraße 45 a 90461 Nürnberg Amtsgericht Nürnberg HRB 21703 GF Friedrich Schreieck Tel.: 0911/25566-25 Fax: 0911/25566-29 moi...@shoptimax.de http://www.shoptimax.de ***
RE: index pdf files
Thanks so much. I got it work now. I really appreciate your help! Xiaohui -Original Message- From: Stefan Moises [mailto:moi...@shoptimax.de] Sent: Thursday, August 12, 2010 1:58 PM To: solr-user@lucene.apache.org Subject: Re: index pdf files Maybe this helps: http://www.packtpub.com/article/indexing-data-solr-1.4-enterprise-search-server-2 Cheers, Stefan Am 12.08.2010 19:45, schrieb Ma, Xiaohui (NIH/NLM/LHC) [C]: Does anyone know if I need define fields in schema.xml for indexing pdf files? If I need, please tell me how I can do it. I defined fields in schema.xml and created data-configuration file by using xpath for xml files. Would you please tell me if I need do it for pdf files and how I can do? Thanks so much for your help as always! -Original Message- From: Marco Martinez [mailto:mmarti...@paradigmatecnologico.com] Sent: Thursday, August 12, 2010 11:45 AM To: solr-user@lucene.apache.org Subject: Re: index pdf files To help you we need the description of your fields in your schema.xml and the query that you do when you search only a single word. Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42 2010/8/12 Ma, Xiaohui (NIH/NLM/LHC) [C]xiao...@mail.nlm.nih.gov I wrote a simple java program to import a pdf file. I can get a result when I do search *:* from admin page. I get nothing if I search a word. I wonder if I did something wrong or miss set something. Here is part of result I get when do *:* search: * -doc -arr name=attr_Author strHristovski D/str /arr -arr name=attr_Content-Type strapplication/pdf/str /arr -arr name=attr_Keywords strmicroarray analysis, literature-based discovery, semantic predications, natural language processing/str /arr -arr name=attr_Last-Modified strThu Aug 12 10:58:37 EDT 2010/str /arr -arr name=attr_content strCombining Semantic Relations and DNA Microarray Data for Novel Hypotheses Generation Combining Semantic Relations and DNA Microarray Data for Novel Hypotheses Generation Dimitar Hristovski, PhD,1 Andrej Kastrin,2... * Please help me out if anyone has experience with pdf files. I really appreciate it! Thanks so much, -- *** Stefan Moises Senior Softwareentwickler shoptimax GmbH Guntherstraße 45 a 90461 Nürnberg Amtsgericht Nürnberg HRB 21703 GF Friedrich Schreieck Tel.: 0911/25566-25 Fax: 0911/25566-29 moi...@shoptimax.de http://www.shoptimax.de ***
RE: index pdf files
I got the following error when I index some pdf files. I wonder if anyone has this issue before and how to fix it. Thanks so much in advance! *** html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 500 /title /head bodyh2HTTP ERROR: 500/h2preorg.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@44ffb2 org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@44ffb2 at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@44ffb2 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190) *** -Original Message- From: Stefan Moises [mailto:moi...@shoptimax.de] Sent: Thursday, August 12, 2010 1:58 PM To: solr-user@lucene.apache.org Subject: Re: index pdf files Maybe this helps: http://www.packtpub.com/article/indexing-data-solr-1.4-enterprise-search-server-2 Cheers, Stefan Am 12.08.2010 19:45, schrieb Ma, Xiaohui (NIH/NLM/LHC) [C]: Does anyone know if I need define fields in schema.xml for indexing pdf files? If I need, please tell me how I can do it. I defined fields in schema.xml and created data-configuration file by using xpath for xml files. Would you please tell me if I need do it for pdf files and how I can do? Thanks so much for your help as always! -Original Message- From: Marco Martinez [mailto:mmarti...@paradigmatecnologico.com] Sent: Thursday, August 12, 2010 11:45 AM To: solr-user@lucene.apache.org Subject: Re: index pdf files To help you we need the description of your fields in your schema.xml and the query that you do when you search only a single word. Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42 2010/8/12 Ma, Xiaohui (NIH/NLM/LHC) [C]xiao...@mail.nlm.nih.gov I wrote a simple java program to import a pdf file. I can get a result when I do search *:* from admin page. I get nothing if I search a word. I wonder if I did something wrong or miss set something. Here is part of result I get when do *:* search: * -doc -arr name=attr_Author strHristovski D/str /arr -arr name=attr_Content-Type strapplication/pdf/str /arr -arr name=attr_Keywords strmicroarray analysis, literature-based discovery, semantic predications, natural language processing/str /arr -arr name=attr_Last-Modified strThu Aug 12 10:58:37 EDT 2010/str /arr -arr name=attr_content
RE: PDF file
Thanks so much for your help! I got Remote Streaming is disabled error. Would you please tell me if I miss something? Thanks, -Original Message- From: Jayendra Patil [mailto:jayendra.patil@gmail.com] Sent: Tuesday, August 10, 2010 8:51 PM To: solr-user@lucene.apache.org Subject: Re: PDF file Try ... curl http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?stream.file= Full_Path_of_File/pub2009001.pdfliteral.id=777045commit=true stream.file - specify full path literal.extra params - specify any extra params if needed Regards, Jayendra On Tue, Aug 10, 2010 at 4:49 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] xiao...@mail.nlm.nih.gov wrote: Thanks so much for your help! I tried to index a pdf file and got the following. The command I used is curl ' http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?map.content=textmap.stream_name=idcommit=true' -F fi...@pub2009001.pdf Did I do something wrong? Do I need modify anything in schema.xml or other configuration file? [xiao...@lhcinternal lhc]$ curl ' http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?map.content=textmap.stream_name=idcommit=true' -F fi...@pub2009001.pdf html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 404 /title /head bodyh2HTTP ERROR: 404/h2preNOT_FOUND/pre pRequestURI=/solr/lhc/update/extract/ppismalla href= http://jetty.mortbay.org/;Powered by Jetty:///a/small/i/pbr/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ /body /html *** -Original Message- From: Sharp, Jonathan [mailto:jsh...@coh.org] Sent: Tuesday, August 10, 2010 4:37 PM To: solr-user@lucene.apache.org Subject: RE: PDF file Xiaohui, You need to add the following jars to the lib subdirectory of the solr config directory on your server. (path inside the solr 1.4.1 download) /dist/apache-solr-cell-1.4.1.jar plus all the jars in /contrib/extraction/lib HTH -Jon From: Ma, Xiaohui (NIH/NLM/LHC) [C] [xiao...@mail.nlm.nih.gov] Sent: Tuesday, August 10, 2010 11:57 AM To: 'solr-user@lucene.apache.org' Subject: RE: PDF file Does anyone have any experience with PDF file? I really appreciate your help! Thanks so much in advance. -Original Message- From: Ma, Xiaohui (NIH/NLM/LHC) [C] Sent: Tuesday, August 10, 2010 10:37 AM To: 'solr-user@lucene.apache.org' Subject: PDF file I have a lot of pdf files. I am trying to import pdf files to solr and index them. I added ExtractingRequestHandler to solrconfig.xml. Please tell me if I need download some jar files. In the Solr1.4 Enterprise Search Server book, use following command to import a mccm.pdf. curl ' http://localhost:8983/solr/solr-home/update/extract?map.content=textmap.stream_name=idcommit=true' -F fi...@mccm.pdf Please tell me if there is a way to import pdf files from a directory. Thanks so much for your help! - SECURITY/CONFIDENTIALITY WARNING: This message and any attachments are intended solely for the individual or entity to which they are addressed. This communication may contain information that is privileged, confidential, or exempt from disclosure under applicable law (e.g., personal health information, research data, financial information). Because this e-mail has been sent without encryption, individuals other than the intended recipient may be able to view the information, forward it to others or tamper with the information without the knowledge or consent of the sender. If you are not the intended recipient, or the employee or person responsible for delivering the message to the intended recipient, any dissemination, distribution or copying of the communication is strictly prohibited. If you received the communication in error, please notify the sender immediately by replying to this message and deleting the message and any accompanying files from your system. If, due to the security risks, you do not wish to receive further communications via e-mail, please reply to this message and inform the sender that you do not wish to receive further e-mail from the sender. -
RE: PDF file
Thanks, I knew how to enable Streaming. But I got another error, ERROR:unknown field 'metadata_trapped'. Does anyone know how to match up with SolrCell metadata? I found the following in schema.xml. I don't know how to make changes for PDF. !-- Common metadata fields, named specifically to match up with SolrCell metadata when parsing rich documents such as Word, PDF. Some fields are multiValued only because Tika currently may return multiple values for them. -- I really appreciate your help! Thanks, -Original Message- From: Ma, Xiaohui (NIH/NLM/LHC) [C] Sent: Wednesday, August 11, 2010 10:36 AM To: solr-user@lucene.apache.org Cc: 'jayendra.patil@gmail.com' Subject: RE: PDF file Thanks so much for your help! I got Remote Streaming is disabled error. Would you please tell me if I miss something? Thanks, -Original Message- From: Jayendra Patil [mailto:jayendra.patil@gmail.com] Sent: Tuesday, August 10, 2010 8:51 PM To: solr-user@lucene.apache.org Subject: Re: PDF file Try ... curl http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?stream.file= Full_Path_of_File/pub2009001.pdfliteral.id=777045commit=true stream.file - specify full path literal.extra params - specify any extra params if needed Regards, Jayendra On Tue, Aug 10, 2010 at 4:49 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] xiao...@mail.nlm.nih.gov wrote: Thanks so much for your help! I tried to index a pdf file and got the following. The command I used is curl ' http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?map.content=textmap.stream_name=idcommit=true' -F fi...@pub2009001.pdf Did I do something wrong? Do I need modify anything in schema.xml or other configuration file? [xiao...@lhcinternal lhc]$ curl ' http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?map.content=textmap.stream_name=idcommit=true' -F fi...@pub2009001.pdf html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 404 /title /head bodyh2HTTP ERROR: 404/h2preNOT_FOUND/pre pRequestURI=/solr/lhc/update/extract/ppismalla href= http://jetty.mortbay.org/;Powered by Jetty:///a/small/i/pbr/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ /body /html *** -Original Message- From: Sharp, Jonathan [mailto:jsh...@coh.org] Sent: Tuesday, August 10, 2010 4:37 PM To: solr-user@lucene.apache.org Subject: RE: PDF file Xiaohui, You need to add the following jars to the lib subdirectory of the solr config directory on your server. (path inside the solr 1.4.1 download) /dist/apache-solr-cell-1.4.1.jar plus all the jars in /contrib/extraction/lib HTH -Jon From: Ma, Xiaohui (NIH/NLM/LHC) [C] [xiao...@mail.nlm.nih.gov] Sent: Tuesday, August 10, 2010 11:57 AM To: 'solr-user@lucene.apache.org' Subject: RE: PDF file Does anyone have any experience with PDF file? I really appreciate your help! Thanks so much in advance. -Original Message- From: Ma, Xiaohui (NIH/NLM/LHC) [C] Sent: Tuesday, August 10, 2010 10:37 AM To: 'solr-user@lucene.apache.org' Subject: PDF file I have a lot of pdf files. I am trying to import pdf files to solr and index them. I added ExtractingRequestHandler to solrconfig.xml. Please tell me if I need download some jar files. In the Solr1.4 Enterprise Search Server book, use following command to import a mccm.pdf. curl ' http://localhost:8983/solr/solr-home/update/extract?map.content=textmap.stream_name=idcommit=true' -F fi...@mccm.pdf Please tell me if there is a way to import pdf files from a directory. Thanks so much for your help! - SECURITY/CONFIDENTIALITY WARNING: This message and any attachments are intended solely for the individual or entity to which they are addressed. This communication may contain information that is privileged, confidential, or exempt from disclosure under applicable law (e.g., personal health information, research data, financial information). Because this e-mail has been sent without encryption, individuals other than the intended recipient may be able to view the information, forward it to others or tamper with the information without the knowledge or consent of the sender. If you are not the intended recipient, or the employee or person responsible for delivering the message to the intended recipient, any dissemination, distribution or copying of the communication is strictly prohibited. If you received the communication in error, please notify the sender immediately by replying to this message and deleting the message and any accompanying files from your system. If, due to the security risks, you do not wish to receive further communications via e-mail, please reply to this message and inform
RE: hl.usePhraseHighlighter
Thanks so much for your help! It works. I really appreciate it. -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Monday, August 09, 2010 6:05 PM To: solr-user@lucene.apache.org Subject: RE: hl.usePhraseHighlighter I used text type and found the following in schema.xml. I don't know which ones I should remove. *** You should remove filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ from both index and query time.
PDF file
I have a lot of pdf files. I am trying to import pdf files to solr and index them. I added ExtractingRequestHandler to solrconfig.xml. Please tell me if I need download some jar files. In the Solr1.4 Enterprise Search Server book, use following command to import a mccm.pdf. curl 'http://localhost:8983/solr/solr-home/update/extract?map.content=textmap.stream_name=idcommit=true' -F fi...@mccm.pdf Please tell me if there is a way to import pdf files from a directory. Thanks so much for your help!
RE: PDF file
Does anyone have any experience with PDF file? I really appreciate your help! Thanks so much in advance. -Original Message- From: Ma, Xiaohui (NIH/NLM/LHC) [C] Sent: Tuesday, August 10, 2010 10:37 AM To: 'solr-user@lucene.apache.org' Subject: PDF file I have a lot of pdf files. I am trying to import pdf files to solr and index them. I added ExtractingRequestHandler to solrconfig.xml. Please tell me if I need download some jar files. In the Solr1.4 Enterprise Search Server book, use following command to import a mccm.pdf. curl 'http://localhost:8983/solr/solr-home/update/extract?map.content=textmap.stream_name=idcommit=true' -F fi...@mccm.pdf Please tell me if there is a way to import pdf files from a directory. Thanks so much for your help!
RE: PDF file
Thanks so much for your help! I tried to index a pdf file and got the following. The command I used is curl 'http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?map.content=textmap.stream_name=idcommit=true' -F fi...@pub2009001.pdf Did I do something wrong? Do I need modify anything in schema.xml or other configuration file? [xiao...@lhcinternal lhc]$ curl 'http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?map.content=textmap.stream_name=idcommit=true' -F fi...@pub2009001.pdf html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 404 /title /head bodyh2HTTP ERROR: 404/h2preNOT_FOUND/pre pRequestURI=/solr/lhc/update/extract/ppismalla href=http://jetty.mortbay.org/;Powered by Jetty:///a/small/i/pbr/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ /body /html *** -Original Message- From: Sharp, Jonathan [mailto:jsh...@coh.org] Sent: Tuesday, August 10, 2010 4:37 PM To: solr-user@lucene.apache.org Subject: RE: PDF file Xiaohui, You need to add the following jars to the lib subdirectory of the solr config directory on your server. (path inside the solr 1.4.1 download) /dist/apache-solr-cell-1.4.1.jar plus all the jars in /contrib/extraction/lib HTH -Jon From: Ma, Xiaohui (NIH/NLM/LHC) [C] [xiao...@mail.nlm.nih.gov] Sent: Tuesday, August 10, 2010 11:57 AM To: 'solr-user@lucene.apache.org' Subject: RE: PDF file Does anyone have any experience with PDF file? I really appreciate your help! Thanks so much in advance. -Original Message- From: Ma, Xiaohui (NIH/NLM/LHC) [C] Sent: Tuesday, August 10, 2010 10:37 AM To: 'solr-user@lucene.apache.org' Subject: PDF file I have a lot of pdf files. I am trying to import pdf files to solr and index them. I added ExtractingRequestHandler to solrconfig.xml. Please tell me if I need download some jar files. In the Solr1.4 Enterprise Search Server book, use following command to import a mccm.pdf. curl 'http://localhost:8983/solr/solr-home/update/extract?map.content=textmap.stream_name=idcommit=true' -F fi...@mccm.pdf Please tell me if there is a way to import pdf files from a directory. Thanks so much for your help! - SECURITY/CONFIDENTIALITY WARNING: This message and any attachments are intended solely for the individual or entity to which they are addressed. This communication may contain information that is privileged, confidential, or exempt from disclosure under applicable law (e.g., personal health information, research data, financial information). Because this e-mail has been sent without encryption, individuals other than the intended recipient may be able to view the information, forward it to others or tamper with the information without the knowledge or consent of the sender. If you are not the intended recipient, or the employee or person responsible for delivering the message to the intended recipient, any dissemination, distribution or copying of the communication is strictly prohibited. If you received the communication in error, please notify the sender immediately by replying to this message and deleting the message and any accompanying files from your system. If, due to the security risks, you do not wish to receive further communications via e-mail, please reply to this message and inform the sender that you do not wish to receive further e-mail from the sender. -
hl.usePhraseHighlighter
I am trying to do exactly match. For example, I hope only get study highlighted if I search study, not others (studies, studied and so on). I didn't find any function for it from SolrQuery. I added following in solrconfig.xml str name=hl.usePhraseHighlightertrue/str. Unfortunately I didn't get it work. Please help me out. Thanks so much, Xiaohui
RE: hl.usePhraseHighlighter
Thanks so much for your help! I used text type and found the following in schema.xml. I don't know which ones I should remove. *** fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType *** -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Monday, August 09, 2010 4:32 PM To: solr-user@lucene.apache.org Subject: Re: hl.usePhraseHighlighter I am trying to do exactly match. For example, I hope only get study highlighted if I search study, not others (studies, studied and so on). This has nothing to do with highlighting and its parameters. You need to remove stem filter factory (porter, snowball) from your analyzer chain. Re-start solr and re-index is also necessary.
how to highlight string in jsp
Hello, I am trying to display the highlight string in different color on jsp. I use following in servlet. query.setHighlight(true).setHighlightSnippets(1); query.setParam(hl.fl, Abstract); I wonder how I can display it in jsp Thanks in advanced. xm
display solr result in JSP
I am new for solr. Just got example xml file index and search by following solr tutorial. I wonder how I can get the search result display in a JSP. I really appreciate any suggestions you can give. Thanks so much, Xiaohui
RE: display solr result in JSP
Thanks so much for your reply. I don't have much experience at JSP. I found tag library, and am trying to use xsltlib:apply xml=%= url.getContent().toString() % xsl=/xsl/result.xsl/ . Unfortunately I didn't get it work. Would you please give me more information? I really appreciate your help! Thanks, Xiaohui -Original Message- From: Ranveer [mailto:ranveer.s...@gmail.com] Sent: Wednesday, July 28, 2010 11:27 AM To: solr-user@lucene.apache.org Subject: Re: display solr result in JSP Hi, very simple to display value in jsp. if you are using solrj then simply store value in bean from java class and can display. same thing you can do in servlet too.. get the solr server response and return in bean or can display directly(in servlet). hope you will able to do. regards Ranveer On Wednesday 28 July 2010 08:11 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote: I am new for solr. Just got example xml file index and search by following solr tutorial. I wonder how I can get the search result display in a JSP. I really appreciate any suggestions you can give. Thanks so much, Xiaohui
new to solr
Hello, I am new to solr. I followed solr online tutorial to get the example work. The search result is xml. I wonder if there is a way to show result in a form. I saw there is example.xsl in conf/xslt directory. I really don't know how to do it. Anyone has some ideas for me. I really appreciate it! Thanks, Xiaohui
RE: new to solr
Thanks so much for your reply! Please tell me what example.xsl is for in conf/xslt. Please let me know where the search result is located. I can use php or .net to display the result in web. Is it created on fly? Thanks, Xiaohui -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Monday, January 14, 2008 11:37 AM To: solr-user@lucene.apache.org Subject: Re: new to solr Ma, Xiaohui (NIH/NLM/LHC) [C] wrote: Hello, I am new to solr. Welcome! I followed solr online tutorial to get the example work. The search result is xml. I wonder if there is a way to show result in a form. I saw there is example.xsl in conf/xslt directory. I really don't know how to do it. Anyone has some ideas for me. I really appreciate it! Are you asking how to display results for people to see? A nicely formatted website? Solr (a database) does not aim to solve the display side... but there are lots of clients to help integrate with your website. php/java/.net/ruby/etc ryan
RE: new to solr
Thanks very much, Ryan. I really appreciate it. I will take a look on both. Best regards, Xiaohui -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Monday, January 14, 2008 11:56 AM To: solr-user@lucene.apache.org Subject: Re: new to solr the example.xsl is an example using XSLT to format results. Check: http://wiki.apache.org/solr/XsltResponseWriter For php, check: http://wiki.apache.org/solr/SolPHP ryan Ma, Xiaohui (NIH/NLM/LHC) [C] wrote: Thanks so much for your reply! Please tell me what example.xsl is for in conf/xslt. Please let me know where the search result is located. I can use php or .net to display the result in web. Is it created on fly? Thanks, Xiaohui -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Monday, January 14, 2008 11:37 AM To: solr-user@lucene.apache.org Subject: Re: new to solr Ma, Xiaohui (NIH/NLM/LHC) [C] wrote: Hello, I am new to solr. Welcome! I followed solr online tutorial to get the example work. The search result is xml. I wonder if there is a way to show result in a form. I saw there is example.xsl in conf/xslt directory. I really don't know how to do it. Anyone has some ideas for me. I really appreciate it! Are you asking how to display results for people to see? A nicely formatted website? Solr (a database) does not aim to solve the display side... but there are lots of clients to help integrate with your website. php/java/.net/ruby/etc ryan