RE: score from two cores

2010-12-03 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Please correct me if I am doing something wrong. I really appreciate your help!

I have a core for metadata (xml files) and a core for pdf documents. Sometimes 
I need search them separately, sometimes I need search both of them together. 
There is the same key which is related them for each item.

For example, the xml files look like following:
?xml version=1.0 encoding=ISO-8859-1?
List
Item  
Keyrmaaac.pdf/Key
TIsomethingTI
UIrmaaac/UI
/Item
Item
   .
/List

I index rmaaac.pdf file with same Key and UI field in another core. Here is the 
example after I index rmaaac.pdf.
  ?xml version=1.0 encoding=UTF-8 ? 
  response
  lst name=responseHeader
  int name=status0/int 
  int name=QTime3/int 
  lst name=params
  str name=indenton/str 
  str name=start0/str 
  str name=qcollectionid: RM/str 
  str name=rows10/str 
  str name=version2.2/str 
  /lst
  /lst
  result name=response numFound=1 start=0
  doc
str name=UIrm/str 
str name=Keyrm.pdf/str  
str name=metadata_contentsomething/str
  /doc
  /result

The result information which is display to user comes from metadata, not from 
pdf files. If I search a term from documents, in order to display search 
results to user, I have to get Keys from documents and then redo search from 
metadata. Then score is different.

Please give me some suggestions!

Thanks so much,
Xiaohui 

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Friday, December 03, 2010 12:37 PM
To: solr-user@lucene.apache.org
Subject: Re: score from two cores

Uhhm, what are you trying to do? What do you want to do with the scores from
two cores?

Best
Erick

On Fri, Dec 3, 2010 at 11:21 AM, Ma, Xiaohui (NIH/NLM/LHC) [C] 
xiao...@mail.nlm.nih.gov wrote:

 I have multiple cores. How can I deal with score?

 Thanks so much for help!
 Xiaohui



RE: how to set maxFieldLength to unlimitd

2010-12-01 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Does anyone know how to index a pdf file with very big size (more than 100MB)?

Thanks so much,
Xiaohui 
-Original Message-
From: Ma, Xiaohui (NIH/NLM/LHC) [C] 
Sent: Tuesday, November 30, 2010 4:22 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: how to set maxFieldLength to unlimitd

I set maxFieldLength to 2147483647, restarted tomcat and re-indexed pdf files 
again. I also commented out the one in the mainIndex section. Unfortunately 
the files are still chopped out if the size of file is more than 20MB.

Any suggestions? I really appreciate your help!
Xiaohui 

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, November 30, 2010 2:01 PM
To: solr-user@lucene.apache.org
Subject: Re: how to set maxFieldLength to unlimitd

Set the maxFieldLength value in solrconfig.xml to, say, 2147483647

Also, see this thread for a common gotcha:
http://lucene.472066.n3.nabble.com/Solr-ignoring-maxFieldLength-td473263.html
,
it appears you can just comment out the one in the mainIndex section.

Best
Erick

On Tue, Nov 30, 2010 at 1:48 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] 
xiao...@mail.nlm.nih.gov wrote:

 I need index and search some pdf files which are very big (around 1000
 pages each). How can I set maxFieldLength to unlimited?

 Thanks so much for your help in advance,
 Xiaohui



RE: how to set maxFieldLength to unlimitd

2010-12-01 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks so much for your replay, Jan. I just found I cannot index pdf files with 
the file size more than 20MB.

I use curl index them, didn't get any error either. Do you have any suggestions 
to index pdf files with more than 20MB?

Thanks,
Xiaohui 

-Original Message-
From: jan.kure...@nokia.com [mailto:jan.kure...@nokia.com] 
Sent: Wednesday, December 01, 2010 11:30 AM
To: solr-user@lucene.apache.org; solr-user-i...@lucene.apache.org; 
solr-user-...@lucene.apache.org
Subject: RE: how to set maxFieldLength to unlimitd

You just can't set it to unlimited. What you could do, is ignoring the 
positions and put a filter in, that sets the token for all but the first token 
to 0 (means the field length will be just 1, all tokens stacked on the first 
position)
You could also break per page, so you put each page on a new position.

Jan

-Original Message-
From: ext Ma, Xiaohui (NIH/NLM/LHC) [C] [mailto:xiao...@mail.nlm.nih.gov]
Sent: Dienstag, 30. November 2010 19:49
To: solr-user@lucene.apache.org; 'solr-user-i...@lucene.apache.org'; 
'solr-user-...@lucene.apache.org'
Subject: how to set maxFieldLength to unlimitd

I need index and search some pdf files which are very big (around 1000 pages 
each). How can I set maxFieldLength to unlimited?

Thanks so much for your help in advance,
Xiaohui


RE: how to set maxFieldLength to unlimitd

2010-12-01 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks so much, Jan. I use curl to index pdf files. Is there other way to do it?

I changed it the positionIncrement to 0, I didn't get it work either.

Thanks,
Xiaohui 

-Original Message-
From: jan.kure...@nokia.com [mailto:jan.kure...@nokia.com] 
Sent: Wednesday, December 01, 2010 2:34 PM
To: solr-user@lucene.apache.org
Subject: Re: how to set maxFieldLength to unlimitd

I don't know about upload limitations, but for sure there are some in  
the default settings, this could explain the limit of 20MB. Which  
upload mechanism on solr side do you use? I guess this is not a lucene  
problem but rather the http-layer of solr.

If you manage to stream your PDF and start parsing it on the stream  
you then should go for the filter, that sets the positionIncrement to  
0 as mentioned.

What we did once for PDF files, we parsed them befor into plain text  
and where indexing this (but we were using lucene directly) with a  
streamReader.


Grüße, Jan

Am 01.12.2010 um 18:13 schrieb ext Ma, Xiaohui (NIH/NLM/LHC) [C] 
xiao...@mail.nlm.nih.gov 
 :

 Thanks so much for your replay, Jan. I just found I cannot index pdf  
 files with the file size more than 20MB.

 I use curl index them, didn't get any error either. Do you have any  
 suggestions to index pdf files with more than 20MB?

 Thanks,
 Xiaohui

 -Original Message-
 From: jan.kure...@nokia.com [mailto:jan.kure...@nokia.com]
 Sent: Wednesday, December 01, 2010 11:30 AM
 To: solr-user@lucene.apache.org; solr-user-i...@lucene.apache.org; 
 solr-user-...@lucene.apache.org
 Subject: RE: how to set maxFieldLength to unlimitd

 You just can't set it to unlimited. What you could do, is ignoring  
 the positions and put a filter in, that sets the token for all but  
 the first token to 0 (means the field length will be just 1, all  
 tokens stacked on the first position)
 You could also break per page, so you put each page on a new  
 position.

 Jan

 -Original Message-
 From: ext Ma, Xiaohui (NIH/NLM/LHC) [C]  
 [mailto:xiao...@mail.nlm.nih.gov]
 Sent: Dienstag, 30. November 2010 19:49
 To: solr-user@lucene.apache.org; 'solr-user- 
 i...@lucene.apache.org'; 'solr-user-...@lucene.apache.org'
 Subject: how to set maxFieldLength to unlimitd

 I need index and search some pdf files which are very big (around  
 1000 pages each). How can I set maxFieldLength to unlimited?

 Thanks so much for your help in advance,
 Xiaohui


how to set maxFieldLength to unlimitd

2010-11-30 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
I need index and search some pdf files which are very big (around 1000 pages 
each). How can I set maxFieldLength to unlimited?

Thanks so much for your help in advance,
Xiaohui


RE: how to set maxFieldLength to unlimitd

2010-11-30 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks so much for your help!
Xiaohui 

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, November 30, 2010 2:01 PM
To: solr-user@lucene.apache.org
Subject: Re: how to set maxFieldLength to unlimitd

Set the maxFieldLength value in solrconfig.xml to, say, 2147483647

Also, see this thread for a common gotcha:
http://lucene.472066.n3.nabble.com/Solr-ignoring-maxFieldLength-td473263.html
,
it appears you can just comment out the one in the mainIndex section.

Best
Erick

On Tue, Nov 30, 2010 at 1:48 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] 
xiao...@mail.nlm.nih.gov wrote:

 I need index and search some pdf files which are very big (around 1000
 pages each). How can I set maxFieldLength to unlimited?

 Thanks so much for your help in advance,
 Xiaohui



RE: how to set maxFieldLength to unlimitd

2010-11-30 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
I set maxFieldLength to 2147483647, restarted tomcat and re-indexed pdf files 
again. I also commented out the one in the mainIndex section. Unfortunately 
the files are still chopped out if the size of file is more than 20MB.

Any suggestions? I really appreciate your help!
Xiaohui 

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, November 30, 2010 2:01 PM
To: solr-user@lucene.apache.org
Subject: Re: how to set maxFieldLength to unlimitd

Set the maxFieldLength value in solrconfig.xml to, say, 2147483647

Also, see this thread for a common gotcha:
http://lucene.472066.n3.nabble.com/Solr-ignoring-maxFieldLength-td473263.html
,
it appears you can just comment out the one in the mainIndex section.

Best
Erick

On Tue, Nov 30, 2010 at 1:48 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] 
xiao...@mail.nlm.nih.gov wrote:

 I need index and search some pdf files which are very big (around 1000
 pages each). How can I set maxFieldLength to unlimited?

 Thanks so much for your help in advance,
 Xiaohui



RE: how to deal with virtual collection in solr?

2010-09-03 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks so much for your help, Jan Høydahl.

Have a great weekend!
Xiaohui

-Original Message-
From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com]
Sent: Friday, September 03, 2010 3:46 AM
To: solr-user@lucene.apache.org
Subject: Re: how to deal with virtual collection in solr?

You did not supply your actual query. Try to add a q=foobar parameter, also 
you don't need a  before shards since you have the ?.
--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 1. sep. 2010, at 20.14, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:

 Thank you, Jan. Unfortunately I got following exception when I use 
 http://localhost:8983/solr/aapublic/select?shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/
  .

 *
 Aug 31, 2010 4:54:42 PM org.apache.solr.common.SolrException log
 SEVERE: java.lang.NullPointerException
at java.io.StringReader.init(StringReader.java:33)
at 
 org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:197)
at 
 org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:78)
at org.apache.solr.search.QParser.getQuery(QParser.java:131)
at 
 org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:89)
at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174)
at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
 org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
 *

 -Original Message-
 From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com]
 Sent: Tuesday, August 31, 2010 2:15 PM
 To: solr-user@lucene.apache.org
 Subject: Re: how to deal with virtual collection in solr?

 Hi,

 If you have multiple cores defined in your solr.xml you need to issue your 
 queries to one of the cores. Below it seems as if you are lacking core name. 
 Try instead:

   
 http://localhost:8983/solr/aapublic/select?shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/

 And as Lance pointed out, make sure your XML files conform to the Solr XML 
 format (http://wiki.apache.org/solr/UpdateXmlMessages).

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Training in Europe - www.solrtraining.com

 On 27. aug. 2010, at 15.04, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:

 Thank you, Jan Høydahl.

 I used 
 http://localhost:8983/solr/select?shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/.
  I got a error Missing solr core name in path. I have aapublic and 
 aaprivate cores. I also got a error if I used 
 http://localhost:8983/solr/aapublic/select?shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/.
  I got a null exception java.lang.NullPointerException.

 My collections are xml files. Please let me if I can use the following way 
 you suggested.
 curl 
 http://localhost:8983/solr/update/extract?literal.collection=aaprivateliteral.id=doc1commit=true;
  -F fi...@myfile.xml

 Thanks so much as always!
 Xiaohui


 -Original Message-
 From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com]
 Sent: Friday, August 27, 2010 7:42 AM
 To: solr-user@lucene.apache.org
 Subject: Re: how

RE: how to deal with virtual collection in solr?

2010-09-01 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thank you, Jan. Unfortunately I got following exception when I use 
http://localhost:8983/solr/aapublic/select?shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/
 . 

*
Aug 31, 2010 4:54:42 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.NullPointerException
at java.io.StringReader.init(StringReader.java:33)
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:197)
at 
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:78)
at org.apache.solr.search.QParser.getQuery(QParser.java:131)
at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:89)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
*

-Original Message-
From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com] 
Sent: Tuesday, August 31, 2010 2:15 PM
To: solr-user@lucene.apache.org
Subject: Re: how to deal with virtual collection in solr?

Hi,

If you have multiple cores defined in your solr.xml you need to issue your 
queries to one of the cores. Below it seems as if you are lacking core name. 
Try instead:


http://localhost:8983/solr/aapublic/select?shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/

And as Lance pointed out, make sure your XML files conform to the Solr XML 
format (http://wiki.apache.org/solr/UpdateXmlMessages).

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 27. aug. 2010, at 15.04, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:

 Thank you, Jan Høydahl. 
 
 I used 
 http://localhost:8983/solr/select?shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/.
  I got a error Missing solr core name in path. I have aapublic and 
 aaprivate cores. I also got a error if I used 
 http://localhost:8983/solr/aapublic/select?shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/.
  I got a null exception java.lang.NullPointerException. 
 
 My collections are xml files. Please let me if I can use the following way 
 you suggested.
 curl 
 http://localhost:8983/solr/update/extract?literal.collection=aaprivateliteral.id=doc1commit=true;
  -F fi...@myfile.xml
 
 Thanks so much as always!
 Xiaohui 
 
 
 -Original Message-
 From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com] 
 Sent: Friday, August 27, 2010 7:42 AM
 To: solr-user@lucene.apache.org
 Subject: Re: how to deal with virtual collection in solr?
 
 Hi,
 
 Version 1.4.1 does not support the SolrCloud style sharding. In 1.4.1, please 
 use this style:
 shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/
 
 
 However, since schema is the same, I'd opt for one index with a collections 
 field as the filter.
 
 You can add that field to your schema, and then inject it as metadata on the 
 ExtractingRequestHandler call:
 
 curl 
 http://localhost:8983/solr/update/extract?literal.collection=aaprivateliteral.id=doc1commit=true;
  -F fi...@myfile.pdf
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com

questions about synonyms

2010-08-31 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Hello, 

I have an couple of questions about synonyms.

1. I got a very big text file of synonyms. How I can use it? Do I need to index 
this text file first?

2. Is there a way to do synonyms' highlight in search result?

3. Does anyone use WordNet to solr? 


Thanks so much in advance, 


questions about synonyms

2010-08-31 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Hello,



I have an couple of questions about synonyms.



1. I got a very big text file of synonyms. How I can use it? Do I need to index 
this text file first?



2. Is there a way to do synonyms' highlight in search result?



3. Does anyone use WordNet to solr?





Thanks so much in advance,



RE: how to deal with virtual collection in solr?

2010-08-27 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks so much, I really appreciate your help!
Have a great weekend!
Xiaohui 

-Original Message-
From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com] 
Sent: Friday, August 27, 2010 7:42 AM
To: solr-user@lucene.apache.org
Subject: Re: how to deal with virtual collection in solr?

Hi,

Version 1.4.1 does not support the SolrCloud style sharding. In 1.4.1, please 
use this style:
shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/


However, since schema is the same, I'd opt for one index with a collections 
field as the filter.

You can add that field to your schema, and then inject it as metadata on the 
ExtractingRequestHandler call:

curl 
http://localhost:8983/solr/update/extract?literal.collection=aaprivateliteral.id=doc1commit=true;
 -F fi...@myfile.pdf

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 26. aug. 2010, at 20.41, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:

 Thanks so much for your help! I will try it.
 
 
 -Original Message-
 From: Thomas Joiner [mailto:thomas.b.joi...@gmail.com] 
 Sent: Thursday, August 26, 2010 2:36 PM
 To: solr-user@lucene.apache.org
 Subject: Re: how to deal with virtual collection in solr?
 
 I don't know about the shards, etc.
 
 However I recently encountered that exception while indexing pdfs as well.
 The way that I resolved it was to upgrade to a nightly build of Solr. (You
 can find them https://hudson.apache.org/hudson/view/Solr/job/Solr-trunk/).
 
 The problem is that the version of Tika that 1.4.1 using is a very old
 version of Tika, which uses a old version of PDFBox to do its parsing.  (You
 might be able to fix the problem just by replacing the Tika jars...however I
 don't know if there have been any API changes so I can't really suggest
 that.)
 
 We didn't upgrade to trunk in order for that functionality, but it was nice
 that it started working. (The PDFs we'll be indexing won't be of later
 versions, but a test file was).
 
 On Thu, Aug 26, 2010 at 1:27 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] 
 xiao...@mail.nlm.nih.gov wrote:
 
 Thanks so much for your help, Jan Høydahl!
 
 I made multiple cores (aa public, aa private, bb public and bb private). I
 knew how to query them individually. Please tell me if I can do a
 combinations through shards parameter now. If yes, I tried to append
 shards=aapub,bbpub after query string. Unfortunately it didn't work.
 
 Actually all of content is the same. I don't have collection field in xml
 files. Please tell me how I can set a collection field in schema and
 simply search collection through filter.
 
 I used curl to index pdf files. I use Solr 1.4.1. I got the following error
 when I index pdf with version 1.5 and 1.6.
 
 *
 html
 head
 meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/
 titleError 500 /title
 /head
 bodyh2HTTP ERROR: 500/h2preorg.apache.tika.exception.TikaException:
 Unexpected RuntimeException from
 org.apache.tika.parser.pdf.pdfpar...@134ae32
 
 org.apache.solr.common.SolrException:
 org.apache.tika.exception.TikaException: Unexpected RuntimeException from
 org.apache.tika.parser.pdf.pdfpar...@134ae32
   at
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211)
   at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
   at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
   at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
   at
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
   at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
   at
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
   at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
   at
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
   at
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
   at
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
   at
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
   at org.mortbay.jetty.Server.handle(Server.java:285)
   at
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
   at
 org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202

RE: how to deal with virtual collection in solr?

2010-08-27 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thank you, Jan Høydahl. 

I used 
http://localhost:8983/solr/select?shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/.
 I got a error Missing solr core name in path. I have aapublic and aaprivate 
cores. I also got a error if I used 
http://localhost:8983/solr/aapublic/select?shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/.
 I got a null exception java.lang.NullPointerException. 

My collections are xml files. Please let me if I can use the following way you 
suggested.
curl 
http://localhost:8983/solr/update/extract?literal.collection=aaprivateliteral.id=doc1commit=true;
 -F fi...@myfile.xml

Thanks so much as always!
Xiaohui 


-Original Message-
From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com] 
Sent: Friday, August 27, 2010 7:42 AM
To: solr-user@lucene.apache.org
Subject: Re: how to deal with virtual collection in solr?

Hi,

Version 1.4.1 does not support the SolrCloud style sharding. In 1.4.1, please 
use this style:
shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/


However, since schema is the same, I'd opt for one index with a collections 
field as the filter.

You can add that field to your schema, and then inject it as metadata on the 
ExtractingRequestHandler call:

curl 
http://localhost:8983/solr/update/extract?literal.collection=aaprivateliteral.id=doc1commit=true;
 -F fi...@myfile.pdf

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 26. aug. 2010, at 20.41, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:

 Thanks so much for your help! I will try it.
 
 
 -Original Message-
 From: Thomas Joiner [mailto:thomas.b.joi...@gmail.com] 
 Sent: Thursday, August 26, 2010 2:36 PM
 To: solr-user@lucene.apache.org
 Subject: Re: how to deal with virtual collection in solr?
 
 I don't know about the shards, etc.
 
 However I recently encountered that exception while indexing pdfs as well.
 The way that I resolved it was to upgrade to a nightly build of Solr. (You
 can find them https://hudson.apache.org/hudson/view/Solr/job/Solr-trunk/).
 
 The problem is that the version of Tika that 1.4.1 using is a very old
 version of Tika, which uses a old version of PDFBox to do its parsing.  (You
 might be able to fix the problem just by replacing the Tika jars...however I
 don't know if there have been any API changes so I can't really suggest
 that.)
 
 We didn't upgrade to trunk in order for that functionality, but it was nice
 that it started working. (The PDFs we'll be indexing won't be of later
 versions, but a test file was).
 
 On Thu, Aug 26, 2010 at 1:27 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] 
 xiao...@mail.nlm.nih.gov wrote:
 
 Thanks so much for your help, Jan Høydahl!
 
 I made multiple cores (aa public, aa private, bb public and bb private). I
 knew how to query them individually. Please tell me if I can do a
 combinations through shards parameter now. If yes, I tried to append
 shards=aapub,bbpub after query string. Unfortunately it didn't work.
 
 Actually all of content is the same. I don't have collection field in xml
 files. Please tell me how I can set a collection field in schema and
 simply search collection through filter.
 
 I used curl to index pdf files. I use Solr 1.4.1. I got the following error
 when I index pdf with version 1.5 and 1.6.
 
 *
 html
 head
 meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/
 titleError 500 /title
 /head
 bodyh2HTTP ERROR: 500/h2preorg.apache.tika.exception.TikaException:
 Unexpected RuntimeException from
 org.apache.tika.parser.pdf.pdfpar...@134ae32
 
 org.apache.solr.common.SolrException:
 org.apache.tika.exception.TikaException: Unexpected RuntimeException from
 org.apache.tika.parser.pdf.pdfpar...@134ae32
   at
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211)
   at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
   at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
   at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
   at
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
   at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
   at
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
   at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
   at
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
   at
 org.mortbay.jetty.handler.ContextHandlerCollection.handle

RE: how to deal with virtual collection in solr?

2010-08-26 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks so much for your help, Jan Høydahl!

I made multiple cores (aa public, aa private, bb public and bb private). I knew 
how to query them individually. Please tell me if I can do a combinations 
through shards parameter now. If yes, I tried to append shards=aapub,bbpub 
after query string. Unfortunately it didn't work.

Actually all of content is the same. I don't have collection field in xml 
files. Please tell me how I can set a collection field in schema and simply 
search collection through filter.

I used curl to index pdf files. I use Solr 1.4.1. I got the following error 
when I index pdf with version 1.5 and 1.6.

*
html
head
meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/
titleError 500 /title
/head
bodyh2HTTP ERROR: 500/h2preorg.apache.tika.exception.TikaException: 
Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@134ae32

org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: 
Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@134ae32
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException 
from org.apache.tika.parser.pdf.pdfpar...@134ae32
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105)
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
... 22 more
Caused by: java.lang.NullPointerException
at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194)
at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182)
at 
org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:226)
at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216)
at org.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:149)
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:53)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:51)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119)
... 24 more
/pre
pRequestURI=/solr/lhcpdf/update/extract/ppismalla 
href=http://jetty.mortbay.org/;Powered by Jetty:///a/small/i/pbr/   
 
br/  
***


-Original Message-
From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com] 
Sent: Wednesday, August 25, 2010 4:34 PM
To: solr-user@lucene.apache.org
Subject: Re: how to deal with virtual collection in solr? 

 1. Currently we use Verity and have more than 20 collections, each collection 
 has a index for public items and a index for private items. So there are 
 virtual collections which point to each collection and a virtual collection 
 which points to all. For example, we have AA and BB collections.
 
 AA 

RE: how to deal with virtual collection in solr?

2010-08-26 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks so much for your help! I will try it.


-Original Message-
From: Thomas Joiner [mailto:thomas.b.joi...@gmail.com] 
Sent: Thursday, August 26, 2010 2:36 PM
To: solr-user@lucene.apache.org
Subject: Re: how to deal with virtual collection in solr?

I don't know about the shards, etc.

However I recently encountered that exception while indexing pdfs as well.
 The way that I resolved it was to upgrade to a nightly build of Solr. (You
can find them https://hudson.apache.org/hudson/view/Solr/job/Solr-trunk/).

The problem is that the version of Tika that 1.4.1 using is a very old
version of Tika, which uses a old version of PDFBox to do its parsing.  (You
might be able to fix the problem just by replacing the Tika jars...however I
don't know if there have been any API changes so I can't really suggest
that.)

We didn't upgrade to trunk in order for that functionality, but it was nice
that it started working. (The PDFs we'll be indexing won't be of later
versions, but a test file was).

On Thu, Aug 26, 2010 at 1:27 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] 
xiao...@mail.nlm.nih.gov wrote:

 Thanks so much for your help, Jan Høydahl!

 I made multiple cores (aa public, aa private, bb public and bb private). I
 knew how to query them individually. Please tell me if I can do a
 combinations through shards parameter now. If yes, I tried to append
 shards=aapub,bbpub after query string. Unfortunately it didn't work.

 Actually all of content is the same. I don't have collection field in xml
 files. Please tell me how I can set a collection field in schema and
 simply search collection through filter.

 I used curl to index pdf files. I use Solr 1.4.1. I got the following error
 when I index pdf with version 1.5 and 1.6.

 *
 html
 head
 meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/
 titleError 500 /title
 /head
 bodyh2HTTP ERROR: 500/h2preorg.apache.tika.exception.TikaException:
 Unexpected RuntimeException from
 org.apache.tika.parser.pdf.pdfpar...@134ae32

 org.apache.solr.common.SolrException:
 org.apache.tika.exception.TikaException: Unexpected RuntimeException from
 org.apache.tika.parser.pdf.pdfpar...@134ae32
at
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211)
at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
 org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at
 org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
 Caused by: org.apache.tika.exception.TikaException: Unexpected
 RuntimeException from org.apache.tika.parser.pdf.pdfpar...@134ae32
at
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121)
at
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105)
at
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
... 22 more
 Caused by: java.lang.NullPointerException
at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194)
at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182)
at
 org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:226)
at
 org.pdfbox.util.PDFTextStripper.writeText

how to deal with virtual collection in solr?

2010-08-25 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Hello,

I just started to investigate Solr several weeks ago. Our current project uses 
Verity search engine which is commercial product and the company is out of 
business. I am trying to evaluate if Solr can meet our requirements. I have 
following questions.

1. Currently we use Verity and have more than 20 collections, each collection 
has a index for public items and a index for private items. So there are 
virtual collections which point to each collection and a virtual collection 
which points to all. For example, we have AA and BB collections.

AA virtual collection -- (AA index for public items and AA index for private 
items).
BB virtual collection -- (BB index for public items and BB index for private 
items).
All virtual collection -- (AA index for public items and AA index for private 
items, BB index for public items and BB index for private items).

Would you please tell me what I should do for this if I use Solr?

2. Our project has different kind format files I need index them. For example, 
xml files, pdf files and text files. Is it possible for Solr to return a search 
result from all?

3. I got a error when I index pdf files which are version 1.5 or 1.6. Would you 
please tell me if there is a patch to fix it?

Thanks so much in advance,


RE: how to deal with virtual collection in solr?

2010-08-25 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thank you for letting me know. Does Autonomy still support Verity search 
engine? 


-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org] 
Sent: Wednesday, August 25, 2010 3:41 PM
To: solr-user@lucene.apache.org
Subject: Re: how to deal with virtual collection in solr? 

On Aug 25, 2010, at 12:18 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:

 I just started to investigate Solr several weeks ago. Our current project 
 uses Verity search engine which is commercial product and the company is out 
 of business. 


Verity is not out of business. They were acquired by Autonomy.

wunder
--
Walter Underwood





RE: ANNOUNCE: Stump Hoss @ Lucene Revolution

2010-08-24 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Hello,

I just started to investigate Solr several weeks ago. Our current project uses 
Verity search engine which is commercial product and the company is out of 
business. I am trying to evaluate if Solr can meet our requirements. I have 
following questions.

1. Currently we use Verity and have more than 20 collections, each collection 
has a index for public items and a index for private items. So there are 
virtual collections which point to each collection and a virtual collection 
which points to all. For example, we have AA and BB collections.

AA virtual collection -- (AA index for public items and AA index for private 
items).
BB virtual collection -- (BB index for public items and BB index for private 
items).
All virtual collection -- (AA index for public items and AA index for private 
items, BB index for public items and BB index for private items).

Would you please tell me what I should do for this if I use Solr?

2. Our project has different kind format files I need index them. For example, 
xml files, pdf files and text files. Is it possible for Solr to return a search 
result from all?

3. I got a error when I index pdf files which are version 1.5 or 1.6. Would you 
please tell me if there is a patch to fix it?

Thanks so much in advance,


-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Monday, August 23, 2010 4:50 PM
To: solr-user@lucene.apache.org
Subject: ANNOUNCE: Stump Hoss @ Lucene Revolution


Hey everybody,

As you (hopefully) have heard by now, Lucid Imagination is sponsoring a 
Lucene/Solr conference in Boston about 6 weeks from now.  We've got a lot 
of really great speakers lined up to give some really interesting 
technical talks, so I offered to do something a little bit different.

I'm going to be in the hot seat for a Stump The Chump style session, 
where I'll be answering Solr questions live and unrehearsed...

http://bit.ly/stump-hoss

The goal is to really make me sweat and work hard to think of creative 
solutions to non-trivial problems on the spot -- like when I answer 
questions on the solr-user mailing list, except in a crowded room with 
hundreds of people staring at me and laughing.

But in order to be a success, we need your questions/problems/challenges!

If you had a tough situation with Solr that you managed to solve with a 
creative solution (or haven't solved yet) and are interesting to see what 
type of solution I might come up with under pressure, please email a 
description of your problem to st...@lucenerevolution.org -- More details 
online...

http://lucenerevolution.org/Presentation-Abstracts-Day1#stump-hostetter

Even if you won't be able to make it to Boston, please send in any 
challenging problems you would be interested to see me tackle under the 
gun.  The session will be recorded, and the video will be posted online 
shortly after the conference has ended.  And if you can make it to Boston: 
all the more fun to watch live and in person (and maybe answer follow up 
questions)

In any case, it should be a very interesting session: folks will either 
get to learn a lot, or laugh at me a lot, or both.  (win/win/win)


-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



multiple values

2010-08-18 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Hello,

I only can display one author which is last one. It looks like overwrite others.

In xml, I have more than one authorname/author in 
authorlist/authorlist. 

In data_config.xml, I put the field column=Author 
xpath=/PublishedArticles/Article/AuthorList/Author /. 

In schema.xml, I put field name=Author type=text indexed=true 
stored=true multiValued=true/. 

Please let me know if I did something wrong, or how I can display it in jsp.

I really appreciate your help!


different pdf version issue

2010-08-13 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
I have a problem to index pdf files which are pdf version 1.5 or 1.6. There is 
no problem at all for me to index pdf files with version 1.4.

Here is the error I got:
h2HTTP ERROR: 500/h2preorg.apache.tika.exception.TikaException: 
Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@44ffb2

Does anyone have this problem when you indexed pdf files? 

Thanks so much in advance,
Xiaohui 


index pdf files

2010-08-12 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
I wrote a simple java program to import a pdf file. I can get a result when I 
do search *:* from admin page. I get nothing if I search a word. I wonder if I 
did something wrong or miss set something. 

Here is part of result I get when do *:* search:
*
- doc
- arr name=attr_Author
  strHristovski D/str 
  /arr
- arr name=attr_Content-Type
  strapplication/pdf/str 
  /arr
- arr name=attr_Keywords
  strmicroarray analysis, literature-based discovery, semantic predications, 
natural language processing/str 
  /arr
- arr name=attr_Last-Modified
  strThu Aug 12 10:58:37 EDT 2010/str 
  /arr
- arr name=attr_content
  strCombining Semantic Relations and DNA Microarray Data for Novel 
Hypotheses Generation Combining Semantic Relations and DNA Microarray Data for 
Novel Hypotheses Generation Dimitar Hristovski, PhD,1 Andrej 
Kastrin,2...
*
Please help me out if anyone has experience with pdf files. I really appreciate 
it!

Thanks so much,



RE: index pdf files

2010-08-12 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks so much. I didn't know how to make any changes in schema.xml for pdf 
files. I used solr default schema.xml. Please tell me what I need do in 
schema.xml.

The simple java program I use is following. I also attached that pdf file. I 
really appreciate your help!
*
public class importPDF {
  public static void main(String[] args) {
try {
String fileName = pub2009001.pdf;
String solrId = pub2009001.pdf;

  indexFilesSolrCell(fileName, solrId);

} catch (Exception ex) {
  System.out.println(ex.toString());
}
  }

 public static void indexFilesSolrCell(String fileName, String solrId)
throws IOException, SolrServerException {
String urlString = http://lhcinternal.nlm.nih.gov:8989/solr/lhcpdf;;
SolrServer solr = new CommonsHttpSolrServer(urlString);

ContentStreamUpdateRequest up
  = new ContentStreamUpdateRequest(/update/extract);

up.addFile(new File(fileName));

up.setParam(literal.id, solrId);
up.setParam(uprefix, attr_);
up.setParam(fmap.content, attr_content);

up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
solr.request(up);
  }
}


-Original Message-
From: Marco Martinez [mailto:mmarti...@paradigmatecnologico.com] 
Sent: Thursday, August 12, 2010 11:45 AM
To: solr-user@lucene.apache.org
Subject: Re: index pdf files

To help you we need the description of your fields in your schema.xml and
the query that you do when you search only a single word.

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2010/8/12 Ma, Xiaohui (NIH/NLM/LHC) [C] xiao...@mail.nlm.nih.gov

 I wrote a simple java program to import a pdf file. I can get a result when
 I do search *:* from admin page. I get nothing if I search a word. I wonder
 if I did something wrong or miss set something.

 Here is part of result I get when do *:* search:
 *
 - doc
 - arr name=attr_Author
  strHristovski D/str
  /arr
 - arr name=attr_Content-Type
  strapplication/pdf/str
  /arr
 - arr name=attr_Keywords
  strmicroarray analysis, literature-based discovery, semantic
 predications, natural language processing/str
  /arr
 - arr name=attr_Last-Modified
  strThu Aug 12 10:58:37 EDT 2010/str
  /arr
 - arr name=attr_content
  strCombining Semantic Relations and DNA Microarray Data for Novel
 Hypotheses Generation Combining Semantic Relations and DNA Microarray Data
 for Novel Hypotheses Generation Dimitar Hristovski, PhD,1 Andrej
 Kastrin,2...
 *
 Please help me out if anyone has experience with pdf files. I really
 appreciate it!

 Thanks so much,




RE: index pdf files

2010-08-12 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Does anyone know if I need define fields in schema.xml for indexing pdf files? 
If I need, please tell me how I can do it. 

I defined fields in schema.xml and created data-configuration file by using 
xpath for xml files. Would you please tell me if I need do it for pdf files and 
how I can do?

Thanks so much for your help as always!

-Original Message-
From: Marco Martinez [mailto:mmarti...@paradigmatecnologico.com] 
Sent: Thursday, August 12, 2010 11:45 AM
To: solr-user@lucene.apache.org
Subject: Re: index pdf files

To help you we need the description of your fields in your schema.xml and
the query that you do when you search only a single word.

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2010/8/12 Ma, Xiaohui (NIH/NLM/LHC) [C] xiao...@mail.nlm.nih.gov

 I wrote a simple java program to import a pdf file. I can get a result when
 I do search *:* from admin page. I get nothing if I search a word. I wonder
 if I did something wrong or miss set something.

 Here is part of result I get when do *:* search:
 *
 - doc
 - arr name=attr_Author
  strHristovski D/str
  /arr
 - arr name=attr_Content-Type
  strapplication/pdf/str
  /arr
 - arr name=attr_Keywords
  strmicroarray analysis, literature-based discovery, semantic
 predications, natural language processing/str
  /arr
 - arr name=attr_Last-Modified
  strThu Aug 12 10:58:37 EDT 2010/str
  /arr
 - arr name=attr_content
  strCombining Semantic Relations and DNA Microarray Data for Novel
 Hypotheses Generation Combining Semantic Relations and DNA Microarray Data
 for Novel Hypotheses Generation Dimitar Hristovski, PhD,1 Andrej
 Kastrin,2...
 *
 Please help me out if anyone has experience with pdf files. I really
 appreciate it!

 Thanks so much,




RE: index pdf files

2010-08-12 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks so much for your help! I defined dynamic field in schema.xml as 
following:
dynamicField name=metadata_* type=string indexed=true stored=true 
multiValued=false/

But I wonder what I should put for uniqueKey/uniqueKey.

I really appreciate your help!

-Original Message-
From: Stefan Moises [mailto:moi...@shoptimax.de] 
Sent: Thursday, August 12, 2010 1:58 PM
To: solr-user@lucene.apache.org
Subject: Re: index pdf files

Maybe this helps: 
http://www.packtpub.com/article/indexing-data-solr-1.4-enterprise-search-server-2

Cheers,
Stefan

Am 12.08.2010 19:45, schrieb Ma, Xiaohui (NIH/NLM/LHC) [C]:
 Does anyone know if I need define fields in schema.xml for indexing pdf 
 files? If I need, please tell me how I can do it.

 I defined fields in schema.xml and created data-configuration file by using 
 xpath for xml files. Would you please tell me if I need do it for pdf files 
 and how I can do?

 Thanks so much for your help as always!

 -Original Message-
 From: Marco Martinez [mailto:mmarti...@paradigmatecnologico.com]
 Sent: Thursday, August 12, 2010 11:45 AM
 To: solr-user@lucene.apache.org
 Subject: Re: index pdf files

 To help you we need the description of your fields in your schema.xml and
 the query that you do when you search only a single word.

 Marco Martínez Bautista
 http://www.paradigmatecnologico.com
 Avenida de Europa, 26. Ática 5. 3ª Planta
 28224 Pozuelo de Alarcón
 Tel.: 91 352 59 42


 2010/8/12 Ma, Xiaohui (NIH/NLM/LHC) [C]xiao...@mail.nlm.nih.gov


 I wrote a simple java program to import a pdf file. I can get a result when
 I do search *:* from admin page. I get nothing if I search a word. I wonder
 if I did something wrong or miss set something.

 Here is part of result I get when do *:* search:
 *
 -doc
 -arr name=attr_Author
   strHristovski D/str
   /arr
 -arr name=attr_Content-Type
   strapplication/pdf/str
   /arr
 -arr name=attr_Keywords
   strmicroarray analysis, literature-based discovery, semantic
 predications, natural language processing/str
   /arr
 -arr name=attr_Last-Modified
   strThu Aug 12 10:58:37 EDT 2010/str
   /arr
 -arr name=attr_content
   strCombining Semantic Relations and DNA Microarray Data for Novel
 Hypotheses Generation Combining Semantic Relations and DNA Microarray Data
 for Novel Hypotheses Generation Dimitar Hristovski, PhD,1 Andrej
 Kastrin,2...
 *
 Please help me out if anyone has experience with pdf files. I really
 appreciate it!

 Thanks so much,


  


-- 
***
Stefan Moises
Senior Softwareentwickler

shoptimax GmbH
Guntherstraße 45 a
90461 Nürnberg
Amtsgericht Nürnberg HRB 21703
GF Friedrich Schreieck

Tel.: 0911/25566-25
Fax:  0911/25566-29
moi...@shoptimax.de
http://www.shoptimax.de
***



RE: index pdf files

2010-08-12 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks so much. I got it work now. I really appreciate your help!
Xiaohui 

-Original Message-
From: Stefan Moises [mailto:moi...@shoptimax.de] 
Sent: Thursday, August 12, 2010 1:58 PM
To: solr-user@lucene.apache.org
Subject: Re: index pdf files

Maybe this helps: 
http://www.packtpub.com/article/indexing-data-solr-1.4-enterprise-search-server-2

Cheers,
Stefan

Am 12.08.2010 19:45, schrieb Ma, Xiaohui (NIH/NLM/LHC) [C]:
 Does anyone know if I need define fields in schema.xml for indexing pdf 
 files? If I need, please tell me how I can do it.

 I defined fields in schema.xml and created data-configuration file by using 
 xpath for xml files. Would you please tell me if I need do it for pdf files 
 and how I can do?

 Thanks so much for your help as always!

 -Original Message-
 From: Marco Martinez [mailto:mmarti...@paradigmatecnologico.com]
 Sent: Thursday, August 12, 2010 11:45 AM
 To: solr-user@lucene.apache.org
 Subject: Re: index pdf files

 To help you we need the description of your fields in your schema.xml and
 the query that you do when you search only a single word.

 Marco Martínez Bautista
 http://www.paradigmatecnologico.com
 Avenida de Europa, 26. Ática 5. 3ª Planta
 28224 Pozuelo de Alarcón
 Tel.: 91 352 59 42


 2010/8/12 Ma, Xiaohui (NIH/NLM/LHC) [C]xiao...@mail.nlm.nih.gov


 I wrote a simple java program to import a pdf file. I can get a result when
 I do search *:* from admin page. I get nothing if I search a word. I wonder
 if I did something wrong or miss set something.

 Here is part of result I get when do *:* search:
 *
 -doc
 -arr name=attr_Author
   strHristovski D/str
   /arr
 -arr name=attr_Content-Type
   strapplication/pdf/str
   /arr
 -arr name=attr_Keywords
   strmicroarray analysis, literature-based discovery, semantic
 predications, natural language processing/str
   /arr
 -arr name=attr_Last-Modified
   strThu Aug 12 10:58:37 EDT 2010/str
   /arr
 -arr name=attr_content
   strCombining Semantic Relations and DNA Microarray Data for Novel
 Hypotheses Generation Combining Semantic Relations and DNA Microarray Data
 for Novel Hypotheses Generation Dimitar Hristovski, PhD,1 Andrej
 Kastrin,2...
 *
 Please help me out if anyone has experience with pdf files. I really
 appreciate it!

 Thanks so much,


  


-- 
***
Stefan Moises
Senior Softwareentwickler

shoptimax GmbH
Guntherstraße 45 a
90461 Nürnberg
Amtsgericht Nürnberg HRB 21703
GF Friedrich Schreieck

Tel.: 0911/25566-25
Fax:  0911/25566-29
moi...@shoptimax.de
http://www.shoptimax.de
***



RE: index pdf files

2010-08-12 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
I got the following error when I index some pdf files. I wonder if anyone has 
this issue before and how to fix it. Thanks so much in advance!

***
html
head
meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/
titleError 500 /title
/head
bodyh2HTTP ERROR: 500/h2preorg.apache.tika.exception.TikaException: 
Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@44ffb2

org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: 
Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@44ffb2
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException 
from org.apache.tika.parser.pdf.pdfpar...@44ffb2
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105)
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
***

-Original Message-
From: Stefan Moises [mailto:moi...@shoptimax.de] 
Sent: Thursday, August 12, 2010 1:58 PM
To: solr-user@lucene.apache.org
Subject: Re: index pdf files

Maybe this helps: 
http://www.packtpub.com/article/indexing-data-solr-1.4-enterprise-search-server-2

Cheers,
Stefan

Am 12.08.2010 19:45, schrieb Ma, Xiaohui (NIH/NLM/LHC) [C]:
 Does anyone know if I need define fields in schema.xml for indexing pdf 
 files? If I need, please tell me how I can do it.

 I defined fields in schema.xml and created data-configuration file by using 
 xpath for xml files. Would you please tell me if I need do it for pdf files 
 and how I can do?

 Thanks so much for your help as always!

 -Original Message-
 From: Marco Martinez [mailto:mmarti...@paradigmatecnologico.com]
 Sent: Thursday, August 12, 2010 11:45 AM
 To: solr-user@lucene.apache.org
 Subject: Re: index pdf files

 To help you we need the description of your fields in your schema.xml and
 the query that you do when you search only a single word.

 Marco Martínez Bautista
 http://www.paradigmatecnologico.com
 Avenida de Europa, 26. Ática 5. 3ª Planta
 28224 Pozuelo de Alarcón
 Tel.: 91 352 59 42


 2010/8/12 Ma, Xiaohui (NIH/NLM/LHC) [C]xiao...@mail.nlm.nih.gov


 I wrote a simple java program to import a pdf file. I can get a result when
 I do search *:* from admin page. I get nothing if I search a word. I wonder
 if I did something wrong or miss set something.

 Here is part of result I get when do *:* search:
 *
 -doc
 -arr name=attr_Author
   strHristovski D/str
   /arr
 -arr name=attr_Content-Type
   strapplication/pdf/str
   /arr
 -arr name=attr_Keywords
   strmicroarray analysis, literature-based discovery, semantic
 predications, natural language processing/str
   /arr
 -arr name=attr_Last-Modified
   strThu Aug 12 10:58:37 EDT 2010/str
   /arr
 -arr name=attr_content

RE: PDF file

2010-08-11 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks so much for your help! I got Remote Streaming is disabled error. Would 
you please tell me if I miss something?

Thanks, 

-Original Message-
From: Jayendra Patil [mailto:jayendra.patil@gmail.com] 
Sent: Tuesday, August 10, 2010 8:51 PM
To: solr-user@lucene.apache.org
Subject: Re: PDF file

Try ...

curl 
http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?stream.file=
Full_Path_of_File/pub2009001.pdfliteral.id=777045commit=true

stream.file - specify full path
literal.extra params - specify any extra params if needed

Regards,
Jayendra

On Tue, Aug 10, 2010 at 4:49 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] 
xiao...@mail.nlm.nih.gov wrote:

 Thanks so much for your help! I tried to index a pdf file and got the
 following. The command I used is

 curl '
 http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?map.content=textmap.stream_name=idcommit=true'
 -F fi...@pub2009001.pdf

 Did I do something wrong? Do I need modify anything in schema.xml or other
 configuration file?

 
 [xiao...@lhcinternal lhc]$ curl '
 http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?map.content=textmap.stream_name=idcommit=true'
 -F fi...@pub2009001.pdf
 html
 head
 meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/
 titleError 404 /title
 /head
 bodyh2HTTP ERROR: 404/h2preNOT_FOUND/pre
 pRequestURI=/solr/lhc/update/extract/ppismalla href=
 http://jetty.mortbay.org/;Powered by Jetty:///a/small/i/pbr/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/

 /body
 /html
 ***

 -Original Message-
 From: Sharp, Jonathan [mailto:jsh...@coh.org]
 Sent: Tuesday, August 10, 2010 4:37 PM
 To: solr-user@lucene.apache.org
 Subject: RE: PDF file

 Xiaohui,

 You need to add the following jars to the lib subdirectory of the solr
 config directory on your server.

 (path inside the solr 1.4.1 download)

 /dist/apache-solr-cell-1.4.1.jar
 plus all the jars in
 /contrib/extraction/lib

 HTH

 -Jon
 
 From: Ma, Xiaohui (NIH/NLM/LHC) [C] [xiao...@mail.nlm.nih.gov]
 Sent: Tuesday, August 10, 2010 11:57 AM
 To: 'solr-user@lucene.apache.org'
 Subject: RE: PDF file

 Does anyone have any experience with PDF file? I really appreciate your
 help!
 Thanks so much in advance.

 -Original Message-
 From: Ma, Xiaohui (NIH/NLM/LHC) [C]
 Sent: Tuesday, August 10, 2010 10:37 AM
 To: 'solr-user@lucene.apache.org'
 Subject: PDF file

 I have a lot of pdf files. I am trying to import pdf files to solr and
 index them. I added ExtractingRequestHandler to solrconfig.xml.

 Please tell me if I need download some jar files.

 In the Solr1.4 Enterprise Search Server book, use following command to
 import a mccm.pdf.

 curl '
 http://localhost:8983/solr/solr-home/update/extract?map.content=textmap.stream_name=idcommit=true'
 -F fi...@mccm.pdf

 Please tell me if there is a way to import pdf files from a directory.

 Thanks so much for your help!



 -
 SECURITY/CONFIDENTIALITY WARNING:
 This message and any attachments are intended solely for the individual or
 entity to which they are addressed. This communication may contain
 information that is privileged, confidential, or exempt from disclosure
 under applicable law (e.g., personal health information, research data,
 financial information). Because this e-mail has been sent without
 encryption, individuals other than the intended recipient may be able to
 view the information, forward it to others or tamper with the information
 without the knowledge or consent of the sender. If you are not the intended
 recipient, or the employee or person responsible for delivering the message
 to the intended recipient, any dissemination, distribution or copying of the
 communication is strictly prohibited. If you received the communication in
 error, please notify the sender immediately by replying to this message and
 deleting the message and any accompanying files from your system. If, due to
 the security risks, you do not wish to receive further communications via
 e-mail, please reply to this message and inform the sender that you do not
 wish to receive further e-mail from the sender.

 -




RE: PDF file

2010-08-11 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks, I knew how to enable Streaming. But I got another error, ERROR:unknown 
field 'metadata_trapped'. 

Does anyone know how to match up with SolrCell metadata? I found the following 
in schema.xml. I don't know how to make changes for PDF.

!-- Common metadata fields, named specifically to match up with
 SolrCell metadata when parsing rich documents such as Word, PDF.
 Some fields are multiValued only because Tika currently may return
 multiple values for them. --

I really appreciate your help!
Thanks,

-Original Message-
From: Ma, Xiaohui (NIH/NLM/LHC) [C] 
Sent: Wednesday, August 11, 2010 10:36 AM
To: solr-user@lucene.apache.org
Cc: 'jayendra.patil@gmail.com'
Subject: RE: PDF file

Thanks so much for your help! I got Remote Streaming is disabled error. Would 
you please tell me if I miss something?

Thanks, 

-Original Message-
From: Jayendra Patil [mailto:jayendra.patil@gmail.com] 
Sent: Tuesday, August 10, 2010 8:51 PM
To: solr-user@lucene.apache.org
Subject: Re: PDF file

Try ...

curl 
http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?stream.file=
Full_Path_of_File/pub2009001.pdfliteral.id=777045commit=true

stream.file - specify full path
literal.extra params - specify any extra params if needed

Regards,
Jayendra

On Tue, Aug 10, 2010 at 4:49 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] 
xiao...@mail.nlm.nih.gov wrote:

 Thanks so much for your help! I tried to index a pdf file and got the
 following. The command I used is

 curl '
 http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?map.content=textmap.stream_name=idcommit=true'
 -F fi...@pub2009001.pdf

 Did I do something wrong? Do I need modify anything in schema.xml or other
 configuration file?

 
 [xiao...@lhcinternal lhc]$ curl '
 http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?map.content=textmap.stream_name=idcommit=true'
 -F fi...@pub2009001.pdf
 html
 head
 meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/
 titleError 404 /title
 /head
 bodyh2HTTP ERROR: 404/h2preNOT_FOUND/pre
 pRequestURI=/solr/lhc/update/extract/ppismalla href=
 http://jetty.mortbay.org/;Powered by Jetty:///a/small/i/pbr/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/

 /body
 /html
 ***

 -Original Message-
 From: Sharp, Jonathan [mailto:jsh...@coh.org]
 Sent: Tuesday, August 10, 2010 4:37 PM
 To: solr-user@lucene.apache.org
 Subject: RE: PDF file

 Xiaohui,

 You need to add the following jars to the lib subdirectory of the solr
 config directory on your server.

 (path inside the solr 1.4.1 download)

 /dist/apache-solr-cell-1.4.1.jar
 plus all the jars in
 /contrib/extraction/lib

 HTH

 -Jon
 
 From: Ma, Xiaohui (NIH/NLM/LHC) [C] [xiao...@mail.nlm.nih.gov]
 Sent: Tuesday, August 10, 2010 11:57 AM
 To: 'solr-user@lucene.apache.org'
 Subject: RE: PDF file

 Does anyone have any experience with PDF file? I really appreciate your
 help!
 Thanks so much in advance.

 -Original Message-
 From: Ma, Xiaohui (NIH/NLM/LHC) [C]
 Sent: Tuesday, August 10, 2010 10:37 AM
 To: 'solr-user@lucene.apache.org'
 Subject: PDF file

 I have a lot of pdf files. I am trying to import pdf files to solr and
 index them. I added ExtractingRequestHandler to solrconfig.xml.

 Please tell me if I need download some jar files.

 In the Solr1.4 Enterprise Search Server book, use following command to
 import a mccm.pdf.

 curl '
 http://localhost:8983/solr/solr-home/update/extract?map.content=textmap.stream_name=idcommit=true'
 -F fi...@mccm.pdf

 Please tell me if there is a way to import pdf files from a directory.

 Thanks so much for your help!



 -
 SECURITY/CONFIDENTIALITY WARNING:
 This message and any attachments are intended solely for the individual or
 entity to which they are addressed. This communication may contain
 information that is privileged, confidential, or exempt from disclosure
 under applicable law (e.g., personal health information, research data,
 financial information). Because this e-mail has been sent without
 encryption, individuals other than the intended recipient may be able to
 view the information, forward it to others or tamper with the information
 without the knowledge or consent of the sender. If you are not the intended
 recipient, or the employee or person responsible for delivering the message
 to the intended recipient, any dissemination, distribution or copying of the
 communication is strictly prohibited. If you received the communication in
 error, please notify the sender immediately by replying to this message and
 deleting the message and any accompanying files from your system. If, due to
 the security risks, you do not wish to receive further communications via
 e-mail, please reply to this message and inform

RE: hl.usePhraseHighlighter

2010-08-10 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks so much for your help! It works. I really appreciate it.

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Monday, August 09, 2010 6:05 PM
To: solr-user@lucene.apache.org
Subject: RE: hl.usePhraseHighlighter

 I used text type and found the following in schema.xml. I
 don't know which ones I should remove. 
 ***

You should remove filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/ from both index and query time.


  


PDF file

2010-08-10 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
I have a lot of pdf files. I am trying to import pdf files to solr and index 
them. I added ExtractingRequestHandler to solrconfig.xml. 

Please tell me if I need download some jar files. 

In the Solr1.4 Enterprise Search Server book, use following command to import a 
mccm.pdf.

curl 
'http://localhost:8983/solr/solr-home/update/extract?map.content=textmap.stream_name=idcommit=true'
 -F fi...@mccm.pdf

Please tell me if there is a way to import pdf files from a directory.

Thanks so much for your help!



RE: PDF file

2010-08-10 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Does anyone have any experience with PDF file? I really appreciate your help!
Thanks so much in advance.

-Original Message-
From: Ma, Xiaohui (NIH/NLM/LHC) [C] 
Sent: Tuesday, August 10, 2010 10:37 AM
To: 'solr-user@lucene.apache.org'
Subject: PDF file

I have a lot of pdf files. I am trying to import pdf files to solr and index 
them. I added ExtractingRequestHandler to solrconfig.xml. 

Please tell me if I need download some jar files. 

In the Solr1.4 Enterprise Search Server book, use following command to import a 
mccm.pdf.

curl 
'http://localhost:8983/solr/solr-home/update/extract?map.content=textmap.stream_name=idcommit=true'
 -F fi...@mccm.pdf

Please tell me if there is a way to import pdf files from a directory.

Thanks so much for your help!



RE: PDF file

2010-08-10 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks so much for your help! I tried to index a pdf file and got the 
following. The command I used is 

curl 
'http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?map.content=textmap.stream_name=idcommit=true'
 -F fi...@pub2009001.pdf

Did I do something wrong? Do I need modify anything in schema.xml or other 
configuration file?


[xiao...@lhcinternal lhc]$ curl 
'http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?map.content=textmap.stream_name=idcommit=true'
 -F fi...@pub2009001.pdf
html
head
meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/
titleError 404 /title
/head
bodyh2HTTP ERROR: 404/h2preNOT_FOUND/pre
pRequestURI=/solr/lhc/update/extract/ppismalla 
href=http://jetty.mortbay.org/;Powered by Jetty:///a/small/i/pbr/   
 
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/

/body
/html
***

-Original Message-
From: Sharp, Jonathan [mailto:jsh...@coh.org] 
Sent: Tuesday, August 10, 2010 4:37 PM
To: solr-user@lucene.apache.org
Subject: RE: PDF file

Xiaohui,

You need to add the following jars to the lib subdirectory of the solr config 
directory on your server. 

(path inside the solr 1.4.1 download)

/dist/apache-solr-cell-1.4.1.jar
plus all the jars in 
/contrib/extraction/lib

HTH 

-Jon

From: Ma, Xiaohui (NIH/NLM/LHC) [C] [xiao...@mail.nlm.nih.gov]
Sent: Tuesday, August 10, 2010 11:57 AM
To: 'solr-user@lucene.apache.org'
Subject: RE: PDF file

Does anyone have any experience with PDF file? I really appreciate your help!
Thanks so much in advance.

-Original Message-
From: Ma, Xiaohui (NIH/NLM/LHC) [C]
Sent: Tuesday, August 10, 2010 10:37 AM
To: 'solr-user@lucene.apache.org'
Subject: PDF file

I have a lot of pdf files. I am trying to import pdf files to solr and index 
them. I added ExtractingRequestHandler to solrconfig.xml.

Please tell me if I need download some jar files.

In the Solr1.4 Enterprise Search Server book, use following command to import a 
mccm.pdf.

curl 
'http://localhost:8983/solr/solr-home/update/extract?map.content=textmap.stream_name=idcommit=true'
 -F fi...@mccm.pdf

Please tell me if there is a way to import pdf files from a directory.

Thanks so much for your help!



-
SECURITY/CONFIDENTIALITY WARNING:  
This message and any attachments are intended solely for the individual or 
entity to which they are addressed. This communication may contain information 
that is privileged, confidential, or exempt from disclosure under applicable 
law (e.g., personal health information, research data, financial information). 
Because this e-mail has been sent without encryption, individuals other than 
the intended recipient may be able to view the information, forward it to 
others or tamper with the information without the knowledge or consent of the 
sender. If you are not the intended recipient, or the employee or person 
responsible for delivering the message to the intended recipient, any 
dissemination, distribution or copying of the communication is strictly 
prohibited. If you received the communication in error, please notify the 
sender immediately by replying to this message and deleting the message and any 
accompanying files from your system. If, due to the security risks, you do not 
wish to receive further communications via e-mail, please reply to this message 
and inform the sender that you do not wish to receive further e-mail from the 
sender. 

-



hl.usePhraseHighlighter

2010-08-09 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
I am trying to do exactly match. For example, I hope only get study highlighted 
if I search study, not others (studies, studied and so on).

I didn't find any function for it from SolrQuery. I added following in 
solrconfig.xml
str name=hl.usePhraseHighlightertrue/str.

Unfortunately I didn't get it work. 

Please help me out.

Thanks so much,
Xiaohui 


RE: hl.usePhraseHighlighter

2010-08-09 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks so much for your help!

I used text type and found the following in schema.xml. I don't know which ones 
I should remove. 
***
fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt 
ignoreCase=true expand=false/
--
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType
***

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Monday, August 09, 2010 4:32 PM
To: solr-user@lucene.apache.org
Subject: Re: hl.usePhraseHighlighter


 I am trying to do exactly match. For
 example, I hope only get study highlighted if I search
 study, not others (studies, studied and so on).

This has nothing to do with highlighting and its parameters. 
You need to remove stem filter factory (porter, snowball) from your analyzer 
chain. Re-start solr and re-index is also necessary.


  


how to highlight string in jsp

2010-08-02 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Hello,

I am trying to display the highlight string in different color on jsp. I use 
following in servlet.

query.setHighlight(true).setHighlightSnippets(1);
query.setParam(hl.fl, Abstract);

I wonder how I can display it in jsp

Thanks in advanced.
xm



display solr result in JSP

2010-07-28 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
I am new for solr. Just got example xml file index and search by following solr 
tutorial. I wonder how I can get the search result display in a JSP. I really 
appreciate any suggestions you can give.

Thanks so much,
Xiaohui


RE: display solr result in JSP

2010-07-28 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks so much for your reply. I don't have much experience at JSP. I found tag 
library, and am trying to use  xsltlib:apply xml=%= 
url.getContent().toString() % xsl=/xsl/result.xsl/ . Unfortunately I 
didn't get it work. 

Would you please give me more information? I really appreciate your help!

Thanks,
Xiaohui 

-Original Message-
From: Ranveer [mailto:ranveer.s...@gmail.com] 
Sent: Wednesday, July 28, 2010 11:27 AM
To: solr-user@lucene.apache.org
Subject: Re: display solr result in JSP

Hi,

very simple to display value in jsp. if you are using solrj then simply 
store value in bean from java class and can display.
same thing you can do in servlet too.. get the solr server response and 
return in bean or can display directly(in servlet).
hope you will able to do.

regards
Ranveer

On Wednesday 28 July 2010 08:11 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:
 I am new for solr. Just got example xml file index and search by following 
 solr tutorial. I wonder how I can get the search result display in a JSP. I 
 really appreciate any suggestions you can give.

 Thanks so much,
 Xiaohui





new to solr

2008-01-14 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Hello,

I am new to solr. I followed solr online tutorial to get the example
work. The search result is xml. I wonder if there is a way to show
result in a form. I saw there is example.xsl in conf/xslt directory. I
really don't know how to do it. Anyone has some ideas for me. I really
appreciate it!

Thanks,
Xiaohui 


RE: new to solr

2008-01-14 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks so much for your reply! Please tell me what example.xsl is for in
conf/xslt.

Please let me know where the search result is located. I can use php or
.net to display the result in web. Is it created on fly?

Thanks,
Xiaohui 

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Monday, January 14, 2008 11:37 AM
To: solr-user@lucene.apache.org
Subject: Re: new to solr

Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:
 Hello,
 
 I am new to solr. 

Welcome!

 I followed solr online tutorial to get the example
 work. The search result is xml. I wonder if there is a way to show
 result in a form. I saw there is example.xsl in conf/xslt directory. I
 really don't know how to do it. Anyone has some ideas for me. I really
 appreciate it!
 

Are you asking how to display results for people to see?  A nicely 
formatted website?

Solr (a database) does not aim to solve the display side... but there 
are lots of clients to help integrate with your website. 
php/java/.net/ruby/etc

ryan





RE: new to solr

2008-01-14 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks very much, Ryan. I really appreciate it. I will take a look on
both.

Best regards,
Xiaohui 

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Monday, January 14, 2008 11:56 AM
To: solr-user@lucene.apache.org
Subject: Re: new to solr

the example.xsl is an example using XSLT to format results.  Check:
http://wiki.apache.org/solr/XsltResponseWriter

For php, check:
http://wiki.apache.org/solr/SolPHP

ryan



Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:
 Thanks so much for your reply! Please tell me what example.xsl is for
in
 conf/xslt.
 
 Please let me know where the search result is located. I can use php
or
 .net to display the result in web. Is it created on fly?
 
 Thanks,
 Xiaohui 
 
 -Original Message-
 From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
 Sent: Monday, January 14, 2008 11:37 AM
 To: solr-user@lucene.apache.org
 Subject: Re: new to solr
 
 Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:
 Hello,

 I am new to solr. 
 
 Welcome!
 
 I followed solr online tutorial to get the example
 work. The search result is xml. I wonder if there is a way to show
 result in a form. I saw there is example.xsl in conf/xslt directory.
I
 really don't know how to do it. Anyone has some ideas for me. I
really
 appreciate it!

 
 Are you asking how to display results for people to see?  A nicely 
 formatted website?
 
 Solr (a database) does not aim to solve the display side... but there 
 are lots of clients to help integrate with your website. 
 php/java/.net/ruby/etc
 
 ryan