Re: indexing pdf files using post tool

2016-03-19 Thread Francisco Andrés Fernández
Vidya, I don't know if I'm understanding it very well but, I think that the best way is to parse your text using a routine outside Solr. You might need to map the different parts of your document using your domain knowledge and use such routine to produce an XML document for example, with correspon

Re: indexing pdf files using post tool

2016-03-19 Thread Binoy Dalal
Take a look at the CloneFieldUpdateProcessorFactory here: http://www.solr-start.com/info/update-request-processors/ On Wed, 16 Mar 2016, 18:25 Binoy Dalal, wrote: > Like Francisco said, use a custom update processor to map the fields the > way you want and add it to your update chain. > > On Wed

Re: indexing pdf files using post tool

2016-03-18 Thread Binoy Dalal
Like Francisco said, use a custom update processor to map the fields the way you want and add it to your update chain. On Wed, 16 Mar 2016, 18:16 Francisco Andrés Fernández, wrote: > Vidya, I don't know if I'm understanding it very well but, I think that the > best way is to parse your text usin

Re: indexing pdf files using post tool

2016-03-18 Thread Jan Høydahl
Hi You can look at the Apache Tika project or the PDFBox project to parse your files before sending to Solr. Alternatively, if your processing is very simple, you can use the built-in Tika as U just did, and then deploy some UpdateRequestProcessor’s in order to modify the Tika output into whate

Re: indexing pdf files using post tool

2016-03-16 Thread vidya
Sorry for conveying it in wrong way. I want my data of 1 pdf file to be indexed with different fields in a document of solr according to data in it like name;id;title;content etc Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-pdf-files-using-post-tool-tp42

Re: indexing pdf files using post tool

2016-03-15 Thread roshan agarwal
Yes vidya, you just have to use copy field Roshan On Tue, Mar 15, 2016 at 3:07 PM, vidya wrote: > Hi > I got data into my content field. But i wanted to have differnt fields to > be > allocated for data in my file.How can I achieve this ? > > > > -- > View this message in context: > http://luce

Re: indexing pdf files using post tool

2016-03-15 Thread Binoy Dalal
You should use copy fields. https://cwiki.apache.org/confluence/display/solr/Copying+Fields On Tue, 15 Mar 2016, 15:07 vidya, wrote: > Hi > I got data into my content field. But i wanted to have differnt fields to > be > allocated for data in my file.How can I achieve this ? > > > > -- > View th

Re: indexing pdf files using post tool

2016-03-15 Thread vidya
Hi I got data into my content field. But i wanted to have differnt fields to be allocated for data in my file.How can I achieve this ? -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-pdf-files-using-post-tool-tp4263811p4263840.html Sent from the Solr - User mailing

Re: indexing pdf files using post tool

2016-03-15 Thread Binoy Dalal
Do you have a "content" field defined in your schema? Is it stored? By default, the content from the docs uploaded through post should be mapped to a field called "content". On Tue, 15 Mar 2016, 12:47 vidya, wrote: > Hi > I am trying to index a pdf file by using post tool in my linux system,Whe