subject:"Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context"

RE: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-20 Thread Allison, Timothy B.

>http - however, the big advantage of doing your indexing on different machine >is that the heavy lifting that tika does in extracting text from documents, >finding metadata etc is not happening on the server. If the indexer crashes, >it doesn’t affect Solr either. +1 for what can go wrong:

RE: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-20 Thread Phil Scadden

: ZiYuan [mailto:ziyu...@gmail.com] Sent: Tuesday, 20 June 2017 11:29 p.m. To: solr-user@lucene.apache.org Subject: Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context Dear Erick and Timothy, I also took a look at the Python clients (say, SolrClient

RE: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-20 Thread Allison, Timothy B.

Yeah, Chris knows a thing or two about Tika. :) -Original Message- From: ZiYuan [mailto:ziyu...@gmail.com] Sent: Tuesday, June 20, 2017 8:00 AM To: solr-user@lucene.apache.org Subject: Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-20 Thread ZiYuan

No intention of spamming but I also want to mention tika-python in the toolchain. Ziyuan On Tue, Jun 20, 2017 at 2:29 PM, ZiYuan wrote: > Dear Erick and Timothy, > > I also took a look at the Python clients (say, SolrClient and

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-20 Thread ZiYuan

Dear Erick and Timothy, I also took a look at the Python clients (say, SolrClient and pysolr) because Python is my main programming language. I have an impression that 1. they send HTTP requests to the server according to the server APIs; 2. they are not official and thus possibly not up to date.

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-19 Thread ZiYuan

Dear Erick and Timothy, yes I will parse from the client for all the benefits. I am just trying to figure out what is going on by indexing one or two PDF files first. Thank you both. Best regards, Ziyuan On Mon, Jun 19, 2017 at 6:17 PM, Erick Erickson wrote: > bq:

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-19 Thread Erick Erickson

bq: Hope that there is no side effect of not mapping the PDF Well, yes it will have that side effect. You can cure that with a copyField directive from content to _text_. But do really consider running this as a SolrJ program on the client. Tim knows in far more painful detail than I do what

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-19 Thread ZiYuan

Hi Erick, Now it is clear. I have to update the request handler of /update/extract/ from "defaults":{"fmap.content":"_text_"} to "defaults":{"fmap.content":"content"} to fill the field. Hope that there is no side effect of not mapping the PDF content to _text_. Thank you for the hint. Best

RE: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-19 Thread Allison, Timothy B.

Finally, and I mean it this time, I heartily second Erik's point about SolrJ and the need to keep your file processing outside of Solr's JVM, VM and M! -Original Message- From: Erik Hatcher [mailto:erik.hatc...@gmail.com] Sent: Monday, June 19, 2017 6:56 AM To: solr-user@lucene.apache.org Subj

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-19 Thread Erik Hatcher

Ziyuan - You may be interested in the example/files that ships with Solr too. It’s got schema and config and even UI for file indexing and searching. Check it out README.txt under example/files in your Solr install. Erik > On Jun 19, 2017, at 6:52 AM, ZiYuan

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-19 Thread ZiYuan

Hi Erick, thanks very much for the explanations! Clarification for question 2: more specifically I cannot see the field content in the returned JSON, with the the same definitions as in the post

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-18 Thread Erick Erickson

1> Yes, you can use your single definition. The author identifies the "text" field as a catch-all. Somewhere in the schema there'll be a copyField directive copying (perhaps) many different fields to the "text" field. That permits simple searches against a single field rather than, say, using

Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-17 Thread ZiYuan

Hi, I am new to Solr and I need to implement a full-text search of some PDF files. The indexing part works out of the box by using bin/post. I can see search results in the admin UI given some queries, though without the matched texts and the context. Now I am reading this post

RE: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

RE: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

RE: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

RE: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

13 matches

Site Navigation

Mail list logo

Footer information