RE: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-20 Thread Allison, Timothy B.
>http - however, the big advantage of doing your indexing on different machine >is that the heavy lifting that tika does in extracting text from documents, >finding metadata etc is not happening on the server. If the indexer crashes, >it doesn’t affect Solr either. +1 for what can go wrong:

RE: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-20 Thread Phil Scadden
: ZiYuan [mailto:ziyu...@gmail.com] Sent: Tuesday, 20 June 2017 11:29 p.m. To: solr-user@lucene.apache.org Subject: Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context Dear Erick and Timothy, I also took a look at the Python clients (say, SolrClient

RE: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-20 Thread Allison, Timothy B.
Yeah, Chris knows a thing or two about Tika. :) -Original Message- From: ZiYuan [mailto:ziyu...@gmail.com] Sent: Tuesday, June 20, 2017 8:00 AM To: solr-user@lucene.apache.org Subject: Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-20 Thread ZiYuan
No intention of spamming but I also want to mention tika-python in the toolchain. Ziyuan On Tue, Jun 20, 2017 at 2:29 PM, ZiYuan wrote: > Dear Erick and Timothy, > > I also took a look at the Python clients (say, SolrClient and

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-20 Thread ZiYuan
Dear Erick and Timothy, I also took a look at the Python clients (say, SolrClient and pysolr) because Python is my main programming language. I have an impression that 1. they send HTTP requests to the server according to the server APIs; 2. they are not official and thus possibly not up to date.

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-19 Thread ZiYuan
Dear Erick and Timothy, yes I will parse from the client for all the benefits. I am just trying to figure out what is going on by indexing one or two PDF files first. Thank you both. Best regards, Ziyuan On Mon, Jun 19, 2017 at 6:17 PM, Erick Erickson wrote: > bq:

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-19 Thread Erick Erickson
bq: Hope that there is no side effect of not mapping the PDF Well, yes it will have that side effect. You can cure that with a copyField directive from content to _text_. But do really consider running this as a SolrJ program on the client. Tim knows in far more painful detail than I do what

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-19 Thread ZiYuan
Hi Erick, Now it is clear. I have to update the request handler of /update/extract/ from "defaults":{"fmap.content":"_text_"} to "defaults":{"fmap.content":"content"} to fill the field. Hope that there is no side effect of not mapping the PDF content to _text_. Thank you for the hint. Best

RE: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-19 Thread Allison, Timothy B.
Finally, and I mean it this time, I heartily second Erik's point about SolrJ and the need to keep your file processing outside of Solr's JVM, VM and M! -Original Message- From: Erik Hatcher [mailto:erik.hatc...@gmail.com] Sent: Monday, June 19, 2017 6:56 AM To: solr-user@lucene.apache.org Subj

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-19 Thread Erik Hatcher
Ziyuan - You may be interested in the example/files that ships with Solr too. It’s got schema and config and even UI for file indexing and searching. Check it out README.txt under example/files in your Solr install. Erik > On Jun 19, 2017, at 6:52 AM, ZiYuan

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-19 Thread ZiYuan
Hi Erick, thanks very much for the explanations! Clarification for question 2: more specifically I cannot see the field content in the returned JSON, with the the same definitions as in the post

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-18 Thread Erick Erickson
1> Yes, you can use your single definition. The author identifies the "text" field as a catch-all. Somewhere in the schema there'll be a copyField directive copying (perhaps) many different fields to the "text" field. That permits simple searches against a single field rather than, say, using

Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-17 Thread ZiYuan
Hi, I am new to Solr and I need to implement a full-text search of some PDF files. The indexing part works out of the box by using bin/post. I can see search results in the admin UI given some queries, though without the matched texts and the context. Now I am reading this post