Is this being indexed via the extracting update handler?  What does your
pipeline look like?  Is the tika extractor in the pipeline?


Karl


On Thu, Nov 24, 2016 at 12:52 PM, Furkan KAMACI <[email protected]>
wrote:

> I've indexed a file via ManifoldCF to Solr which has a content starts with:
>
> *1. Vivien Leigh and Marlon Brando in "A Streetcar Named Desire" directed
> by Elia Kazan, 1951*
>
> *2. Portrait of Marlon Brando for "A Streetcar Named Desire" directed by
> Elia Kazan, 1951*
>
> *3. Portrait of Marlon Brando for "A Streetcar Named Desire" directed by
> Elia Kazan, 1951*
>
> However when I check Solr I see that at content:
>
> * " \n \nstream_source_info MARLON BRANDO.rtf   \nstream_content_type
> application/rtf   \nstream_size 13580   \nstream_name MARLON BRANDO.rtf
> \nContent-Type application/rtf   \nresourceName MARLON BRANDO.rtf   \n  \n
> \n  1. Vivien Leigh and Marlon Brando in \"A Streetcar Named Desire\"
> directed by Elia Kazan \n"*
>
> There are 2 problems at here.
>
> 1) There are newline characters which are unnecessary.
>
> 2) There are metadata prepended to content field which should not be.
>
> So, one can think that problem maybe at Solr or ManifoldCF (related to
> Tika). When I index same document to Solr via cURL there are not new line
> characters or metadata prepended.
>
> What do you think about for a solution?
>
> Kind Regards,
> Furkan KAMACI
>
>

Reply via email to