I've indexed a file via ManifoldCF to Solr which has a content starts with:
*1. Vivien Leigh and Marlon Brando in "A Streetcar Named Desire" directed by Elia Kazan, 1951* *2. Portrait of Marlon Brando for "A Streetcar Named Desire" directed by Elia Kazan, 1951* *3. Portrait of Marlon Brando for "A Streetcar Named Desire" directed by Elia Kazan, 1951* However when I check Solr I see that at content: * " \n \nstream_source_info MARLON BRANDO.rtf \nstream_content_type application/rtf \nstream_size 13580 \nstream_name MARLON BRANDO.rtf \nContent-Type application/rtf \nresourceName MARLON BRANDO.rtf \n \n \n 1. Vivien Leigh and Marlon Brando in \"A Streetcar Named Desire\" directed by Elia Kazan \n"* There are 2 problems at here. 1) There are newline characters which are unnecessary. 2) There are metadata prepended to content field which should not be. So, one can think that problem maybe at Solr or ManifoldCF (related to Tika). When I index same document to Solr via cURL there are not new line characters or metadata prepended. What do you think about for a solution? Kind Regards, Furkan KAMACI
