Hi all, Currently I am trying to do index documents from different kinds with Solr and tika. It's working fine but when solr returns the content of the document. Doesn't return the plain text. It comes back as well with some metadata.
For instance my request. http://localhost:8983/solr/document/update/extract?extractOnly=true&stream.file=C:\TIKA\FileTest\Test.txt Content of Test.txt file is just "*Test File*". Response from Solr as you can see below returns plenty of information. I would the answer to be something like this without noise for the search. <str name="Test.txt"> Test File </str> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">135</int> </lst> <str name="Test.txt"> <?xml version="1.0" encoding="UTF-8"?> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta name="stream_size" content="13"/> <meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser"/> <meta name="X-Parsed-By" content="org.apache.tika.parser.txt.TXTParser"/> <meta name="stream_name" content="Test.txt"/> <meta name="stream_source_info" content="file:/C:/TIKA/FileTest/Test.txt"/> <meta name="Content-Encoding" content="ISO-8859-1"/> <meta name="Content-Type" content="text/plain; charset=ISO-8859-1"/> <title></title> </head> <body> <p>Test File</p> </body> </html> </str> <lst name="Test.txt_metadata"> <arr name="stream_size"> <str>13</str> </arr> <arr name="X-Parsed-By"> <str>org.apache.tika.parser.DefaultParser</str> <str>org.apache.tika.parser.txt.TXTParser</str> </arr> <arr name="stream_name"> <str>Test.txt</str> </arr> <arr name="stream_source_info"> <str>file:/C:/TIKA/FileTest/Test.txt</str> </arr> <arr name="Content-Encoding"> <str>ISO-8859-1</str> </arr> <arr name="Content-Type"> <str>text/plain; charset=ISO-8859-1</str> </arr> </lst> </response> Can anyone give some light here? Thanks a lot. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html