Hi, we have an Oracle database where we store Rtf content into a Clob column. Now we try to index those records but we just want to get the plain text, same as Tika does. I tried to use the TikaEntityProcessor but I’m getting the following error message:
ClassCastException: java.io.StringReader cannot be cast to java.io.InputStream The configuration looks like this: <dataSource name="f1" type="FieldReaderDataSource"/> <entity name="SV_SOLVE_TXT" onError="continue" transformer="ClobTransformer" query="select SOLUTION_ID, SOLUTION_TXT SOLUTION_TXT from IT_SOLUTION where SOLUTION_ID = '${ts3_it_solution_text_search.SOLUTION_ID}'"> <field name="text_4" column="SOLUTION_TXT" clob="true" /> <entity name="tika_SOLUTION_TXT" onError="continue" processor="TikaEntityProcessor" url="${SV_SOLVE_TXT.text_4}" dataField="SV_SOLVE_TXT.text_4" dataSource="f1" > <field name="text_1" column="text"/> </entity> </entity> Thx & Regards, Torsten -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html