Hi, if a fetched HTML page (using SimplePostTool: -Ddata=web) contains <script> and <style> tags inside the <body> tag (not in <head> tag ) the innerText ( i.e. EMAC/JS scripts and CSS styles) of these tags remains as part of document text inside the "content"/"_text_" field in indexed documents.
So when I search in _text_ for "push(arguments)", for example, i get a result :( Any idea how to remove these unwanted content? Using: Solr 6.6.0. Solrconfig.xml: <requestHandler name="/update/extract" startup="lazy" class="solr.extraction.ExtractingRequestHandler" > <lst name="defaults"> <str name="lowernames">true</str> <str name="uprefix">ignored_</str> <str name="captureAttr">true</str> <str name="fmap.meta">ignored_</str> <str name="fmap.content">plaintext</str> </lst> </requestHandler> Thanks in advance Daniel