I'm using this C# code to call the parser directly via its URL; it returns JSON:
var url = @"http://localhost:8983/solr/update/extract"; var client = new WebClient(); client.QueryString.Add("extractOnly","true"); client.QueryString.Add("wt","json"); var data = client.UploadFile(url, "input.txt"); var json = ASCIIEncoding.ASCII.GetString(data); Sincerely, Alex -----Original Message----- From: Nick Burch [mailto:[email protected]] Sent: 16 August 2012 6:36 PM To: [email protected] Subject: Re: Return raw text from document On Thu, 16 Aug 2012, Alexander Cougarman wrote: > Is it possible to return just the raw text of the document extracted > by Tika? In other words, we don't want it in XML or JSON, just the > text in it. Yes. Are you using the TikaApp jar, calling the Tika facade class, or calling a parser directly? Nick
