A convenience method for getting a document's text in a single method would be
helpful.
---------------------------------------------------------------------------------------
Key: TIKA-20
URL: https://issues.apache.org/jira/browse/TIKA-20
Project: Tika
Issue Type: New Feature
Components: general
Affects Versions: 0.1-incubator
Reporter: Keith R. Bennett
Priority: Minor
Fix For: 0.1-incubator
A convenience method for getting a document's text in a single method would be
helpful.
This would address the common use case of wanting the string content, but not
the document metadata.
Sample methods are below:
------------------------------------------------------------------
/**
* Gets the full text (but not other properties of the document
* at the specified URL.
*
* @param documentUrl URL of the resource to parse
* @param configUrl url of Tika configuration object
* @return the document's full text
*/
public static String getStrContent(URL documentUrl, URL configUrl)
throws LiusException, IOException {
return getStrContent(documentUrl,
LiusConfig.getInstance(configUrl));
}
/**
* Gets the full text (but not other properties of the document
* at the specified URL.
*
* @param documentUrl URL of the resource to parse
* @param config Tika configuration object
* @return the document's full text
*/
public static String getStrContent(URL documentUrl, LiusConfig config)
throws LiusException, IOException {
String fulltext = null;
if (documentUrl != null) {
Parser parser = ParserFactory.getParser(documentUrl, config);
fulltext = parser.getStrContent();
}
return fulltext;
}
=========================
This code assumes changes to the code base that are not (yet) committed that
will enable us to use URL's for input document specifiers. (See TIKA-17.)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.