Thanks - I'll have a look at these options too.

I'm happy to send over a sample document, but wasn't aware if attachments are allowed. The documents come Lexis+, so require user credentials to log in, but I could upload the file somewhere if that would help? Any ideas for a good location to do so?


On 29/12/2023 20:25, Dr Eberhard W Lisse wrote:
I would also look at https://pandoc.org perhaps which can
export a number of formats...

And for spreadsheets https://github.com/jqnatividad/qsv is my
goto weapon.  Can also read and write XLSX and others.

A sample document or two would always be helpful...

el

On 29/12/2023 21:01, CALUM POLWART wrote:
It sounded like he looked at officeR but I would agree

content <- officer::docx_summary("filename.docx")

Would get the text content into an object called content.

That object is a data.frame so you can then manipulate it.
To be more specific, we might need an example of the DF
[...]
On Fri, Dec 29, 2023 at 10:14 AM Andy <phaedr...@gmail.com>
wrote:
[...]
I'd like to be able to accomplish the following:

(1) Append the title, the month, the author, the number of
words, and page number(s) to a spreadsheet

(2) Read each article and extract keywords (in the docs,
these are listed in 'Subject' section as a list of
keywords with a percentage showing the extent to which the
keyword features in the article (e.g., FAST FASHION (72%))
and to append the keyword and the % coverage to the same
row in the spreadsheet.  However, I want to ensure that
the keyword coverage meets the threshold of >= 50%; if
not, then pass onto the next article in the directory.
Rinse and repeat for the entire directory.
[...]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to