Hi all!
I've written a function that makes use of a python module called ftfy and
made it available in R through Reticulate.
The aim is to fix broken encoding. The function is, unfortunately, a scalar
function.
I attempted to vectorise the function. by means of a for loop, but the
speed of this function is a real concern when the datasets get over 500 000
rows.

I've adapted the function to conditionally modify only broken text with
ifelse statements.

I *really* want to speed up this function using Rcpp, but there are two
problems

1. I tried researching how to call python functions from R through C++
scripts but none have been successful for me. (
https://gallery.rcpp.org/articles/rcpp-python/    )

2. I'm having trouble to modifying all elements of a StringVector using Rcpp

Any advice would be highly appreciated!

Attached are the vectorised function script, conditionally fixing broken
encoding, and lastly, my *flawed* Rcpp script


-- 

*Casper Crause*

*Cell:     072 475 8969*
*Email: ccraus...@gmail.com <ccraus...@gmail.com>*

Attachment: fixing encoding.R
Description: Binary data

Attachment: Rcpp_fix_encoding.cpp
Description: Binary data

_______________________________________________
Rcpp-devel mailing list
Rcpp-devel@lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

Reply via email to