Manel,

# I recommend the stringi package, which has deliberately copied much
#    of the syntax used by Hadley's wonderful stringr package.
# However, it does more and is much faster.
# Since you are not requiring anything complex you can use base R functionality
#    for everything, but I think the stringi syntax is cleaner and it is all
#    C++ code under the covers.

library(stringi)
seed(1)
len <- 10
# Let's make a dataframe with some data.
# The column name of 'NIF/NIE' is problematic.
# Although you can force it, R is not going to like it.
# Use an underscore or period.

df_2 <- data.frame('NIF_NIE' = sample(c(NA, 'starts with alpha', 
'1234numerals'),
                                      len, replace = TRUE),
                   col2 = 1:len, stringsAsFactors = FALSE)
df_2 # see what it looks like
df_2$NIF_NIE[!is.na(df_2$NIF_NIE) &
    stri_detect_regex(df_2$NIF_NIE, '^[0-9]')] <- 'SLOPD'
df_2

# end of code

On my iMac with len set to 10^6, this takes less than a tenth of a second.

> start.time <- Sys.time()
> len <- 1000000
> df_2 <- data.frame('NIF_NIE' = sample(c(NA, 'starts with alpha', 
> '1234numerals'), len, replace = TRUE),
+     col2 = 1:len, stringsAsFactors = FALSE)
> end.time <- Sys.time()
> time.taken <- end.time - start.time
> time.taken
Time difference of 0.08968401 secs

R. Mark Sharp, Ph.D.
Director of Primate Records Database
Southwest National Primate Research Center
Texas Biomedical Research Institute
P.O. Box 760549
San Antonio, TX 78245-0549
Telephone: (210)258-9476
e-mail: [email protected]







> On Feb 5, 2015, at 2:08 AM, Manel Amado Martí <[email protected]> 
> wrote:
>
> I'm processing a table database. To do that, I put it in a dataframe, and 
> then I do the data processing (normalization of some fields). I'm used to 
> program in C, and some R's facilities are not so natural to me, please, 
> excuse me if the question is for "dummies".
> In the processing, I want to substitute some field's value depending on the 
> previous content. For example, if field starts with a digit instead of an 
> alpha character, the entire field from the actual row, I'll replace it with 
> "SOLPD". I'm sure that would be another way (maybe through some apply 
> function), but I can't figure how to do.
> The code that I'm using now, is:
> for( i in 1:nrow(dataframe2)) {
>        if(is.na(dataframe2[i,"NIF/NIE"])==FALSE){
>                if(str_locate(dataframe2[i,"NIF/NIE"],"\\d")[1]<2){
>                        sprintf("elimina NIF aut�nom: % i\n",i)
>                        dataframe2[i,"NIF"]<-"SLOPD"}
>                }
>        }
> }
>
> Thank you for your attention!
>
>
>
>
> Manel Amado i Mart�
> Cap d'Assessoria de Comer� Interior
> [email protected]<mailto:[email protected]>
> Tel. 93 745 12 63 � Fax 93 745 12 64    
> [http://www.cambrasabadell.org/Ficheros/mails/Plantilles/facebook.png] 
> <https://www.facebook.com/cambrasabadell>   
> [http://www.cambrasabadell.org/Ficheros/mails/Plantilles/Twitter.png] 
> <https://twitter.com/CambraSabadell>   
> [http://www.cambrasabadell.org/Ficheros/mails/Plantilles/LinkedIn.png] 
> <http://www.linkedin.com/company/cambra-de-comer-de-sabadell?trk=company_name>
> Av. Francesc Maci�, 35 � 08206 Sabadell
> Apt. corr. 119 � www.cambrasabadell.org<http://www.cambrasabadell.org>
>
> [http://www.cambrasabadell.org/Ficheros/mails/Plantilles/peu.png]
> Aquest missatge pot contenir informaci� confidencial o sotmesa a secret 
> professional, la divulgaci� de la qual est� prohibida per la llei. Si no sou 
> el destinatari del missatge, si us plau, esborreu-lo i comuniqueu-nos-ho 
> immediatament, no el reenvieu ni en copieu el contingut. Si la vostra empresa 
> no permet rebre missatges d'aquesta mena, si us plau, feu-nos-ho saber 
> immediatament.
> Este mensaje puede contener informaci�n confidencial o sometida a secreto 
> profesional, cuya divulgaci�n est� prohibida por la ley. Si no es usted el 
> destinatario del mensaje, le rogamos que lo borre y nos lo notifique 
> inmediatamente; no lo reenv�e ni copie su contenido. Si su empresa no permite 
> la recepci�n de mensajes de este tipo, por favor h�ganoslo saber 
> inmediatamente.
> This message may contain confidential information that i...{{dropped:11}}
>
> <ATT00001.c>


NOTICE:  This E-Mail (including attachments) is confidential and may be legally 
privileged.  It is covered by the Electronic Communications Privacy Act, 18 
U.S.C.2510-2521.  If you are not the intended recipient, you are hereby 
notified that any retention, dissemination, distribution or copying of this 
communication is strictly prohibited.  Please reply to the sender that you have 
received this message in error, then delete it.
_______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-teaching

Reply via email to