Prezados membros do r-br,


Eu gostaria de criar um data frame à partir de output de uma análise em *txt, sendo:


#Arquivo original
https://www.dropbox.com/s/pncmjwl3camap6d/log.txt?dl=0

#Faço a leitura do arquivo
myfile<-read.table("log.txt", sep="\t", quote="", comment.char="")


#Estrutura parcial do arquivo myfile
#
obj
Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005
Resizing
416
Loaded: 0.062388 seconds
Region 82 Avg IOU: 0.254732, Class: 0.000000, Obj: 0.575008, No Obj: 0.417811, .5R: 0.000000, .75R: 0.000000,  count: 4 Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.496387, .5R: -nan, .75R: -nan,  count: 0 Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.415856, .5R: -nan, .75R: -nan,  count: 0 Region 82 Avg IOU: 0.263274, Class: 0.000000, Obj: 0.306391, No Obj: 0.418069, .5R: 0.000000, .75R: 0.000000,  count: 4 Region 94 Avg IOU: 0.435966, Class: 0.000000, Obj: 0.207774, No Obj: 0.496172, .5R: 0.000000, .75R: 0.000000,  count: 1 Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.413582, .5R: -nan, .75R: -nan,  count: 0 Region 82 Avg IOU: 0.303235, Class: 0.000000, Obj: 0.424457, No Obj: 0.418686, .5R: 0.000000, .75R: 0.000000,  count: 4 Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.496352, .5R: -nan, .75R: -nan,  count: 0 Region 106 Avg IOU: 0.579218, Class: 0.000000, Obj: 0.502197, No Obj: 0.415232, .5R: 1.000000, .75R: 0.000000,  count: 1 Region 82 Avg IOU: 0.187162, Class: 0.000000, Obj: 0.501398, No Obj: 0.416089, .5R: 0.000000, .75R: 0.000000,  count: 5 Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.496362, .5R: -nan, .75R: -nan,  count: 0 Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.414499, .5R: -nan, .75R: -nan,  count: 0 Region 82 Avg IOU: 0.271427, Class: 0.000000, Obj: 0.481964, No Obj: 0.417647, .5R: 0.166667, .75R: 0.000000,  count: 6 Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.495838, .5R: -nan, .75R: -nan,  count: 0 Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.415899, .5R: -nan, .75R: -nan,  count: 0 Region 82 Avg IOU: 0.285605, Class: 0.000000, Obj: 0.469981, No Obj: 0.417026, .5R: 0.000000, .75R: 0.000000,  count: 3 Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.494833, .5R: -nan, .75R: -nan,  count: 0 Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.413943, .5R: -nan, .75R: -nan,  count: 0 Region 82 Avg IOU: 0.300229, Class: 0.000000, Obj: 0.313481, No Obj: 0.416831, .5R: 0.000000, .75R: 0.000000,  count: 6 Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.495936, .5R: -nan, .75R: -nan,  count: 0 Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.413855, .5R: -nan, .75R: -nan,  count: 0 Region 82 Avg IOU: 0.384617, Class: 0.000000, Obj: 0.398042, No Obj: 0.418052, .5R: 0.333333, .75R: 0.000000,  count: 3 Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.496205, .5R: -nan, .75R: -nan,  count: 0 Region 106 Avg IOU: 0.144387, Class: 0.000000, Obj: 0.349722, No Obj: 0.414624, .5R: 0.000000, .75R: 0.000000,  count: 1
1: 799.219543, 799.219543 avg, 0.000000 rate, 654.661284 seconds, 24 images
Loaded: 0.000042 seconds
Region 82 Avg IOU: 0.308919, Class: 0.000000, Obj: 0.264983, No Obj: 0.418332, .5R: 0.250000, .75R: 0.000000,  count: 4 Region 94 Avg IOU: 0.204282, Class: 0.000000, Obj: 0.167168, No Obj: 0.495162, .5R: 0.000000, .75R: 0.000000,  count: 2 Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.415848, .5R: -nan, .75R: -nan,  count: 0 Region 82 Avg IOU: 0.274081, Class: 0.000000, Obj: 0.471111, No Obj: 0.418323, .5R: 0.000000, .75R: 0.000000,  count: 3 Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.495826, .5R: -nan, .75R: -nan,  count: 0
...
55: 1025.803833, 1181.399658 avg, 0.000000 rate, 919.132681 seconds, 1320 images
Loaded: 0.000050 seconds
#

Agora, eu quero criar um data frame onde eu não preciso de toda essa informação e eu sei que cada linha que eu preciso está acima de linha que começam com a expressão "Loaded:", sendo as minhas linhas de interesse caracterizadas pela estrutura "1: 799.219543, 799.219543 avg, 0.000000 rate, 654.661284 seconds, 24 images".

Eu preciso que seja criada alguma regra (informação desnecessária começa com a expressão "Region" e ocorre a cada 24 linhas) para que eu consiga inicialmente isolar a informação pertinente, ficando meu output processado com 55 linhas:

#
1: 799.219543, 799.219543 avg, 0.000000 rate, 654.661284 seconds, 24 images
2: 799.555359, 799.253113 avg, 0.000000 rate, 672.519735 seconds, 48 images
...
55: 1025.803833, 1181.399658 avg, 0.000000 rate, 919.132681 seconds, 1320 images
#

e após com alguma manipulação a mais de modo a reorganizar a informação isolada, conseguir gerar o meu data frame final, que seria:

#
iteration  total_loss      loss_error     rate time               n_images
1               799.219543  799.219543  0.000000  654.661284 24
2               799.555359  799.253113  0.000000  672.519735 48
...
55            1025.803833 1181.399658 0.000000  919.132681  1320
#

Alguém que trabalha com manipulação de tabelas em R teria alguma dica para dar?

Obrigado,

Alexandre


_______________________________________________
R-br mailing list
R-br@listas.c3sl.ufpr.br
https://listas.inf.ufpr.br/cgi-bin/mailman/listinfo/r-br
Leia o guia de postagem (http://www.leg.ufpr.br/r-br-guia) e forneça código 
mínimo reproduzível.

Responder a