Re: [Gluster-users] Geo-Replication - UnicodeEncodeError: 'utf-8' codec can't encode character '\udcfc' in position 78: surrogates not allowed
Hi Andreas, recently i have been faced with the same fault. I'm pretty sure you are speaking german, that's why a translation should not be necessary. I found the reason by tracing a certain process which points to the gsyncd.log and looking backward from the error until i found some lgetxattr function call's. In the corresponding directory i found some filenames with 'special' characters. Rename fixed the problem. Below 'my' history and solution for UnicodeEncodeError und UnicodeDecodeError. Hope it helps...btw, we are running gfs 7.9 on Ubuntu 18.04. best regards Dietmar script fuer trace von geo-replication : [ 07:35:09 ] - root@gl-master-05 ~/tmp/geo-rep $cat trace_gf.sh #!/bin/bash # # script zum tracen der geo-rep aktivitaeten # script benoetigt pid # gedacht zum tracen der parent pid von master prozess auf gsyncd.log # in diesem beispiel pid 13620 # # #[ 16:19:24 ] - root@gl-master-05 /var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1 $lsof gsyncd.log #COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME #python3 13021 root 3w REG 8,2 2905607 9572924 gsyncd.log #python3 13619 root 3w REG 8,2 2905607 9572924 gsyncd.log #python3 13620 root 3w REG 8,2 2905607 9572924 gsyncd.log #[ 16:19:27 ] - root@gl-master-05 /var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1 $ # #gf_log="/var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1/gsyncd.log" tr_out="/root/tmp/geo-rep/trace-`date +"%H_%M_%S_%d_%m_%Y"`.out" echo "tr_out : $tr_out" #pid=`lsof "$gf_log" | grep -v COMMAND | head -1 | awk '{print $2}'` PID=$1 echo "pid : $PID" ps -p $PID > /dev/null 2>&1 if [ $? -ne 0 ] then echo "Pid $PID not running" exit fi nohup strace -t -f -s 256 -o $tr_out -p$PID & PID_STRACE=`ps -aef | grep -v grep | grep strace | awk '{print $2}'` echo "Pid von strace : $PID_STRACE" while true do filesize=`ls -l $tr_out | awk '{print $5}'` if [ $filesize -gt 10 ] then ps -p $PID > /dev/null 2>&1 if [ $? -eq 0 ] then kill -9 $PID_STRACE sleep 1 rm $tr_out nohup strace -t -f -s 256 -o $tr_out -p$PID & PID_STRACE=`ps -aef | grep -v grep | grep strace | awk '{print $2}'` echo "Pid von strace : $PID_STRACE" else echo "pid $PID laeuft nicht mehr" exit fi fi ps -p $PID > /dev/null 2>&1 if [ $? -ne 0 ] then echo "pid $PID laeuft nicht mehr..." exit fi sleep 120 echo "`date` : `ls -lh $tr_out`" done -- zu 2. Loesungsansatz (s.u.) : Fuer diesen Fehler reicht es den 'letzten' Prozess zu tracen. Hier 1236, nicht 13021. 13021 ist der 'mother' prozess, nach error werden die beien anderen gekillt und mit neuer pid gestartet, resultat von beobachtungen : [ 13:00:04 ] - root@gl-master-05 ~/tmp/geo-rep/15 $lsof /var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1/gsyncd.log COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME python3 1235 root 3w REG 8,2 2857996 9572924 /var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1/gsyncd.log python3 1236 root 3w REG 8,2 2857996 9572924 /var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1/gsyncd.log python3 13021 root 3w REG 8,2 2857996 9572924 /var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1/gsyncd.log [ 13:00:18 ] - root@gl-master-05 ~/tmp/geo-rep/15 $ [ 13:00:10 ] - root@gl-master-05 ~/tmp/geo-rep $strace -t -f -s 256 -o /root/tmp/geo-rep/gsyncd1.out -p1236 Um das file nicht zu gross werden zu lassen kann man den strace immer wieder killen, file loeschen, und strace neu starten. Pech natuerlich wenn gerade dann der Fehler auftritt. Das file hat schnell eine Groesse von 1GB und mehr (ca. 10 Minuten, je nach aktivitaet) und viele Millionen lines... geo-replication log beobachten, kill von o.g. pid ist allerdings nicht noetig. Der Prozess endet bei error, und damit auch der trace. [ 12:32:04 ] - root@gl-master-05 /var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1 $tail -f gsyncd.log ... [2021-02-11 12:53:59.530649] I [master(worker /brick1/mvol1):1441:process] _GMaster: Batch Completed mode=xsync duration=178.4717 changelog_start=1613041474 changelog_end=1613041474 num_changelogs=1 stime=None entry_stime=None [2021-02-11 12:53:59.639853] I [master(worker /brick1/mvol1):1681:crawl] _GMaster: processing xsync changelog path=/var/lib/misc/gluster/gsyncd/mvol1_gl-slave-01-int_svol1/brick1-mvol1/xsync/XSYNC-CHANGELOG.1613041477 ### [2021-02-11 13:00:57.149347] E [syncdutils(worker /brick1/mvol1):339:log_raise_exception] : FAIL: Traceback (most recent call last):
Re: [Gluster-users] Geo-Replication - UnicodeEncodeError: 'utf-8' codec can't encode character '\udcfc' in position 78: surrogates not allowed
Hi Dietmar, thank you for your reply. I've also started to trace this down and you are correct, the directory does contain filenames with 'special' characters (umlauts), but renaming them as a workaround unfortunately is not an option. So the question really is why does it fail on those characters and how to fix that so it doesn't error even if there are such filenames. Kind regards, Andreas Am 26.02.2021 um 14:16 schrieb Dietmar Putz: Hi Andreas, recently i have been faced with the same fault. I'm pretty sure you are speaking german, that's why a translation should not be necessary. I found the reason by tracing a certain process which points to the gsyncd.log and looking backward from the error until i found some lgetxattr function call's. In the corresponding directory i found some filenames with 'special' characters. Rename fixed the problem. Below 'my' history and solution for UnicodeEncodeError und UnicodeDecodeError. Hope it helps...btw, we are running gfs 7.9 on Ubuntu 18.04. best regards Dietmar script fuer trace von geo-replication : [ 07:35:09 ] - root@gl-master-05 ~/tmp/geo-rep $cat trace_gf.sh #!/bin/bash # # script zum tracen der geo-rep aktivitaeten # script benoetigt pid # gedacht zum tracen der parent pid von master prozess auf gsyncd.log # in diesem beispiel pid 13620 # # #[ 16:19:24 ] - root@gl-master-05 /var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1 $lsof gsyncd.log #COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME #python3 13021 root 3w REG 8,2 2905607 9572924 gsyncd.log #python3 13619 root 3w REG 8,2 2905607 9572924 gsyncd.log #python3 13620 root 3w REG 8,2 2905607 9572924 gsyncd.log #[ 16:19:27 ] - root@gl-master-05 /var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1 $ # #gf_log="/var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1/gsyncd.log" tr_out="/root/tmp/geo-rep/trace-`date +"%H_%M_%S_%d_%m_%Y"`.out" echo "tr_out : $tr_out" #pid=`lsof "$gf_log" | grep -v COMMAND | head -1 | awk '{print $2}'` PID=$1 echo "pid : $PID" ps -p $PID > /dev/null 2>&1 if [ $? -ne 0 ] then echo "Pid $PID not running" exit fi nohup strace -t -f -s 256 -o $tr_out -p$PID & PID_STRACE=`ps -aef | grep -v grep | grep strace | awk '{print $2}'` echo "Pid von strace : $PID_STRACE" while true do filesize=`ls -l $tr_out | awk '{print $5}'` if [ $filesize -gt 10 ] then ps -p $PID > /dev/null 2>&1 if [ $? -eq 0 ] then kill -9 $PID_STRACE sleep 1 rm $tr_out nohup strace -t -f -s 256 -o $tr_out -p$PID & PID_STRACE=`ps -aef | grep -v grep | grep strace | awk '{print $2}'` echo "Pid von strace : $PID_STRACE" else echo "pid $PID laeuft nicht mehr" exit fi fi ps -p $PID > /dev/null 2>&1 if [ $? -ne 0 ] then echo "pid $PID laeuft nicht mehr..." exit fi sleep 120 echo "`date` : `ls -lh $tr_out`" done -- zu 2. Loesungsansatz (s.u.) : Fuer diesen Fehler reicht es den 'letzten' Prozess zu tracen. Hier 1236, nicht 13021. 13021 ist der 'mother' prozess, nach error werden die beien anderen gekillt und mit neuer pid gestartet, resultat von beobachtungen : [ 13:00:04 ] - root@gl-master-05 ~/tmp/geo-rep/15 $lsof /var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1/gsyncd.log COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME python3 1235 root 3w REG 8,2 2857996 9572924 /var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1/gsyncd.log python3 1236 root 3w REG 8,2 2857996 9572924 /var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1/gsyncd.log python3 13021 root 3w REG 8,2 2857996 9572924 /var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1/gsyncd.log [ 13:00:18 ] - root@gl-master-05 ~/tmp/geo-rep/15 $ [ 13:00:10 ] - root@gl-master-05 ~/tmp/geo-rep $strace -t -f -s 256 -o /root/tmp/geo-rep/gsyncd1.out -p1236 Um das file nicht zu gross werden zu lassen kann man den strace immer wieder killen, file loeschen, und strace neu starten. Pech natuerlich wenn gerade dann der Fehler auftritt. Das file hat schnell eine Groesse von 1GB und mehr (ca. 10 Minuten, je nach aktivitaet) und viele Millionen lines... geo-replication log beobachten, kill von o.g. pid ist allerdings nicht noetig. Der Prozess endet bei error, und damit auch der trace. [ 12:32:04 ] - root@gl-master-05 /var/log/glusterfs/geo-replication/mvol1_gl-slave-01-int_svol1 $tail -f gsyncd.log ... [2021-02-11 12:53:59.530649] I [master(worker /brick1/mvol1):1441:process] _GMaster: Batch Completed mode=xsync duration=178.4717