Hello, On Thursday, March 07, 2013 09:47:21 AM Michael Wood wrote: > Hi > > On 6 March 2013 19:09, Ali Bendriss <ali.bendr...@gmail.com> wrote: > > On Wednesday, March 06, 2013 06:50:46 PM Michael Wood wrote: > >> Hi > >> > >> On 6 March 2013 16:43, Ali Bendriss <ali.bendr...@gmail.com> wrote: > >> > Hello, > >> > > >> > I'm running samba 4.0.3. > >> > when I query the operatingsystem attribute using > >> > ldapsearch ... -P 3 "(objectCategory=computer)" > >> > > >> > The operatingsystem value returned for "Windows 7 Professionnel N" > >> > is operatingSystem:: V2luZG93c8KgNyBQcm9mZXNzaW9ubmVsIE4= > >> > which translate to Windows 7 Professionnel N > >> > But when I look at it using dsa.msc I can read "Windows 7 Professionnel > >> > N" > >> > >> Are you worried about the "Â"? That's actually a non-breaking space > >> character (like in HTML). > > > > my mistake in fact it return Windows + something not convertible to utf8. > > It is encoded as UTF-8. It should not be "converted to" UTF-8. > > That base64 encoded string decodes to: > > $ python -c 'print > repr("V2luZG93c8KgNyBQcm9mZXNzaW9ubmVsIE4=".decode("base64"))' > 'Windows\xc2\xa07 Professionnel N' > > which Python is quite happy to interpret as UTF-8: > > $ python -c 'print > repr("V2luZG93c8KgNyBQcm9mZXNzaW9ubmVsIE4=".decode("base64").decode("utf-8") > )' u'Windows\xa07 Professionnel N' > > If you look here: > > http://en.wikipedia.org/wiki/Non-breaking_space#Encodings > > you will see that the UTF-8 encoding of a non-breaking space is the > two bytes 0xC2 and 0xA0 which is exactly what your data contains. And > the Unicode code point is U+00A0, which Python prints as u'\xa0'. > > So it seems something else is going on between getting the information > from Samba and sending it to Postgres. >
Thank you for your valuable input. You are perfectly correct the culprit was my "to_lower_case" routine. My I ask you some info about the date format used in samba. In example the attribute whenCreated, whenChanged whenCreated: 20120402125316.0Z whenChanged: 20130208010036.0Z I can see : %Y%m%d maybe after it is %H%M%S But what is ".0Z" ? > > I'm trying to get the computers info in a postgresql database and get in > > postgresql log file > > > > ERROR: invalid byte sequence for encoding "UTF8": 0xe2 0xa0 0x37 > > > >> > For other system, it's fine, I've got "Windows XP Professional", "Mac > >> > OS > >> > X", "Windows 7 Professionnel" > >> > I've got only the problem for the 'N' version. > >> > > >> > Could someone let me know if he can see or not the same problem. > >> > > >> > thanks -- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/options/samba