Re: [rt-users] Bad characters in names loaded from LDAP (AD)

2016-10-11 Thread Bill Cole

On 11 Oct 2016, at 5:51, Jan Burian wrote:


Hi Bill,

thank you for your response. Sry not to mention our database.
We use PostreSQL.
After I wrote first email a also checked encoding in database.

The database was with following parameters:
   Name| Encoding |  Collate |   Ctype
-+-+-+--
  rt4  |  UTF8   | en_US.UTF-8 | en_US.UTF-8


And so my beautiful theory is destroyed by your brutal facts. :)


1) I dump database with UTF-8 encoding parameter.
2) Then I drop the databases.
3) Create new database with following parameters:

   Name| Encoding |  Collate |   Ctype
-+-+-+--
  rt4  |  UTF8   | cs_CZ.UTF-8 | cs_CZ.UTF-8

4) And then import database from dump.

But after that change names are loading from LDAP still with bad
characters :-/.


Indeed: the Collate and Ctype parameters are encoding-specific rulesets 
for how characters are related to each other, not variations on 
encoding.



When the user writes first email to queue, then is also autocreated as
unprivileged. If he/she was his/her name in From header, then is used 
as
RealName RT attribute. But in this case is his/her name saved 
correctly.


*Example from the log - autocreated from LDAP:*
[6937] [Tue Sep 27 15:59:25 2016] [info]:
RT::User::CanonicalizeUserInfoFromExternalAuth returning Disabled: ,
EmailAddress: no...@vsup.cz, Gecos: novak, Name: novak, Privileged: 1,
RealName: Matouš Novák, WorkPhone:  
(/opt/rt4/sbin/../lib/RT/User.pm:811)

[6937] [Tue Sep 27 15:59:25 2016] [info]: Autocreated external user
novak ( 61 ) (/opt/rt4/sbin/../lib/RT/Authen/ExternalAuth.pm:356)
[6937] [Tue Sep 27 15:59:25 2016] [info]:
RT::Authen::ExternalAuth::LDAP::GetAuth External Auth OK ( My_LDAP ):
novak (/opt/rt4/sbin/../lib/RT/Authen/ExternalAuth/LDAP.pm:348)
[6937] [Tue Sep 27 15:59:26 2016] [info]:
RT::User::CanonicalizeUserInfoFromExternalAuth returning EmailAddress:
no...@vsup.cz, Name: novak, *RealName: Matouš Novák*, WorkPhone:
(/opt/rt4/sbin/../lib/RT/User.pm:811)
*
**Example from the log - autocreated from email:*
[6026] [Mon Oct 10 06:26:02 2016] [info]:
RT::User::CanonicalizeUserInfoFromExternalAuth returning Comments:
Autocreated on ticket submission, Disabled: , EmailAddress:
tereza.skvar...@seznam.cz, Name: tereza.skvar...@seznam.cz, 
Privileged:

, *RealName: Tereza Škvárová* (/opt/rt4/sbin/../lib/RT/User.pm:811)

Any other ideas?


Yes: At least one of your FCGI handlers (PID 6937) is using an 8-bit 
encoding and at least one (PID 6026) is using UTF-8.


Note that both of those cases are being logged by the 
RT::User::CanonicalizeUserInfoFromExternalAuth method, which uses LDAP 
to retrieve the attribute it uses for the "RealName" field in RT. The 
first was logged by process 6937, the second by process 6026.


The *reason* for that is a bit of a mystery. It's clear that the 2 
processes were not started near the same time (unless that server is 
VERY busy spawning processes) so if you can determine what was different 
about how they were launched (likely a involving a locale environment 
variable, most likely LANG or LC_ALL) you can probably make sure that 
the improper launch doesn't happen.

-
RT 4.4 and RTIR training sessions, and a new workshop day! 
https://bestpractical.com/training
* Boston - October 24-26
* Los Angeles - Q1 2017

Re: [rt-users] Bad characters in names loaded from LDAP (AD)

2016-10-11 Thread Jan Burian
Hi all,

I finally resolved the issue with support from RT engineers. So big
thanks to them.
I'm posting the fix, if someone will be interested (maybe in the
future), so it can be found in list archive.

Here is answer from RT engineers:

/We use Net::LDAP and there is an option called 'raw' that might properly
convert the incoming content to utf8. That's the first thing to try
since we pass parameters through to Net::LDAP and you can put it right
in the config file. 
//https://metacpan.org/pod/distribution/perl-ldap/lib/Net/LDAP.pod//However, 
there is likely another bit of code we need to add to RT to be
explicit about the incoming text and treat it as utf8 when told to do
so. We can file it as a bug, or provide some commercial assistance if
you are interested. /

So I add

raw => qr/(?i:^jpegPhoto|;binary)/

as net_ldap_args parameter in RT_SiteConfig.pm.

Now it is all working fine, the names are imported correctly from LDAP
(MS AD, LDAP protocol version 3).
I also suggested to add information about raw option with example to RT
docs.

Best regards
Jan Burian



On 11.10.2016 11:51, Jan Burian wrote:
> Hi Bill,
>
> thank you for your response. Sry not to mention our database.
> We use PostreSQL.
> After I wrote first email a also checked encoding in database.
>
> The database was with following parameters:
>Name| Encoding |  Collate |   Ctype
> -+-+-+--
>   rt4  |  UTF8   | en_US.UTF-8 | en_US.UTF-8
>
> 1) I dump database with UTF-8 encoding parameter.
> 2) Then I drop the databases.
> 3) Create new database with following parameters:
>
>Name| Encoding |  Collate |   Ctype
> -+-+-+--
>   rt4  |  UTF8   | cs_CZ.UTF-8 | cs_CZ.UTF-8
>
> 4) And then import database from dump.
>
> But after that change names are loading from LDAP still with bad
> characters :-/.
>
> When the user writes first email to queue, then is also autocreated as
> unprivileged. If he/she was his/her name in From header, then is used
> as RealName RT attribute. But in this case is his/her name saved
> correctly.
>
> *Example from the log - autocreated from LDAP:*
> [6937] [Tue Sep 27 15:59:25 2016] [info]:
> RT::User::CanonicalizeUserInfoFromExternalAuth returning Disabled: ,
> EmailAddress: no...@vsup.cz, Gecos: novak, Name: novak, Privileged: 1,
> RealName: Matouš Novák, WorkPhone: 
> (/opt/rt4/sbin/../lib/RT/User.pm:811)
> [6937] [Tue Sep 27 15:59:25 2016] [info]: Autocreated external user
> novak ( 61 ) (/opt/rt4/sbin/../lib/RT/Authen/ExternalAuth.pm:356)
> [6937] [Tue Sep 27 15:59:25 2016] [info]:
> RT::Authen::ExternalAuth::LDAP::GetAuth External Auth OK ( My_LDAP ):
> novak (/opt/rt4/sbin/../lib/RT/Authen/ExternalAuth/LDAP.pm:348)
> [6937] [Tue Sep 27 15:59:26 2016] [info]:
> RT::User::CanonicalizeUserInfoFromExternalAuth returning EmailAddress:
> no...@vsup.cz, Name: novak, *RealName: Matouš Novák*, WorkPhone: 
> (/opt/rt4/sbin/../lib/RT/User.pm:811)
> *
> **Example from the log - autocreated from email:*
> [6026] [Mon Oct 10 06:26:02 2016] [info]:
> RT::User::CanonicalizeUserInfoFromExternalAuth returning Comments:
> Autocreated on ticket submission, Disabled: , EmailAddress:
> tereza.skvar...@seznam.cz, Name: tereza.skvar...@seznam.cz,
> Privileged: , *RealName: Tereza Škvárová*
> (/opt/rt4/sbin/../lib/RT/User.pm:811)
>
> Any other ideas?
>
> Best regards
> Jan Burian
>
> On 11.10.2016 05:41, Bill Cole wrote:
>> On 10 Oct 2016, at 16:26, Jan Burian wrote:
>>
>>> Hi all,
>>>
>>> we have RT 4.4.0 on CentOS 7 and Perl v5.22.1. And we are starting to
>>> use RT in production.
>>>
>>> We configured RT to authenticate users via LDAP
>>> (RT::Authen::ExternalAuth::LDAP). Our LDAP server is MS AD (Win 2008
>>> R2).
>> [...]
>>> Authentication is working fine. Users can log in, if the user doesn't
>>> exist in RT the account is autocreated. All the configured attributes
>>> are transferred.
>>
>> This is a strong sign that the LDAP part is working correctly. If the
>> LDAP server (AD) and client (Perl's Net::LDAP module) are using
>> mismatched encodings, it is likely to show up in authentication
>> failures due to incompatible encodings of the same (logical)
>> characters that 8-bit encodings assign to byte values 0x80-0xff.
>>
>> Fortunately, it is somewhere between arcane and impossible to make
>> Net::LDAP use anything other than UTF-8. There's *probably* some way
>> to make it do T.61 for ancient-history compatibility, but that's
>> mostly pointless.
>>
>> [...]
>>> We had similar problem with Moodle. When we configured Moodle against
>>> Active Directory and set cp1250 encoding, then it was doing exactly
>>> same
>>> thing. After we changed encoding for LDAP connector to utf-8 then the
>>> names was
>>> corrected.
>>
>> Which makes sense: LDAP v3 by default uses UTF-8 and you have a
>> modern system with a mature LDAP client. I know of no way to
>> 

Re: [rt-users] Bad characters in names loaded from LDAP (AD)

2016-10-11 Thread Jan Burian
Hi Bill,

thank you for your response. Sry not to mention our database.
We use PostreSQL.
After I wrote first email a also checked encoding in database.

The database was with following parameters:
   Name| Encoding |  Collate |   Ctype
-+-+-+--
  rt4  |  UTF8   | en_US.UTF-8 | en_US.UTF-8

1) I dump database with UTF-8 encoding parameter.
2) Then I drop the databases.
3) Create new database with following parameters:

   Name| Encoding |  Collate |   Ctype
-+-+-+--
  rt4  |  UTF8   | cs_CZ.UTF-8 | cs_CZ.UTF-8

4) And then import database from dump.

But after that change names are loading from LDAP still with bad
characters :-/.

When the user writes first email to queue, then is also autocreated as
unprivileged. If he/she was his/her name in From header, then is used as
RealName RT attribute. But in this case is his/her name saved correctly.

*Example from the log - autocreated from LDAP:*
[6937] [Tue Sep 27 15:59:25 2016] [info]:
RT::User::CanonicalizeUserInfoFromExternalAuth returning Disabled: ,
EmailAddress: no...@vsup.cz, Gecos: novak, Name: novak, Privileged: 1,
RealName: Matouš Novák, WorkPhone:  (/opt/rt4/sbin/../lib/RT/User.pm:811)
[6937] [Tue Sep 27 15:59:25 2016] [info]: Autocreated external user
novak ( 61 ) (/opt/rt4/sbin/../lib/RT/Authen/ExternalAuth.pm:356)
[6937] [Tue Sep 27 15:59:25 2016] [info]:
RT::Authen::ExternalAuth::LDAP::GetAuth External Auth OK ( My_LDAP ):
novak (/opt/rt4/sbin/../lib/RT/Authen/ExternalAuth/LDAP.pm:348)
[6937] [Tue Sep 27 15:59:26 2016] [info]:
RT::User::CanonicalizeUserInfoFromExternalAuth returning EmailAddress:
no...@vsup.cz, Name: novak, *RealName: Matouš Novák*, WorkPhone: 
(/opt/rt4/sbin/../lib/RT/User.pm:811)
*
**Example from the log - autocreated from email:*
[6026] [Mon Oct 10 06:26:02 2016] [info]:
RT::User::CanonicalizeUserInfoFromExternalAuth returning Comments:
Autocreated on ticket submission, Disabled: , EmailAddress:
tereza.skvar...@seznam.cz, Name: tereza.skvar...@seznam.cz, Privileged:
, *RealName: Tereza Škvárová* (/opt/rt4/sbin/../lib/RT/User.pm:811)

Any other ideas?

Best regards
Jan Burian

On 11.10.2016 05:41, Bill Cole wrote:
> On 10 Oct 2016, at 16:26, Jan Burian wrote:
>
>> Hi all,
>>
>> we have RT 4.4.0 on CentOS 7 and Perl v5.22.1. And we are starting to
>> use RT in production.
>>
>> We configured RT to authenticate users via LDAP
>> (RT::Authen::ExternalAuth::LDAP). Our LDAP server is MS AD (Win 2008
>> R2).
> [...]
>> Authentication is working fine. Users can log in, if the user doesn't
>> exist in RT the account is autocreated. All the configured attributes
>> are transferred.
>
> This is a strong sign that the LDAP part is working correctly. If the
> LDAP server (AD) and client (Perl's Net::LDAP module) are using
> mismatched encodings, it is likely to show up in authentication
> failures due to incompatible encodings of the same (logical)
> characters that 8-bit encodings assign to byte values 0x80-0xff.
>
> Fortunately, it is somewhere between arcane and impossible to make
> Net::LDAP use anything other than UTF-8. There's *probably* some way
> to make it do T.61 for ancient-history compatibility, but that's
> mostly pointless.
>
> [...]
>> We had similar problem with Moodle. When we configured Moodle against
>> Active Directory and set cp1250 encoding, then it was doing exactly same
>> thing. After we changed encoding for LDAP connector to utf-8 then the
>> names was
>> corrected.
>
> Which makes sense: LDAP v3 by default uses UTF-8 and you have a modern
> system with a mature LDAP client. I know of no way to configure a
> CentOS 7/Perl 5.22 system such that the LDAP interaction with an AD
> LDAP server talking UTF-8 would be the source of this sort of encoding
> conflict. I'm mildly surprised that anything talking LDAPv3 can be
> made to use cp1250 encoding, but I suppose Microsoft makes their own
> rules to go along with their own unique code pages.
>
> [...]
>> Also I red thath MS AD in LDAP protocol version 3 returns any string to
>> LDAP client in utf-8 encoding.
>> I really don't know where could be a problem.
>
> The most likely place is in your database. I'm guessing that you are
> using MySQL, which defaults to latin1 encoding. When you store a UTF-8
> string into a latin1 table, it breaks any multi-byte characters into 2
> or 3 characters, but the right bits are still there. This issue has
> come up a few times on this list over the past decade and I think Best
> Practical has documented how to safely convert a RT database with that
> sort of problem from latin1 to utf8. It is probably worth looking
> through their docs (possibly one of the UPGRADING* files?) and the RT
> Wiki for a solution. I expect it could be done with a binary dump of
> the database, altering of any latin1 tables to use utf8, and a
> re-import of the binary dump. I'm not 

Re: [rt-users] Bad characters in names loaded from LDAP (AD)

2016-10-10 Thread Bill Cole

On 10 Oct 2016, at 16:26, Jan Burian wrote:


Hi all,

we have RT 4.4.0 on CentOS 7 and Perl v5.22.1. And we are starting to
use RT in production.

We configured RT to authenticate users via LDAP
(RT::Authen::ExternalAuth::LDAP). Our LDAP server is MS AD (Win 2008 
R2).

[...]

Authentication is working fine. Users can log in, if the user doesn't
exist in RT the account is autocreated. All the configured attributes
are transferred.


This is a strong sign that the LDAP part is working correctly. If the 
LDAP server (AD) and client (Perl's Net::LDAP module) are using 
mismatched encodings, it is likely to show up in authentication failures 
due to incompatible encodings of the same (logical) characters that 
8-bit encodings assign to byte values 0x80-0xff.


Fortunately, it is somewhere between arcane and impossible to make 
Net::LDAP use anything other than UTF-8. There's *probably* some way to 
make it do T.61 for ancient-history compatibility, but that's mostly 
pointless.


[...]

We had similar problem with Moodle. When we configured Moodle against
Active Directory and set cp1250 encoding, then it was doing exactly 
same

thing. After we changed encoding for LDAP connector to utf-8 then the
names was
corrected.


Which makes sense: LDAP v3 by default uses UTF-8 and you have a modern 
system with a mature LDAP client. I know of no way to configure a CentOS 
7/Perl 5.22 system such that the LDAP interaction with an AD LDAP server 
talking UTF-8 would be the source of this sort of encoding conflict. I'm 
mildly surprised that anything talking LDAPv3 can be made to use cp1250 
encoding, but I suppose Microsoft makes their own rules to go along with 
their own unique code pages.


[...]
Also I red thath MS AD in LDAP protocol version 3 returns any string 
to

LDAP client in utf-8 encoding.
I really don't know where could be a problem.


The most likely place is in your database. I'm guessing that you are 
using MySQL, which defaults to latin1 encoding. When you store a UTF-8 
string into a latin1 table, it breaks any multi-byte characters into 2 
or 3 characters, but the right bits are still there. This issue has come 
up a few times on this list over the past decade and I think Best 
Practical has documented how to safely convert a RT database with that 
sort of problem from latin1 to utf8. It is probably worth looking 
through their docs (possibly one of the UPGRADING* files?) and the RT 
Wiki for a solution. I expect it could be done with a binary dump of the 
database, altering of any latin1 tables to use utf8, and a re-import of 
the binary dump. I'm not enough of a MySQL expert to detail that process 
(I generally use Postgres where possible.)

-
RT 4.4 and RTIR training sessions, and a new workshop day! 
https://bestpractical.com/training
* Boston - October 24-26
* Los Angeles - Q1 2017