Re: [Puppet Users] Re: puppetdb: UTF-8 byte sequence

2012-06-15 Thread David Schmitt

On 14.06.2012 19:07, Deepak Giridharagopal wrote:

At least there is a potential for some user guidance.  For example,
would the problem be adequately addressed if all manifests and data
were encoded in UTF-8 and the agent were ensured to run in a
UTF-8-based locale?



Correct on all accounts, I think. I'll add that suggestion to the
ticket. Ultimately, this needs be fixed in core.


JFTR: +1 for recommending/enforcing UTF-8 on manifests. File contents 
and data of course is something completely different, but you know that 
anyways.



Best Regards, David

--
You received this message because you are subscribed to the Google Groups Puppet 
Users group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to 
puppet-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/puppet-users?hl=en.



Re: [Puppet Users] Re: puppetdb: UTF-8 byte sequence

2012-06-14 Thread Antidot SAS
Hi,

I have no idea how I can help, tell me what to do and I would be glad to
help.


Regards,
Jeremy MAURO


On Thu, Jun 14, 2012 at 12:11 AM, Chris Price ch...@puppetlabs.com wrote:

 Because the serialization format (JSON) and the database both require
 UTF-8 character encoding for their data, puppetdb needs to encode strings
 before it sends them from the puppet master to the puppetdb server.  Due to
 limitations in Puppet's representation of strings (character encoding is
 not explicitly specified), it's not possible for us to do anything too
 fancy when we encounter a byte sequence that is not directly representable
 in UTF-8.  Thus, when this scenario occurs, you will see the warning that
 you mentioned.  This does mean that we will be discarding the invalid bytes.

  Whether or not this is cause for concern in your particular case depends
 on which resource triggered the warning, and what your use case for that
 resource is.  If the offending resource is an exported resource that other
 nodes are relying on, then this could cause problems.  If the offending
 resource is one that you query or report on, then your data could be skewed
 slightly.  Otherwise, this is effectively harmless for you.

  One thing that we should do on our end, though, is try to provide a bit
 more context to the warning message to help you try to identify which
 resource is causing the warning.  To that end I've filed the following
 ticket:

 http://projects.puppetlabs.com/issues/15016

 (Also worth noting: in the existing/old storeconfigs, the behavior for
 handling this scenario is undefined... so for us, this warning is a first
 step towards providing comprehensive, robust support for handling string
 encoding.)

 We are definitely interested in hearing more details about your setup if
 this does cause you any problems.

 Thanks for the feedback!
 Chris

 On Wednesday, June 13, 2012 6:06:38 AM UTC-7, jcbollinger wrote:



 On Wednesday, June 13, 2012 5:51:22 AM UTC-5, A_SAAS wrote:

 Me again regarding puppetdb, I have the following warning message:
 Jun 13 12:49:15 puppetmaster puppet-master[28444]: Ignoring invalid
 UTF-8 byte sequences in data to be sent to PuppetDB

 Do I have to worry?


 I don't know any relevant specifics about PuppetDB, but on general
 principles I would say that to the extent you rely on the data curated by
 PuppetDB to be correct, yes, you should worry.  The message suggests data
 stream corruption between PuppetDB and whatever other part of the master is
 talking to it at that point.  Probably they disagree about what character
 encoding to use, but whatever the cause of the problem, the message
 suggests that PuppetDB interpreted the data in question differently than
 its source intended.  There is a bug of some kind in there, so I would file
 a ticket.


 John

 --
 You received this message because you are subscribed to the Google Groups
 Puppet Users group.
 To view this discussion on the web visit
 https://groups.google.com/d/msg/puppet-users/-/PZtYDMbV1XQJ.

 To post to this group, send email to puppet-users@googlegroups.com.
 To unsubscribe from this group, send email to
 puppet-users+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/puppet-users?hl=en.


-- 
You received this message because you are subscribed to the Google Groups 
Puppet Users group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to 
puppet-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/puppet-users?hl=en.



Re: [Puppet Users] Re: puppetdb: UTF-8 byte sequence

2012-06-14 Thread Chris Price
No action necessary; we should be able to create repro scenarios that will 
help us provide more info in the warning message (and resolve the ticket 
that I mentioned).  If you happen to know (or are able to identify) which 
resource in your system is triggering the warning (because of a String that 
contains a non-UTF-8 byte sequence), it would be interesting to see what 
your resource looked like.  Otherwise, since the odds are high that the 
warning should be harmless, just let us know if you notice any other 
unusual behavior or problems that you suspect might be related to this.

Thanks again for the feedback!

On Thursday, June 14, 2012 3:20:07 AM UTC-7, A_SAAS wrote:

 Hi,

 I have no idea how I can help, tell me what to do and I would be glad to 
 help.


 Regards,
 Jeremy MAURO


 On Thu, Jun 14, 2012 at 12:11 AM, Chris Price ch...@puppetlabs.comwrote:

 Because the serialization format (JSON) and the database both require 
 UTF-8 character encoding for their data, puppetdb needs to encode strings 
 before it sends them from the puppet master to the puppetdb server.  Due to 
 limitations in Puppet's representation of strings (character encoding is 
 not explicitly specified), it's not possible for us to do anything too 
 fancy when we encounter a byte sequence that is not directly representable 
 in UTF-8.  Thus, when this scenario occurs, you will see the warning that 
 you mentioned.  This does mean that we will be discarding the invalid bytes.

  Whether or not this is cause for concern in your particular case depends 
 on which resource triggered the warning, and what your use case for that 
 resource is.  If the offending resource is an exported resource that other 
 nodes are relying on, then this could cause problems.  If the offending 
 resource is one that you query or report on, then your data could be skewed 
 slightly.  Otherwise, this is effectively harmless for you.

  One thing that we should do on our end, though, is try to provide a bit 
 more context to the warning message to help you try to identify which 
 resource is causing the warning.  To that end I've filed the following 
 ticket:

 http://projects.puppetlabs.com/issues/15016

 (Also worth noting: in the existing/old storeconfigs, the behavior for 
 handling this scenario is undefined... so for us, this warning is a first 
 step towards providing comprehensive, robust support for handling string 
 encoding.)

 We are definitely interested in hearing more details about your setup if 
 this does cause you any problems.

 Thanks for the feedback!
 Chris

 On Wednesday, June 13, 2012 6:06:38 AM UTC-7, jcbollinger wrote:



 On Wednesday, June 13, 2012 5:51:22 AM UTC-5, A_SAAS wrote:

 Me again regarding puppetdb, I have the following warning message:
 Jun 13 12:49:15 puppetmaster puppet-master[28444]: Ignoring invalid 
 UTF-8 byte sequences in data to be sent to PuppetDB

 Do I have to worry?


 I don't know any relevant specifics about PuppetDB, but on general 
 principles I would say that to the extent you rely on the data curated by 
 PuppetDB to be correct, yes, you should worry.  The message suggests data 
 stream corruption between PuppetDB and whatever other part of the master is 
 talking to it at that point.  Probably they disagree about what character 
 encoding to use, but whatever the cause of the problem, the message 
 suggests that PuppetDB interpreted the data in question differently than 
 its source intended.  There is a bug of some kind in there, so I would file 
 a ticket.


 John

 -- 
 You received this message because you are subscribed to the Google Groups 
 Puppet Users group.
 To view this discussion on the web visit 
 https://groups.google.com/d/msg/puppet-users/-/PZtYDMbV1XQJ.

 To post to this group, send email to puppet-users@googlegroups.com.
 To unsubscribe from this group, send email to 
 puppet-users+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/puppet-users?hl=en.




-- 
You received this message because you are subscribed to the Google Groups 
Puppet Users group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/puppet-users/-/5ljDNLRfadEJ.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to 
puppet-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/puppet-users?hl=en.



Re: [Puppet Users] Re: puppetdb: UTF-8 byte sequence

2012-06-14 Thread Antidot SAS
Hi again,

Can I run facter and dump the result? Would that be enough. On every client
I have the warning so I would say that the scenario is pretty much
reproducible. The only own made factts that I use is a shell scripts with
the facts function from:
https://github.com/ripienaar/facter-facts/tree/master/facts-dot-d

One example of the output would be:
--
#!/bin/bash

echo network_site=dmz

--
Could that be the problem?


Regards,
JM


On Thu, Jun 14, 2012 at 2:41 PM, Chris Price ch...@puppetlabs.com wrote:

 No action necessary; we should be able to create repro scenarios that will
 help us provide more info in the warning message (and resolve the ticket
 that I mentioned).  If you happen to know (or are able to identify) which
 resource in your system is triggering the warning (because of a String that
 contains a non-UTF-8 byte sequence), it would be interesting to see what
 your resource looked like.  Otherwise, since the odds are high that the
 warning should be harmless, just let us know if you notice any other
 unusual behavior or problems that you suspect might be related to this.

 Thanks again for the feedback!


-- 
You received this message because you are subscribed to the Google Groups 
Puppet Users group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to 
puppet-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/puppet-users?hl=en.



[Puppet Users] Re: puppetdb: UTF-8 byte sequence

2012-06-14 Thread jcbollinger


On Wednesday, June 13, 2012 5:11:49 PM UTC-5, Chris Price wrote:

 [...]  Due to limitations in Puppet's representation of strings (character 
 encoding is not explicitly specified), it's not possible for us to do 
 anything too fancy when we encounter a byte sequence that is not directly 
 representable in UTF-8.


Is Puppet's representation of strings distinct from Ruby's representation?  
In any case, it seems like a fundamental problem that Puppet is working 
with a bunch of strings whose encoding is uncertain.  Why can't that be 
tackled farther upstream with a mechanism for ensuring that Puppet uses a 
consistent and known encoding for strings?  Or even that it uses UTF-8 
internally, so that no transcoding is needed when sending data to puppetdb?

Furthermore, what do you mean by a byte sequence that is not directly 
representable in UTF-8?  UTF-8 encodes characters as bytes, not bytes as 
bytes.  No byte sequence is inherently non-representable.  For example, you 
can encode any byte sequence in UTF-8 by assuming that it represents a 
sequence of Latin1-encoded characters, so that the bytes are also the 
characters' Unicode scalar values.  Do you perhaps mean a byte sequence 
that isn't already valid UTF-8?

I understand that Ruby 1.8 has pretty dismal character encoding support, 
but there are ways to deal with it.  Surely you can do better than just an 
improved warning and a don't do that.

At least there is a potential for some user guidance.  For example, would 
the problem be adequately addressed if all manifests and data were encoded 
in UTF-8 and the agent were ensured to run in a UTF-8-based locale?


John

-- 
You received this message because you are subscribed to the Google Groups 
Puppet Users group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/puppet-users/-/Ww0zpDq8QdcJ.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to 
puppet-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/puppet-users?hl=en.



Re: [Puppet Users] Re: puppetdb: UTF-8 byte sequence

2012-06-14 Thread Deepak Giridharagopal
On Thu, Jun 14, 2012 at 9:22 AM, jcbollinger john.bollin...@stjude.orgwrote:



 On Wednesday, June 13, 2012 5:11:49 PM UTC-5, Chris Price wrote:

 [...]  Due to limitations in Puppet's representation of strings
 (character encoding is not explicitly specified), it's not possible for us
 to do anything too fancy when we encounter a byte sequence that is not
 directly representable in UTF-8.


 Is Puppet's representation of strings distinct from Ruby's
 representation?  In any case, it seems like a fundamental problem that
 Puppet is working with a bunch of strings whose encoding is uncertain.  Why
 can't that be tackled farther upstream with a mechanism for ensuring that
 Puppet uses a consistent and known encoding for strings?  Or even that it
 uses UTF-8 internally, so that no transcoding is needed when sending data
 to puppetdb?


Agreed, it can and should be tackled upstream! I believe there's already a
ticket for that, but I'll verify that assumption.

Your suspicions are correct: Puppet doesn't currently track the encoding of
any strings inside the language. Once a string is inside of Puppet, we no
longer know what its original encoding was. All we have are bytes. Nothing
is converted to an internal, neutral encoding, nor do we maintain
metadata about the original character set. A string in Puppet could contain
ASCII, Latin-1, UTF-8, Shift-JIS, binary data, etc...and we unfortunately
don't have any way to distinguish between them.

So until this issue is fixed in core Puppet, if we need to send those bytes
over-the-wire to a system that actually cares about the precise encoding of
what you're sending, our options are limited. What we do in the PuppetDB
terminus is apply a heuristic: we attempt to convert the string to UTF-8.
For things like ASCII (which in our research represents the lion's share of
Puppet code out there) this works fine, and preserves all data. For things
like Latin-1 etc., though, which can't be transcoded in a lossless way, we
emit the warning and try to preserve as much of the original data as we
can. Once the root cause is fixed, though, PuppetDB can take advantage of
it with very minor changes to our terminus code.


Furthermore, what do you mean by a byte sequence that is not directly
 representable in UTF-8?  UTF-8 encodes characters as bytes, not bytes as
 bytes.  No byte sequence is inherently non-representable.  For example, you
 can encode any byte sequence in UTF-8 by assuming that it represents a
 sequence of Latin1-encoded characters, so that the bytes are also the
 characters' Unicode scalar values.  Do you perhaps mean a byte sequence
 that isn't already valid UTF-8?


Yes, that phrasing is more accurate. :)



 I understand that Ruby 1.8 has pretty dismal character encoding support,
 but there are ways to deal with it.  Surely you can do better than just an
 improved warning and a don't do that.

 At least there is a potential for some user guidance.  For example, would
 the problem be adequately addressed if all manifests and data were encoded
 in UTF-8 and the agent were ensured to run in a UTF-8-based locale?



Correct on all accounts, I think. I'll add that suggestion to the ticket.
Ultimately, this needs be fixed in core...then downstream services
(PuppetDB, Foreman, Dashboard, etc) can all benefit. But certainly, I
believe that improved guidelines would help. And the ticket that Chris
filed earlier against PuppetDB specifically will at least help users in the
interim figure out precisely which resources are giving us trouble.

Cheers,
Deepak

--
Deepak Giridharagopal / Puppet Labs / @grim_radical

-- 
You received this message because you are subscribed to the Google Groups 
Puppet Users group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to 
puppet-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/puppet-users?hl=en.



[Puppet Users] Re: puppetdb: UTF-8 byte sequence

2012-06-13 Thread jcbollinger


On Wednesday, June 13, 2012 5:51:22 AM UTC-5, A_SAAS wrote:

 Me again regarding puppetdb, I have the following warning message:
 Jun 13 12:49:15 puppetmaster puppet-master[28444]: Ignoring invalid UTF-8 
 byte sequences in data to be sent to PuppetDB

 Do I have to worry?


I don't know any relevant specifics about PuppetDB, but on general 
principles I would say that to the extent you rely on the data curated by 
PuppetDB to be correct, yes, you should worry.  The message suggests data 
stream corruption between PuppetDB and whatever other part of the master is 
talking to it at that point.  Probably they disagree about what character 
encoding to use, but whatever the cause of the problem, the message 
suggests that PuppetDB interpreted the data in question differently than 
its source intended.  There is a bug of some kind in there, so I would file 
a ticket.


John

-- 
You received this message because you are subscribed to the Google Groups 
Puppet Users group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/puppet-users/-/sA34MXwimyYJ.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to 
puppet-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/puppet-users?hl=en.



[Puppet Users] Re: puppetdb: UTF-8 byte sequence

2012-06-13 Thread Chris Price
Because the serialization format (JSON) and the database both require UTF-8 
character encoding for their data, puppetdb needs to encode strings before 
it sends them from the puppet master to the puppetdb server.  Due to 
limitations in Puppet's representation of strings (character encoding is 
not explicitly specified), it's not possible for us to do anything too 
fancy when we encounter a byte sequence that is not directly representable 
in UTF-8.  Thus, when this scenario occurs, you will see the warning that 
you mentioned.  This does mean that we will be discarding the invalid bytes.

 Whether or not this is cause for concern in your particular case depends 
on which resource triggered the warning, and what your use case for that 
resource is.  If the offending resource is an exported resource that other 
nodes are relying on, then this could cause problems.  If the offending 
resource is one that you query or report on, then your data could be skewed 
slightly.  Otherwise, this is effectively harmless for you.

 One thing that we should do on our end, though, is try to provide a bit 
more context to the warning message to help you try to identify which 
resource is causing the warning.  To that end I've filed the following 
ticket:

http://projects.puppetlabs.com/issues/15016

(Also worth noting: in the existing/old storeconfigs, the behavior for 
handling this scenario is undefined... so for us, this warning is a first 
step towards providing comprehensive, robust support for handling string 
encoding.)

We are definitely interested in hearing more details about your setup if 
this does cause you any problems.

Thanks for the feedback!
Chris

On Wednesday, June 13, 2012 6:06:38 AM UTC-7, jcbollinger wrote:



 On Wednesday, June 13, 2012 5:51:22 AM UTC-5, A_SAAS wrote:

 Me again regarding puppetdb, I have the following warning message:
 Jun 13 12:49:15 puppetmaster puppet-master[28444]: Ignoring invalid 
 UTF-8 byte sequences in data to be sent to PuppetDB

 Do I have to worry?


 I don't know any relevant specifics about PuppetDB, but on general 
 principles I would say that to the extent you rely on the data curated by 
 PuppetDB to be correct, yes, you should worry.  The message suggests data 
 stream corruption between PuppetDB and whatever other part of the master is 
 talking to it at that point.  Probably they disagree about what character 
 encoding to use, but whatever the cause of the problem, the message 
 suggests that PuppetDB interpreted the data in question differently than 
 its source intended.  There is a bug of some kind in there, so I would file 
 a ticket.


 John



-- 
You received this message because you are subscribed to the Google Groups 
Puppet Users group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/puppet-users/-/PZtYDMbV1XQJ.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to 
puppet-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/puppet-users?hl=en.