Bug#788327: Swift in Debian

2019-01-04 Thread Gregory Williams
Hello,

Jonas pointed me at this thread. I’m the author of the RDF triplestore[1] he 
referenced, and would *love* to see the swift tools packaged and available in 
Debian. I suspect the ABI stability work coming in swift 5 might bring benefits 
and more widespread use to a future swift-on-debian package, but I would 
personally find immediate use from such a package, whether that were swift 4.x 
or a future 5.x.

Thanks!
Greg Williams

[1] https://github.com/kasei/kineo 



Bug#750946: libhtml-html5-parser-perl: UTF-8 character breaks parse_file

2017-08-07 Thread Gregory Williams
On Aug 7, 2017, at 8:26 AM, gregor herrmann  wrote:
> 
> 
> This looks indeed much better than my crude workarounds, thanks for
> that!
> 
> Do you think you can take this up with upstream?

Yes, I think Kjetil and I can work on getting this merged upstream.

Thanks,
Greg



Bug#750946: libhtml-html5-parser-perl: UTF-8 character breaks parse_file

2017-08-06 Thread Gregory Williams
On Sat, 5 Aug 2017 12:16:04 -0400 gregor herrmann  wrote:
> What helps is:
> - replace in lib/HTML/HTML5/Parser.pm
>   $response->{decoded_content} with $response->{content}
>   which feels a bit dangerous
> - or in lib/HTML/HTML5/Parser/UA.pm's get:
>   move the
>   if ($uri =~ /^file:/i)
>   up so it's the first alternative and then _get_fs is used
> 
> 
> The latter change would be, as a diff:
> 
> #v+
> --- a/lib/HTML/HTML5/Parser/UA.pm
> +++ b/lib/HTML/HTML5/Parser/UA.pm
> @@ -18,14 +18,14 @@ sub get
>  {
> my ($class, $uri, $ua) = @_;
> 
> +   if ($uri =~ /^file:/i)
> +   { goto \&_get_fs }
> if (ref $ua and $ua->isa('HTTP::Tiny') and $uri =~ /^https?:/i)
> { goto \&_get_tiny }
> if (ref $ua and $ua->isa('LWP::UserAgent'))
> { goto \&_get_lwp }
> if (UNIVERSAL::can('LWP::UserAgent', 'can') and not $NO_LWP)
> { goto \&_get_lwp }
> -   if ($uri =~ /^file:/i)
> -   { goto \&_get_fs }
> 
> 
> 
> While this helps for reading local files, I guess the _get_lwp() case
> might still be buggy.


I also looked into this and found another possible fix:

diff -ru HTML-HTML5-Parser-0.301/lib/HTML/HTML5/Parser.pm 
HTML-HTML5-Parser-0.301-patched/lib/HTML/HTML5/Parser.pm
--- HTML-HTML5-Parser-0.301/lib/HTML/HTML5/Parser.pm2013-07-08 
07:12:25.0 -0700
+++ HTML-HTML5-Parser-0.301-patched/lib/HTML/HTML5/Parser.pm2017-08-06 
12:42:58.0 -0700
@@ -13,6 +13,7 @@
 use HTML::HTML5::Parser::TagSoupParser;
 use Scalar::Util qw(blessed);
 use URI::file;
+use Encode qw(encode_utf8);
 use XML::LibXML;
 
 BEGIN {
@@ -102,6 +103,11 @@
{
 # XXX AGAIN DO THIS TO STOP ENORMOUS MEMORY LEAKS
 my ($errh, $errors) = @{$self}{qw(error_handler errors)};
+
+if (utf8::is_utf8($text)) {
+   $text   = encode_utf8($text);
+}
+
$self->{parser}->parse_byte_string(
 $opts->{'encoding'}, $text, $dom,
 sub {


Part of the underlying issue here is that many variables and methods in these 
modules are named in a confusing way, expecting/requiring encoded bytes, but 
using names which imply a desire for decoded strings.

The above patch should handle the LWP case which the previously suggest patch 
avoids. It still passes the test suite (which should probably be improved to 
verify this case), and also supports the test case detailed in this bug report 
(though I should mention that I believe the test script included by Vincent 
Lefevre includes a double-encoding bug as $doc->toString() actually returns 
utf8 encoded bytes, which the :encoding(UTF-8) PerlIO layer on stdout will 
attempt to encode a second time).

thanks,
.greg