Re: [elinks-users] Support for unicode characters in URL

2013-06-28 Thread Chris Jones
On Wed, Jun 26, 2013 at 08:44:11AM EDT, Lars Bjørndal wrote:
> Hi

> I wrote:

> > On intranet at work, there sometimes happens to be unicode (UTF-8)
> > characters such as a Norwegian ø in the filename. With lynx I can
> > retrieve these files, but not with elinks. Is there something I can
> > do to get elinks work also with these URLs?

> Let me describe the problem some more:
> 
> - I'm using text console only from a Fedora system, with charset
>   iso-8859-1. (Dont't think that that matter.)

You could switch your locale to UTF_8 and see if it makes any
difference.

> - I use Elinks 0.13.GIT with ECMAScript (SpiderMonkey) built in.

The elinks.or.cz page does not mention anything higher than 0.12pre6

> - The intranet solution is based on Microsoft SharePoint 2010.

Ouch.. there we have it.. the culprit, I mean :-)

> - Some files that I want to download, such as pdf or docx has UTF-8
>   characters in their file names, e.g. \303\270 as Will mentioned
>   (thank you). When selecting such a link, Elinks asks me if I want to
>   save the file, and I save it. The file content, however, is only
>   this line: 404 NOT FOUND

Sounds like for some reason, the Elinks's download routine is confused
and is sending out a mangled URL to the server.. suggests some problem
relative to converting between the one-byte 0xf8 latin1 and the two-byte
0xc3 0xb8 UTF-8 encodings of the "latin small letter o with stroke"..
maybe..?

One heavy-handed way of finding out if this assumption is correct and if
so, what actually gets sent out would be to use tcpdump or such to
capture/filter the actual dialog between Elinks and your server.. since
I'm not aware of a debug option in Elinks that would help here.

You could obviously do the same against lynx and see where they differ..

I ran another test with Elinks 0.12pre5 and tried to access a site whose
url is http://snl.no/Sønner_af_Norge.. then downloaded the web page via
the "Save As"..  and "Save formatted document" under Elinks' File menu..
no problem.

I switched my locale to en_ISO8859-1 and logged into a linux console
(assumming that's what you mean by text console).. no problem either.

I went as far as generating the nn_NO.ISO-8859-1 locale.. logged in
again and was still able to download. 

I also ran command-line Elinks with the "-dump" option, copy-pasting the
url and redirecting the output to a file.. and no problem either.

But then I run debian rather than fedora and both versions of Elinks may
have different patches applied.. 

> - The file content is preserved if I use Lynx to download and save the
>   file.

So maybe this is a bug that occurs in a very specific context that may
be specific to the fedora version.. Perhaps you could build an Elinks
executable in your $HOME from the 0.12pre6 tarball to clarify.

As a workaround, you could try replacing the "ø"'s in the url's with
html's Ø.. see if that helps. 

Oh, and please note that I am not an "encodings expert" or an Elinks
developer.. 

CJ

-- 
ALL YOUR BASE ARE BELONG TO US!
___
elinks-users mailing list
elinks-users@linuxfromscratch.org
http://linuxfromscratch.org/mailman/listinfo/elinks-users


Re: [elinks-users] Support for unicode characters in URL

2013-06-26 Thread Lars Bjørndal
Hi

I wrote:

> On intranet at work, there sometimes happens to be unicode (UTF-8)
> characters such as a Norwegian ø in the filename. With lynx I can
> retrieve these files, but not with elinks. Is there something I can do
> to get elinks work also with these URLs?

Let me describe the problem some more:

- I'm using text console only from a Fedora system, with charset
  iso-8859-1. (Dont't think that that matter.)

- I use Elinks 0.13.GIT with ECMAScript (SpiderMonkey) built in.

- The intranet solution is based on Microsoft SharePoint 2010.

- Some files that I want to download, such as pdf or docx has UTF-8
  characters in their file names, e.g. \303\270 as Will mentioned
  (thank you). When selecting such a link, Elinks asks me if I want to
  save the file, and I save it. The file content, however, is only
  this line: 404 NOT FOUND

- The file content is preserved if I use Lynx to download and save the
  file.

Hope this is clarifying

Thanks and regards,
Lars
___
elinks-users mailing list
elinks-users@linuxfromscratch.org
http://linuxfromscratch.org/mailman/listinfo/elinks-users


Re: [elinks-users] Support for unicode characters in URL

2013-06-25 Thread Will Mengarini
> On Tue, Jun 25, 2013 at 03:48:37AM EDT, Lars Bjørndal wrote:
>> On intranet at work, there sometimes happens to be unicode (UTF-8)
>> characters such as a Norwegian ø in the filename. With lynx I can
>> retrieve these files, but not with elinks. [...]

* Chris Jones  [13-06/25=Tu 18:37 -0400]:
> I did the following to create a test file:
> % echo 'øø' > /tmp/file-ø
> Pointed elinks to /tmp/file-ø
> and was able to display the file's content successfully.
> Vim tells me that the characters in the file are U+00F8. [...]

Lars's email, including the From header, was encoded in ISO-8859-1
(aka Latin-1), not UTF-8, and \370 is the Latin-1 encoding of small
letter o with stroke; the UTF-8 encoding for that would be \303\270.

Lars also says "unicode (UTF-8)", suggesting a confusion; they are not
synonymous.  Chris reports that Vim reports that the file was encoded in
Latin-1.  Perhaps Lars is using multiple encodings without realizing it.
It's particularly easy for an X-based desktop to have encodings different
from those selected by environment variables in terminal sessions,
and those encodings might differ from that of the filesystem.
___
elinks-users mailing list
elinks-users@linuxfromscratch.org
http://linuxfromscratch.org/mailman/listinfo/elinks-users


Re: [elinks-users] Support for unicode characters in URL

2013-06-25 Thread Chris Jones
On Tue, Jun 25, 2013 at 03:48:37AM EDT, Lars Bjørndal wrote:

> On intranet at work, there sometimes happens to be unicode (UTF-8)
> characters such as a Norwegian ø in the filename. With lynx I can
> retrieve these files, but not with elinks. Is there something I can do
> to get elinks work also with these URLs?

I did the following to create a test file:

% echo 'øø' > /tmp/file-ø

Pointed elinks to /tmp/file-ø

and was able to display the file's content successfully.

Vim tells me that the characters in the file are U+00F8.

% elinks --version
ELinks 0.12pre5

I mention the latter because I remember that maybe 2-3 years ago I did
have some trouble getting the version of Elinks that came with debian to
work with my en_UTF8 locale. I had to build my own from a more recent
tarball AFAICR..

With the current version that comes with debian stable, I don't remember
doing any kind of customization, so my guess is that whatever problem
I had with UTF-8 in the past was fixed and that this should work out of
the box with any recent version of Elinks..

CJ

-- 
HOW ARE YOU GENTLEMEN?
___
elinks-users mailing list
elinks-users@linuxfromscratch.org
http://linuxfromscratch.org/mailman/listinfo/elinks-users