UTF-8 fix for Shift+Enter

2006-08-03 Thread Koblinger Egmont
Hi,

With the UTF-8 patches, line editing (both the command line as well as other
widgets: create dir, rename file...) go crazy if a literal newline
(Shift+Enter, or Ctrl+Q followed by an Enter) is pressed. This special
character can no longer be removed from the buffer, the cursor walks to the
left and so on...

I created a patch called 00-82-utf8-shift-enter.patch, it is available
either from this bug entry:
  http://savannah.gnu.org/bugs/?func=detailitemitem_id=17268
or from the usual place of our mc patches:
  https://svn.uhulinux.hu/packages/dev/mc/patches/

The savannah bug entry as well as the comments on the top of the patch file
explain the bug and its solution in more details.


bye,
Egmont
___
Mc-devel mailing list
http://mail.gnome.org/mailman/listinfo/mc-devel


highlight not on first line after a cd

2005-10-10 Thread Koblinger Egmont
Hi,

Tried with 4.6.0 and 4.6.1:

Log in as root, whose home directory is /root (as usual under Linux).

Create an entry (regular file or directory, doesn't matter) under one of the
topmost directories, e.g. touch /tmp/root.

Start mc, having your home (/root) in one of the panels.

Type cd /tmp here.

/tmp will be listed, but the highlighed line will be positioned on its
root file, not at the top of the panel, as it would be expected.



-- 
Egmont
___
Mc-devel mailing list
http://mail.gnome.org/mailman/listinfo/mc-devel


Re: utf8 patch for mc, slang 2 version

2005-09-26 Thread Koblinger Egmont
Hi,

 config.h @ 3
 +#ifdef __APPLE__
 +#define unix 1
 +#endif

I guess developers prefer patches created with diff -u rather than just
some pseudo-code. Exctact the original source code to a directory called
mc-4.6.1.orig (or actually you can call it whatever you want), copy it to
mc-4.6.1 (or whatever you like), make your modifications under the latter
one and then run diff -Naurdp mc-4.6.1.orig mc-4.6.1 or something similar.
Then attach it to the mail, it's easier to handle attachments than cutting
parts of the message body. Developers will correct me if I'm wrong.

 The Israeli (ivrit) and others like Chinese and Japanese are showing
 up corretly, but they crash the windowing, because of they are
 multi-byte chars.
 
 Question: how the length of these multi-byte chars can be decided?
 Does anyone have any idea?

If I understand you, here by multi-byte and length you actually mean how
many character cells they occupy on the screen. It's usually called the
width of a character. See the manual of wcwidth() and wcswidth().



-- 
Egmont
___
Mc-devel mailing list
http://mail.gnome.org/mailman/listinfo/mc-devel


Re: utf8 patch for mc, slang 2 version

2005-09-26 Thread Koblinger Egmont
Hi,

 I have patches for the NFC / NFD issue and two other patches for the
 Darwin/Mac Platform for the current UTF-8 version, with all patches
 applied (I don't know where to post it, so I post here, sorry):
 
 config.h @ 3
 +#ifdef __APPLE__
 +#define unix 1
 +#endif

I guess developers prefer patches created with diff -u rather than just
some pseudo-code. Exctact the original source code to a directory called
mc-4.6.1.orig (or actually you can call it whatever you want), copy it to
mc-4.6.1 (or whatever you like), make your modifications under the latter
one and then run diff -Naurdp mc-4.6.1.orig mc-4.6.1 or something similar.
Then attach it to the mail, it's easier to handle attachments than cutting
parts of the message body. Developers will correct me if I'm wrong.

 The Israeli (ivrit) and others like Chinese and Japanese are showing
 up corretly, but they crash the windowing, because of they are
 multi-byte chars.
 
 Question: how the length of these multi-byte chars can be decided?
 Does anyone have any idea?

If I understand you, here by multi-byte and length you actually mean how
many character cells they occupy on the screen. It's usually called the
width of a character. See the manual of wcwidth() and wcswidth().



-- 
Egmont
___
Mc-devel mailing list
http://mail.gnome.org/mailman/listinfo/mc-devel


Re: utf8 patch for mc, slang 2 version

2005-09-21 Thread Koblinger Egmont
On Tue, Sep 20, 2005 at 10:11:28PM +0200, Bálint Kardos wrote:

 But even with all patches and stuff, I see the following Unicode glitches:
 
 - the utf-8 chars are not diplayed in the dir list (on Ubuntu, everything is 
 OK)
 for ÉÁŰŐÚÖÜÓ I see EAUOUOUO (upper, lowercase all wrong)
 
 - the files/dirs that contain the unicode chars, are still not
 properly aligned to the grids
 
 What could cause Darwin to behave such unpredictably?
 In the filesystem, there's another error:
 if you do 'ls', the alignment of the columns after the unicode chars
 are broken as well.

Unices use NFC, while MacOS uses NFD representation of accents (at least for
filenames, I don't know how about file contents). NFC means each accented
character has its own composed value, that is, one Unicode entity, which
is usually stored as two (maybe three) bytes in UTF-8. NFD composes the
characters from two Unicode entities, first the unaccented letter, followed
by an accent on its own. Its UTF-8 representation hence takes three bytes
(one for the unaccented letter and two more for the accent).

There are different levels of Unicode specified, I guess supporting NFD
requires a higher level of conformance since it's a harder job than
supporting NFC. I bet mc's UTF-8 patch only supports NFC.



-- 
Egmont
___
Mc-devel mailing list
http://mail.gnome.org/mailman/listinfo/mc-devel


Re: Multi byte charset support vs. single byte

2005-07-31 Thread Koblinger Egmont
On Sun, Jul 24, 2005 at 04:20:45PM +0200, Leonard den Ottolander wrote:

Hi,

 I was wondering if when we will implement multi byte charset support we
 should still keep a compile option to build for (low memory) single byte
 systems, or even a runtime option to choose between the two modes, as
 the use of wchar_t's is relatively memory hungry.

Please see my earlier mail at:
http://mail.gnome.org/archives/mc-devel/2005-April/msg00029.html

there I discuss why I think this whole approach of wchar_t is completely
wrong for a file manager / file viewer / file editor.

The most important part is that file does not always mean text. MC
should keep on working perfectly with out-of-locale filenames or file
contents (e.g. binary files), the editor should remain binary safe even for
out-of-locale characters etc... This cannot be implemented with the think
in characters philosophy suggested by the use of wchar_t's. MC should still
think in pure single 8-bit bytes and only group some of them together for
displaying purposes. This is the only way for the user to be able to delete
or rename a file with non-valid UTF-8 filename under an UTF-8 system etc.
Which is, I think, a pretty much required feature of any file manager...
Discussed in more details in the mail mentioned above.

On the other hand, wchar_t can perfectly be used to store the information:
what character we want to be displayed in a cell of the terminal. But then
the growth caused by the char - wchar_t change is negligible, it is not
worth it offering any compile time options to turn it off.



e.
___
Mc-devel mailing list
http://mail.gnome.org/mailman/listinfo/mc-devel


Re: glibc and getgrouplist

2005-07-22 Thread Koblinger Egmont
 (strcmp(gnu_get_libc_version(), 2.3.3)  0)

 Why not? Note the  comparison. Since 2.3.3, this should be definitely 
 fixed.

Please use strverscmp() or something similar. According to strcmp, 2.3.10 is
smaller than 2.3.3.



-- 
Egmont
___
Mc-devel mailing list
http://mail.gnome.org/mailman/listinfo/mc-devel


Re: glibc and getgrouplist

2005-07-22 Thread Koblinger Egmont
On Fri, Jul 22, 2005 at 09:30:11AM +0200, Roland Illig wrote:

 Hmmm, you're right. As we are sure to have a glibc, there's also a 
 function strverscmp, which we can use. I had known this issue, but as 
 glibc-2.2 only got upto 2.2.6, I thought it would suffice.

Also AFAIK current 2.3.5 is the last member of the 2.3 series, next one will
be 2.4 IIRC, but you can never know...



-- 
Egmont
___
Mc-devel mailing list
http://mail.gnome.org/mailman/listinfo/mc-devel


Re: cons.saver not suid root

2005-06-08 Thread Koblinger Egmont
On Wed, Jun 08, 2005 at 04:49:11PM +0200, Oswald Buddenhagen wrote:

 we have no portable (even across
 linuxes) way to create a vcsa user, so there is no other option than
 root.

How about not creating a user or group, but observing the installed system?

IMHO if all the vcsa devices are owned by the same user or same group whose
uid/gid is below 100, then we could assume that that one is a vcsa-like user
or group, and install mc setgid or setuid to that particular gid or uid.

If this is not the case, that is, the user or group varies or is greater
than 100 (which might mean that a real user is logged in on all the ttys),
then we could still fallback to setuid root.


bye,

Egmont
___
Mc-devel mailing list
http://mail.gnome.org/mailman/listinfo/mc-devel


Re: cons.saver not suid root

2005-06-08 Thread Koblinger Egmont
On Wed, Jun 08, 2005 at 06:06:03PM +0200, Tomasz Koczko wrote:

 BTW: in case Linux now I don' see cons.saver usage.
 
 $ strace -o out mc; grep cons.saver out
 
 with few times ctrl-o during strace show nothing (?).
 
 Q: is in case Linux cons.saver is still neccessary ?

Yes. In case mc's running on text console you need raw read-write access to
vcsa* in order to get ctrl-o working. There are two approches for this
situation, one is the setuid/setgid cons.saver wrapper, the other is
granting access for the user to the vcsa devices, e.g. make vcsa owned by
the user. The problem with the latter one is that there's no way to revoke
this from the user under Linux, once you open it you can hold it open as
long as the system is up and hence you can watch other people's data
there if someone else logs in there...



-- 
Egmont
___
Mc-devel mailing list
http://mail.gnome.org/mailman/listinfo/mc-devel


byte vs. character approach [was: Terminology concerning strings]

2005-04-06 Thread Koblinger Egmont
Hi all,

 According to
 http://www.gnu.org/software/libc/manual/html_node/Extended-Char-Intro.html
 wchar_t on GNU systems is 4 bytes by default. Internal representation of
 multibyte strings always uses fixed widths or something like x[3] wouldn't
 work (without scanning the string). So in case x in the above example is a
 wchar_t you overflow the buffer nicely ;) .

As I see, this is now a completely different approach to the whole situation
than what the current UTF-8 hack patchset uses.

The current UTF-8 patchset still _thinks_ in _bytes_, but tries to correctly
display them using UTF-8 or whatever the current locale is.

Using wchar_t all over the source gives me a feeling that this approach
wants mc to _think_ in _characters_.

I'm not sure at all that this is the right way to go for a file manager and
text editor.

Unix philosophy says filenames are sequences of bytes (as opposed to Windows
which says filenames are sequences of characters). Whenever you use a
multibyte locale, you might face filenames that are not valid according to
this locale. But these are still valid filenames on the system, just cannot
be displayed with your current locale, but maybe they're okay with another
locale. For a file manager I expect that it can handle these kind of files
without a problem. Hence the filenames should be handled as byte sequences
and then mc should try to do the best to display this filename as good as
possible, but even if it cannot display it correctly and needs to use some
question marks, it should perfectly be able to remove, rename, edit this
file, invoke an external command on it etc. Typing a command and using
Esc+Enter to put this filename into the command line should also work. So
its name should be converted from the original byte stream to anything else
sequence only for displaying purposes, but stored as the original byte
stream inside mc's memory segment.

Similar things happen e.g. with file editing. Suppose I receive a large
English text file and I find a typo and want to fix that. I do it in mcedit
and then save the file. I didn't even realize that this file also contained
some French words encoded in Latin-1, while my whole system is set to UTF-8.
mcedit must save the file leaving the original Latin-1 accents the same, no
matter if it's not a valid UTF-8. It's definitely a bug if these characters
disappeared from the file or if in any other way mc couldn't handle them.

Actually, will mcedit be able to edit UTF-8 encoded files inside a Latin-1
terminal? Or edit Latin-1 files inside an UTF-8 terminal? Will mc be able to
assume UTF-8 filenames while the terminal is Latin-1? ...


I recommend everyone to take a look at the 'joe' text editor, version 3.1 or
3.2 to see how it handles charsets. I don't mean to look at the
implementation, only the user-visible behavior of the software. IMHO this is
the way things have to work.

'joe' thinks the file being edited is always a byte stream. It knows the
behavior of the terminal from the locale settings, this is not overrideable
in joe, which is a perfect decision (as opposed to vim) since this is
exactly what the locale environment variables are for. The default encoding
assumed for a file is the current locale, however, you can easily change it
any time pressing ^T E. Changing this assumed character set does not change
anything in the file, it just changes the way the file is displayed on the
screen, what bytes a keypress will insert, how many bytes a backspace or
delete or overtyping will remove etc. Obviously, byte sequences that are
invalid in the selected charset are displayed by some special symbol, maybe
using special color. This whole approach guarantees that joe can edit files
of arbitrary encodings over arbitrary terminals, and in the same time, it is
still binary safe and keeps the byte sequence unchanged even if that is not
valid according to the assumed character set.

As opposed to joe, take a look at Gnome and KDE, especially KDE, their
bugzilla etc. to see how many bug reports they have about accented
filenames. The complete KDE system thinks of filenames as sequence of human
readable characters and hence it usually fails to handle out-of-locale
filenames.

Just think how many complaints and bug reports you would receive that
someone uses a modern Linux system with its default UTF-8 locale,
recursively downloads some stuff from an ftp server and then blames on
mc-4.7.0 that it cannot cope with these filenames (whoops, they're in
Latin-1), cannot access, delete, rename etc. them. These users would have to
quit to the shell to properly rename them which means that mc fails to
perform one of its most basic jobs. I hope this won't happen.


So while the approach of thinking in characters is the better for most of
the desktop applications, I'm pretty sure that for file managers as mc, text
editors as mcedit thinking in bytes is the right way to go and convert the
byte stream solely for displaying purposes.



-- 
Egmont

Re: Terminology concerning strings

2005-04-04 Thread Koblinger Egmont
Hi,

On Mon, Apr 04, 2005 at 11:35:44AM +0200, Roland Illig wrote:

 * the _size_ of a string (as well as for other objects) is the number of
   bytes that is allocated for it. For arrays, it is the number of
   entries of the array. For strings it is at least _length_ + 1.
 
 * the _length_ of a string is the number of characters in it, excluding
   the terminating '\0'.
 
 * the _width_ and _height_ of a string are the size of a box on the
   screen that would be needed to display the string.

It seems to me that this terminology is not yet multibyte-aware. Since UTF-8
becomes an everyday issue and AFAIR is planned for mainstream mc 4.7.0, IMHO
it is very important to create a clear terminology for this even if it's not
yet officially implemented now.

Hence:

Byte and character are two completely different notions. A byte is clear
what it means. A character is a human-visible entity, e.g. an accented
letter. A character may be represented by one or more bytes. It should be
clarified whether composing symbols (e.g. to put an accent on the top of the
previous letter) is a character on its own or not. Pressing a letter on the
keyboard usually inserts one character, and a backspace/delete is supposed
to remove one character, not one byte.

Is the _length_ of a string the number of bytes in it or the number of
characters in it? If it is the number of bytes, then the second definition
(in the quoted part) should be corrected. If it is the number of characters,
then the last sentence of the first definition doesn't really have a meaning
since then the size and the length have really nothing to do with each other
and hence the size = length + 1 constraint is misleading (even though it
isn't false supposing that every character takes at least one byte to
represent).

Actually, what does string mean? Is it an arbitrary sequence of bytes
terminating with the first zero byte in it that we sometimes try do display
somehow, or is it a technical representation of a human-readable text? These
two approaches might lead to a completely different programming philosophy.
I recommend the latter version since that one really thinks in the term
which is the most important for the user interface, that is, it thinks in
the meaning of the byte sequence rather than in the pure byte sequence on
its own. Another consequence is that according to the second possible
definition the byte sequence must always be valid according to one
well-defined character set (e.g. valid UTF-8) while the first version also
allows invalid byte sequences that still should be displayed somehow.

Furthermore, it should be emphasized that the width of a character is not
necessarily 1, so the number of bytes, number of characters and the width of
a string may be three completely different values.


-- 
Egmont
___
Mc-devel mailing list
http://mail.gnome.org/mailman/listinfo/mc-devel


Re: MC translation

2005-01-24 Thread Koblinger Egmont
On Mon, Jan 24, 2005 at 06:34:45PM +0100, Leonard den Ottolander wrote:

 On Sun, 2005-01-23 at 23:02, Arpad Biro wrote:
  New Hungarian MC translation attached, someone please commit it.
 
 Maybe somebody could proof read this translation? Egmont?

rpd does a good job when translating, and he has contributed translations
to many other projects, and he's even participating in the development of a
Hungarian spell checker, so most likely he has also spellchecked his .po
file. So IMHO we can trust in his work. Whenever I find a bug, I contact him
on this anyway.


-- 
Egmont
___
Mc-devel mailing list
http://mail.gnome.org/mailman/listinfo/mc-devel


^X^P sometimes doesn't put trailing slash

2003-08-23 Thread Koblinger Egmont
Hi,

^X^P doesn't put a trailing slash under certain circumstances, e.g. if
you're in /tmp/asdf/x while the other panel is /tmp/foo/x.

The bug is in main.c copy_other_pathname():

command_insert (cmdline, opanel-cwd, 0);
if (cpanel-cwd[strlen (opanel-cwd) - 1] != PATH_SEP)
^^
command_insert (cmdline, PATH_SEP_STR, 0);


that one should be opanel, too.




bye,
Egmont

___
Mc-devel mailing list
[EMAIL PROTECTED]
http://mail.gnome.org/mailman/listinfo/mc-devel


sort order

2003-07-23 Thread Koblinger Egmont
Hi,

mc sorts the files in standard ascii order (strcmp() instead of strcoll()).

IMHO it would be better if it sorted them according to the current locale.
This way those who don't like this still can set LC_COLLATE=C to revert
the current behavior.

(Tested with 4.6.0)


bye,
Egmont

___
Mc-devel mailing list
[EMAIL PROTECTED]
http://mail.gnome.org/mailman/listinfo/mc-devel


AltGr doesn't work...

2002-12-29 Thread Koblinger Egmont
Hi!

It's me again ;)

In the official Hungarian keyboard layout many common symbols are reached
using the AltGr modifier, e.g. '@' is AltGr + V.  These work on Linux
console after you issue the command loadkeys hu. They had no problem
with mc-4.5.55. However, they don't insert the symbols with mc-4.6.0-pre2.
(The bug only occurs on Linux console. Under X everything is okay.)


bye,
Egmont

___
Mc-devel mailing list
[EMAIL PROTECTED]
http://mail.gnome.org/mailman/listinfo/mc-devel



glibc error messages

2002-12-27 Thread Koblinger Egmont
Hi!

When compiled with glib2, error messages received from glibc are not
displayed with the correct character set. E.g. launch 'LC_ALL=fr_FR mc'
and try to remove /bin/bash as normal user. The accented letter is
incorrect in the error box.

If compiled with glib1, it is okay.


mc-4.6.0-pre2.



cheers,
Egmont

___
Mc-devel mailing list
[EMAIL PROTECTED]
http://mail.gnome.org/mailman/listinfo/mc-devel



Re: Issues with /tmp/mc-$USER directory

2002-12-25 Thread Koblinger Egmont
Hi!

 I don't want to delete the directory on exit because there are many
 reasons why mc can exit (including crash and killing it when rebooting the
 system).  Considering that the temporary directory may have huge files in
 it, I would prefer to have a fixed name for it, so that it could be easily
 cleaned up by scripts if mc exists without cleaning some files.

Removal could only happen when mc exits cleanly. E.g. the removal of that
directory tree could be executed when F10 is pressed (and confirmed). This
way it isn't removed when mc crashes or is killed.

Using fix file names under /tmp is impossible without risking security.
You can use fix file names under your home, or a unique non-existant
filename under /tmp. However, you can still use filenames that match the
pattern /tmp/mc-$USER-* or similar, which might help cleanup scripts a
lot. It is also possible to check whether /tmp/mc-$USER exists (and try to
create it and see if it's successful) and use it if it's yours, and only
fallback to some other ugly name if it is stolen by someone else.

 mkdtemp would be great if it was more portable.  info libc says it comes
 from OpenBSD, so I don't think you can find mkdtemp on every UNIX.

Maybe some configure-check for whether it is available... Sorry, I'm not
familiar with any Unices other than Linux. Alternatively an own
implementation would be good which creates random filenames and tries to
open them with O_EXCL. If it succeeds, that file/dir is yours. Maybe take
a look at the glibc sources how these calls are implemented there :))

 It is important to have a fallback for the case if something is wrong with
 the temporary directory.  Midnight Commander should be useful even on
 systems with all filesystems mounted read-only.

But I guess we only expect the very basic features of mc to work, not the
virtual file system, nor the -P option when the system doesn't even have a
writeable /tmp. Am I right?


 Any help with this fix will be appreciated.  All other issues have been
 addressed.  As soon as this issue is fixed, 4.6.0-pre2 will be released.

Shall I write patches and send them to you to see if you like them? Or
should I just try to figure out what should be done and leave the
implementation to you?


bye,
Egmont

___
Mc-devel mailing list
[EMAIL PROTECTED]
http://mail.gnome.org/mailman/listinfo/mc-devel