UTF-8 fix for Shift+Enter
Hi, With the UTF-8 patches, line editing (both the command line as well as other widgets: create dir, rename file...) go crazy if a literal newline (Shift+Enter, or Ctrl+Q followed by an Enter) is pressed. This special character can no longer be removed from the buffer, the cursor walks to the left and so on... I created a patch called 00-82-utf8-shift-enter.patch, it is available either from this bug entry: http://savannah.gnu.org/bugs/?func=detailitemitem_id=17268 or from the usual place of our mc patches: https://svn.uhulinux.hu/packages/dev/mc/patches/ The savannah bug entry as well as the comments on the top of the patch file explain the bug and its solution in more details. bye, Egmont ___ Mc-devel mailing list http://mail.gnome.org/mailman/listinfo/mc-devel
highlight not on first line after a cd
Hi, Tried with 4.6.0 and 4.6.1: Log in as root, whose home directory is /root (as usual under Linux). Create an entry (regular file or directory, doesn't matter) under one of the topmost directories, e.g. touch /tmp/root. Start mc, having your home (/root) in one of the panels. Type cd /tmp here. /tmp will be listed, but the highlighed line will be positioned on its root file, not at the top of the panel, as it would be expected. -- Egmont ___ Mc-devel mailing list http://mail.gnome.org/mailman/listinfo/mc-devel
Re: utf8 patch for mc, slang 2 version
Hi, config.h @ 3 +#ifdef __APPLE__ +#define unix 1 +#endif I guess developers prefer patches created with diff -u rather than just some pseudo-code. Exctact the original source code to a directory called mc-4.6.1.orig (or actually you can call it whatever you want), copy it to mc-4.6.1 (or whatever you like), make your modifications under the latter one and then run diff -Naurdp mc-4.6.1.orig mc-4.6.1 or something similar. Then attach it to the mail, it's easier to handle attachments than cutting parts of the message body. Developers will correct me if I'm wrong. The Israeli (ivrit) and others like Chinese and Japanese are showing up corretly, but they crash the windowing, because of they are multi-byte chars. Question: how the length of these multi-byte chars can be decided? Does anyone have any idea? If I understand you, here by multi-byte and length you actually mean how many character cells they occupy on the screen. It's usually called the width of a character. See the manual of wcwidth() and wcswidth(). -- Egmont ___ Mc-devel mailing list http://mail.gnome.org/mailman/listinfo/mc-devel
Re: utf8 patch for mc, slang 2 version
Hi, I have patches for the NFC / NFD issue and two other patches for the Darwin/Mac Platform for the current UTF-8 version, with all patches applied (I don't know where to post it, so I post here, sorry): config.h @ 3 +#ifdef __APPLE__ +#define unix 1 +#endif I guess developers prefer patches created with diff -u rather than just some pseudo-code. Exctact the original source code to a directory called mc-4.6.1.orig (or actually you can call it whatever you want), copy it to mc-4.6.1 (or whatever you like), make your modifications under the latter one and then run diff -Naurdp mc-4.6.1.orig mc-4.6.1 or something similar. Then attach it to the mail, it's easier to handle attachments than cutting parts of the message body. Developers will correct me if I'm wrong. The Israeli (ivrit) and others like Chinese and Japanese are showing up corretly, but they crash the windowing, because of they are multi-byte chars. Question: how the length of these multi-byte chars can be decided? Does anyone have any idea? If I understand you, here by multi-byte and length you actually mean how many character cells they occupy on the screen. It's usually called the width of a character. See the manual of wcwidth() and wcswidth(). -- Egmont ___ Mc-devel mailing list http://mail.gnome.org/mailman/listinfo/mc-devel
Re: utf8 patch for mc, slang 2 version
On Tue, Sep 20, 2005 at 10:11:28PM +0200, Bálint Kardos wrote: But even with all patches and stuff, I see the following Unicode glitches: - the utf-8 chars are not diplayed in the dir list (on Ubuntu, everything is OK) for ÉÁŰŐÚÖÜÓ I see EAUOUOUO (upper, lowercase all wrong) - the files/dirs that contain the unicode chars, are still not properly aligned to the grids What could cause Darwin to behave such unpredictably? In the filesystem, there's another error: if you do 'ls', the alignment of the columns after the unicode chars are broken as well. Unices use NFC, while MacOS uses NFD representation of accents (at least for filenames, I don't know how about file contents). NFC means each accented character has its own composed value, that is, one Unicode entity, which is usually stored as two (maybe three) bytes in UTF-8. NFD composes the characters from two Unicode entities, first the unaccented letter, followed by an accent on its own. Its UTF-8 representation hence takes three bytes (one for the unaccented letter and two more for the accent). There are different levels of Unicode specified, I guess supporting NFD requires a higher level of conformance since it's a harder job than supporting NFC. I bet mc's UTF-8 patch only supports NFC. -- Egmont ___ Mc-devel mailing list http://mail.gnome.org/mailman/listinfo/mc-devel
Re: Multi byte charset support vs. single byte
On Sun, Jul 24, 2005 at 04:20:45PM +0200, Leonard den Ottolander wrote: Hi, I was wondering if when we will implement multi byte charset support we should still keep a compile option to build for (low memory) single byte systems, or even a runtime option to choose between the two modes, as the use of wchar_t's is relatively memory hungry. Please see my earlier mail at: http://mail.gnome.org/archives/mc-devel/2005-April/msg00029.html there I discuss why I think this whole approach of wchar_t is completely wrong for a file manager / file viewer / file editor. The most important part is that file does not always mean text. MC should keep on working perfectly with out-of-locale filenames or file contents (e.g. binary files), the editor should remain binary safe even for out-of-locale characters etc... This cannot be implemented with the think in characters philosophy suggested by the use of wchar_t's. MC should still think in pure single 8-bit bytes and only group some of them together for displaying purposes. This is the only way for the user to be able to delete or rename a file with non-valid UTF-8 filename under an UTF-8 system etc. Which is, I think, a pretty much required feature of any file manager... Discussed in more details in the mail mentioned above. On the other hand, wchar_t can perfectly be used to store the information: what character we want to be displayed in a cell of the terminal. But then the growth caused by the char - wchar_t change is negligible, it is not worth it offering any compile time options to turn it off. e. ___ Mc-devel mailing list http://mail.gnome.org/mailman/listinfo/mc-devel
Re: glibc and getgrouplist
(strcmp(gnu_get_libc_version(), 2.3.3) 0) Why not? Note the comparison. Since 2.3.3, this should be definitely fixed. Please use strverscmp() or something similar. According to strcmp, 2.3.10 is smaller than 2.3.3. -- Egmont ___ Mc-devel mailing list http://mail.gnome.org/mailman/listinfo/mc-devel
Re: glibc and getgrouplist
On Fri, Jul 22, 2005 at 09:30:11AM +0200, Roland Illig wrote: Hmmm, you're right. As we are sure to have a glibc, there's also a function strverscmp, which we can use. I had known this issue, but as glibc-2.2 only got upto 2.2.6, I thought it would suffice. Also AFAIK current 2.3.5 is the last member of the 2.3 series, next one will be 2.4 IIRC, but you can never know... -- Egmont ___ Mc-devel mailing list http://mail.gnome.org/mailman/listinfo/mc-devel
Re: cons.saver not suid root
On Wed, Jun 08, 2005 at 04:49:11PM +0200, Oswald Buddenhagen wrote: we have no portable (even across linuxes) way to create a vcsa user, so there is no other option than root. How about not creating a user or group, but observing the installed system? IMHO if all the vcsa devices are owned by the same user or same group whose uid/gid is below 100, then we could assume that that one is a vcsa-like user or group, and install mc setgid or setuid to that particular gid or uid. If this is not the case, that is, the user or group varies or is greater than 100 (which might mean that a real user is logged in on all the ttys), then we could still fallback to setuid root. bye, Egmont ___ Mc-devel mailing list http://mail.gnome.org/mailman/listinfo/mc-devel
Re: cons.saver not suid root
On Wed, Jun 08, 2005 at 06:06:03PM +0200, Tomasz Koczko wrote: BTW: in case Linux now I don' see cons.saver usage. $ strace -o out mc; grep cons.saver out with few times ctrl-o during strace show nothing (?). Q: is in case Linux cons.saver is still neccessary ? Yes. In case mc's running on text console you need raw read-write access to vcsa* in order to get ctrl-o working. There are two approches for this situation, one is the setuid/setgid cons.saver wrapper, the other is granting access for the user to the vcsa devices, e.g. make vcsa owned by the user. The problem with the latter one is that there's no way to revoke this from the user under Linux, once you open it you can hold it open as long as the system is up and hence you can watch other people's data there if someone else logs in there... -- Egmont ___ Mc-devel mailing list http://mail.gnome.org/mailman/listinfo/mc-devel
byte vs. character approach [was: Terminology concerning strings]
Hi all, According to http://www.gnu.org/software/libc/manual/html_node/Extended-Char-Intro.html wchar_t on GNU systems is 4 bytes by default. Internal representation of multibyte strings always uses fixed widths or something like x[3] wouldn't work (without scanning the string). So in case x in the above example is a wchar_t you overflow the buffer nicely ;) . As I see, this is now a completely different approach to the whole situation than what the current UTF-8 hack patchset uses. The current UTF-8 patchset still _thinks_ in _bytes_, but tries to correctly display them using UTF-8 or whatever the current locale is. Using wchar_t all over the source gives me a feeling that this approach wants mc to _think_ in _characters_. I'm not sure at all that this is the right way to go for a file manager and text editor. Unix philosophy says filenames are sequences of bytes (as opposed to Windows which says filenames are sequences of characters). Whenever you use a multibyte locale, you might face filenames that are not valid according to this locale. But these are still valid filenames on the system, just cannot be displayed with your current locale, but maybe they're okay with another locale. For a file manager I expect that it can handle these kind of files without a problem. Hence the filenames should be handled as byte sequences and then mc should try to do the best to display this filename as good as possible, but even if it cannot display it correctly and needs to use some question marks, it should perfectly be able to remove, rename, edit this file, invoke an external command on it etc. Typing a command and using Esc+Enter to put this filename into the command line should also work. So its name should be converted from the original byte stream to anything else sequence only for displaying purposes, but stored as the original byte stream inside mc's memory segment. Similar things happen e.g. with file editing. Suppose I receive a large English text file and I find a typo and want to fix that. I do it in mcedit and then save the file. I didn't even realize that this file also contained some French words encoded in Latin-1, while my whole system is set to UTF-8. mcedit must save the file leaving the original Latin-1 accents the same, no matter if it's not a valid UTF-8. It's definitely a bug if these characters disappeared from the file or if in any other way mc couldn't handle them. Actually, will mcedit be able to edit UTF-8 encoded files inside a Latin-1 terminal? Or edit Latin-1 files inside an UTF-8 terminal? Will mc be able to assume UTF-8 filenames while the terminal is Latin-1? ... I recommend everyone to take a look at the 'joe' text editor, version 3.1 or 3.2 to see how it handles charsets. I don't mean to look at the implementation, only the user-visible behavior of the software. IMHO this is the way things have to work. 'joe' thinks the file being edited is always a byte stream. It knows the behavior of the terminal from the locale settings, this is not overrideable in joe, which is a perfect decision (as opposed to vim) since this is exactly what the locale environment variables are for. The default encoding assumed for a file is the current locale, however, you can easily change it any time pressing ^T E. Changing this assumed character set does not change anything in the file, it just changes the way the file is displayed on the screen, what bytes a keypress will insert, how many bytes a backspace or delete or overtyping will remove etc. Obviously, byte sequences that are invalid in the selected charset are displayed by some special symbol, maybe using special color. This whole approach guarantees that joe can edit files of arbitrary encodings over arbitrary terminals, and in the same time, it is still binary safe and keeps the byte sequence unchanged even if that is not valid according to the assumed character set. As opposed to joe, take a look at Gnome and KDE, especially KDE, their bugzilla etc. to see how many bug reports they have about accented filenames. The complete KDE system thinks of filenames as sequence of human readable characters and hence it usually fails to handle out-of-locale filenames. Just think how many complaints and bug reports you would receive that someone uses a modern Linux system with its default UTF-8 locale, recursively downloads some stuff from an ftp server and then blames on mc-4.7.0 that it cannot cope with these filenames (whoops, they're in Latin-1), cannot access, delete, rename etc. them. These users would have to quit to the shell to properly rename them which means that mc fails to perform one of its most basic jobs. I hope this won't happen. So while the approach of thinking in characters is the better for most of the desktop applications, I'm pretty sure that for file managers as mc, text editors as mcedit thinking in bytes is the right way to go and convert the byte stream solely for displaying purposes. -- Egmont
Re: Terminology concerning strings
Hi, On Mon, Apr 04, 2005 at 11:35:44AM +0200, Roland Illig wrote: * the _size_ of a string (as well as for other objects) is the number of bytes that is allocated for it. For arrays, it is the number of entries of the array. For strings it is at least _length_ + 1. * the _length_ of a string is the number of characters in it, excluding the terminating '\0'. * the _width_ and _height_ of a string are the size of a box on the screen that would be needed to display the string. It seems to me that this terminology is not yet multibyte-aware. Since UTF-8 becomes an everyday issue and AFAIR is planned for mainstream mc 4.7.0, IMHO it is very important to create a clear terminology for this even if it's not yet officially implemented now. Hence: Byte and character are two completely different notions. A byte is clear what it means. A character is a human-visible entity, e.g. an accented letter. A character may be represented by one or more bytes. It should be clarified whether composing symbols (e.g. to put an accent on the top of the previous letter) is a character on its own or not. Pressing a letter on the keyboard usually inserts one character, and a backspace/delete is supposed to remove one character, not one byte. Is the _length_ of a string the number of bytes in it or the number of characters in it? If it is the number of bytes, then the second definition (in the quoted part) should be corrected. If it is the number of characters, then the last sentence of the first definition doesn't really have a meaning since then the size and the length have really nothing to do with each other and hence the size = length + 1 constraint is misleading (even though it isn't false supposing that every character takes at least one byte to represent). Actually, what does string mean? Is it an arbitrary sequence of bytes terminating with the first zero byte in it that we sometimes try do display somehow, or is it a technical representation of a human-readable text? These two approaches might lead to a completely different programming philosophy. I recommend the latter version since that one really thinks in the term which is the most important for the user interface, that is, it thinks in the meaning of the byte sequence rather than in the pure byte sequence on its own. Another consequence is that according to the second possible definition the byte sequence must always be valid according to one well-defined character set (e.g. valid UTF-8) while the first version also allows invalid byte sequences that still should be displayed somehow. Furthermore, it should be emphasized that the width of a character is not necessarily 1, so the number of bytes, number of characters and the width of a string may be three completely different values. -- Egmont ___ Mc-devel mailing list http://mail.gnome.org/mailman/listinfo/mc-devel
Re: MC translation
On Mon, Jan 24, 2005 at 06:34:45PM +0100, Leonard den Ottolander wrote: On Sun, 2005-01-23 at 23:02, Arpad Biro wrote: New Hungarian MC translation attached, someone please commit it. Maybe somebody could proof read this translation? Egmont? rpd does a good job when translating, and he has contributed translations to many other projects, and he's even participating in the development of a Hungarian spell checker, so most likely he has also spellchecked his .po file. So IMHO we can trust in his work. Whenever I find a bug, I contact him on this anyway. -- Egmont ___ Mc-devel mailing list http://mail.gnome.org/mailman/listinfo/mc-devel
^X^P sometimes doesn't put trailing slash
Hi, ^X^P doesn't put a trailing slash under certain circumstances, e.g. if you're in /tmp/asdf/x while the other panel is /tmp/foo/x. The bug is in main.c copy_other_pathname(): command_insert (cmdline, opanel-cwd, 0); if (cpanel-cwd[strlen (opanel-cwd) - 1] != PATH_SEP) ^^ command_insert (cmdline, PATH_SEP_STR, 0); that one should be opanel, too. bye, Egmont ___ Mc-devel mailing list [EMAIL PROTECTED] http://mail.gnome.org/mailman/listinfo/mc-devel
sort order
Hi, mc sorts the files in standard ascii order (strcmp() instead of strcoll()). IMHO it would be better if it sorted them according to the current locale. This way those who don't like this still can set LC_COLLATE=C to revert the current behavior. (Tested with 4.6.0) bye, Egmont ___ Mc-devel mailing list [EMAIL PROTECTED] http://mail.gnome.org/mailman/listinfo/mc-devel
AltGr doesn't work...
Hi! It's me again ;) In the official Hungarian keyboard layout many common symbols are reached using the AltGr modifier, e.g. '@' is AltGr + V. These work on Linux console after you issue the command loadkeys hu. They had no problem with mc-4.5.55. However, they don't insert the symbols with mc-4.6.0-pre2. (The bug only occurs on Linux console. Under X everything is okay.) bye, Egmont ___ Mc-devel mailing list [EMAIL PROTECTED] http://mail.gnome.org/mailman/listinfo/mc-devel
glibc error messages
Hi! When compiled with glib2, error messages received from glibc are not displayed with the correct character set. E.g. launch 'LC_ALL=fr_FR mc' and try to remove /bin/bash as normal user. The accented letter is incorrect in the error box. If compiled with glib1, it is okay. mc-4.6.0-pre2. cheers, Egmont ___ Mc-devel mailing list [EMAIL PROTECTED] http://mail.gnome.org/mailman/listinfo/mc-devel
Re: Issues with /tmp/mc-$USER directory
Hi! I don't want to delete the directory on exit because there are many reasons why mc can exit (including crash and killing it when rebooting the system). Considering that the temporary directory may have huge files in it, I would prefer to have a fixed name for it, so that it could be easily cleaned up by scripts if mc exists without cleaning some files. Removal could only happen when mc exits cleanly. E.g. the removal of that directory tree could be executed when F10 is pressed (and confirmed). This way it isn't removed when mc crashes or is killed. Using fix file names under /tmp is impossible without risking security. You can use fix file names under your home, or a unique non-existant filename under /tmp. However, you can still use filenames that match the pattern /tmp/mc-$USER-* or similar, which might help cleanup scripts a lot. It is also possible to check whether /tmp/mc-$USER exists (and try to create it and see if it's successful) and use it if it's yours, and only fallback to some other ugly name if it is stolen by someone else. mkdtemp would be great if it was more portable. info libc says it comes from OpenBSD, so I don't think you can find mkdtemp on every UNIX. Maybe some configure-check for whether it is available... Sorry, I'm not familiar with any Unices other than Linux. Alternatively an own implementation would be good which creates random filenames and tries to open them with O_EXCL. If it succeeds, that file/dir is yours. Maybe take a look at the glibc sources how these calls are implemented there :)) It is important to have a fallback for the case if something is wrong with the temporary directory. Midnight Commander should be useful even on systems with all filesystems mounted read-only. But I guess we only expect the very basic features of mc to work, not the virtual file system, nor the -P option when the system doesn't even have a writeable /tmp. Am I right? Any help with this fix will be appreciated. All other issues have been addressed. As soon as this issue is fixed, 4.6.0-pre2 will be released. Shall I write patches and send them to you to see if you like them? Or should I just try to figure out what should be done and leave the implementation to you? bye, Egmont ___ Mc-devel mailing list [EMAIL PROTECTED] http://mail.gnome.org/mailman/listinfo/mc-devel