[xz-devel] List is now public
I added this to mail-archive.com. I will update the home page in a few hours once I see that this message is visible on mail-archive.com. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
[xz-devel] XZ Utils 5.0.1
XZ Utils 5.0.1 is available at <http://tukaani.org/xz/>. It fixes a few minor bugs. Here is an extract from the NEWS file: * xz --force now (de)compresses files that have setuid, setgid, or sticky bit set and files that have multiple hard links. The man page had it documented this way already, but the code had a bug. * gzip and bzip2 support in xzdiff was fixed. * Portability fixes * Minor fix to Czech translation (As written on <http://tukaani.org/xz/lists.html>, I will send release announcements to xz-devel also in the future, so there's no need to subscribe to both xz-devel and xz-announce.) -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Detecting .lzma-compressed files
On 2011-03-17 Mark wrote: > What is the best way to detect data which was compressed using > lzma_alone (i.e. .lzma files)? There is no easy answer. It depends on if you want to detect the typical .lzma files (over 99.9 % of .lzma files) or also the uncommon ones. The typical .lzma files have been created with LZMA Utils 4.32.x (any compression settings), XZ Utils (with the most common settings) or LZMA SDK (default settings). LZMA SDK and LZMA Utils can decode the uncommon .lzma files too, but some of the uncommon files cannot be decompressed with XZ Utils. > I'm developing a patch for the star archiver to support xz-compressed > files. While detection of an XZ-format file is easy enough, .lzma > doesn't seem to be. (This is so star invokes the correct program to > decompress the data.) With GNU tar I used a patch that checked that the first three bytes are 0x5D 0x00 0x00 ("]\0\0"). It catched all typical .lzma files and didn't conflict with other compressors. It did have a false positive if the first file inside the .tar was named "]", so I don't know if this solution is acceptable to you. A more complex hack with fewer false positives but also some false negatives is used in XZ Utils: - The first byte must be in the range [0x00, 0xE0]. In most files it is 0x5D (']'). - The next four bytes are read as unsigned 32-bit little endian integer. This indicates the dictionary size. In typical files it is 2^n or 2^n + 2^(n-1). XZ Utils accept only these sizes and UINT32_MAX. The .lzma format allows other sizes too though and LZMA Utils 4.32.x and LZMA SDK accept any dictionary size. - The next eight bytes are read as unsigned 64-bit little endian integer. This indicates the uncompressed size of the file. It should be either UINT64_MAX (meaning that the size is unknown) or some size as bytes. XZ Utils rejects files having a known size greater than 2^38 bytes (256 GiB). Parts of is_format_lzma() in src/xz/coder.c in the XZ Utils source tree might be useful to you. Reading doc/lzma-file-format.txt might help a little too. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Detecting .lzma-compressed files
On 2011-03-17 ma...@clara.co.uk wrote: > Lasse Colin wrote: > > > A more complex hack with fewer false positives but also some false > > negatives is used in XZ Utils: > > ... > > - The next eight bytes are read as unsigned 64-bit little > > endian integer. This indicates the uncompressed size of > > the file. It should be either UINT64_MAX (meaning that the > > size is unknown) or some size as bytes. XZ Utils rejects > > files having a known size greater than 2^38 bytes (256 GiB). > > Can xz be forced to work with a file whose uncompressed size field is > larger than 256GiB? That does seem a bit small these days; it's > conceivable that some users might be compressing files that large. Currently no. I will reconsider if it is a real-world problem for someone. Note that the limit is only for .lzma files with a known uncompressed size. Files created in a pipe have unknown uncompressed size, and .lzma files created with XZ Utils always have unknown uncompressed size (simpler code). Most new files will use .xz instead of .lzma so the limitations of the .lzma support in XZ Utils don't matter much, I hope. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
[xz-devel] XZ in Java
There is now something for decompressing .xz files in Java: http://tukaani.org/xz/java.html It currently lacks BCJ filters but otherwise it supports everything from the .xz specification. It hasn't been tested much, but at least it behaves with the test files from XZ Utils. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
[xz-devel] XZ Utils 5.0.2
XZ Utils 5.0.2 is available at <http://tukaani.org/xz/>. It fixes a few minor bugs. Here is an extract from the NEWS file: * LZMA2 decompressor now correctly accepts LZMA2 streams with no uncompressed data. Previously it considered them corrupt. The bug can affect applications that use raw LZMA2 streams. It is very unlikely to affect .xz files because no compressor creates .xz files with empty LZMA2 streams. (Empty .xz files are a different thing than empty LZMA2 streams.) * "xz --suffix=.foo filename.foo" now refuses to compress the file due to it already having the suffix .foo. It was already documented on the man page, but the code lacked the test. * "xzgrep -l foo bar.xz" works now. * Polish translation was added. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
[xz-devel] XZ Utils 5.1.1alpha
XZ Utils 5.1.1alpha is available at <http://tukaani.org/xz/>. Here is an extract from the NEWS file: * All fixes from 5.0.2 * liblzma fixes that will also be included in 5.0.3: - A memory leak was fixed. - lzma_stream_buffer_encode() no longer creates an empty .xz Block if encoding an empty buffer. Such an empty Block with LZMA2 data would trigger a bug in 5.0.1 and older (see the first bullet point in 5.0.2 notes). When releasing 5.0.2, I thought that no encoder creates this kind of files but I was wrong. - Validate function arguments better in a few functions. Most importantly, specifying an unsupported integrity check to lzma_stream_buffer_encode() no longer creates a corrupt .xz file. Probably no application tries to do that, so this shouldn't be a big problem in practice. - Document that lzma_block_buffer_encode(), lzma_easy_buffer_encode(), lzma_stream_encoder(), and lzma_stream_buffer_encode() may return LZMA_UNSUPPORTED_CHECK. - The return values of the _memusage() functions are now documented better. * Support for multithreaded compression was added using the simplest method, which splits the input data into blocks and compresses them independently. Other methods will be added in the future. The current method has room for improvement, e.g. it is possible to reduce the memory usage. * Added the options --single-stream and --block-size=SIZE to xz. * xzdiff and xzgrep now support .lzo files if lzop is installed. The .tzo suffix is also recognized as a shorthand for .tar.lzo. * Support for short 8.3 filenames under DOS was added to xz. It is experimental and may change before it gets into a stable release. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
[xz-devel] strerror-like functionality in liblzma
Implementing a function to convert lzma_ret to string is tricky, because the same return values have slightly different meanings when returned by different functions. This is a design mistake in the API, but it cannot be fixed without breaking the API, which I don't want to do. One possibility would be to provide a few strerror-like functions that could be used with return values of different functions. This doesn't sound nice though. Letting liblzma construct the error message when the error occurs allows more detailed error messages than what one could get by converting lzma_ret to a string. E.g. when LZMA_OPTIONS_ERROR is returned, the error message could include what compression option was the problem. Functions that work on lzma_stream could store the message in the lzma_stream structure. This is what zlib does. liblzma has many functions that don't use lzma_stream, so this isn't a solution for those functions. A thread-local variable to store an error message would work with all functions and also in threaded programs. Would this be OK? Does someone have alternative ideas? -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] strerror-like functionality in liblzma
On 2011-05-17 Thorsten Glaser wrote: > Lasse Collin dixit: > >A thread-local variable to store an error message would work with > >all > > This wouldn’t be portable at all. To be more exact, I meant a function that would return a pointer to thread-specific char array. POSIX has pthread_key_create() and pthread_once() that can be used to implement this. I think those are fairly portable. I'm aware that Windows might give some gray hair, but I won't worry about that too much. Maybe there is a problem if liblzma is loaded with dlopen() and later unloaded with dlclose(). It could leak the memory allocated for the thread-specific data and leak the resources associated with a pthread_key_t. glibc supports destructor functions that are called before dlcose() returns. I think it would prevent this issue, but such destructors aren't portable. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] strerror-like functionality in liblzma
On 2011-05-17 Thorsten Glaser wrote: > Lasse Collin dixit: > >To be more exact, I meant a function that would return a pointer to > >thread-specific char array. POSIX has pthread_key_create() and > > Oh sure. Let’s just force all xz users to link in libpthread… It already does that unless you pass --disable-threads to configure when compiling XZ Utils. 5.1.1alpha does threaded compression, so most people don't want to disable threading support. It's the dlopen/dlclose situation that worries me. GNU and Solaris call functions registered with atexit() when a library is unloaded, but that trick isn't supported e.g. on BSDs. GCC's __attribute__((destructor)) seems to work on a few other systems too, but it requires that the compiler supports GNU C extensions, which isn't an acceptable requirement in this case. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] strerror-like functionality in liblzma
On 2011-05-19 Jonathan Nieder wrote: > Lasse Collin wrote: > > It already does that unless you pass --disable-threads to configure > > when compiling XZ Utils. > > It seems like a sane worry. If someone uses --disable-threads, does > that mean that person won't need a thread-safe way to get error > messages? --disable-threads already means that liblzma might become thread unsafe. It is documented in --help and in INSTALL. Currently thread unsafe situation happens only if --enable-small is also used. > (One reasonable answer might be "yes, such a person can > read the documentation and figure out what happened from the error > numbers, and at least they won't be worse off than they started." If > it proves to be annoying, it's possible to introduce _r variants that > return the error message through a parameter later.) It's possible that I will do something like this as the only method. With functions that use lzma_stream, the message can be stored there. For some other functions, a method to pass a pointer to a buffer to hold the message may be needed. > > It's the dlopen/dlclose situation that worries me. GNU and Solaris > > call functions registered with atexit() when a library is > > unloaded, but that trick isn't supported e.g. on BSDs. GCC's > > __attribute__((destructor)) seems to work on a few other systems > > too, but it requires that the compiler supports GNU C extensions, > > which isn't an acceptable requirement in this case. > > C1X has[1] a _Thread_local keyword that might work well. So in a > decade or so a person will be able to write > > _Thread_local const char *lzma_error_message; > > and rely on compilers setting up the appropriate constructors and > destructors behind the scenes. Today, GCC has[2] __thread and > Microsoft C has[3] __declspec(thread). > > I haven't played around with it much, but maybe that can help. It can help when the new standard has been out for a few years. Before that, I cannot rely on GNU C extensions. Currently the code can be compiled with several compilers, and that's how I want it to be in the future too. The current portable method for thread-specific data is pthread_key_create(). My current understanding of the interactions of pthread_key_create() and dlclose(): - With C++ I could use a global object whose destructor is run when the library is unloaded with dlclose(). The destructor would free the thread-specific data and call pthread_key_delete(). - Non-portable operating system, C compiler, or linker extensions would make it possible to have a destructor function that is called when the library is unloaded. - I could require that developers call some initiazation and destruction functions when they start and stop using the library. This would be annoying and it's easy to forget to call the destructor. - If the library is never unloaded with dlclose(), then there is no problem with pthread_key_create(). This isn't an acceptable limitation for liblzma. In short, if one wants to only use C code and functionality provided by POSIX.1-2008, it's not possible to use thread-specific data in a shared library without restricting or complicating the use of that library. I'm happy if someone will show that I'm wrong. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] [RFC/PATCH] using versioned symbols in liblzma
On 2011-05-19 Jonathan Nieder wrote: > Well, that would be unpleasant. Consider a program foo that links to > both libkdecore5 and libdw1. The installed version of libdw1 has > been rebuilt against liblzma6, while the local copy of libkdecore5 > is still linked against liblzma5. What happens? If those two libraries exchange pointers to liblzma structures, things go wrong even with symbol versions, right? Most libraries don't do that, but I suppose you still need to carefully track which libraries do. > -liblzma_la_LDFLAGS = -no-undefined -version-info 5:99:0 > +liblzma_la_LDFLAGS = -no-undefined -version-info 5:99:0 \ > + -Wl,--version-script=$(top_srcdir)/src/liblzma/Versions This option is specific to GNU ld, so it must not be used unconditionally. zlib enables symbol versioning if uname -s matches any of these: Linux* | linux* | GNU | GNU/* | *BSD | DragonFly zlib doesn't use Autoconf so those need to be converted to the format used by Autoconf, although it's not clear to me yet if symbol versioning is wanted on all these systems in upstream liblzma. E.g. FreeBSDs ships xz in the base system with its symbol versioning file. On the other hand, maybe FreeBSD's map file could be used elsewhere too. http://svnweb.freebsd.org/base/head/lib/liblzma/ -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Re: [RFC/PATCH] using versioned symbols in liblzma
On 2011-05-19 Jonathan Nieder wrote: > Sadly the symbol versioning mechanism doesn't seem to be documented > nicely in the style of a manpage anywhere. Thanks for the links. I'm fine with Texinfo myself. :-) > Short-term question: would you mind if Debian carries this patch for > the time being? In particular, do the version node names [...] > seem reasonable to standardize on (in environments that will be using > symbol versions)? I'm not sure about the names yet. See the FreeBSD example in another email. > The main unfortunate effect is warnings when running binaries linked > against the versioned symbols in an environment not providing them. Those are annoying, but I guess it's not a big deal in this case. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] [RFC/PATCH] using versioned symbols in liblzma
On 2011-05-19 Jonathan Nieder wrote: > >> -liblzma_la_LDFLAGS = -no-undefined -version-info 5:99:0 > >> +liblzma_la_LDFLAGS = -no-undefined -version-info 5:99:0 \ > >> + -Wl,--version-script=$(top_srcdir)/src/liblzma/Versions > > > > This option is specific to GNU ld, so it must not be used > > unconditionally. zlib enables symbol versioning if uname -s matches > > any of these: > > Linux* | linux* | GNU | GNU/* | *BSD | DragonFly > > > > zlib doesn't use Autoconf so those need to be converted to the > > format used by Autoconf, although it's not clear to me yet if > > symbol versioning is wanted on all these systems in upstream > > liblzma. > > Would it make sense to add an autoconf test and to use > --version-script by default on platforms that support it? Then > users and packagers could pass --disable-symbol-versioning to > configure when appropriate. I don't know. If there is GNU ld and some other linker available and only the GNU version supports symbol versions, does it make sense that the choice of linker will affect if versioning is used or not? Would that be a mess? Also, the other linker might support versioning too but use a different command line option (e.g. Solaris ld uses -M). > > E.g. FreeBSDs ships > > xz in the base system with its symbol versioning file. On the other > > hand, maybe FreeBSD's map file could be used elsewhere too. > > > > http://svnweb.freebsd.org/base/head/lib/liblzma/ > > Ah, thanks for the pointer. I'll use FreeBSD's version names for the > public symbols (so, XZ_5.0 and XZ_5.1). I guess it is mostly OK. I think at least alpha versions should be e.g. XZ_5.1.1alpha, because I won't try to keep those symbols stable. > Another question: what can I assume with regard to ABI stability of > development versions? For example, is every symbol that appears in a > beta part of the ABI, while symbols in alphas are subject to change, That's how I hope it will go, but if there is clear need, I will change things even in beta. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
[xz-devel] XZ Utils 5.0.3
XZ Utils 5.0.3 is available at <http://tukaani.org/xz/>. Here is an extract from the NEWS file: * liblzma fixes: - A memory leak was fixed. - lzma_stream_buffer_encode() no longer creates an empty .xz Block if encoding an empty buffer. Such an empty Block with LZMA2 data would trigger a bug in 5.0.1 and older (see the first bullet point in 5.0.2 notes). When releasing 5.0.2, I thought that no encoder creates this kind of files but I was wrong. - Validate function arguments better in a few functions. Most importantly, specifying an unsupported integrity check to lzma_stream_buffer_encode() no longer creates a corrupt .xz file. Probably no application tries to do that, so this shouldn't be a big problem in practice. - Document that lzma_block_buffer_encode(), lzma_easy_buffer_encode(), lzma_stream_encoder(), and lzma_stream_buffer_encode() may return LZMA_UNSUPPORTED_CHECK. - The return values of the _memusage() functions are now documented better. * Fix command name detection in xzgrep. xzegrep and xzfgrep now correctly use egrep and fgrep instead of grep. * French translation was added. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] [RFC/PATCH] using versioned symbols in liblzma
On 2011-05-21 Jonathan Nieder wrote: > All else being equal, I'd prefer to allow testers to try > > 1. update liblzma > 2. update xz > > without breaking xz in the window between steps 1 and 2, except in > the obvious case when a function that turned out to be a bad idea > was changed or removed. It will work between stable releases, but I wouldn't like to make such a promise for alpha releases. Maybe I could promise it for beta versions, but I'm not sure yet. Since it is likely that in some alpha-to-alpha upgrades your wish wouldn't work, I think it is simpler and safer to just assume that new things in development releases aren't stable. So with non-stable releases, keep xz and liblzma always in sync. I'm not sure if distros should ship alpha or beta versions of *shared* liblzma at all. > That would mean that even after the alpha > is over, lzma_stream_encoder_mt would stay as > > lzma_stream_encoder_mt@XZ_5.1.1alpha This made me think, what should I do when I extend old functions e.g. by adding support for a new flag? It doesn't affect backward compatibility, but it means that new applications that use the new functionality won't work with older liblzma versions. Should this kind of extensions be visible in symbol versions too, or is incrementing the minor soname enough? -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] [RFC/PATCH] using versioned symbols in liblzma
On 2011-05-23 Jonathan Nieder wrote: > Lasse Collin wrote: > > This made me think, what should I do when I extend old functions > > e.g. by adding support for a new flag? It doesn't affect backward > > compatibility, but it means that new applications that use the new > > functionality won't work with older liblzma versions. Should this > > kind of extensions be visible in symbol versions too, or is > > incrementing the minor soname enough? > > The old version of the function would return LZMA_OPTIONS_ERROR, > right? So just incrementing the minor library version seems safe > enough. :) Right. > If on the other hand you want to make running a new program with the > old library into a hard error, then the only ways I know are ugly. I suppose it's not required. On the other hand, I have understood that some systems will give a warning or error at program startup if the program was linked against newer minor soname than what is currently installed. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] strerror-like functionality in liblzma
On 2011-05-28 Guillem Jover wrote: > Adding the _r (or _e for error or whatever) counterparts seems like > the most portable solution (with the C/POSIX restictions you > mention), at the cost of API bloat (as we discussed on IRC), and > probably more code churn? The normal functions could then be made to > be tiny shims just passing NULL as the error argument. Yes, it needs more changes to the code than using a thread-local variable. > Passing just a pointer to a buffer might be problematic due to the > size being unknown to the caller, so ideally it should be a pointer > to pointer to buffer, and the function allocating the message or > assigning from a static string table, or a pointer to an int (or > some other integral type) to just assign an extended lzma_ret code. Defining big enough maximum size for the message should be enough, e.g. 512 bytes. A caller can allocate such a buffer on stack. I don't want to dynamically allocate memory because that can fail and that needs to be freed too. Using a custom string allows more specific messages, e.g. including what value was seen in the file vs. what was expected. It also is safer when liblzma is loaded with dlopen() because there is no static string that could disappear. > I'd add the TLS storage class specifier as an option, as it seems to > be suported by quite a few compilers, it obviously depends on which > ones you want to support currently. So far my goal has been to support anything that supports C99 enough, sometimes using something else than Autotools-based build system. In practice this has excluded GCC 2 and Microsoft's compilers. It is very annoying if liblzma provides a function that is available only on some platforms. So if TLS is used, it needs to be supported almost everywhere to be acceptable for XZ Utils. > It seems (from [0] and [1]) at least these compilers support > something like __thread or __declspec(thread): > > * Borland C++ Builder > * Digital Mars C/C++ > * GNU C/C++ > * HP Tru64 UNIX C/C++ > * IBM XL C/C++ > * Intel C/C++ > * Sun Studio C/C++ > * Visual C++ That covers quite a lot, but I'm not confident that it is enough. One must keep in mind that compiler support isn't enough: it needs also operating system support. So having GCC >= 3.3 available doesn't imply that TLS is supported. > > - If the library is never unloaded with dlclose(), then there > > is no problem with pthread_key_create(). This isn't an > > acceptable limitation for liblzma. > > Well, pthread_key_create() allows to provide a destructor function, > so as long as that function is not part of liblzma (free(3)), then > it can do proper cleanup once the thread terminates, ragardless of > liblzma having been unloaded. The pthread_key_t is in the memory of liblzma and thus may be gone when liblzma is unloaded. Pointer to the destructor function might be stored in the key. The key would also be leaked, which can be a real-world problem if the library is loaded and unloaded multiple times, because the system might support only a limited number of keys per process: https://svn.boost.org/trac/boost/ticket/4639 -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] [RFC/PATCH] using versioned symbols in liblzma
I have added symbol versioning to liblzma. Please check that it looks sane. It is enabled by default on GNU-based systems and FreeBSD. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Straightforward memory-to-memory compression&decompress in C?
On 2011-06-24 Dan Stromberg wrote: > I'm looking for some example code (C preferred, something else if > need be) that will: > > 1) Demonstrate using liblzma (or whatever library xz-utils produces), > but producing output in the xz format, not the lzma format. > > 2) Demonstrate using liblzma (or whatever) for memory-to-memory > compression, and memory-to-memory decompression (I'm only > compressing smallish chunks, and wish to do my own I/O so I can > sidestep the buffer cache) There are two example programs in doc/examples directory in XZ Utils source. They use multi-call mode to compress big files in a pipe. Data is passed to and from liblzma via buffers. If you want a single-call interface (one function to encode or decode a buffer holding a complete .xz file), see lzma_easy_buffer_encode, lzma_stream_buffer_encode, and lzma_stream_buffer_decode in src/liblzma/api/lzma/container.h (or /usr/include/lzma/container.h). Note that the above functions work on .xz files, not .lzma files, even though the names might suggest otherwise. The names are what they are for historical reasons: originally .xz was supposed to be .lzma, replacing the old .lzma format. It would be nice to have more tutorial programs for different use cases, but so far I haven't written anything like that. > BTW, does the underlying library API (need to) change much? liblzma API is stable. Things will be added, but old things won't change in incompatible ways. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Straightforward memory-to-memory compression&decompress in C?
(No need to CC anyone since everyone who can post to the list are subscribers. Majordomo doesn't prevent delivery of duplicate emails if someone uses CC.) On 2011-07-06 Dan Stromberg wrote: > Is it safe to assume that lzma_stream_buffer_decode is the way to go > for decompression, irrespective of whether one has used > lzma_easy_buffer_encode or lzma_stream_buffer_encode to create the > compressed input? Yes. "easy" refers to the way the compression options are set. It doesn't affect the file format so there's no need to have an "easy" function for decompression. > > It would be nice to have more tutorial programs for different use > > cases, but so far I haven't written anything like that. > > Maybe we can kill two birds with one stone here - since I'm > prototyping in C, would you find it more useful if it were done as > an example program, or as a unit test? Example programs with good comments for every step can be used as tutorials. The test suite is currently poor so improving the test suite would be welcome. But I think that improving the test suite doesn't help improving the documentation. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] [PATCH] “xzdiff a.xz b.xz” exit status should reflect whether the files differ
On 2011-07-26 Jonathan Nieder wrote: > xzdiff was clobbering the exit status from diff in a case statement > used to analyze the exit statuses from "xz" when its operands were > two compressed files. Save and restore diff's exit status to fix > this. The fix looks OK. The test suite addition needs minor changes. > +temporaries="tmp_preimage.xz tmp_samepostimage.xz > tmp_otherpostimage.xz" > +rm -f $temporaries > +trap "rm -f $temporaries" 0 I'm not sure how well "trap" behaves with ancient shells. You can use the included test files instead of temp files: "$srcdir/files/good-1-check-crc32.xz" "$srcdir/files/good-1-check-crc64.xz" "$srcdir/files/good-1-lzma2-1.xz" > +PATH=$(pwd)/../src/xz:$PATH Ancient pre-POSIX /bin/sh implementations don't support $(pwd) so it's better to use `pwd` here. The same shells need a separate "export PATH" to update the environment. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] [PATCH] “xzdiff a.xz b.xz” exit status should reflect whether the files differ
On 2011-07-29 Jonathan Nieder wrote: > +sh "$XZDIFF" "$preimage" "$samepostimage" >/dev/null I missed this during the first round. It's not necessarily sh that ends up as @POSIX_SHELL@ into the scripts, so it's possible that this will use a different shell to run xzdiff than normal use of xzdiff would. Ancient pre-POSIX /bin/sh doesn't run xzdiff and other scripts correctly. That's why there's @POSIX_SHELL@ which gets replaced by configure. (The test suite doesn't rely on @POSIX_SHELL@ so the test scripts themselves still need to work with old shells.) Solaris 10 is an example with a problematic /bin/sh. However, there's a better sh in the PATH first. So maybe it isn't a problem in practice; I didn't test now. Even though the above might be a problem, I have committed the patches. Thank you. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] [PATCH] “xzdiff a.xz b.xz” exit status should reflect whether the files differ
On 2011-07-31 Jonathan Nieder wrote: > Maybe it could make sense to > teach the Makefile instead of the configure script to generate the > scripts so they could be marked executable at build time and then used > directly. Maybe. The current way was used because it was the laziest and had a low risk of new build system bugs. I haven't merged your patch to v5.0 yet. I haven't decided if the test script is safe enough there yet. It would be annoying if someone got a test failure in a stable release just because the test script uses a wrong shell. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Re: [PATCH] “xzdiff a.xz b.xz” exit status should reflect whether the files differ
On 2011-08-03 Jonathan Nieder wrote: > Makes sense. Just for kicks, here's a try based on advice from > <http://www.gnu.org/s/hello/manual/automake/Scripts.html>. It > probably makes more sense to use the "AC_CONFIG_FILE([src/my_script], > [chmod +x src/my_script])" approach. I used the AC_CONFIG_FILE approach. I guess I had missed that part from the manual earlier. Thanks. > > I haven't merged your patch to v5.0 yet. I haven't decided if the > > test script is safe enough there yet. It would be annoying if > > someone got a test failure in a stable release just because the > > test script uses a wrong shell. > > No need to hurry. I don't mind if you merge the fix without the > test. :) Maybe it is OK now unless the chmod causes trouble on some not-so-POSIX systems. :-) I think 5.0.4 should be released before the end of this month. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Re: [PATCH] “xzdiff a.xz b.xz” exit status should reflect whether the files differ
On 2011-08-07 Jonathan Nieder wrote: > On a completely unrelated note, I finally found time to start reading > the xz-java implementation, and it's been very pleasant. Thanks for > writing it. :) Thanks! I will need to do a little more Java coding, so new development on XZ Utils will unfortunately need to wait a little more. The good news is that the lessons learned while working on the Java code should help with XZ Utils. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
[xz-devel] XZ for Java 0.4
XZ for Java 0.4 is now available: http://tukaani.org/xz/java.html There are some minor fixes to the old code. Support for random access decompression is a new feature. Threading is missing but it's easier to add it to this code than to liblzma in XZ Utils. This is probably the last release before 1.0. I don't have anything planned before 1.0 but maybe someone finds something that could be improved. :-) -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Integration with TrueZIP
On 2011-09-12 Christian Schlichtherle wrote: > I just found your website and would be interested to write a driver > module for TAR.XZ files for TrueZIP (http://truezip.java.net). I > wonder if anyone has already done this because I do not want to > reinvent the wheel. Probably not. > I had a brief look at the code and I noticed that the 0.4 > distribution of XZ for Java contains more classes than what the > online Javadoc has emmitted. Is this required? Classes that aren't part of the public API aren't documented in the API docs. Non-public classes are needed by the public classes. It tries to be a complete and pedantic implementation, not a size-optimized implementation. So it's a bit bloated. > For integration with TrueZIP, I would like to add XZ for Java to the > Maven Central directory. Would anybody mind if I do this for you? It would be nice if you could do it. Having the code in a Maven repository would be useful also for Apache Commons Compress integration. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Integration with TrueZIP
On 2011-09-12 Christian Schlichtherle wrote: > > It would be nice if you could do it. Having the code in a Maven > > repository would be useful also for Apache Commons Compress > > integration. > > No problem. I could write a pom.xml for you. I could then either > upload the generated artifact to Maven Central or you could do it. If > you want me to do it, as a side effect I would take "ownership" with > the groupId at oss.sonatype.org. If that's not what you want, you > would need to sign up for an account at oss.sonatype.org and deploy > the artifacts yourself. Up to you, of course. Let's start with pom.xml. I'll decide later if I want to upload it myself. Thanks. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Use of XZ/LZMA compression in the ZIP file format
On 2011-09-14 Christian Schlichtherle wrote: > I am looking into integrating LZMA compression to my TrueZIP Driver > Zip > <http://truezip.java.net/truezip-driver/truezip-driver-zip/index.html> > as explained in the ZIP File Format Specification > <http://www.pkware.com/documents/casestudies/APPNOTE.TXT> . Now I > wonder what I need to do to restrict the compression to LZMA-only, > not LZMA2 or XZ, or if I should not restrict it at all because > supporting method 14 (LZMA) may imply supporting LZMA2 or XZ, too. Internally LZMA2 uses the same code as the original LZMA, but it needs some work to adapt the LZMA2 Java code to do LZMA streams: - See the comment about RangeEncoder.cacheSize in the code. - You will need to modify RangeEncoder.shiftLow so that it writes directly to an output stream (instead of to a buffer). - You need to add a function to write LZMA end of stream marker: rc.encodeBit(isMatch[state.get()], posState, 1); rc.encodeBit(isRep, state.get(), 0); encodeMatch(0x, MATCH_LEN_MIN, posState); - A little code is needed to glue the components together. LZMA2OutputStream does this for LZMA2. Note that LZMA doesn't support flush() like LZMA2 does. LZMA might be needed for Apache Commons Compress. I don't know yet. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Use of XZ/LZMA compression in the ZIP file format
On 2011-09-14 Jonathan Nieder wrote: > - The version information header refers to the version of Igor >Pavlov's LZMA SDK used to compress (one byte major, one byte >minor). The LZMA SDK never used versions in the range [5, 8], so >maybe some lie like "5.0" would be appropriate. ;-) Maybe it would be better to fake a low SDK version. Maybe decompressors check it and reject too big versions as unsupported. I'm guessing here, I have no idea about the real-world implementations. > - I am not sure if the constraints on compression parameters >mentioned at [1] would ever trip in decompressing ZIP files. >Probably not in practice. The spec doesn't mention them, alas. The Java version doesn't support LZMA1. If it is adapted to support it, there's no similar limit of lc + lp <= 4 as there is in liblzma because in Java the arrays are allocated one by one. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
[xz-devel] XZ for Java 1.0
XZ for Java 1.0 was released earlier this week: http://tukaani.org/xz/java.html The code is available also in Maven Central. The actual Java code is identical to the version 0.4, but I made a new release to make it clear that the code and API should now be stable. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Is the xz format stable?
On 2011-11-06 Tom Trauth wrote: > I am trying to submit a patch to an open source project to add xz > support to it, but before accepting it the maintainer wants me to get > a promise from the xz developers that the xz format is now stable and > will have no backwardly incompatible modifications in the future. It is stable in sense that new tools will always be able to decompress old .xz files that have been created with a stable release of XZ Utils. It is possible and even somewhat likely that new features will be added in the future which old programs won't support. Compare to the .zip format. It has got support for new compression methods and other features over the years, including LZMA support. When maximum portability is needed, people stick to the Deflate algorithm which all non-ancient .zip implementations support. > But he > apparently had a bad experience with the lzma format changing its > format several times and therefore does not trust xz. The old .lzma format hasn't changed since it was introduced in LZMA SDK and also used by LZMA Utils. There were development versions of the .xz format that used also the .lzma suffix, but no one has claimed that those alpha versions would be stable. If someone has thought the development versions were stable, it has been a major misunderstanding. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Is the xz format stable?
On 2011-11-06 Tom Trauth wrote: > Given an xz file, > is there a way to determine which version of the xz format it uses. > Something like: > > xz-get-version foo.xz --> foo.xz uses XZ format version 1.0.4 Right now there is no way to get a version number of the format. I could make xz -lvv show the oldest XZ Utils version that will decompress the file. It can only work for files that are supported by the xz tool, so it's not possible to make an old xz tool to display how much newer xz is required for a given file; the old tool could only tell that it doesn't support it. I don't know if this could be good enough for you. To understand the reason for the above, it's good to understand how incompatible additions may happen: (1) A new filter/method ID may be added into the official .xz format specification. Old tools will show that there is an unsupported filter ID and cannot decompress such files (will display an error). (2) Third-party developers may use custom filter IDs which aren't in the official specification and aren't supported by XZ Utils. If they don't deviate from the .xz specification in any other way, this is OK. Old tools cannot distinguish this situation from (1). (3) A new .xz format specification may add new features to the container format. The old tools will detect such files as unsupported (they won't claim them to be corrupt). With old tools, the difference to (1) and (2) is that the old tools won't be able to list even the filter IDs. If incompatible additions are made, the xz tool won't use them by default. Maybe they might become a default after several years have passed and old xz versions aren't common anymore. But it won't be done easily, because it would make people angry if the default settings created files that many wouldn't be able to decompress without extra work. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Is the xz format stable?
On 2011-11-06 Tom Trauth wrote: > This is what the maintainer wants, in his own words: > We need a way to verify that a specific named version of xz, which > might be older than the version someone has installed, will be able > to uncompress a particular file. When compressing with a future version of xz, don't enable incompatible features. If you already have a .xz file and want to find out if an old xz will support it too, there's no simple way right now, but I think I will make xz -lvv show the minimum required XZ Utils version. > His main fear is that someone will create an xz file with a version > of xz which is newer than the version of xz that his project > supports, and then his project will not be able to read that file. Such a situation will probably be possible in the future if someone enables an incompatible feature when compressing. A comparable situation is technically possible even now because it is possible to have a .xz file that requires 4 GiB of memory to decompress, which is too much for many systems. No one creates such files in practice though. :-) If one wants to be extra safe, one could define what features and memory usage are allowed to guarantee compatibility. This is already required when creating files for XZ Embedded, which doesn't support all .xz features. XZ Embedded is used e.g. in Linux as an option to compress the kernel and initramfs and for Squashfs images. > However, your last paragraph implies xz, including future versions, > will create a file that is unreadable by current versions of xz > unless special parameters are used, because to do otherwise would > anger a lot of people. Is that true? If so, I think it will help > allay his fears. Assuming that you meant "will not create", yes, it is true with one possible exception: If a new very nice but incompatible feature is added in 2012, I might consider making that a default in 2018-2020. At point the old versions should have pretty much vanished. Even then it will be possible to create files that are compatible with XZ Utils 5.0.0 by using extra options. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Is the xz format stable?
On 2011-11-06 Tom Trauth wrote: > From an e-mail from the maintainer, it looks like your proposal to > add the minimum required version of xz-utils to the output of "xz > -lvv" will be enough to meet his requirements. He wants to be able > to check which version of xz he needs, and this will enable him to do > it. Any idea which future version of xz-utils may contain this > enhancement? The feature is now available in the git repository. It will be in 5.1.2alpha, but I don't know when it will be released. It won't be in 5.0.x because I won't add any new features into a stable branch. The info is also in xz -lvv --robot output so it should be easy to parse. The idea of --robot is to make parsing simple and stable across xz versions. I didn't update the man page yet. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] xz startup time for small files
On 2011-11-28 Stefan Westerfeld wrote: > Now the problem is that for those files I cannot predict the size. > Often they will be quite small, but they also could be 100 MB in size > or more. So I use xz -9 to get the best compression. > > The problem is now that xz takes a lot of time to start: > > stefan@ubuntu:/tmp$ time echo "foo" | xz -9 >/dev/null > > real0m0.155s > user0m0.052s > sys 0m0.096s The match finder hash table has to be initialized. It cannot be avoided. The bigger the dictionary, the bigger the hash table. It's about 64 MiB when using 64 MiB dictionary (xz -9). With 8 MiB dictionary (xz -6) it's about 16 MiB. So at a lower setting the initialization is faster. xz allocates much more memory for other things. Most of that memory isn't initialized beforehand. Uninitialized memory doesn't cause a significant speed penalty because many kernels don't physically allocate large allocations before the memory will actually be used. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
[xz-devel] Memory usage limits again
ult memory usage limits isn't liked by everyone who want to enable limits: - If you login interactively with ssh, the shell startup scripts are executed and XZ_DEFAULTS will be set. But if ssh is used to run a remote command (e.g. "ssh myhost myprogram"), the startup scripts aren't read and XZ_DEFAULTS won't be there. - /etc/profile or equivalent usually isn't executed by initscripts when starting daemons. Some daemons use xz. - People don't want to pollute the environment with variables that affect only one program. Having a configuration file would fix the above problems, but XZ Utils is already an over-engineered pig, so I'm not so eager to add config file support. I have thought about adding configure options that would allow setting default limits for compression and decompression. Someone may think that it can confuse things even more, but on the other hand some people already patch xz to have a limit for compression by default. I haven't thought much about memory usage limits with threading, but below are some preliminary thoughts. With compression, -T0 in 5.1.1alpha sets the number of threads to match the number of CPU cores. If no memory usage limit has been set, it may end up using more memory than there is RAM. Pushing the system to swap with threading is silly, because the point of threading in xz is to make it faster. So it might make sense to have some kind of default soft limit that is used to limit the number of threads when automatic number of threads is requested. With threaded decompression (not implemented yet) and no memory usage limit, the worst case is that xz will try to read the whole input file to memory, which is silly. So it probably will need some sort of soft default limit to keep the maximum memory usage sane. The definition of sane is unclear though. It's not necessarily the same as for compression. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] xz startup time for small files
On 2011-11-28 Stefan Westerfeld wrote: > Just a thought: could performance be improved if xz requested the > memory via mmap(), like > > char *buffer = (char *) mmap (NULL, 64 * 1024 * 1024, > PROT_READ|PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); > > I wrote a little test program which seems to indicate that mmap() is > much faster for getting zero initialized memory than malloc() + > memset(). But thats for the case where the application does not > access the memory. For xz the question is how much of the memory will > be accessed, and how much not having to zero-initialize the memory > will save. With tiny input the memory won't be accessed much. With BT4 match finder, it's one read and one write per uncompressed input byte. Each read and write is a 32-bit integer. Since it's a hash table, it's random access. There are actually three hash tables in BT4, which are allocated at the same time, but the other two tables are small. If you do a few thousand random 32-bit reads and writes, the mmap method can still be faster, but it's not as huge difference as your test makes it look like. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] xz startup time for small files
On 2011-11-28 Thorsten Glaser wrote: > Stefan Westerfeld dixit: > > >Just a thought: could performance be improved if xz requested the > >memory via mmap(), like > > No, because any self-respecting modern malloc(3) implementation > uses mmap(2) internally, see omalloc for example. (That’s Otto > Moerbeek’s last one, found e.g. in OpenBSD.) The point was that with mmap you are guaranteed that the memory is already zeroed (or will be zeroed when kernel does the physical allocation). With malloc the contents of the memory is undefined. There's also calloc, but with a quick and inaccurate test with glibc, it doesn't seem faster than malloc+memset. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] xz startup time for small files
On 2011-11-28 Thorsten Glaser wrote: > Lasse Collin dixit: > If xz does indeed know it needs a zero’d allocation and > can express that in page sizes (pretty non-portable), > _and_ has fallback code for mmap-less architectutes (e.g. > several POSIX-for-Windows systems or ancient OSes) then > sure. But I’d say, leave malloc speedups to the OS. Or > the porter; they should know what they do. I'm not interested in playing with mmap in liblzma. > (calloc is indeed faster than malloc+memset here for > large allocations. About 1750 vs. 20 milliseconds.) Add a few thousand random reads and writes, which liblzma will do even with small files. Maybe the calloc is so much faster because it just mmaps memory and doesn't touch it, so the kernel doesn't physically allocate and initialize it either. I know that using calloc is the right way to get zeroed allocation. In liblzma I have allocations and initializations separated, because it allows reusing the existing allocations when (de)compressing many streams. I could still use calloc and skip memset as a special case, but currently I think it's not worth it at all. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] xz startup time for small files
On 2011-11-29 Stefan Westerfeld wrote: > Of course its your code base, and you can use mmap() or not, there > are some performance gains, which can to be bought with additional > code complexity. Your mmap test isn't very realistic because it doesn't touch the allocated memory at all. Below is a modified version of your test program. 5000 simulates the input file size as bytes. x is there just in case to ensure that the memory reads aren't optimized away. #include #include #include #include #include int main (int argc, char **argv) { unsigned int i; unsigned int x; unsigned int *buffer; assert (argc == 2); if (strcmp (argv[1], "malloc") == 0) { buffer = malloc (64 * 1024 * 1024); memset (buffer, 0, 64 * 1024 * 1024); } else { buffer = mmap (NULL, 64 * 1024 * 1024, PROT_READ|PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); } for (i = 0; i < 5000; ++i) { x = buffer[rand() % (16 * 1024 * 1024)]; buffer[rand() % (16 * 1024 * 1024)] = x + 1; } return x & 1; } The mmap version will still be faster, but the difference isn't so enormous anymore. If the mmap in the test program is replaced with calloc, it's as slow as malloc+memset on GNU/Linux x86-64, but it may very well be as fast as mmap on some other OS. Creating a separate xz process for every file wastes even more time than the hash table initialization. I tested with this on tmpfs: TIMEFORMAT=%3R # To get times in seconds with bash mkdir test cd test for I in {..4999} ; do printf '%300s\n' $I > $I ; done for OPT in -1e -6 -9 ; do echo echo $OPT echo Separate processes: time for I in * ; do xz -k $OPT $I ; done rm *.xz echo Single process: time xz -k $OPT * rm *.xz done My results (times are in seconds): -1e -6-9 Separate processes:39.3 49.8 146 Single process: 2.6 14.757 So even at -9, using a single xz process would help much more than optimizing the hash table initialization. One way to do this could be to use --files or --files0 option with xz. You would leave xz running and give it new filenames via stdin. It's good to note that combining the single-process approach and mmap is not a good idea. If you compress multiple files, the memory won't be reallocated for every file. Using memset to reset the old allocation is faster than munmap+mmap as long as the memory will also be accessed at least a little like it will be in xz. It would be possible to use a small hash table at first, and switch to a bigger one if the input size exceeds a predefined value. This would probably have some speed penalty too, and the code wouldn't look so fun either. > But I think maybe its better to take a step back and see what I was > trying to do in the first place: compressing files which vary in > size. From the documentation I've found that using levels bigger than > -7, -8 and -9 doesn't change anything if the file is small enough. So > I can do this: > > def xz_level_for_file (filename): > size = os.path.getsize (filename) > if (size <= 8 * 1024 * 1024): > return "-6" > if (size <= 16 * 1024 * 1024): > return "-7" > if (size <= 32 * 1024 * 1024): > return "-8" > return "-9" > > in my code before calling xz. This will get around initializing the > 64M of memory for small files, and results in quite a bit of a > performance improvement in my test (probably even more than using > mmap). > > It would still be cool if xz could do this automatically (or do it > with a special option) so that not every xz user needs to adapt the > compression settings according to the file size. Basically, it could > detect the file size and adjust the compression level downwards if > that will not produce worse results. Adding an option to do this shouldn't be too hard. I added it to my to-do list. At one point it was considered to enable such a feature by default. I'm not sure if it is good as a default, because then compressing the same data from stdin will produce different output than when the input size is known. Usually this doesn't matter, but sometimes it does, so if it is a default, there probably needs to be an option to disable it. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Memory usage limits again
On 2011-11-29 Jonathan Nieder wrote: > Lasse Collin wrote: > > > With compression, -T0 in 5.1.1alpha sets the number of threads to > > match the number of CPU cores. If no memory usage limit has been > > set, it may end up using more memory than there is RAM. Pushing the > > system to swap with threading is silly, because the point of > > threading in xz is to make it faster. So it might make sense to > > have some kind of default soft limit that is used to limit the > > number of threads when automatic number of threads is requested. > > How about something like this patch, to start? With it applied, I am > happy using > > XZ_DEFAULTS='--no-adjust --threads=0 --memlimit=1080MiB After the patch, --no-adjust doesn't prevent auto-adjusting the number of threads. It would only prevent auto-adjustments of LZMA2 dictionary size. I'm not sure if I like this or not. I guess that your idea is to use --no-adjust to catch situations where the specified settings are too high even for single-threaded operation, and use more than one thread when the memory limit allows. I don't know if someone would want to use --no-adjust to prevent xz from scaling down the number of threads. Maybe your use case is more likely. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] lzma orig file and decompressed file are different.
On 2011-12-09 stompdagg...@yahoo.com wrote: > I'm writing a c++ program that takes a sql text file compressed in > lzma (created by running lzma -9 ) and unlzma it. but for some > reason the result file is different, see here: > http://paste.pocoo.org/show/518724/ one of the line is cut and a 1 > char is has been added to some other ones. the relevant code can be > found here: http://paste.pocoo.org/show/518725/ Maybe the problem happens if the output buffer is filled partially and the input buffer becomes empty. After getting more input, the contents of the partially filled output buffer is lost. There is also a problem that the code doesn't check that the decoding ends with LZMA_STREAM_END. This means that you won't detect if the file is truncated. The code may read past the end of the output output buffer (dDataArr) when writing the data to the file on line 38. It could be better to omit the memset on line 27 and use mystream.write instead of operator<<. > OT: the list's registering procedure is very unclear, it would be > wise to make the return make more clearer as it took me few mails to > get registered and understand I'm registered, I'm registered to > another few lists so I can say it can be done. I'm sorry to hear that. I cannot affect how the actual subscribing is done over email. My hosting provider has Majordomo and that's what I have to use (or move the list elsewhere). I can improve the instructions on tukaani.org web site if you can explain what should be clarified. Thanks. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] lzma orig file and decompressed file are different.
On 2011-12-12 stompdagg...@yahoo.com wrote: > in regards to the registration, the main issue is that the return > mails are not clear, for example when I send confirmation, the answer > was not clear if it was successful so I've had to resend it, only > then I've scanned the return mail and found out that the original > answer confirmed my registration. Unfortunately I cannot change it without moving the list elsewhere and to another mailing list software. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] lzma orig file and decompressed file are different.
On 2011-12-13 stompdagg...@yahoo.com wrote: > back to topic, I've taken the pipe decompress example and started > modifying it, when I got to read from file, decompress and write file > using c functions it worked but when I changed it to c++ stream > handling see here: http://dpaste.com/673199/ the output file is > identical to the original but I get error code 10. > > how is that possible? I'm not sure. Using in.reasome looks suspicious. You may want to use in.read and in.gcount instead. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] lzma orig file and decompressed file are different.
On 2011-12-16 stompdagg...@yahoo.com wrote: > yes! problem solved, the code can be viewed at > http://gitorious.org/open-source-soccer-manager/ossm/blobs/master/src/Utilities/Utils-General.cpp#line240 Good that it works. There's still a bug that it doesn't detect truncated files. You need to check that lzma_code has returned LZMA_STREAM_END to know that the end of the file was reached successfully. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] lzma orig file and decompressed file are different.
On 2011-12-16 stompdagg...@yahoo.com wrote: > in that case, what action should I take? Simply check that the last call to lzma_code has returned LZMA_STREAM_END. If it returned LZMA_OK, the decoder didn't decode the last bytes of the file and thus the file was truncated or otherwise corrupt. I see you copied the bug from xz_pipe_decomp.c. I'm sorry about that. I should have reviewed the example programs more carefully before accepting them. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Memory requirement for linux kernel compression/decompression
On 2012-02-21 Gilles Espinasse wrote: > On 2.6.32, only lzma option is available and by default, kernel use > lzma -9 and this require much more memory than needed. As one > compiler of my distrib reported a compilation breakage on a 512 MB VM > during kernel compression, I started hacking scripts/Makefile.lib, > removed -9 and added -vv. I then played with information displayed > during compression to adjust xz memory requirement. lzma -9 from LZMA Utils uses 32 MiB dictionary and requires 311 MiB of memory. xz -9 uses 64 MiB dictionary and requires 674 MiB of memory. The lzma emulation in xz uses the same presets as xz, so lzma -9 from XZ Utils needs 674 MiB of memory. So the emulation isn't very good although by default both XZ Utils and LZMA Utils use 8 MiB dictionary. Using a dictionary bigger than the uncompressed file is waste of memory. So if the kernel image is small, switching to a much smaller dictionary doesn't affect compression ratio. > Should not a patch be pushed on LKLM to at least remove the -9 part? I don't know. If -9 is removed, then a kernel bigger than 8 MiB may compress worse than it does now. The -9 probably was put there when XZ Utils hadn't taken over LZMA Utils, so the memory usage was much lower. Using a high setting is fine from decompression point of view, because in the specific case of kernel decompression the dictionary size doesn't affect the decompressor memory usage. So from that point of view it is fine to use a high setting "just in case". scripts/xz_wrap.sh uses 32 MiB dictionary (370 MiB memory) to compress a kernel image with xz. Maybe that would work on 512 MiB VMs but it can still be a bit annoying on them. An alternative to local patching is to set a memory usage limit for xz when compiling the kernel: $ XZ_OPT=-M100MiB make xz will then scale down the dictionary size. It does it also when emulating lzma. > Secondly, could I trust the decompression memory requirement > displayed by xz? It can be trusted but: - It's rounded *up* to the nearest MiB, so it's not very precise when memory requirements are low. This could be fixed since a more accurate number is known internally already. - The number assumes that the decompressor needs to allocate a separate dictionary buffer. This isn't always the case. Linux kernel decompression doesn't need a dictionary buffer but initramfs and initrd decompression does. > Is the kernel decompressor really requiring the same > memory size that xz display during compression? No. Kernel decompression with a XZ-compressed kernel requires about 30 KiB of memory. The dictionary size doesn't matter because the output buffer is used as the dictionary buffer. This is done even when a BCJ filter is used. I think with a LZMA-compressed kernel the memory usage is very similar to XZ. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Oddities with --lzma2 options
On 2012-03-05 Gilles Espinasse wrote: > I find strange here that with a dictionary size even a bit bigger > than with bare -8e the compressed file is a bit bigger. This can sometimes happen, but it shouldn't be the common case. A bigger dictionary might allow encoding some sections of the file better, but it can cause the internal state to be less optimal for other sections of the file. So with bad luck a bigger dictionary gives a tiny bit bigger output. > Trying to add -e doesnt change the result (and time to compress) when > nice=273 depth=512 are set. If you specify something like xz --lzma2 -e the -e option is ignored. The -e only affects the presets -0 ... -9. If you want to take -8e as the starting point and then adjust the dictionary size, use this: xz --lzma2=preset=8e,dict=${DICT}KiB -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] The right data for Embedded XZ?
On 2012-03-28 Mike Melanson wrote: > *However!* Studying the source code in that directory demonstrated > what was wrong in my own sample app-- I need to call xz_crc32_init() > before the other functions. I see that's mentioned near the end of > xz.h; perhaps it warrants an earlier mention. XZ Embedded has been written primarily for the Linux kernel and there you don't need xz_crc32_init() except in decompress_unxz.c. So in the Linux context it's better to keep the xz_crc32_init() docs at the end of xz.h. I'm sorry that you didn't notice this or the existence of xzminidec earlier. I added a reference to xzminidec.c to README. > Anyway, I got past that problem. It should be noted that > "--check=crc32" really is necessary for compressed data if Embedded > XZ will be chewing on it. Yes. This is mentioned in linux/Documentation/xz.txt. The existence of this file is pointed in README. > The library returns XZ_OPTIONS_ERROR otherwise, but only on the first > call. If you call xz_dec_run() again, decoding will proceed fine (and > accurately). I figured this out when I made a mistake in my decode > loop and didn't terminate on error. Calling xz_dec_run() again after XZ_OPTIONS_ERROR leads to undefined behavior in sense that I haven't thought what will happen. liblzma would keep returning the same error code if one calls lzma_code() again after an error, but in XZ Embedded I skipped such things to make the code smaller. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] The right data for Embedded XZ?
On 2012-03-29 Thorsten Glaser wrote: > Mike Melanson dixit: > > >gcc -std=gnu89 -I../linux/include/linux -I. -DXZ_DEC_X86 > ^^ > You probably want -std=gnu99 here. gnu89 should work (gnu99 should work too). The Linux kernel is compiled with gnu89 so XZ Embedded needs to conform to that too. > >-DXZ_DEC_IA64 -DXZ_DEC_ARM -DXZ_DEC_ARMTHUMB -DXZ_DEC_SPARC > >-DXZ_DEC_ANY_CHECK -ggdb3 -O2 -pedantic -Wall -Wextra -c -o > >boottest.o boottest.c > >In file included from ../linux/lib/decompress_unxz.c:235:0, > > from boottest.c:22: > >../linux/lib/xz/xz_dec_lzma2.c: In function ‘xz_dec_lzma2_run’: > >/usr/include/bits/string3.h:56:1: sorry, unimplemented: inlining > >failed in call to ‘memmove’: redefined extern inline functions are > >not considered for inlining > > Yes well, that will of course break. It works for me but I'm not sure why. In boottest.c I want to use the memmove() and other functions from decompress_unxz.c instead of libc. Maybe it would be enough to avoid in boottest.c and replace strcmp() with something else. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] when a stable version with multithreaded compression support will be available?
On 2012-05-29 valentin wrote: > I'm working on yoctoproject (www.yoctoproject.org) and xz it's used > pretty much by the build system. I would like to upgrade the xz > package to the development version (5.1.1alpha) for it's support for > multithreaded compression to speed up some tasks. I know this version > is not stable and I'm asking if someone knows when the stable version > will be available !? I don't know. A few days ago I started working on getting 5.0.4 ready to be released. Before this it has been quiet for several months. 5.1.1alpha has a few annoying bugs that have been fixed later in the git repository. I might release 5.1.2alpha around the same time with 5.0.4. There are no known critical bugs (e.g. data corruption) but getting the code to beta or stable will still take more work. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] keep and hard links
On 2012-06-10 Ariel wrote: > xz won't compress a file if it has hard links, even if --keep is > specified. > > I think this should be changed since if the file is not being deleted > hard links don't matter. You aren't the first one requesting this change. I'm not sure if it is safe to change it. In theory someone could rely on the current feature although that doesn't sound so likely. Currently --force does what you request, but --force also makes xz overwrite existing files, which you might not want. If overwriting isn't an issue, then --keep --force does exactly what you want. If --keep is modified, I think it should also allow (de)compression of symlinks and setuid, setgid, and sticky files. This way it would match what --force does. I would like to hear what people think about this. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] keep and hard links
On 2012-06-11 Christoph Biedl wrote: > A while ago I considered asking for a few --force-* options that allow > finer control about what things are acceptable that usually are not. > Their names would be something like --force-overwrite, > --force-symlink, --force-links and so on. That would allow overriding > xz's sane defaults in certain aspects only without using --force and > doing something potientially really harmful and undesired. It's not hard to add these, but I'm unsure how useful these would be in practice. Maybe you had some specific use case in mind. Anyway, special --force-foo switches aren't the answer to the original question of what --keep should do. If you need it often, it isn't practical enough to type such long options on the command line when "xz -k foo" or "xz -kf foo" could be enough. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
[xz-devel] Example program bug fix and new example programs
A bug was fixed in doc/examples/xz_pipe_decomp.c. It didn't detect truncated files. I recently wrote new example programs that have more comments. I moved xz_pipe_comp.c and xz_pipe_decomp.c to doc/examples_old so that new example programs can be put into doc/examples. I will keep the old programs for now in examples_old. If someone has copied the structure from xz_pipe_decomp.c he can then see how to easily fix the bug. I would like to get feedback about the new example programs. They are now in the master branch in the git repository. Gitweb: http://git.tukaani.org/?p=xz.git;a=commit;h=3a0c5378abefaf86aa39a62a7c9682bdb21568a1 I would like to include them into 5.0.4, so if there's something wrong with them, I would like to hear about it soon. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
[xz-devel] XZ Utils 5.0.4
XZ Utils 5.0.4 is available at <http://tukaani.org/xz/>. Here is an extract from the NEWS file: * liblzma: - Fix lzma_index_init(). It could crash if memory allocation failed. - Fix the possibility of an incorrect LZMA_BUF_ERROR when a BCJ filter is used and the application only provides exactly as much output space as is the uncompressed size of the file. - Fix a bug in doc/examples_old/xz_pipe_decompress.c. It didn't check if the last call to lzma_code() really returned LZMA_STREAM_END, which made the program think that truncated files are valid. - New example programs in doc/examples (old programs are now in doc/examples_old). These have more comments and more detailed error handling. * Fix "xz -lvv foo.xz". It could crash on some corrupted files. * Fix output of "xz --robot -lv" and "xz --robot -lvv" which incorrectly printed the filename also in the "foo (x/x)" format. * Fix exit status of "xzdiff foo.xz bar.xz". * Fix exit status of "xzgrep foo binary_file". * Fix portability to EBCDIC systems. * Fix a configure issue on AIX with the XL C compiler. See INSTALL for details. * Update French, German, Italian, and Polish translations. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] next development release
On 2012-06-28 Denis Excoffier wrote: > Is a xz-5.1.2alpha (or xz-5.1.1beta) release planned soon? 5.1.2alpha will be released soon. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
[xz-devel] XZ Utils 5.1.2alpha
XZ Utils 5.1.2alpha is available at <http://tukaani.org/xz/>. Here is an extract from the NEWS file: * All fixes from 5.0.3 and 5.0.4 * liblzma: - Fixed a deadlock and an invalid free() in the threaded encoder. - Added support for symbol versioning. It is enabled by default on GNU/Linux, other GNU-based systems, and FreeBSD. - Use SHA-256 implementation from the operating system if one is available in libc, libmd, or libutil. liblzma won't use e.g. OpenSSL or libgcrypt to avoid introducing new dependencies. - Fixed liblzma.pc for static linking. - Fixed a few portability bugs. * xz --decompress --single-stream now fixes the input position after successful decompression. Now the following works: echo foo | xz > foo.xz echo bar | xz >> foo.xz ( xz -dc --single-stream ; xz -dc --single-stream ) < foo.xz Note that it doesn't work if the input is not seekable or if there is Stream Padding between the concatenated .xz Streams. * xz -lvv now shows the minimum xz version that is required to decompress the file. Currently it is 5.0.0 for all supported .xz files except files with empty LZMA2 streams require 5.0.2. * Added an *incomplete* implementation of --block-list=SIZES to xz. It only works correctly in single-threaded mode and when --block-size isn't used at the same time. --block-list allows specifying the sizes of Blocks which can be useful e.g. when creating files for random-access reading. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
[xz-devel] XZ for Java 1.1
XZ for Java 1.1 is available at <http://tukaani.org/xz/java.html> and in the Maven Central (groupId = org.tukaani, artifactId = xz). Here is an extract from the NEWS file: * The depthLimit argument in the LZMA2Options constructor is no longer ignored. * LZMA2Options() can no longer throw UnsupportedOptionsException. * Fix bugs in the preset dictionary support in the LZMA2 encoder. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Wrong content of sources JAR on Maven Central
Sorry about slow reply. On 2012-07-08 Christian Schlichtherle wrote: > The sources JAR on Maven Central seems to contain a copy of the source > repository so that you could rebuild XZ 1.1 from it. However, this is > not what should be in there. A sources JAR is not meant to be used for > rebuilding the release. Instead, it should exactly match the > directory tree of the classes JAR so that tools like an IDE can look > up the sources by substituting the .class suffix with .java. So the > sources JAR should just contain a top directory with the name org > which contains the rest of the package structure for XZ 1.1 OK, thanks for reporting the bug. I didn't know this and I still don't know where it is documented. I have hopefully fixed it in the git repository now. I assume that -sources.jar doesn't require any manifest. > If you want to make the source code available for rebuilding, then > this is better done by providing an online source code repository > (it's Git for XZ, isn't it) with a special tag for the release, say > "xz-1.1". Right. There is a source .zip on tukaani.org and releases have been tagged in the git repository. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] cache-aware match finder for blocks of 2**17 bytes
On 2012-10-09 John Reiser wrote: > I'm interested in speeding up compression for mksquashfs, which uses > independent blocks of input length 2**17 bytes. I have in mind a > specialized match finder which would take advantage of the small > fixed block size, and tailor its memory usage to the common L2 cache > size of 256KB. Is anyone else looking into this? I'm not aware of anyone working on something like this. I think one needs to modify more than the match finder to fit all data structures into 256 KiB. For example, the dictionary buffer has some fixed extra size to prevent too frequent memmove calls. Even then 256 KiB might be hard to achieve without affecting compression ratio much. You may need to use mode=fast since mode=normal uses slightly more memory. Maybe that is OK for you since you are looking for fast compression anyway. If you have trouble reading the code, see also XZ for Java, which I think is currently the most readable version. (I'm not suggesting that you should use the Java code, I just mean that it might help understanding liblzma.) liblzma should be made more readable too. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] random-access reading and the "--block-size" option
On 2012-11-14 Jack Duston wrote: > Given time involved in compressing and the quantity of data, I am > hesitant to use the 5.1.2alpha code. > I am paying attention when your web page says it should be considered > unstable! It is good to be cautious with unstable releases. There are people using 5.1.2alpha and I haven't got bug reports. I'm not aware of any data corruption bugs. So in this particular case I think it's not too dangerous to use the development version. You get threading in addition to the --block-size option. Another option is to use for example pixz: https://github.com/vasi/pixz XZ Utils 5.1.2alpha and pixz both can create a single .xz stream that contains many blocks, and thus make random access reading possible. I haven't used truly pixz myself so I cannot say anything else about it. > I see in the Release Notes that the "--block-size" option was added > to the April, 2011 alpha release, and we are fast approaching 2013. > I don't know how complex the code change is, or if it goes against > your release policy, but would you consider back-porting the > "--block-size" option to a 5.0.5 Stable Release? I surely can't be > the only one who would love to make use of the option. I don't like to add any new features in a stable branch. I'm doing this to (hopefully) make it easier for downstream distributions to include bug fixes in stable distributions where the distro maintainers want only bug/security fixes to minimize the risk of new bugs. Adding --block-size isn't a huge patch, but in this particular case I think it should be safe to try 5.1.2alpha. > My ultimate end is to incorporate the XZ library or Embedded into our > application to search and read the compressed files directly. > In any case, thanks again for all the work you've put into xz, I will > be compressing with your utility either way. Random access can be done with liblzma, but the provided APIs are too low level to make it nice to use. See src/xz/list.c in XZ Utils what kind of things you need to do. There is an old plan to have a file I/O library that makes things easy for the most common use cases. I even started writing it long ago but didn't get very far. There is random access support in XZ for Java, but I guess it doesn't help you. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] [PATCH/RFC] xzless: Make "less -V" parsing more robust
On 2012-11-19 Jonathan Nieder wrote: > In v4.999.9beta~30 (xzless: Support compressed standard input, > 2009-08-09), xzless learned to parse ‘less -V’ output to figure out > whether less is new enough to handle $LESSOPEN settings starting > with “|-”. That worked well for a while, but the version string from > ‘less’ versions 448 (June, 2012) is misparsed, producing a warning: [...] Thanks for the patch. I have committed it as is. > The implementation uses "awk" for simplicity. Hopefully that’s > portable enough. I guess it is. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] [PATCH] Add manifest attributes required by OSGi
On 2013-01-04 Mikolaj Izdebski wrote: > For xz-java to be usable as an OSGi bundle certain attributes > required by the OSGi specification need to be present in the > manifest. K. Daniel visited #tukaani on Thursday but I got online five minutes too late so I couldn't reply. Then I got an email from Stefan Bodewig (Apache Commons Compress developer) on the same day about the same subject. The next days I was busy, so I'm sorry that I didn't reply earlier. > The above patch was applied[1] in Fedora GNU/Linux distribution and it > was tested by Fedora developers. It would be nice if OSGi manifests > were included in upstream xz-java too. In addition to Bundle-SymbolicName, Bundle-Version, and Export-Package, Stefan Bodewig suggested adding Bundle-ManifestVersion, Bundle-Name, and Bundle-DocURL. I don't have much clue about any OSGi stuff, but I checked the OSGi wiki and these sounded reasonable. > [1] > http://pkgs.fedoraproject.org/cgit/xz-java.git/commit/?id=cd63efa72e4150b6303f995e829b31465bfcd6e9 Note that the committed patch hardcodes the bundle version number while the patch you included in your email doesn't. On the other hand, I think it is OK hardcode "org.tukaani.xz" in build.xml. I committed a patch: http://git.tukaani.org/?p=xz-java.git;a=commitdiff;h=101303b7e10e9618a83a06b1bfccd0b87e33acd6 Please let me know if you think it has a problem. Maybe I should make a new release some day to include this and xz-x.x-sources.jar fixes even though no actual Java code has been changed after 1.1. Thanks! -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] [PATCH] Add manifest attributes required by OSGi
I got an email about a small optimization and I want to include it in the next release. I will make a new release once the reporter has confirmed that the patch makes a difference. If anyone is interested, I committed the patch already: http://git.tukaani.org/?p=xz-java.git;a=commitdiff;h=ec224582e44776a53874346dfe703dbbfcc6bd15 -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
[xz-devel] XZ for Java 1.2
XZ for Java 1.2 is available at <http://tukaani.org/xz/java.html> and in the Maven Central (groupId = org.tukaani, artifactId = xz). Here is an extract from the NEWS file: * Use fields instead of reallocating frequently-needed temporary objects in the LZMA encoder. * Fix the contents of xz-${version}-sources.jar. * Add OSGi attributes to xz.jar. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] [PATCH] xzless: There is no need to call awk for this.
Thanks. Committed. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Re: cache-aware match finder for blocks of 2**17 bytes
On 2013-03-18 John Reiser wrote: > I've got my specialized match-finder working (minimize cache misses > while processing blocks of size 2**17.) Now I find that it is slow, > mainly because it finds *all* matches, even those that "obviously" > are not good candidates for encoding. > > It seems to me that there are two missing parameters to the match > finder: > > 1) the current four offsets which have very low encoding cost (many > bits less than other nearby offsets) As the first step you could add some hack to pass that information to the match finder. In the fast mode (lzma_encoder_optimum_fast.c) this should be relatively straightforward. In the normal mode (_normal.c) one needs to modify the code more to update the reps in the opts array earlier, but the code is ugly. It should be cleaned up some day (e.g. in XZ for Java the equivalent code is much nicer). So for now it may be better to test with the fast mode code only. > 2) for each of the four match lengths 2,3,4, and 5: the maximum offset >that yields a savings when encoded, but ignoring the possibility of >special savings due to using one of the four most-recent offsets. >For instance, gzip won't even consider any offset greater than 4096 >("TOO_FAR") for gzip's minimum match length of 3. I don't have a complete answer for that at the moment. In fast mode, distances longer than 128 bytes are ignored when length is 2 bytes. In other cases it's more complicated. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Re: cache-aware match finder for blocks of 2**17 bytes
I'm sorry that I'm so at replying. On 2013-03-29 John Reiser wrote: > Below is a summary of encoding costs that I gleaned by inspecting the > code. Notable: > A match of length 3 is no shorter than 3 literal bytes when > 64K<=offset. A match of length 1 at rep0==offset is an important > special case. Your calculations look correct to me. They are useful as is when in fast mode, but in normal mode it's not so simple. In the normal mode one shouldn't make so simplified assumptions about the costs. > LZMA encoding costs (in bits) before RangeEncoder (after > RangeDecoder.) RangeEncoder often reduces the cost in bits, but it > depends on history and is difficult to compute. [On average is it a > small constant factor?] A key thing in the normal mode (compared to the fast mode) is that the algorithm takes into account the costs after the range encoding (the code uses the term "price"). Up to 4 KiB of uncompressed data is analyzed at a time. The cheapest combination of LZMA symbols to represent the analyzed range as a whole is chosen. To speed it up, some things are cached in lookup tables that are updated only now and then. This means that the calculation isn't always done with the exact prices as the real prices drift away from the cached values. The price of a symbol depends on the alignment in the uncompressed data via pos_state (position bits (pb) setting). Alignment also affects literal encoding via literal position bits (lp), but that is usually zero to indicate one-byte alignment. The price of a symbol also depends on the previous two or three LZMA symbols(*) via the "state" variable. For example, if the situation "the previous symbols were a normal match and a literal, and the current position % pos_state == 3" has occurred several times earlier and in most cases the next symbol has been for example a repeated match, encoding a repeated match in such a situation has become a little bit cheaper than it would be in the base state. This may make the encoder choose, for example, a shorter repeated match over a longer normal match in the same situation in the future. (Not a great example but you get the idea, I hope.) (*) lzma_common.h seems to talk about events, but later I've switched to using the term "LZMA symbol" or plain "symbol" to mean a literal, normal match, or repeated match. There are variables named "symbol" too, but I don't speak about those in this email. Some code cleanup would be good to do. :-| If you want to read the normal mode code, see LZMAEncoderNormal.java and other files in XZ for Java. Those are way more readable than the equivalent code currently in XZ Utils even if you had never seen Java code before. Your original question was about what kind of "obviously bad" matches a match finder could throw away. The results you calculated might be useful for the fast mode, but using those for the normal mode may harm the compression ratio. Maybe with some safety margins it would work well, but that is purely a guess. You could try different values or you could even analyze what kind of symbol combinations the encoder currently creates. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] xzgrep and '-h' option
On 2013-04-03 Pavel Raiskup wrote: > Hi all, would you please consider the following patch? It is adding > support for the '-h' grep option into xzgrep also. The author is Jeff > Bastian. Thanks. Committed. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] [PATCH] doc: man and --help/--long-help sync
On 2013-04-09 Pavel Raiskup wrote: > * src/xz/message.c (message_help): Cover --uncompress, --to-stdout and > mention that --memory is alias for --memlimit. No thanks for these reasons: - Those spellings probably shouldn't have been supported in the first place. - The --help text should be as short as reasonably possible. Documenting --uncompress and --to-stdout would add two new lines. - Listing just one spelling in the easiest-to-find location hopefully encourages people to use only that spelling. - People should be able to find the alternative spellings from the manual if they find a script that uses those options and cannot otherwise guess the meaning of those options. > * src/xz/xz.1: Mention obsoleted --memory option. If --memory is marked as an old/alternative spelling, then --to-stdout and --uncompress should be too. I'm not sure if such a change would clarify things much. > * src/xzdec/xzdec.c: Mention --to-stdout and --uncompress option in > help, better describe why options are ignored and move all ignored > options to the end of whole list. I like to keep the order similar to xz --help instead of putting the ignored options to the end. Now I see that the order didn't match xz --help or xzdec's man page, so I've fixed it. I added descriptions but left them in parenthesis even if it looks a bit silly. This way I find it easier to distinguish the ignored vs. non-ignored options when skimming the list. Thanks! -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
[xz-devel] XZ for Java 1.3
XZ for Java 1.3 is available at <http://tukaani.org/xz/java.html> and in the Maven Central (groupId = org.tukaani, artifactId = xz). Here is an extract from the NEWS file: * Fix a data corruption bug when flushing the LZMA2 encoder or when using a preset dictionary. * Make information about the XZ Block positions and sizes available in SeekableXZInputStream by adding the following public functions: - int getStreamCount() - int getBlockCount() - long getBlockPos(int blockNumber) - long getBlockSize(int blockNumber) - long getBlockCompPos(int blockNumber) - long getBlockCompSize(int blockNumber) - int getBlockCheckType(int blockNumber) - int getBlockNumber(long pos) - void seekToBlock(int blockNumber) * Minor improvements to javadoc comments were made. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Random access to xz files
y return less than "size" without hitting end of file or error. I don't know if Linux makes extra guarantees over POSIX when reading from a regular file, but even if it does, I still wouldn't rely on it. After those small things I think it should have a good chance to work once you add code to decompress the requested part of the block into xzfile_pread(). While that is some work still, don't get discouraged now: you have the messiest parts mostly done already. Obviously what you are doing should have been abstracted into nice file I/O library long ago but so far that doesn't exist. > is this stuff documented anywhere? The documentation is poor. The API headers have reference-like docs but so far there only are example programs for the most basic compression and decompression, so there are no examples about random access. (I don't count list.c in xz sources as an example program.) The liblzma APIs for random access are low level and thus require a lot of code to use. One also needs to understand the .xz file format structure. A reason for so low-level APIs is that liblzma takes its input and gives its output via buffers provided by the application. Callback functions or file I/O functions aren't used. My idea was and still is to have a separate file I/O library that would handle not only .xz files but also uncompressed, .gz, and .bz2 files. There is some old pre-pre-alpha code in libxzfile.git on git.tukaani.org, but in its current state it's not interesting since it's so incomplete and there's almost no compression related code yet. It is a bit backwards that right now, compared to XZ Utils, XZ for Java has much cleaner code, better docs, and an *easy*-to-use random-access decompressor class. On the other hand XZ for Java works on streams instead of passing *both* input and output via caller-provided buffers. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Random access to xz files
On 2013-06-24 Richard W.M. Jones wrote: > I have now completed a simple random-access XZ NBD server plugin: > > https://github.com/libguestfs/nbdkit/tree/master/plugins/xz > > which may be of interest. It works enough that I can read out some > xz-compressed Windows guest disks, which is a fairly good test. Nice that you got it working. Below are a few more things that I noticed that you may or may not find interesting. "xz --block-size=SIZE" is available only in 5.1.x branch which is still officially in alpha stage. I don't know if the required xz version is worth mentioning in the docs. A modified version of 5.1.x is shipped in Debian and maybe some other distros (Fedora maybe) ship 5.1.x too. In practice it seems to work quite OK since I haven't got bug reports. With "xz --block-size=SIZE" SIZE can be e.g. 16MiB which is easier to type than 16777216. Most options in xz that take numbers accept such suffixes. It is documented on the man page but it's not mentioned again for each option (maybe it should be?). I noticed that both list.c and your code lack a check for the lzma_stream_flags.version field (see ). It doesn't affect anything for now since the version is always 0. But it quite probably won't always be zero in the future (e.g. if metadata support is added to .xz) and then the current code may misbehave instead of giving a clear error message. Here's what I added to list.c: http://git.tukaani.org/?p=xz.git;a=commitdiff;h=84d2da6c9dc252f441deb7626c2522202b005d4d xzfile.c lines 467-488: - Changing the action argument to LZMA_FINISH from LZMA_RUN when no more input is coming is fine but when decompressing blocks it is not required. It is fine to only use LZMA_RUN if you want. - After successful decoding, lzma_code() must have returned LZMA_STREAM_END. On line 488 the code accepts also LZMA_OK, which looks suspicious. In practice there is no bug because the "while" condition on line 486 ensures that the value cannot be LZMA_OK, making the check for LZMA_OK on the line 488 a no-op. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] xz-utils streaming patch
On 2013-06-26 Alexander Clouter wrote: > Attached is a patch that enables 'streaming' support for xz output, > in short LZMA_SYNC_FLUSH is called every X milliseconds. I like the idea. The patch uses LZMA_SYNC_FLUSH after every X milliseconds even if all read() calls are able to fill the buffer without blocking. A possible alternative could be to flush when at least X milliseconds have passed and read() gives EAGAIN. That is, don't flush as long as input is coming faster than xz can compress it. I don't know if this is a good or bad idea. It might mean much higher latency especially in threaded mode (which doesn't support LZMA_SYNC_FLUSH yet). A few other thoughts: The timeout must be disabled when --list (MODE_LIST) is used. gettimeofday() shouldn't fail as long as the first argument is sane and the second argument is NULL, so there's no need to test the return value (I hope). It could be good to use clock_gettime(CLOCK_MONOTONIC, ...) when it is available. It makes a difference if the system time jumps for some reason. The threading code in liblzma uses it already so it's not a new dependency. Currently message.c uses gettimeofday() and that could use clock_gettime() too. If select() gives EINTR, there should be a test for user_abort. Otherwise if there is no input, xz won't react to SIGINT until the timeout has expired. I noticed that there is a race condition in signal handling in the existing xz code. If e.g. SIGINT is sent after the value of user_abort has been checked but before a blocking read() or write(), the read/write will block and another signal is needed to make xz notice that user_abort has been set. This affects the same code as your patch so I think this should be fixed first. Could signals be a good way to set a flag when to flush? It would allow triggering flushing from another process. xz already supports SIGUSR1/SIGINFO for to show progress info if --verbose wasn't used. A possible problem is how to raise such signals within xz. timer_create() and friends look nice but after checking a few OSes I think they aren't portable enough. setitimer() could be more portable but in practice it would mean using SIGALRM. Currently xz uses alarm() for the progress indicator. Creating a thread solely for sending timer signals should work, but I'm not sure I like that. Maybe just polling the time like your patch does is the way to go. > The patch is for 5.0.0 (what is currently in Debian > 'oldstable/squeeze') but if the community likes the look of the > patch, I can roll a version for whatever is at the HEAD of the git > tree. It won't apply directly because there's new code that uses LZMA_FULL_FLUSH. But let's not worry about it until I have fixed the race condition with signals and user_abort. It may need select() or poll(), which may then be used to implement flushing too. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
[xz-devel] XZ Utils 5.0.5
XZ Utils 5.0.5 is available at <http://tukaani.org/xz/>. Here is an extract from the NEWS file: * lzmadec and liblzma's lzma_alone_decoder(): Support decompressing .lzma files that have less common settings in the headers (dictionary size other than 2^n or 2^n + 2^(n-1), or uncompressed size greater than 256 GiB). The limitations existed to avoid false positives when detecting .lzma files. The lc + lp <= 4 limitation still remains since liblzma's LZMA decoder has that limitation. NOTE: xz's .lzma support or liblzma's lzma_auto_decoder() are NOT affected by this change. They still consider uncommon .lzma headers as not being in the .lzma format. Changing this would give way too many false positives. * xz: - Interaction of preset and custom filter chain options was made less illogical. This affects only certain less typical uses cases so few people are expected to notice this change. Now when a custom filter chain option (e.g. --lzma2) is specified, all preset options (-0 ... -9, -e) earlier are on the command line are completely forgotten. Similarly, when a preset option is specified, all custom filter chain options earlier on the command line are completely forgotten. Example 1: "xz -9 --lzma2=preset=5 -e" is equivalent to "xz -e" which is equivalent to "xz -6e". Earlier -e didn't put xz back into preset mode and thus the example command was equivalent to "xz --lzma2=preset=5". Example 2: "xz -9e --lzma2=preset=5 -7" is equivalent to "xz -7". Earlier a custom filter chain option didn't make xz forget the -e option so the example was equivalent to "xz -7e". - Fixes and improvements to error handling. - Various fixes to the man page. * xzless: Fixed to work with "less" versions 448 and later. * xzgrep: Made -h an alias for --no-filename. * Include the previously missing debug/translation.bash which can be useful for translators. * Include a build script for Mac OS X. This has been in the Git repository since 2010 but due to a mistake in Makefile.am the script hasn't been included in a release tarball before. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] xz-utils streaming patch
On 2013-06-26 Alexander Clouter wrote: > Attached is a patch that enables 'streaming' support for xz output, > in short LZMA_SYNC_FLUSH is called every X milliseconds. There is now this kind of feature in the git repostory that can be tested. I named the option --flush-timeout=TIMEOUT where the timeout is in milliseconds. In contrast to your patch, the committed code calls read() as long as read() can fill the buffer completely. poll() is only called when read() would block and only then is the flush-timeout checked. Thus, the system time isn't polled with clock_gettime() or gettimeofday() on every call to io_read() when the flush-timeout is active. The --long-help or the man page haven't been updated yet. It is possible that this feature isn't in its final form yet. > We find it > helpful so that we can effectively do: > > tail -f foobar.log.xz | nc w.x.y.z 1234 > > > Meanwhile foobar.log.xz is effectively being generated with: > > tail -f foobar.log | xz -c --select-timeout 500 > foobar.log.xz > > > This means the receiver then gets something that is decodeable in X > milliseconds rather than having to wait for a whole block to be > generated and flushed, which might be a considerable time if whatever > is writing to foobar.log is low volume (100 bytes per second for > example). For now, xz cannot be used for the decompression side because xz does too much buffering. It is similar with XZ for Java unless one reads one byte at a time. xz should naturally be usable for the decompression side too. I haven't decided yet how to fix it (e.g. require an option or perhaps always disable buffering). -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
[xz-devel] XZ for Java 1.4
XZ for Java 1.4 is available at <http://tukaani.org/xz/java.html> and in the Maven Central (groupId = org.tukaani, artifactId = xz). Here is an extract from the NEWS file: * Add LZMAInputStream for decoding .lzma files and raw LZMA streams. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
[xz-devel] XZ Utils 5.1.3alpha
XZ Utils 5.1.3alpha is available at <http://tukaani.org/xz/>. Here is an extract from the NEWS file: * All fixes from 5.0.5 * liblzma: - Fixed a deadlock in the threaded encoder. - Made the uses of lzma_allocator const correct. - Added lzma_block_uncomp_encode() to create uncompressed .xz Blocks using LZMA2 uncompressed chunks. - Added support for native threads on Windows and the ability to detect the number of CPU cores. * xz: - Fixed a race condition in the signal handling. It was possible that e.g. the first SIGINT didn't make xz exit if reading or writing blocked and one had bad luck. The fix is non-trivial, so as of writing it is unknown if it will be backported to the v5.0 branch. - Made the progress indicator work correctly in threaded mode. - Threaded encoder now works together with --block-list=SIZES. - Added preliminary support for --flush-timeout=TIMEOUT. It can be useful for (somewhat) real-time streaming. For now the decompression side has to be done with something else than the xz tool due to how xz does buffering, but this should be fixed. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Parallel xzcat
On 2013-10-21 Richard W.M. Jones wrote: > Here is a parallel implementation of xzcat: > > http://git.annexia.org/?p=pxzcat.git;a=tree > > Some test results: > > 4 cores: xzcat: 23.8 s pxzcat: 8.1 s speed up: 2.9 > 8 cores: xzcat: 26.8 s pxzcat: 10.5 s speed up: 2.55 > > I just wrote this as a quick hack in a couple of hours, so while it > may be of interest it's not a long term solution. (It would be better > to get the xzcat -T flag working). Sounds nice! Threaded decoding should be included in liblzma, but it will need to wait past 5.2.0. In liblzma it will work for streamed decompression, but it also means using quite a bit of memory. > (2) I have not tested it with multi-stream files, but it should work > with them. I tested two-stream files without and with stream padding and neither did work with pxzcat. Commands to create the files: echo foobar | xz --block-size=3 > test1.xz echo bazqux | xz --block-size=4 >> test1.xz echo foobar | xz --block-size=3 > test2.xz dd if=/dev/zero bs=100 count=1 >> test2.xz echo bazqux | xz --block-size=4 >> test2.xz I didn't investigate why it doesn't work, sorry. > Notes on performance: > > - Scalability is not too bad on my laptop (4 core machine above) but > much worse on a theoretically higher performing machine with SSDs (8 > core machine above). I don't really understand why that is. A few wild guesses: - Eight cores or threads (hyperthreading)? - If all cores share the same L3 cache and memory controller, maybe memory access becomes a bottle neck. - Maybe scattered I/O has something to do with it. Testing with the write calls commented out might give some hints. > - For reasons I don't understand, both regular xzcat and pxzcat cause > the output file to be flushed to disk after the program exits. This > causes any program which consumes the output of the file to slow down. I have no idea. I see you committed something that seems to be related to this after your email. With a quick reading I don't understand it well, it seems to be working around some issue with ftruncate() with ext4. xz doesn't use ftruncate() though so if xz has a problem, it cannot be ftruncate(). If sparseness is the problem, test --no-sparse, although with very sparse files it creates a different performance problem, of course. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Creating an archive without timestamps
On 2013-11-10 Ernestas Lukoševičius wrote: > How do I create an XZ compressed archive, which could be compared by > md5? > > Right now, running "tar cJfp" creates a tarball, gives it a timestamp > and the rest is history, because the timestamp is always different > and I cannot compare such an archive... Is it possible to avoid it? > > gzip has -n, what about XZ and it's implementation on Tar? GNU gzip has -n because by default gzip saves timestamp and other metadata to the .gz header. xz doesn't do such things. In fact xz doesn't even support metadata for now (it probably will in the future, but it won't use it by default). Probably your tar implementation creates a different .tar file on each run. E.g. GNU tar 1.27 seems to do this if using --format=pax. With --format=ustar the output doesn't vary. It is also good to keep in mind that future xz versions might create different output with the same command line options e.g. if the compression engine is updated. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Parallel xzcat
On 2013-10-27 Richard W.M. Jones wrote: > On Sat, Oct 26, 2013 at 08:06:45PM +0300, Lasse Collin wrote: > > > - For reasons I don't understand, both regular xzcat and pxzcat > > > cause the output file to be flushed to disk after the program > > > exits. This causes any program which consumes the output of the > > > file to slow down. > > > > I have no idea. I see you committed something that seems to be > > related to this after your email. With a quick reading I don't > > understand it well, it seems to be working around some issue with > > ftruncate() with ext4. > > That's right. It turned out to be a misfeature in ext4: if you > truncate a file from a non-zero size down to a zero size, ext4 flags > the file and flushes it on close. ("Truncate" includes both O_TRUNC > and ftruncate). This can be disabled with the noauto_da_alloc mount > option, but of course that is not the default. If I remember correctly, that "misfeature" was added because there are too many programs that write config files by overwriting the old files with O_TRUNC. By flushing quickly ext4 tries to avoid zero-length config files after a crash or a power failure. With xz there's nothing I can do to work around the flushes (and I'm not sure if I wanted do anything for this even if I could). The truncation is done by the shell when one redirects the output from xz to a file, so to avoid the flushes one would need to patch the shell. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] xz: Make --block-list and --block-size work together in
On 2013-11-02 James M Leddy wrote: > This makes --block-list and --block-size work together in > single-thread mode, as per the FIXME Thanks and sorry for a slow reply. The patch looks very good. I will commit it in a day or two. > I've verified this works by testing with --block-size=3000 > --block-list=1024,2048,4096 as well as stepping through the block > decoder in the debugger. xz -lv or xz -lvv is useful for checking block sizes. > For some reason, the single threaded mode still yields smaller > files. I'm looking into that. It is because in single-threaded mode the encoder doesn't store the block size information into block headers. In multi-threaded mode each encoded block is fully buffered in RAM and as the last step the block header is written. Currently there is no option to disable this. The header info will allow streamed multi-threaded decompression some day. Single-threaded mode doesn't buffer the blocks so it cannot write the block header after finishing a block. I think this difference will remain in 5.2.0. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] xz: Make --block-list and --block-size work together in
On 2013-11-02 James M Leddy wrote: > This makes --block-list and --block-size work together in > single-thread mode, as per the FIXME I have committed the patch. I made a few minor edits but hopefully I didn't break anything. Thanks again. The man page was updated too. I added a note about the difference in output in single-threaded vs. multi-threaded mode to both --block-size and --block-list. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] LZMA documentation
On 2013-12-16 Kevin Ingwersen wrote: > But then I was kinda surprised to not find any LZMA documentation, > although a lzma.h file is installed into the sytem’s default include > path. > > I also couldn’t find any link on the offical xz utils site. So if > anyone could link me to the correct place with the documentation, > that’d be nice. The API docs are in the header files. See at least these: $prefix/include/lzma/base.h $prefix/include/lzma/container.h Those docs alone aren't nice when learning the basics. There are a few example programs (more would be needed though) in $prefix/share/doc/xz/examples which work as a kind of a tutorial. It helps a little if you already are familiar with zlib's API. If you cannot find the example programs on your system, download the source package. The examples are in doc/examples and the API headers with the API docs in src/liblzma/api. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Inserting Compressed Data Into Compressed File
On 2014-02-02 Brandon Fergerson wrote: > With that being said I've been trying to find out how to write > compressed data to a compressed file. My game's map structure is > formatted in a way such that a collection of tiles is grouped into a > block. So each XZ Block contains a certain amount of tiles. What I > would like to do is when a player changes tiles of the map the map > would find the XZ Block that the modified tile is in and rewrite that > entire compressed block but with the new compressed data. The new compressed XZ Block might be bigger than the old one because the new data might be less compressible. Then the new XZ Block cannot fit in the place of the old one and one has to rewrite the rest of the file to make space for the new bigger XZ Block. This is slow if the file is big and it's not really random access in practice. > Is there any reason a SeekableXZOutputStream was not made or needed? XZ isn't suitable for random access writing. The data is required to be in sequential order and decompressible in streamed mode. Good random access writing would probably allow the data be in non-sequential order and thus break the streamability requirement. > So my question is how possible is this and hard would it be? Are > there indexes of the locations of the XZ Blocks somewhere that would > have to be updated? The XZ Index is near the end of the file. It stores the compressed and uncompressed sizes of the XZ Blocks. But this doesn't help you due to reasons explained above. The simplest solution for you could be to write each group of tiles into a separate compressed file. Then you can overwrite the file when the tiles in it have been updated. A downside of this is that you may end up creating even thousands of files and, depending on the file system, things can slow down. There are several ways to keep the tiles in a single file. For example, you could put a fixed-size index to the beginning of the file and the compressed tile groups after the index. The index would be small enough to keep in RAM when the game is running. To update a tile group, find a big enough unused hole (you need a way to track unused space) to store the new data or if there is no big enough hole, append the data at the end of the file. There may be better ways to do it and I wouldn't be surprised if there were good libraries to do this kind of things (I don't know any but I haven't searched either). You don't need a full file system, something relatively simple should be enough. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Inserting Compressed Data Into Compressed File
On 2014-02-03 Brandon Fergerson wrote: > I realize the new XZ Block would be bigger which is why I'm looking > to insert the data instead of overwrite it. I imagined something > like: find block, insert new compressed data over block (while > pushing down the blocks below), and then update indices to reflect > changes. Or would this not be efficient? Unfortunately most file systems don't support inserting new data in the middle of a file (not even as multiples of the file system block size) so "pushing down" means rewriting everything after that file offset. If the file is big, it gets slow. If the file is small, the question doesn't matter much since you could rewrite the whole file anyway. > I suppose my problem is that I don't really understand what it is I'm > looking for. I knew I wanted the maps to be compressed and I knew I > wanted them to support random access. What else should I be looking > for? I fully agree with Alexander's advice. First write something simple that works even if it wastes a few hundred megabytes of disk space per map. Make the map available with getTile(int x, int y) or something like that. When it and other major parts of the game work, you can replace the map implementation without affecting the rest of the game code. From your first post I guess you are using Java, and with my limited Java experience I cannot say if there's good enough mmap() equivalent, but it should be very easy for you to write a class that writes the whole map into a single uncompressed file, each tile taking a fixed amount of space. Once you have that working, you could try a multi-file approach and write for example 16x16 tiles per file, again each tile taking a fixed amount of space. Naturally you need to name each file so that you know which file contains which tiles. A new empty map can consist of no files: any missing file is considered to be full of empty tiles. That way small maps don't take much space. Later you can add compression easily by compressing each file separately. Don't worry about a single-file compressed map format for now. Feel free to ask if you got more questions, but I hope you can at least get started now. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] [java] assert in SimpleInputStream's constructor
On 2014-02-28 Stefan Bodewig wrote: > and SimpleInputStream does > > SimpleInputStream(InputStream in, SimpleFilter simpleFilter) { > ... > assert simpleFilter == null; > > which is obviously wrong. I think != is intended (and matches the > comment right in front of the assert). Thanks! The fix is now in the git repository. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Solaris packages (done) and C99 code removal
On 2014-03-02 Mark Ashley wrote: > I've compiled up xz 5.0.4 on the following machines: Was there any reason to avoid 5.0.5? > The Solaris 7 was more problematic, the C99 support is very minimal > in Sun Studio 8. I haven't tried myself but at least the Sun Studio 8 manual lists quite a bit of C99 support (see the last link; I included the link chain because the last page doesn't mention the Sun Studio version): http://docs.oracle.com/cd/E19059-01/stud.8/index.html -> Sun Studio 8: C User's Guide http://docs.oracle.com/cd/E19059-01/stud.8/817-5064/index.html -> D. Supported Features of C99 http://docs.oracle.com/cd/E19059-01/stud.8/817-5064/c99.app.html According to the manual there shouldn't be too much trouble with the compiler; at least one shouldn't need to make it C89. The C library is another question, for example, snprintf() in Solaris 7 is pre-C99, I think, but let's focus on the compiler first. Seems that Sun Studio 8 should be in C99 mode by default. Just in case, you could try to force it to C99 mode: ./configure CC="cc -xc99" If configure still fails, try what the section 4.1 in INSTALL suggests: ./configure CC="cc -xc99" ac_cv_prog_cc_c99= Or without -xc99: ./configure CC=cc ac_cv_prog_cc_c99= Maybe you already tried all these and they didn't help. In that case I'd like to know a little more about the problem. If the Sun Studio 8 manual is simply wrong about C99 support, that alone is useful information. If Sun Studio 8 really cannot be made to work, a recent enough GCC should be available for Solaris 7. I don't know if that is an acceptable solution to you. > I took out the C99 specific code in the xz source > tree, making it C89 friendly (and thus portable to a lot more > compilers - you should do this to the main code base IMHO). See the > attached diff. I didn't do this in the test/* files. So far the list of non-C99 compilers are GCC 2.95.3 (released in 2001) and Microsoft Visual C. The ancient GCC naturally won't get any C99 support but MSVC 2013 is getting close. Unfortunately with MSVC there's still one really stupid MSVC bug that prevents it from compiling liblzma from XZ Utils' git repository. Of course there might be other compilers that people would like to use to compile XZ Utils but which don't support enough C99, but I don't remember hearing any C99-complaints about compilers other than the two I mentioned. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
[xz-devel] XZ for Java 1.5
XZ for Java 1.5 is available at <http://tukaani.org/xz/java.html> and in the Maven Central (groupId = org.tukaani, artifactId = xz). Here is an extract from the NEWS file: * Fix a wrong assertion in BCJ decoders. * Use a field instead of reallocating a temporary one-byte buffer in read() and write() implementations in several classes. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] marking version 5.1 stable?
On 2014-05-23 Pavel Raiskup wrote: > Looking at the http://tukaani.org/xz/ page for some time, I am curious > whether we could "stabilize" the version 5.1. Almost all > distributions are shipping alpha/beta versions of xz* packages which > is probably not what especially library users want. Yes, the current situation isn't good. > What are plans on this topic? I checked the TODO file and didn't find > what exactly we need to fix to mark xz 5.1 stable. For the past year or more, the plan has been to just get the 5.2.0 out. It's so horribly late that I don't plan to do anything except fix bugs and possibly some do simple enhancements. The rest must wait past 5.2.0. Somehow months just pass and I get little done (with xz or anything else). Anyway, here are some things that I plan to do before 5.2.0: * Skim through some of the new code in case I can spot problems that should be fixed before 5.2.0. * Ensure that the new APIs look OK for long-term support (I like to keep API & ABI stable). * Check that the new xz features are correctly documented on the xz man page. * Once I'm sure I won't change any message strings, I need to ask for updated translations from the translators. The test suite is very poor but considering that how few bug reports I've got about the alpha versions, I guess the important features work well enough. liblzma API is another question though: I guess no one has used the threaded encoder API yet because in 5.1.3alpha the preset support was completely broken and I only noticed it when writing an example program for that. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] xzgrep should success if at least one file matches
On 2014-06-11 Pavel Raiskup wrote: > Hi, in RHBZ, there was reported problem with xzgrep, we should exit 0 > when at lest one file contains matching string. Grep behaves > similarly. > > Original bugreport: > https://bugzilla.redhat.com/show_bug.cgi?id=1108085 Thanks. I fixed a typo a in a comment in xzgrep (>=2 instead of >2) and simplified the test quite a bit. The original test didn't work for out-of-tree builds and there was a typo (exho). The new test doesn't test as much but I didn't quickly see a good fix for it. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] xzgrep should success if at least one file matches
On 2014-06-11 Pavel Raiskup wrote: > Btw., I am just curious, what is the reason for '(exit X)' statements > in the test_scripts.sh file? Apart from that it sets the > "last-command" exit status -- '$?', which is empty operation for us > anyway, I don't see reason. I followed that style but I doubt that > it is necessary. The Autoconf manual has a few examples where such a construct is needed to workaround differences and bugs in shells: info --i exit autoconf info --i trap autoconf info '(autoconf)Shell Functions' However, it sounds that the situations mentioned in the manual don't apply here. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] xzgrep should success if at least one file matches
On 2014-06-12 Pavel Raiskup wrote: > Just a note - I wanted the exact output to be > compared (not just check the exit value), that is the reason why I > test the '-h/-H/-l' options because that could reveal similar bugs > like that which was fixed e.g. by commits bd5002f5 or 40277998. But > that really needs not-so naive testsuite. Would you be interested in > autotest solution? Maybe in the future, but maybe not before 5.2.0. I should get familiar with the new things too so that I know what I'm maintaining. I committed something simple to get the result you wanted, I hope. I used cmp -s (it's in SUSv2) instead of diff -u because diff -u isn't in POSIX before POSIX-1.2008. Maybe I should have used plain diff, but it's not much extra to type if the test happens to fail, which should be rare. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Disabling CRC/SHA-256 checks on decompression
On 2014-07-31 Florian Weimer wrote: > Would it be possible to add a flag to disable these checks during > decompression? I think so. I will look at this relatively soon since it shouldn't be hard to implement. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Disabling CRC/SHA-256 checks on decompression
On 2014-07-31 Florian Weimer wrote: > Would it be possible to add a flag to disable these checks during > decompression? This feature is available in xz.git now. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Disabling CRC/SHA-256 checks on decompression
On 2014-08-05 Florian Weimer wrote: > Could you add something similar to the xz-java as well? Probably. I try to look at it next week. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Re: [xz-devel] Disabling CRC/SHA-256 checks on decompression
On 2014-08-05 Florian Weimer wrote: > Could you add something similar to the xz-java as well? Done. I don't have any plans about a new release of XZ for Java yet, but if one is needed for this feature, let me know and I'll do it next week. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
[xz-devel] XZ Utils 5.0.6 and 5.1.4beta
XZ Utils 5.0.6 and 5.1.4beta are available at <http://tukaani.org/xz/>. Here is an extract from the NEWS file: 5.0.6 (2014-09-14) * xzgrep now exits with status 0 if at least one file matched. * A few minor portability and build system fixes 5.1.4beta (2014-09-14) * All fixes from 5.0.6 * liblzma: Fixed the use of presets in threaded encoder initialization. * xz --block-list and --block-size can now be used together in single-threaded mode. Previously the combination only worked in multi-threaded mode. * Added support for LZMA_IGNORE_CHECK to liblzma and made it available in xz as --ignore-check. * liblzma speed optimizations: - Initialization of a new LZMA1 or LZMA2 encoder has been optimized. (The speed of reinitializing an already-allocated encoder isn't affected.) This helps when compressing many small buffers with lzma_stream_buffer_encode() and other similar situations where an already-allocated encoder state isn't reused. This speed-up is visible in xz too if one compresses many small files one at a time instead running xz once and giving all files as command-line arguments. - Buffer comparisons are now much faster when unaligned access is allowed (configured with --enable-unaligned-access). This speeds up encoding significantly. There is arch-specific code for 32-bit and 64-bit x86 (32-bit needs SSE2 for the best results and there's no run-time CPU detection for now). For other archs there is only generic code which probably isn't as optimal as arch-specific solutions could be. - A few speed optimizations were made to the SHA-256 code. (Note that the builtin SHA-256 code isn't used on all operating systems.) * liblzma can now be built with MSVC 2013 update 2 or later using windows/config.h. * Vietnamese translation was added. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode