Re: df -h stats for same file systems display different result son AMD64 then on i386 (Source solved)
OK, Here is the source of the problem. The cache file generated by webazolver is the source of the problem. Based on the information of the software webalizer, as this: Cached DNS addresses have a TTL (time to live) of 3 days. This may be changed at compile time by editing the dns_resolv.h header file and changing the value for DNS_CACHE_TTL. The cache file is process each night, and the records older then 3 days are remove, but somehow that file become a sparse file in the process and when copy else where show it's real size. In my case that file was using a bit over 4 millions blocks more then it should have and give me the 4GB+ difference in mirroring the content. So, as far as I can see it, this process of expiring the records from the cache file that is always reuse doesn't shrink the file really, but somehow just mark the records inside the file as bad, or something like that. So, nothing to do with OpenBSD at all but I would think there is a bug in the portion of webalizer however base on what I see from it's usage. Now the source of the problem was found and many thanks to all that stick with me along the way. Always feel good to know in the end! Thanks to Otto, Ted and Tom. Daniel
Re: df -h stats for same file systems display different result son AMD64 then on i386 (Source solved)
On Tue, 17 Jan 2006, Daniel Ouellet wrote: OK, Here is the source of the problem. The cache file generated by webazolver is the source of the problem. Based on the information of the software webalizer, as this: Cached DNS addresses have a TTL (time to live) of 3 days. This may be changed at compile time by editing the dns_resolv.h header file and changing the value for DNS_CACHE_TTL. The cache file is process each night, and the records older then 3 days are remove, but somehow that file become a sparse file in the process and when copy else where show it's real size. In my case that file was using a bit over 4 millions blocks more then it should have and give me the 4GB+ difference in mirroring the content. So, as far as I can see it, this process of expiring the records from the cache file that is always reuse doesn't shrink the file really, but somehow just mark the records inside the file as bad, or something like that. So, nothing to do with OpenBSD at all but I would think there is a bug in the portion of webalizer however base on what I see from it's usage. Now the source of the problem was found and many thanks to all that stick with me along the way. You are wrong in thinking sparse files are a problem. Having sparse files quite a nifty feature, I would say. -Otto
Re: df -h stats for same file systems display different result son AMD64 then on i386 (Source solved)
On Tue, Jan 17, 2006 at 02:15:57PM +0100, Otto Moerbeek wrote: On Tue, 17 Jan 2006, Daniel Ouellet wrote: OK, Here is the source of the problem. The cache file generated by webazolver is the source of the problem. Based on the information of the software webalizer, as this: Cached DNS addresses have a TTL (time to live) of 3 days. This may be changed at compile time by editing the dns_resolv.h header file and changing the value for DNS_CACHE_TTL. The cache file is process each night, and the records older then 3 days are remove, but somehow that file become a sparse file in the process and when copy else where show it's real size. In my case that file was using a bit over 4 millions blocks more then it should have and give me the 4GB+ difference in mirroring the content. So, as far as I can see it, this process of expiring the records from the cache file that is always reuse doesn't shrink the file really, but somehow just mark the records inside the file as bad, or something like that. So, nothing to do with OpenBSD at all but I would think there is a bug in the portion of webalizer however base on what I see from it's usage. Now the source of the problem was found and many thanks to all that stick with me along the way. You are wrong in thinking sparse files are a problem. Having sparse files quite a nifty feature, I would say. Are we talking about webazolver or OpenBSD? I'd argue that relying on the OS handling sparse files this way instead of handling your own log data in an efficient way *is* a problem, as evidenced by Daniels post. After all, it's reasonable to copy data to, say, a different drive and expect it to take about as much space as the original. On the other hand, I agree with you that handling sparse files efficiently is rather neat in an OS. Joachim
Re: df -h stats for same file systems display different result son AMD64 then on i386 (Source solved)
On Tue, 17 Jan 2006, Joachim Schipper wrote: On Tue, Jan 17, 2006 at 02:15:57PM +0100, Otto Moerbeek wrote: You are wrong in thinking sparse files are a problem. Having sparse files quite a nifty feature, I would say. Are we talking about webazolver or OpenBSD? I'd argue that relying on the OS handling sparse files this way instead of handling your own log data in an efficient way *is* a problem, as evidenced by Daniels post. After all, it's reasonable to copy data to, say, a different drive and expect it to take about as much space as the original. Now that's a wrong assumption. A file is a row of bytes. The only thing I can assume is that if I write a byte at a certain position, I will get the same byte back when reading the file. Furthermoe, the file size (not the disk space used!) is the largest position written. If I assume anything more, I'm assuming too much. For an application, having sparse files is completely transparant. The application doesn't even know the difference. How the OS stores the file is up to the OS. Again, assuming a copy of a file takes up as much space as the original is wrong. On the other hand, I agree with you that handling sparse files efficiently is rather neat in an OS. -Otto
Re: df -h stats for same file systems display different result son AMD64 then on i386 (Source solved)
You are wrong in thinking sparse files are a problem. Having sparse files quite a nifty feature, I would say. Are we talking about webazolver or OpenBSD? I'd argue that relying on the OS handling sparse files this way instead of handling your own log data in an efficient way *is* a problem, as evidenced by Daniels post. After all, it's reasonable to copy data to, say, a different drive and expect it to take about as much space as the original. Just as feedback the size showed something like 150MB or so as the original file on OpenBSD. Using RSYNC to copy it over makes it almost 5GB in size, well I wouldn't call that good. But again, before I say no definitely, there is always something that I may not understands, so I am welling to leave some space for that here. But not much! (: On the other hand, I agree with you that handling sparse files efficiently is rather neat in an OS. I am not sure that the OS handle it well or not. Again, no punch intended, but if it was, why copy no data then? Obviously something I don't understand for sure. However, here is something I didn't include in my previous email with all the stats and may be very interesting to know. I didn't think it was so important at the time, but if you talk about handling it properly, may be it might be relevant. The test were done with three servers. The file showing ~150MB in size was on www1. Then copying it to www2 with the -S switch in rsync regardless got it to ~5GB. Then copying the same file from www2 to www3 using the same rsync -S setup go that file back to the size it was on www1. So, why not in the www2 in that case. So, it the the OS, or is that the rsync. Was it handle properly or wasn't it? I am not sure. If it was, then the www2 file should not have been ~5GB should it? So the picture was www1-www2-www3 www1 cache DB show 150MB rsync -e ssh -aSuqz --delete /var/www/sites/ [EMAIL PROTECTED]:/var/www/sites www2 cache DB show ~5GB rsync -e ssh -aSuqz --delete /var/www/sites/ [EMAIL PROTECTED]:/var/www/sites www3 cache DB show ~150MB Why not 150Mb on www2??? One think that I haven't tried and regret not have done that not to know is just copying that file on www1 to a different name and then copying it again to it's original name and check the size at the and and the transfer of that file as well I without the -S switch to see if the OS did copy the empty data or not. I guess the question would be, should it, or shouldn't it do it? My own opinion right now is the file should show the size it really is. So, if it is 5GB and only 100MB is good on it, shouldn't it show it to be 5GB? I don't know, better mind then me sure have the answer to this one, right now, I do not for sure.
Re: df -h stats for same file systems display different result son AMD64 then on i386 (Source solved)
On Tue, Jan 17, 2006 at 05:49:24PM +0100, Otto Moerbeek wrote: On Tue, 17 Jan 2006, Joachim Schipper wrote: On Tue, Jan 17, 2006 at 02:15:57PM +0100, Otto Moerbeek wrote: You are wrong in thinking sparse files are a problem. Having sparse files quite a nifty feature, I would say. Are we talking about webazolver or OpenBSD? I'd argue that relying on the OS handling sparse files this way instead of handling your own log data in an efficient way *is* a problem, as evidenced by Daniels post. After all, it's reasonable to copy data to, say, a different drive and expect it to take about as much space as the original. Now that's a wrong assumption. A file is a row of bytes. The only thing I can assume is that if I write a byte at a certain position, I will get the same byte back when reading the file. Furthermoe, the file size (not the disk space used!) is the largest position written. If I assume anything more, I'm assuming too much. For an application, having sparse files is completely transparant. The application doesn't even know the difference. How the OS stores the file is up to the OS. Again, assuming a copy of a file takes up as much space as the original is wrong. On the other hand, I agree with you that handling sparse files efficiently is rather neat in an OS. Okay - I understand your logic, and yes, I do know about sparse files and how they are typically handled. And yes, you are right that there are very good reasons for handling sparse files this way. And yes, application are right to make use of this feature where applicable. However, in this case, it's a simple log file, and what the application did, while very much technically correct, clearly violated the principle of least astonishment, for no real reason I can see. Sure, trying to make efficient use of every single byte may not be very efficient - but just zeroing out the first five GB of the file is more than a little hackish, and not really necessary. Joachim
Re: df -h stats for same file systems display different result son AMD64 then on i386 (Source solved)
On Tue, Jan 17, 2006 at 02:36:44PM -0500, Daniel Ouellet wrote: [...] But having a file that is let say 1MB of valid data that grow very quickly to 4 and 6GB quickly and takes time to rsync between servers were in one instance fill the fill system and create other problem. (: I wouldn't call that a feature. As Otto noted, you've distinguish between file size (that's what stat(2) and friends report, and at the same time it's the number of bytes you can read sequentially from the file), and a file's disk usage. For more explanations, see the RATIONALE section at http://www.opengroup.org/onlinepubs/009695399/utilities/du.html (You may have to register, but it doesn't hurt) See also the reference to lseek(2) mentioned there. But at the same time, I wasn't using the -S switch in rsync, so my own stupidity there. However, why spend lots of time processing empty files I still don't understand that however. Please note that -S in rsync does not *guarantee* that source and destination files are *identical* in terms of holes or disk usage. For example: $ dd if=/dev/zero of=foo bs=1m count=42 $ rsync -S foo host: $ du foo $ ssh host du foo Got it? The local foo is *not* sparse (no holes), but the remote one has been optimized by rsync's -S switch. We recently had a very controverse (and flaming) discussion at our local UG on such optimizations (or heuristics, as in GNU cp). IMO, if they have to be explicitely enabled (like `-S' for rsync), that's o.k. The other direction (copy is *not* sparse by default) is exactly what I would expect. Telling wether a sequence of zeroes is a hole or just a (real) block of zeroes isn't possible in userland -- it's a filesystem implementation detail. To copy the *exact* contents of an existing filesystem including all holes to another disk (or system), you *have* to use filesystem-specific tools, such as dump(8) and restore(8). Period. I did research on google for sparse files and try to get more informations about it. In some cases I would assume like if you do round database type of stuff where you have a fix file that you write in at various place or something, would be good and useful, but a sparse file that keep growing over time uncontrol, I may be wrong, but I don't call that useful feature. Sparse files for databases on heavy load (many insertions and updates) ar the death of performance -- you'll get files with blocks spreaded all over your filesystem. OTH, *spare* databases such as quotas files (potentially large, but growing very slowly) are good candidates for sparse files. Ciao, Kili
Re: df -h stats for same file systems display different result son AMD64 then on i386 (Source solved)
Hi all, First let me start with my apology to some of you for having waisted your time! As much as this was/is interesting and puzzling to me and that I am trying obviously to get my hands around this issue and usage of sparse files, the big picture of it, is obviously something missing in my understanding at this time. I am doing more research on my own, so lets kill this tread and sorry to have waisted any of your time with my lack of understanding of this aspect! I am not trying to be a fucking idiot on the list, but it's obvious that I don't understand this at this time. So, lets drop it and I will continue my homework! Big thanks to all that try to help me as well! Daniel