Re: directory replication between two servers
I am two Linux servers with rsync server running on both. Now I am replicating directories in both servers with the command rsync -avz My requirement is, if I made any changs in the first server, say server A, I want to see the changes in the scond server immediatelysome thing similar to mysql database replicationhow can I do that..?? ... a vague question. It depends on the application. In high-avilability environments it's best to do the replication in the application so that the application can deal with or work around any failure conditions. In the case of a database, database replication methods work better than depending on the filesystem. The filesystem does not know the state of transactions within the database. Imagine this: Instead of having your client application write to one filesystem, have it write to two filesystems before saying the write was completed or committed. If one system fails, the other is updated just as well as the failed filesystem (caveat: I'm ignoring race conditions!). If you need read-write access on both local and remote servers and have partitioned data sets (i.e. don't need to depend on block-level locking), consider having both servers use a dedicated high-availability network attached storage server (HA solution). Both can access an NFS server, or the second server can mount the filesystem from the first server (not an HA solution). If you need read-write access on one server and need to replicate data to a read-only server _and_ if the replicaiton process can be asynchronous, doing multiple rsyncs can work. while true do rsync -avz source destination if [ $? != 0 ]; then Get Help fi done If you know where your applications are doing writes, you might limit your replication to the subdirectory or files on which writes are performed to help speed up the search process. Note, though, that rsync-based replicaiton methods are not efficient on the disks or filesystems, just the network traffic. Imagine reading _all_ of your data over and over and over and over again when only a few blocks might change periodically. If you need read-write access on one server and need to replicate data to a read-only server and need synchronous operation (i.e.: the write must be completed on the remote server before returning to the local server), then you need operating-system-level or storage-level replication products. Veritas: It's not available on Linux yet, but Volume Replicator performs block-level incremental copies to keep two OS-level filesystems in sync. $$ File Replicator is based (interestingly enough) on rsync, and runs under a virtual filesystem layer. It is only as reliable as a network-wide NFS mount, though. (I haven't seen it used much on a WAN.) $$ Andrew File System (AFS) This advanced filesystem has methods for replication built in, but have a high learning curve for making them work well. I don't see support for Linux, though. $ Distributed File System (DFS) Works alot like AFS, built for DCE clusters, commercially supported (for Linux too) $$$ NetApp, Procom (et.al.): Several network-attached-storage providers have replication methods built into their products. The remote side is kept up to date, but integrity of the remote data depends on the application's use of snapshots. $$$ EMC, Compaq, Hitachi (et.al.): Storage companies have replication methods and best practices built into their block-level storage products. Another alternative (cheaper, too) is to just use a database, period. People who worry about data storage, data integrity, failover, and replication have put alot of thought into their database products. If you can modify your application to depend on a database and not a filesystem, you may be better off in the long run. Lazy people use filesystems as their database. It works just fine up to the point where you need to worry about real-time replication. Again, it really depends on the application. If others know of other replication methods or distributed filesystem work, feel free to chime in. -- Eric Ziegast -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: -c Option
Quick questioncan anyone explain to me when the data in a file might change without changing the mtime, ctime or size? I'm not sure I've ever come across that before. An example might help me determine if I can safely remove -c. It's possible on Unix systems, but not practical. An example script: #!/bin/sh # Run on a BSD Unix system, your touch(1) arguments may vary echo foo File touch -t 200205300800 File ls -l File echo foo File ls -l File touch -t 200205300800 File ls -l File Output: -rw-r--r-- 1 ziegast ziegast 4 May 30 08:00 File -rw-r--r-- 1 ziegast ziegast 4 May 30 11:19 File -rw-r--r-- 1 ziegast ziegast 4 May 30 08:00 File Here's one example where I might use -c: Hackers are not known for being practical. They like to cover their tracks as best they can by setting the owner, group, permissions, size, mtime, and ctime of their fake programs to be the same as the original programs. If you're distributing system software using rsync or rdist, you may want to force checksum comparisons just to be sure. One might also use rsync with -n -c to help compare a gold copy of OS files with an active system. Then again, tripwire was designed to do this better. Another example is when some network-based filesystems delay updating their metadata even though the content has changed. I remember once using a client machine to update a file on a busy NFS server. It took several seconds for the change to be seen by another client machine. If I were using rsync on the second client machine, its view of the file might be inconsistent. Then again, best practices would dictate my wanting to run rsync on the NFS server and not on the clients that might be inconsistent with the server (to keep network traffic down and reduce overhead). -- Eric Ziegast -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Rsync dies
In my humble opinion, this problem with rsync growing a huge memory footprint when large numbers of files are involved should be #1 on the list of things to fix. I think many would agree. If it were trivial, it'd probably be done by now. Fix #1 (what most people do): Split the files/paths to limit the size of each job. What someone could/should do here is at least edit the BUGS section of the manual to talk about the memory restrictions. Fix #2 (IMHO, what should be done to rsync): File caching of results (or using a file-based database of some sorts) is the way to go. Instead of maintaining a data structure entirely in memory, open a (g)dbm file or add hooks into the db(3) libraries to store file metadata and checksums. It'll be slower than an all-memory implementation, but large jobs will at least finish predictably. Fix #3 (what I did): If you really really need to efficiently transfer large numbers of files, come up with your own custom process. I used to run a large web site with thousands of files and directories that needed to be distributed to dozens of servers atomically. Using rsync, I'd run into memory problems and worked around them with Fix #1. Another problem was running rsync in parallel. The source directory was scanned order(N) times when it needed to be scaned only once. The source content server was pummeled from the multiple simultaneous instances. I resorted to making my own single-threaded rsync-like program in Perl to behave more like Fix #2 and runs very efficiently. I've spent a some time cleaning up this program so that I can publish it, but priorities (*) are getting in the way. When I get some time, you'll see it posted here. -- Eric Ziegast (*) Looking for a full-time job is a full-time job. :^( Will consult for food. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Does any rsync-based diff, rmdup, cvs software exist?
I'd like to be able to run GNU-diff type comparisons, but use R-sync technology to make it efficient to see what's different without transmitting all that data. Rsync is great at synchronizing data between a source and destination. For diff-like comparisons, perhaps something like CVS might be more apropriate. Another thing I like to do using rsync protocol, is what I call rmdup -- remove duplicates. This would allow me to recursively (like diff -r) compare files in two (!!MUST BE!!) different directories and remove one (or the other) of the duplicates. A shell script that does something similar to what you want without using rsync #!/bin/sh # Our md5 checksum program (rsync uses md4, but the concept is the same) MD5=md5sum# On RedHat 7.1 #MD5=md5 # In *bsd # Inventory the source directory cd $SOURCE_DIR src=/var/tmp/find.$$.src find -x -type f -print | xargs $MD5 | awk {print $2, $1} | sort $src # Inventory the destination directory cd $DESTINATION_DIR dst=/var/tmp/find.$$.dst find -x -type f -print | xargs $MD5 | awk {print $2, $1} | sort $dst # Remove duplicates in the destination directory cd $DESTINATION_DIR comm -12 $src $dst | sed -e 's/ .*//' | xargs rm -i # rm $src $dst Note: comm -12 does a line by line comparison of the two checksum lists. The output is lines common to both files. If a filename/checksum matchs for both the source and destination directory, the file in the destination directory is the duplicate (per the definition in the e-mail) and is piped to xargs rm for removal. Note: Configuring for use with source or destination directory on a remote host would include the strategic use of rsh or ssh. The good news is that because only a list of checksums is needed for comparison, the bandwidth needed between servers is minimized (like rsync). Again, the rsync protocol could be useful in configuration management, for computing the deltas that must be stored. CVS (or even RCS) is more useful for configuration management and updates of text files. It also archives changes over time. As far as I'm aware (without looking at source code), rsync does block-level comparisons, not line-by-line. -- Eric Ziegast -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: rsynch related question
Uma asks: I had a question on rsynch'ing, please do help me on this, I have scenario where,I need to synch the two softwares, where both the software are across the network, on different cells (AFS cells). For ex: first1 - /afs/tr/software , second1 - /afs/ddc/software Both the softwares are same fist1 cell will be constantly updating and I need to synch this software to sencond1. In this scenario what command I should use ? There are many ways to do it based on your needs, and from where you want to drive the process. Push using local filesystems If both AFS trees are on the same LAN with low latency and high bandwidth available, you can just access them directly: # On any server... cd /afs/tr/software rsync -ax . /afs/ddc/software Push to remote server using rsh/ssh If the AFS trees exist is different locations with significant delay between them or not much bandwidth, then is is more efficient to use rsync between servers at both locations to minimize bandwidth needs between locations. Each server (eg: TR-SERVER and DDC-SERVER) would scann the drectory trees locally and transmit only inventory information and changes to files over the WAN. # On TR-SERVER... cd /afs/tr/software rsync -ax . USER@DDC-SERVER:/afs/ddc/software The above pushes files out using rsh. If you want to use ssh or a Kerberized rsh, consider -e ssh or -e 'rsh -K'. If the content is usually compressable, consider using -z to save more bandwidth. Suck from remote server with rsyncd You can also suck files by setting up an rsync server on a server at the /afs/tr node and have rsync clients on the net connect to the server to suck down their files. I haven't used rsyncd before, but the syntax might look something like this: # On DDC-SERVER... cd /afs/ddc/software rsync -ax USER@TR-SERVER::software . # In /etc/rsyncd.conf on TR-SERVER... [software] path=/afs/tr/software ... other options based on access/security ... See the rsync(1) man page for more information about syntax with an rsync server. See rsyncd.conf(5) for more info about configuring rsync servers. There are examples on how to setup and rsync server here: http://everythinglinux.org/rsync/ http://www.freeos.com/articles/4042/ Suck from remote server with rsh/ssh Another simpler way to use rsync to suck files over the network using rsh (or ssh) is: # On DDC-SERVER... cd /afs/ddc/software rsync -ax [-e ssh] USER@TR-SERVER:/afs/tr/software . There are many other command options you might consider, but they are based more on the content than connectivity. -- Eric Ziegast -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html