Re: Skipping hardlinks in a copy

2007-03-09 Thread Phil Howard
On Thu, Mar 08, 2007 at 01:34:03PM -0800, Sriram Ramkrishna wrote:

| OK, I wasn't aware that you couldn't hardlink a directory to another
| directory.

You're not supposed to be able to.  But some systems allow it in a
restricted manner just for very special issues such as recovery of
an otherwise corrupt filesystem.  Some others might allow it due to
a bug.

Hard linking directories can produce the loops, and this is a major
reason for disallowing them in the normal case.  Another is that an
empty directory with 2 or more links will not appear empty in the
normal manner.  All these things could be worked around, but it is
detrimental to the system to have to do things in such ways.

Sequent Dynix around 1990 or so allowed doing rename(2) simultaneously
on different processors, each naming a different target name, without
proper locking, and it would result in hard links.  I encountered this
with regular files then tested it with directories and it allowed that,
too.  Then the directories could not be removed, even though empty.  I
used the same bug to reverse the process that created them by renaming
the two directories back to one name to get out of the mess.


| OK.  Looks like I just have to deal wtih each cycle I encounter and
| break it.  Joy. :-)

What does find for that system do?  If they allow hard links, then
their tools should know how to work around them.

I hope your application doesn't depend on these hard linked directories.


| I'm at a loss then at what I'm looking at.  Maybe it's following
| symlinks and I have not checked the arguments properly.  It might
| be that symbolic links are causing the issue, but in that case it
| doesn't seem to explain why it's taking days to copy a level 0 copy.
| I leaped on the link tree issue, because strace on an rsync was
| showing it going through the same progression of directories.
| I'm going to have to go back and run it again and see if I can
| catch it.

Normally, symlinks to directories are not followed.  I've never seen
rsync or find do that.  I've written a tree recursion function in a
library and it doesn't follow symlinks.
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Skipping hardlinks in a copy

2007-03-08 Thread Phil Howard
On Wed, Mar 07, 2007 at 09:22:08PM -0800, Sriram Ramkrishna wrote:

| Hi folks, I've been googling around for awhile but I can't seem to find
| an answer to my question. 
| 
| I have a number of filesystems that contain thousands of hard links due
| to some bad organization of data.  Rsync, cpio and various other
| utilities fail to copy this data because I think there might be some
| cycles in it.  (you know you have troubles if cpio can't copy it!)
| 
| What I thought I would do instead is to copy the data but skip any files
| that are hard links.  Then after the copy is finished, I will use some
| kind of find . -type l type command that finds the hard links and then
| make a script to recreate it.  This saves me a lot of trouble with not
| having to stat the files and not having the receive side balloon up.
| 
| Is there a way to have it skip hard links when doing an rsync?
| Or is there some other mystic incantation that I can use that might
| accomplish the same thing.

The following command pipeline can give you a list which you could isolate
to being just the first ocurrence of each file that is sharing the same
inode:

find . ! -type d -printf '%10i %P\n' | awk '{n=substr($0,12);if(a[$1]==1){print 
other,n;}else{a[$1]=1;print first,n;}}'

Note the above is 123 characters long.  You may have issues with mail
programs that truncate or wrap it around, so be careful.  The fixed
size formatting of the inode number in the find output is to make it
easy to extract the name, or the name plust the symlink target, in the
awk command using substr().

One approach in the situation you have, if the filesystem is not corrupt
(which it might be, because files don't create cycles), is to create a
list of files based on their inode number, and hardlink each file to one
named by its inode number.  Just rsync the directory full of inode numbers.
Then re-expand on the destination based on that list.

You should not be following symlinks in a file tree recursion.  Rsync,
find, cpio, and others, know not to.

But I suspect some kind of filesystem corruption, or at least some hard
links being applied to directories.  The latter can create cycles if not
done carefully (and there is virtually no case to ever do that at all by
intent).

I do not consider it bad organization to have lots of files be hardlinked.
In fact, I have a program that actually seeks out indentical files and makes
them be hardlinked to save space (not safe in all cases, but safe in most).

The command find . -type l will only find symlinks.  You can find files
that have hard links with find . ! -type d -links +1 -print.  Note that
all file types can have hard links, even symlinks.  Do exclude directories
as those will have many links for other reasons (e.g. 1 for self reference,
1 for being inside a directory and 1 each for each subdirectory within).

-- 
|---/--|
| Phil Howard KA9WGN (ka9wgn.ham.org)  /  Do not send to the address below |
| first name lower case at ipal.net   /  [EMAIL PROTECTED] |
|/-|
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: about lock.

2007-03-08 Thread Phil Howard
On Thu, Mar 08, 2007 at 12:08:55PM +0100, Alejandro Feij?o wrote:

| mmm and is posible corrupt data???
| 
| for example if rsync is on file called A.txt and at same time mysql is read
| A.txt... whats happend? is that who i dont understand.

It can get mixed data.  The file contents could appear corrupt to another
mysql running on the target.  You can use rsync to backup database files
while mysql is running to get _MOST_ of the content transferred while the
database is running.  Repeat rsync until very little change happens in a
shorter run.  Then shut down mysql cleanly and do a final rsync to sync
the files in a non-corrupt way.  At least mysql downtime is minimal this
way.

You may also want to employ methods of backup up data from within mysql
itself on a row/column basis as transactions update things, to a live
backup mysql engine running elsewhere that can take over serving clients
should the first die.  This has the advantage of a hot failover AND you
could use that backup engine to do the rsync backups from, since only it
needs to be shutdown to do the final rsync syncronization.  Just be sure
the in-mysql backups know how to re-syncronize everything that was changed
while the backup mysql was briefly down for the last rsync run.

   /-- rsync copy to site 2
main mysql == backup mysql -- rsync copy to site 3
   \-- rsync copy to site 4

-- 
|---/--|
| Phil Howard KA9WGN (ka9wgn.ham.org)  /  Do not send to the address below |
| first name lower case at ipal.net   /  [EMAIL PROTECTED] |
|/-|
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


syncronizing file content into device content

2007-03-07 Thread Phil Howard
I have repeatable cases where a (very large) file contents are the
intended content of a device (partition) of that exact size.  The
target device is already mostly of that content, but a few block
have been changed.  I want to restore the device contents back to
what the file originally was.  I'm trying to find a way to have
rsync's write end open the device as a file and do its thing to
syncronize the content efficiently so I don't have to transfer the
entire content each time by some other means.  But rsync is very
persistent in replacing the device node file with just a regular
file, which because the file is new, causes it to transfer the
content of the original file, filling up the filesystem /dev is on
and needing to restore the device node with mknod.

So, how can I get rsync to syncronize the _content_ of the device
with the _content_ of the source file (or source device)?

I don't even have luck trying to syncronize device content to device
content (e.g. without --devices, which would just replicate the node
and it's major,minor ... it says it is just skipping the non-regular
file naming the source device).  But I do need to do this with the
source being a file, though if I could get device content to device
contnt to work I guess I could fake this with the loopback device.

Any ideas?

-- 
|---/--|
| Phil Howard KA9WGN (ka9wgn.ham.org)  /  Do not send to the address below |
| first name lower case at ipal.net   /  [EMAIL PROTECTED] |
|/-|
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: trying to understand --include and --exclude

2005-12-02 Thread Phil Howard
On Sun, Nov 27, 2005 at 05:13:30PM -0500, Aaron Morris wrote:

| I think you are trying to include too many things in a single
| include/exclude statement.  You need to break everything down to
| files, folders, and sets of files in a single folder.  For example, if
| you want to transfer a file 3 directories deep, each parent directory
| needs to be included if there is a global exclude at the end of your
| statements.

What if I want to include a particular pattern of files in which its
parent directory is a given directory, but this combination can be
found at various levels, such as:

foobar/2005/07.tar.bz2
foo/bar/2005/07.tar.bz2
fo/ob/ar/2005/07.tar.bz2

I would think the ** pattern would cover multiple levels, so the
pattern **/2005/07.tar.bz2 should cover that.  But the issue is
what about the directories?

I think the problem here is that I have to also match every possible
parent directory as each is individually applied to all the patterns,
else the directory won't be created.  The better solution it seems to
me would be to have a general option like:  --create-parent-directories
which would basically have the semantic meaning that for anything that
does match, create the directories necessary for it, much like the -p
option on mkdir.


| The following will probably give you what you want:
| 
| --include /*  (include all files at root of transfer)
| --include /*/2005/ --include /*/2005/07.tar.bz2 (include file
| 07.tar.bz2, you must also include the folder that contains the file
| because of the next statement)
| --exclude *  (exclude everything not explicitly included)
| 
| I haven't tested it, so YMMV.

I ended up getting it working with this combination:

--include */2005
--include */2005/07.tar.bz2
--exclude */*
--include *
--exclude */**

(in reality a much larger list to handle a lot more stuff really being done)

I don't recall if what you suggested was tried or not.  I went through so
many combinations.  One problem is, there are no messages to explain why
some file is, or is not, matched, so it is very hard to figure this out.
I suspect I still don't really grasp the whole picture of how this works.
It's certainly not the way I would have designed it.

-- 
-
| Phil Howard KA9WGN   | http://linuxhomepage.com/  http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/   http://ka9wgn.ham.org/ |
-
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


parallel tree recursion

2005-12-02 Thread Phil Howard
How hard would it be to have rsync do the file recursion scan on both the
source tree and the destination tree at the same time in parallel?  Would
that require a protocol change, or could just a program change be enough?

I have some very large directories I'd like to syncronize.  Total time to
scan through these millions of files is a substantial portion of an hour
or even exceeds it.  It just goes slower if these large critical time blocks
have to be done sequentially.

It would also be nice if rsync didn't have to collect the entire tree in
RAM all at once, but instead, both source and destination could recurse
their respective trees in sync with each other and copy, create, delete,
as things go through that parallel recursion.

-- 
-
| Phil Howard KA9WGN   | http://linuxhomepage.com/  http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/   http://ka9wgn.ham.org/ |
-
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


trying to understand --include and --exclude

2005-11-27 Thread Phil Howard
I was under the impression that --include and --exclude worked by matching
patterns in the order given, and whichever matched first, whether that was
an include or exclude determined the action for that file.  I have a big
directory from which I am attempting to transfer selected files.  I want
all files where the first level directory is anything, the second level
directory is 2005 and the third level is the file 07.tar.bz2.  So I
started with these options (different tries on each line):

--include '**/2005/07.tar.bz2' --exclude '**'
--include '**/2005/07.tar.bz2' --exclude '*'
--include '*/2005/07.tar.bz2' --exclude '**'
--include '*/2005/07.tar.bz2' --exclude '*'

I also tried numerous other random variations.  Either I get absolutely no
files at all matching, or I get everything in the entire directory tree to
match.

I hope what I want is clear to you.  How can I make it clear to rsync?

I've had several other cases of this in the past, and stumbled on something
that worked without really understanding why.  So I guess the explanations
in the man page just didn't really give me the correct understanding of how
this works.  Maybe someone can try explaining in another way.

-- 
-
| Phil Howard KA9WGN   | http://linuxhomepage.com/  http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/   http://ka9wgn.ham.org/ |
-
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: mkstemp fails but data still transferred

2005-03-24 Thread Phil Howard
On Tue, Mar 22, 2005 at 11:27:57PM -0600, John Van Essen wrote:

| On Tue, 22 Mar 2005, Steve Harris [EMAIL PROTECTED] wrote:
|  
|  What seems to be happening is that even though the directory doesn't exist
|  and the temporary file can't be created the data is transferred but
|  not written anywhere
| 
|  I guess what I'm getting at is that if rsync can't create the temporary
|  file shouldn't it just bail ?
| 
| The Documentation section of the rsync web site has a How Rsync Works page:
| 
|   http://rsync.samba.org/how-rsync-works.html
| 
| originally written by the late JW Schultz.
| 
| In the pipeline section you'll see that communication is unidirectional.
| One of rsync's many advantages is streaming unidirectional pipelines.
| Not having two-way chatter helps speed up file transfers.
| 
| Since it is optimized for the normal case where there are no problems
| on the receiving end, the receiver has no way to tell the sender to
| stop sending file content when there is a problem, and must accept and
| discard the remainder of the file (which is what you are seeing).
| Subsequent file transfers might be successful, so it can't abort, yet.
| 
| Hope that helps.

An option to specify action would help more, IMHO.  Since there is no
two-way chatter, the choices are obviously limited.  How about:

  --create-fail-action=discard (default)
  --create-fail-action=mkdir
  --create-fail-action=abort

or some such?

-- 
-
| Phil Howard KA9WGN   | http://linuxhomepage.com/  http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/   http://ka9wgn.ham.org/ |
-
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync is slowing down

2004-04-07 Thread Phil Howard
On Sat, Apr 03, 2004 at 12:23:59PM -0800, Wayne Davison wrote:

| You can implement such optimizations on top of rsync using either
| excludes or the --files-from option.  For instance, if the sending
| side maintained an exclude file of old directories that didn't need
| to be transferred, you could write a script that would look for
| updated items and remove the appropriate exclusion.  An exclude list
| would have to be grabbed first from the remote side before it could
| be used, though.

How would the sending side know what directories are old for a
given receiver?  One receiving side may run their update today for
an old directory that had one file changed.  But another receiving
side may not run its update for a few more days or even weeks.

This sounds like the sending side needs to keep track of what each
different receiver has or doesn't have.  That's what I used to do
before rsync.

-- 
-
| Phil Howard KA9WGN   | http://linuxhomepage.com/  http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/   http://ka9wgn.ham.org/ |
-
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: --include vs. --exclude

2004-04-03 Thread Phil Howard
On Sun, Mar 28, 2004 at 07:16:26PM -0600, John Van Essen wrote:

| On Fri, 26 Mar 2004, Phil Howard [EMAIL PROTECTED] wrote:
| 
|  So I have on my server a big file tree.  I want to use rsync to download
|  only the PDF files, which make up a small portion of that tree.  So I try
|  it this way:
|  
|rsync -aHPvz --include '*.pdf' --exclude '**' [EMAIL PROTECTED]:source 
destination
|  
|  which gives me nothing.
| 
| Hi Phil,
| 
| Your goal seems to be to create a tree that consists only of *.pdf
| files and their necessary directory nodes.  Right?

Right.


| The closest you can get to that with standard includes/excludes is
| with this combination:
| 
|   --include '*/' (process/create all directory nodes)
|   --include '*.pdf'  (include all *.pdf files)
|   --exclude '*'  (exclude everything else)
| 
| The disadvantage is that you will get the entire tree, even the
| branches with no pdf files.

I was not including that first --include '*/' for the directories.
I was under the impression rsync simply created directories as needed.


| rsync 2.6.0 has a new 'files-from' option which will do what you need,
| but you have to create the file list yourself (easy enough to do with
| the 'find' command).  But you are at a disadvantage in that you are
| doing a pull and may not have access to the source.
| 
| Here's a possibility.  Do a --dry-run first to determine the names
| of the pdf files.  Grep the -v output of rsync for '\.pdf$', store
| that list in a file, then do a real run with the files-from option.
| 
| I didn't try this - so I haven't verified that this will work.

Maybe a feature like --make-needed-directories would do well.  Of course
for directories that exist at the source, using the same metadata should
still be done.

-- 
-
| Phil Howard KA9WGN   | http://linuxhomepage.com/  http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/   http://ka9wgn.ham.org/ |
-
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


rsync is slowing down

2004-04-03 Thread Phil Howard
The cause is, of course, that the tree being syncronized ie getting larger,
so of course rsync is slowing down.  But in the case of my particular file
tree, there is a way it could be speeded up, but this would obviously also
need a change in the rsync protocol to accomplish it.  Any tree that has
major unchanged subbranches would benefit from this.

The file tree I'm syncronizing in this case has archived data that is being
deposited under a /MM/DD directory structure.  Hundred to thousands of
files are added each day, and I'm even considering breaking it down further
by hour.  In theory, I could do the syncronizing by date.  On the receiving
side, which is also the initiating side, I could have it learn when the last
time it fulling completed syncronizing, and re-run that date as well as any
subsequent dates, rather than the entire tree (which could easily reach a
quarter million files or more per year).

But I have one catch.  Occaisionally, an older file is updated.  And that
older file needs to be syncronized as well.  The receiving/initiating side
won't know whether an old file is, or is not, updated.

What I think would be an improvement in rsync speed in this scenario, and
some similar ones where lots of tree branches are not updated for extended
periods of time, is to collect the timestamps (and checksums if that is
enabled) for each entire branch, hash them, and transfer only the hash of
that metadata.  It would need to be a strong hash like MD5 of SHA1 since
if the hashes are equal, the tree branch would be skipped, and none of the
filenames within would be transferred.  For branches where the hash is not
equal, then the same hashing would be done recursively on the sub-branches
until either unchanged sub-branches are found and skipped, or changed files
are found (and transferred).

The catch with this mechanism is that nothing would be exchanged between
the rsync processes until the entire tree had been scanned and all the time
stamps collected (worse if doing checksums).  On slow (relative to the total
volume of data to be syncronized) networks, though, this could still be a
major time savings, as well as traffic savings.  But it clearly would have
to have a special option to enable it.

Has anything like this been considered before?

-- 
-
| Phil Howard KA9WGN   | http://linuxhomepage.com/  http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/   http://ka9wgn.ham.org/ |
-
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


--include vs. --exclude

2004-03-26 Thread Phil Howard
It guess I still haven't figured out the entire sematics of the --include
and --exclude options.  From reading the man page, it seems to say that
what happens is that each file being checked is tested against each pattern
in order, and when one matches the tests end, and whether it is --include
or --exclude determines if that file is included or excluded.

So I have on my server a big file tree.  I want to use rsync to download
only the PDF files, which make up a small portion of that tree.  So I try
it this way:

  rsync -aHPvz --include '*.pdf' --exclude '**' [EMAIL PROTECTED]:source 
destination

which gives me nothing.  For reference, I try:

  rsync -aHPvz --include '*.pdf' [EMAIL PROTECTED]:source destination

which starts downloading other files.  That confirms that the default final
action is equivalent to --include '**' or something like that.

So it seems the include pattern isn't matching.  So I try variations:

  rsync -aHPvz --include '*.pdf' --exclude '*' [EMAIL PROTECTED]:source destination
  rsync -aHPvz --include '**.pdf' --exclude '**' [EMAIL PROTECTED]:source 
destination
  rsync -aHPvz --include '**.pdf' --exclude '*' [EMAIL PROTECTED]:source 
destination
  rsync -aHPvz --include '**/*.pdf' --exclude '**' [EMAIL PROTECTED]:source 
destination
  rsync -aHPvz --include '**/*.pdf' --exclude '*' [EMAIL PROTECTED]:source 
destination
  rsync -aHPvz --include '/**.pdf' --exclude '**' [EMAIL PROTECTED]:source 
destination
  rsync -aHPvz --include '/**.pdf' --exclude '*' [EMAIL PROTECTED]:source 
destination
  rsync -aHPvz --include '/**/*.pdf' --exclude '**' [EMAIL PROTECTED]:source 
destination
  rsync -aHPvz --include '/**/*.pdf' --exclude '*' [EMAIL PROTECTED]:source 
destination

None of these work.

So finally, I replicate the file tree on the server with:

  cp -al source alternatename

And proceed to remove all non-PDF files:

  find alternatename -type f ! -name '*.pdf' -exec rm -f {} ';'

Then I do:

  rsync -aHPvz --include '*.pdf' --exclude '**' [EMAIL PROTECTED]:alternatename 
destination

which now works.

Can rsync do this by itself?  Is there a way to tell rsync only download
this particular extension?  How SHOULD I have done this?

I generally understand things best by knowing what sequence of steps is
performed.  I thought I understood this for rsync based on what the man
page said.  I guess one of us is wrong.

I'm running:  rsync  version 2.6.0  protocol version 27

-- 
-
| Phil Howard KA9WGN   | http://linuxhomepage.com/  http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/   http://ka9wgn.ham.org/ |
-
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Change in reporting for --dry-run in 2.6.x

2004-01-28 Thread Phil Howard
On Wed, Jan 28, 2004 at 12:38:11PM -0500, Alberto Accomazzi wrote:

| I just noticed that there is an extra blank line in the output generated 
| by rsync when the --dry-run (-n) flag is used.  This seems to have 
| started with 2.6.0.  Is this desired?  The reason why I'm asking is 
| because I use scripts that parse the output from rsync and little 
| modifications in verbosity can make or break things easily.

While I understand your concern, the design of programs like rsync is for
humans to read the results, not for machines.

That said, I think it might be nice for there to be an option to specify
a filename to store final status information in that is orthogonal to
any human readable verbosity.  The format of that status should be easy
for parsing to be applied, perhaps in name=value format, one per line.

wrote=78
read=1204
bytes/sec=2564.00
totalsize=457622413
speedup=356959.76
exitstatus=0

However, this would be a lot of work for the rsync developers for not a lot
of gain, so I don't think it would likely be adopted.  Maybe you can write
a patch for it.  I might if I get the time.

When I run rsync, I do like to be able to run in via screen or a headless
virtual console, where I can spy on the current running any time I want.
Adding the ability to have a script extract that info, while still being
able to see normal stdout and stderr output, would be good.

-- 
-
| Phil Howard KA9WGN   | http://linuxhomepage.com/  http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/   http://ka9wgn.ham.org/ |
-
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync 2.6.0: ./configure goes into a loop

2004-01-04 Thread Phil Howard
On Sat, Jan 03, 2004 at 01:00:09PM -0800, Wayne Davison wrote:

| On Sat, Jan 03, 2004 at 02:11:04AM -0600, Phil Howard wrote:
|  After doing a fresh extraction of the source for 2.6.0, I execute
|  ./configure and it enters a loop with no output before or during.
| 
| What shell are you using?  I'd suggest trying to run it as
| bash configure and see if that helps.

GNU bash, version 2.05b.0(1)-release (i386-slackware-linux-gnu)

Doing bash configure goes into a (presumable the same) loop.

There is a file called configure.lineno in the source directory that is
being rewritten over and over and over.  It looks like the configure
script with $LINENO replaced by line numbers.

-- 
-
| Phil Howard KA9WGN   | http://linuxhomepage.com/  http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/   http://ka9wgn.ham.org/ |
-
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


rsync 2.6.0: ./configure goes into a loop

2004-01-03 Thread Phil Howard
After doing a fresh extraction of the source for 2.6.0, I execute ./configure
and it enters a loop with no output before or during.  I let run 20 minutes
just to see if it would ever do anything and it did nothing else.

Host system has:
Linux:  2.4.23
Slakcware:  9.0
bash:   2.05b.0
gcc:3.2.2
glibc:  2.3.1

I have a big (huge) strace of it here:
http://phil.ipal.org/rsync-configure-loop.strace.bz2
http://phil.ipal.org/rsync-configure-loop-1000.strace.bz2

The one with 1000 is just the first 1000 lines of the strace.

Ideas?

-- 
-
| Phil Howard KA9WGN   | http://linuxhomepage.com/  http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/   http://ka9wgn.ham.org/ |
-
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: any way to get --one-file-system in rsyncd.conf?

2003-06-22 Thread Phil Howard
On Sun, Jun 22, 2003 at 10:34:17AM -0700, [EMAIL PROTECTED] wrote:

|  I would like to specify an entry in /etc/rsyncd.conf such that it
|  operates on a --one-file-system basis always.  The path will point
|  to a filesystem mount point, but there is another filesystem that
|  is mounted in a subdirectory.  I want to back up only those files
|  in the pointed to filesystem, and not the one mounted within (in
|  that run, anyway).  I do not see such an option in man rsyncd.conf.
|  Is there an undocumented one available?
| 
| I don't think so.  But an alternative is use the exclude option in
| rsyncd.conf to exclude any mount points.  There are some caveats -
| see the man page.

Thanks for the response.

I'll probably avoid the exclude and just use a separate set of bind
mounts of the same filesystems in a non-overlapping way.  I was hoping
to cleanly avoid that, but bind mounts are reasonably clean even if
they do clutter /etc/mtab a bit.  Since I'm doing this on Linux, this
is an option.  I'm not sure what my options will be on other systems
if/when I need to run those.

-- 
-
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| [EMAIL PROTECTED] | Texas, USA | http://ka9wgn.ham.org/|
-
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


rsync over ssl (again)

2002-08-22 Thread Phil Howard

A while back, I asked if there had been any consideration in making
rsync support direct ssl (as opposed to just ssh).  I've been looking
around for a secure way (e.g. encrypted, so passwords are never in
the clear, and even content is obscured from sniffers) to allow a
set of limited-trust users (limited-trust being defined as mostly
customers, whom you trust with their own data, but not with shell
accounts and such) to access data using rsync (or I guess we might
call it srsync).

Fortunately things like pop3 and imap4 have secure equivalents.
But I also have a need to give users the ability to upload and
download their own data securely, and the only good tool to do
that without granting them something that might open a shell for
them is https.  But for large transfers, that just does not work
very well, and I think rsync would be a much better answer if
the security issue can be worked out.

The server side can be readily done through something like stunnel.
But on the client side, stunnel would be cumbersome considering the
user base involved.

If this can be done, I'd also like to see if there is some way to
overload the usage of port 873 for both insecure and secure usage.
I don't know how the protocol works, but if it does enough of the
right startup negotiation, it might be possible to safely decide
if the session needs to be done securely, then switch to secure
mode and restart the session negotiation (including server identity
certification validation).  Any thoughts on this?

-- 
-
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| [EMAIL PROTECTED] | Texas, USA | http://ka9wgn.ham.org/|
-
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



option --copy-unsafe-links

2002-07-03 Thread Phil Howard

I presume the option --copy-unsafe-links really means to copy the file
contents a symlink points to, even outside the tree being copyed, rather
than make a symlink on the destination.

What I find is that if a symlink on the source is dangling, that is,
it points to nothing that exists, that symlink is not created at the
destination.

What I want is for all symlinks to be reproduced as symlinks exactly,
regardless of whether what they point to happens to exist at the moment
or not.  How can I get this?  The option --links is only working when
the symlink actually points to something (e.g. stat() succeeds).  But
if stat() fails, even though lstat() would succeed, the symlink is not
copied.

-- 
-
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ |
-

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



-z and -B65536 causing file corruption in 2.5.5 w/zlib 1.1.4

2002-06-22 Thread Phil Howard
 
hadar.ipal.net::slackware-8.1/PACKAGES.TXT 
/home2/rsync/hadar/mirrors/slackware/rsync/ftp.slackware.com/slackware/slackware-8.1/PACKAGES.TXT
Welcome to the Internet Power and Light public file archive!

Please limit repeated updates to not more than once per hour.

receiving file list ...
1 file to consider
wrote 116 bytes  read 224 bytes  136.00 bytes/sec
total size is 228031  speedup is 670.68
phil@polaris:/home/phil 28 rsync --version
rsync  version 2.5.5  protocol version 26
Copyright (C) 1996-2002 by Andrew Tridgell and others
http://rsync.samba.org/
Capabilities: 64-bit files, socketpairs, hard links, symlinks, batchfiles,
  IPv6, 64-bit system inums, 64-bit internal inums

rsync comes with ABSOLUTELY NO WARRANTY.  This is free software, and you
are welcome to redistribute it under certain conditions.  See the GNU
General Public Licence for details.
phil@polaris:/home/phil 29 which rsync
/usr/bin/rsync
phil@polaris:/home/phil 30 ls -l /usr/bin/rsync
-rwxr-xr-x1 root root   548639 Jun 22 03:47 /usr/bin/rsync
phil@polaris:/home/phil 31 strings /usr/bin/rsync | fgrep 1.1.4
1.1.4
 deflate 1.1.4 Copyright 1995-2002 Jean-loup Gailly
1.1.4
 inflate 1.1.4 Copyright 1995-2002 Mark Adler
1.1.4
phil@polaris:/home/phil 32
=

Checking out the server:
=
root@hadar:/root 20 which rsync
/usr/bin/rsync
root@hadar:/root 21 which rsyncd
/usr/sbin/rsyncd
root@hadar:/root 22 ls -l /usr/bin/rsync
-rwxr-xr-x1 root root   548639 Jun 22 03:47 /usr/bin/rsync
root@hadar:/root 23 ls -l /usr/sbin/rsyncd
-rwxr-xr-x1 root root   548639 Jun 22 03:47 /usr/sbin/rsyncd
root@hadar:/root 24 cmp /usr/bin/rsync /usr/sbin/rsyncd
root@hadar:/root 25 strings /usr/bin/rsync | fgrep 1.1.4
1.1.4
 deflate 1.1.4 Copyright 1995-2002 Jean-loup Gailly
1.1.4
 inflate 1.1.4 Copyright 1995-2002 Mark Adler
1.1.4
root@hadar:/root 26 strings /usr/sbin/rsyncd | fgrep 1.1.4
1.1.4
 deflate 1.1.4 Copyright 1995-2002 Jean-loup Gailly
1.1.4
 inflate 1.1.4 Copyright 1995-2002 Mark Adler
1.1.4
root@hadar:/root 27
=

-- 
-
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ |
-

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Improving the rsync protocol (RE: Rsync dies)

2002-05-20 Thread Phil Howard

On Mon, May 20, 2002 at 10:58:33PM +1000, Donovan Baarda wrote:

| On Mon, May 20, 2002 at 09:35:04PM +1000, Martin Pool wrote:
|  On 17 May 2002, Wayne Davison [EMAIL PROTECTED] wrote:
|   On Fri, 17 May 2002, Allen, John L. wrote:
| [...]
|  I've been thinking about this too.  I think the top-level question is
|  
|Start from scratch with a new protocol, or try to work within the
|current one?
| 
| tough question... to avoid backwards breakage and yet implement something
| significantly better you would probably have to make two rsyncs in one
| executable; the new protocol, and the old one for a compatible fallback
| talking to old versions. After enough time had passed and all old rsync
| implementations had been purged, the old code could be dropped, leaving a
| nice clean small solution.
| 
| I tend to think that once a delta compressing http extension gets mainstream
| acceptance, rsync will fade away _unless_ it offers significantly better
| performance by avoiding http overheads (which is why ftp lives on, despite
| being a bastard protocol from hell).

I don't see how HTTP can hope to displace RSYNC.  Both have flaws, but HTTP
has way more.  Work is chopping away at them, but the results are often not
very good.  Persistent connections, for example, are a bastardization that
just isn't handled effectively (for example it almost always breaks with
dynamic content such as CGI).

IMHO, it is HTTP that needs to be rewritten entirely from scratch.

Also, I'm really not in favor of so many of these one size fits all kinds
of solutions.  An example is XML.  It's a format.  But as we try to overload
it more and more as a solution to all problems, we just end up squishing it
down to being a yet-another-layer in an ever growing stack, which only ends
up with more and more needless layers of software.

As for RSYNC, it has a basic purpose.  As long as it stays focused, then it
can serve that purpose well.  Of course I have ideas to make it better, but
I certainly don't want to see it go away even if none of those are ever
adopted (it's too useful the way it is).

-- 
-
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ |
-

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: exit code 23 - inappropriate error for copying symlinks?

2002-05-20 Thread Phil Howard

On Mon, May 20, 2002 at 10:59:41AM -0400, Crisler, Jon wrote:

| Error 23 is an incomplete copy.  As I understand it, there is an
| inconsistancy between the first check for differences, and the start of the
| copy process.
| 
|  I run into this all the time, but for my implemention it is a legitimate
| error.

If the receiver creates the symlink, isn't that complete?  If so, why
would there need to be an error just because the symlink happens to be
dangling/lame?

If the source tree has a dangling symlink, IMHO, it should be copied
and made dangling in the destination, without that being treated as an
error (there are tools to scan file trees for dangling links).  You
wouldn't think of treating a device node file as an error just because
the device it identifies doesn't happen to be configured in the system,
would you?

OTOH, if the symlink is valid at the source, but points to something
that cannot be copyed by rsync (outside the tree or excluded), and so
it would become dangling at the destination, then I can see treating
that as an error in some way.

These semantics are often difficult to deal with, and so options that
describe what the user prefers to happen seem to be the solution.  One
thing I'd like to see is symlinks that work their way through points
outside of what rsync is copying and back in (such as absolute links,
but also including relative links going outside then back in), be
translated so they point to the same object using the same means on
the destination system.  For example an absolute link would become an
absolute link on the destination, but changed to reflect the change in
filesystem point in the destination (if actually changed).  Relative
would be more complicated.  Clearly an option is needed to enable this
as it can potentiall break established things not expecting this.

-- 
-
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ |
-

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



limiting metadata updates

2002-04-25 Thread Phil Howard

Is there a way to specify to rsync that metadata should NOT be updated
unless the object was created, or the contents of a file was modified.

Specifically, what I want to do is use rsync to install a file tree
of a few changes (a package being replicated to multiple machines after
it has been compiled).  The problem is that the file tree created to
hold the files from the install does not have the correct meta data in
the directories unless the package is installing that directory.  For
example /usr/bin has permission 0755 normally, but I find that same
directory within the directory derived from building the package does
not have 0755.  Sometimes directories are owned or have a group other
than root for a reason.  When I replicate these few files over to the
target systems, I want the metadata on the target systems to stay as it
was, and not be updated by the package (except of course for new ones
created by the package).

Any way to do this in rsync?

I'd prefer not to have to reproduce that metadata back into the package
tree being built (this would still cause the metadata to change, though
it would not be changed differently).  The reason is that it may be the
case at some point that the metadata is not universally the same across
different machines and needs to be retained on each machine the way it
was set up originally.

I can do this by staging the tree in /tmp first and running a complex
script to apply the updates.  I would prefer to find something cleaner
that already does it right, if possible.

-- 
-
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ |
-

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: delete fails to delete everything it should like dangling symlinks

2002-03-08 Thread Phil Howard

On Fri, Mar 08, 2002 at 09:46:46AM -0700, [EMAIL PROTECTED] wrote:

| Try it without --delete-after.  I'm pretty sure that --delete-after also 
| affects --force, so I think it's trying to write to the directory pointed 
| to by the symlink, which doesn't exist.  I could be wrong, but that looks 
| like the most likely interaction.

Same problem:

=
root@pollux:/home/root/src 347 rsync --checksum --copy-unsafe-links --compress 
'--exclude=*~' '--exclude=#*#' --force --group --links --owner --partial --perms 
--progress --recursive --timeout=150 --times --verbose --delete --delete-excluded 
--ignore-errors --force /home/root/src-pub/ /home/root/src/
rsync: building file list...
rsync: 809 files to consider.
readlink openssl-0.9.6b/work/openssl-0.9.6b/crypto/des/asm/perlasm: No such file or 
directory
readlink openssl-0.9.6c/tmp/openssl-0.9.6c/crypto/des/asm/perlasm: No such file or 
directory
wrote 34720 bytes  read 20 bytes  2395.86 bytes/sec
total size is 85523958  speedup is 2461.83
rsync error: partial transfer (code 23) at main.c(576)
root@pollux:/home/root/src 348
=

The --copy-unsafe-links seems to be the culprit.  If I take that out it then
works to delete these symlinks.  ATM I can't recall why I put that in.  I'm
sure I'll find out soon enough unless it was some bug from an older version
that is now fixed.

-- 
-
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ |
-

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



delete fails to delete everything it should like dangling symlinks

2002-03-07 Thread Phil Howard

I think someone posted this before, but I can't find it in the archives.

I am using rsync to pull down source files to be compiled.  The delete
options are used to clear out any old files left over from previous.
Normally this works.  I've run into one case where it persistently fails.

Within the directory created during compiling is a symlink to another
directory, also created during compiling (well, literally, during tar
extraction done by the script that does the compiling).  When rsync is
run to re-syncronize, which should delete all created files (including
all those extracted from the tar file), the directory that is the target
of the symlink apparently is deleted first.  Then when the symlink is
encountered, I get an error saying that readlink gets No such file or
directory.  This doesn't make sense since readlink should work on a
dangling link.

Here's a paste of manually doing the rsync within the same host
(warning, the first line is 318 characters long):

=
root@pollux:/home/root/src 152 rsync --checksum --copy-unsafe-links --compress 
'--exclude=*~' '--exclude=#*#' --force --group --links --owner --partial --perms 
--progress --recursive --timeout=150 --times --verbose --delete --delete-after 
--delete-excluded --ignore-errors --force /home/root/src-pub/ /home/root/src/
rsync: building file list...
rsync: 809 files to consider.
readlink openssl-0.9.6b/work/openssl-0.9.6b/crypto/des/asm/perlasm: No such file or 
directory
readlink openssl-0.9.6c/tmp/openssl-0.9.6c/crypto/des/asm/perlasm: No such file or 
directory
wrote 34720 bytes  read 20 bytes  4087.06 bytes/sec
total size is 85523954  speedup is 2461.83
rsync error: partial transfer (code 23) at main.c(576)
root@pollux:/home/root/src 153 readlink 
openssl-0.9.6c/tmp/openssl-0.9.6c/crypto/des/asm/perlasm
../../perlasm
root@pollux:/home/root/src 154 rsync --version
rsync  version 2.5.2  protocol version 26
Copyright (C) 1996-2002 by Andrew Tridgell and others
http://rsync.samba.org/
Capabilities: 64-bit files, socketpairs, hard links, symlinks, batchfiles, IPv6,
  32-bit system inums, 64-bit internal inums
root@pollux:/home/root/src 155
=

As you can see I'm using a number of options that should, according
to the man page, make sure things get deleted.  But this is not
happening.

I assume that partial transfer (code 23) error message is because
it did recognize that it failed to completely get things in sync.

BTW, after manually deleting the 2 symlinks, it works fine:

=
root@pollux:/home/root/src 155 rm -f 
openssl-0.9.6b/work/openssl-0.9.6b/crypto/des/asm/perlasm
root@pollux:/home/root/src 156 rm -f 
openssl-0.9.6c/tmp/openssl-0.9.6c/crypto/des/asm/perlasm
root@pollux:/home/root/src 157 rsync --checksum --copy-unsafe-links --compress 
'--exclude=*~' '--exclude=#*#' --force --group --links --owner --partial --perms 
--progress --recursive --timeout=150 --times --verbose --delete --delete-after
--delete-excluded --ignore-errors --force /home/root/src-pub/ /home/root/src/
rsync: building file list...
rsync: 809 files to consider.
deleting directory openssl-0.9.6c/tmp/openssl-0.9.6c/crypto/des/asm
deleting directory openssl-0.9.6c/tmp/openssl-0.9.6c/crypto/des
deleting directory openssl-0.9.6c/tmp/openssl-0.9.6c/crypto
deleting directory openssl-0.9.6c/tmp/openssl-0.9.6c
deleting directory openssl-0.9.6c/tmp
deleting directory openssl-0.9.6b/work/openssl-0.9.6b/crypto/des/asm
deleting directory openssl-0.9.6b/work/openssl-0.9.6b/crypto/des
deleting directory openssl-0.9.6b/work/openssl-0.9.6b/crypto
deleting directory openssl-0.9.6b/work/openssl-0.9.6b
deleting directory openssl-0.9.6b/work
wrote 34720 bytes  read 20 bytes  4087.06 bytes/sec
total size is 85523954  speedup is 2461.83
root@pollux:/home/root/src 158
=

-- 
-
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ |
-

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Why does one of there work and the other doesn't

2001-12-03 Thread Phil Howard

On Sun, Dec 02, 2001 at 09:31:25PM -0500, Mark Eichin wrote:

|  Perhaps a trailing / instead of training /. is supposed to work.  I do
|  not remember why I didn't start using it, but I am sure I would have tried
| 
| Quite possibly because you've been bitten by class cp/rcp; cp is not
| idempotent, in that if you cp -r foo bar where foo is a dir and bar
| doesn't exist, you end up with a bar that has the contents of foo
| (ie. foo/zot - bar/zot) and if you do it twice, cp sees that bar is a
| dir and inserts it instead (so foo-bar/foo, foo/zot-bar/foo/zot.)
| TO make it worse, on BSD-ish systems, traditionally adding a trailing
| slash makes it treat bar as a directory (bar/ == bar/ == bar/.), but
| under sysv-ish systems it doesn't change the interpretation (bar/ ==
| bar, even if bar doesn't exist.)
| 
| Partially *because* of this horror, rsync is (and is documented to be)
| consistent, and to have an explicit interpretation of trailing slash
| (that is consistent with bar/ == bar/. as far as destinations are
| concerned)  and is independent of the existence of the destination, so
| you can expect it to do the same thing when run twice.  This is one
| reason i'll often run rsync -a on local files rather than cp -r...

I have certainly been bitten by that, and it is not limited to cp and rcp,
either.  Another example I know that has bitten with disasterous effects
is the ln -s command.  If the destination does not exist, it puts a symlink
there.  If the destination exists and is (even if is means a symlink that
points to) a directory, it puts the new symlink in the directory named.
So doing ln -s twice with a directory target can produce two different
symlinks.  Even hard links have a problem with target directories, although
the twice issue is not relevant since you can't hard link a directory
itself (if your system is not broken, unlink pre-ptx Dynix way back in time).
On some systems the -n option gets around this on ln -s.

I'll do some tests with training / instead of /. to see if that works
for me now with 2.5.0.  It may have been a bug in an older version.  If I
get any unexpected results with 2.5.0 I'll report back with those.

Consistency is a great value.

-- 
-
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ |
-




Re: rsync internationalization?

2001-12-03 Thread Phil Howard

On Mon, Dec 03, 2001 at 12:16:46AM +1100, Martin Pool wrote:

| Does anybody care about supporting non-English message locales in
| rsync?  (Do all sysadmins speak English? :-)  Would anybody contribute
| translations if we had the framework?

Based on the quality of security and non-open-relay configurations I
have seen throughout the internet, I'd say that the vast majority of
sysadmins do not read English (or even any language) at all :-)

As long as the internationalization does not stuff the executeable with
messages of all languages, I'm all for it.  For messages that are going
to be in the executeable, it might be nice to have a configure time
option to specify a language to be integrated if you want something
different than the default.

I'd contribute Texan but I've only lived in Texas for about 8 years so
I really don't know all of the language here, yet :-)

-- 
-
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ |
-




Re: Why does one of there work and the other doesn't

2001-12-03 Thread Phil Howard

On Mon, Dec 03, 2001 at 09:55:53AM -0700, [EMAIL PROTECTED] wrote:

| rsync already has a memory-hogging issue.  Imagine having it search your 
| entire directory tree, checksumming all files, storing and sending them 
| all, comparing both lists looking for matching date/time/checksums to 
| guess where you've moved files to.  You'd be better off to use a wrapper 
| the tools you move files with, keeping a replayable log, and have your 
| mirrors retrieve and replay that log, before doing the rsync.

I don't think so.  I would like to have that kind of smart capability be
fully integrated into a useful tool.  And rsync already has most of the
pieces such a thing would need in place.  I am NOT suggesting that it be
the default.  As you say, it would be memory hogging.  But it is already
memory hogging now, and adding a checksum for every file in the tree would
be 32 bytes more per file.

In some cases I definitely want LESS memory hogging, such as replicating
trees of millions of files.  In other cases I do want the checksumming to
get LESS files redundantly transferred.

What I have done in the past to accomplish it is to build a tar file of the
entire tree on both sides, then sync the tar files making sure the rsync
blocksize matches correctly.  That still takes a lot of time because rsync
is sending a LOT of checksum for small blocks.  If I could get tar to build
the tar file with the files on very large block boundaries, then I could
specify a larger blocksize to rsync and do the transfer much faster.  But
it would make just as much sense to just send a checksum per file, and, in
cases where a whole file checksum matches (though at a different name on
the destination) to copy, hardlink, or move (as appropriate) the file to
the new location.

Inventing a whole new tool to do this when rsync has most of the logic of
it in place is absurd.  I just don't understand the actual rsync internals
or protocol enough to accomplish such a patch myself, so my only option is
to offer the suggestion and hope someone likes it.  Again, I am not
suggesting that it be the default option, so it would nt impact anyone
unless they wanted it to.

-- 
-
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ |
-




Re: Why does one of there work and the other doesn't

2001-12-01 Thread Phil Howard

On Fri, Nov 30, 2001 at 07:42:17AM -0700, [EMAIL PROTECTED] wrote:

| from man rsync:
|  a trailing slash on the  source  changes  this  behavior  to
|  transfer all files from the directory src/bar on the machine
|  foo into the /data/tmp/.  A trailing  /  on  a  source  name
|  means  copy  the  contents  of  this directory.  Without a
|  trailing slash it means copy the directory.  This  differ-
|  ence  becomes particularly important when using the --delete
|  option.
| Wonderful things, those manuals.  Warning:  in my experience, this gives 
| unpredictable results.  it does NOT, in fact, always detect all the 
| content of the directory, and as a result, a --delete can have 
| catastrophic consequences.  I have not had time to try to figure out why 
| this happens, but my few tests aren't even repeatable... if there are more 
| than maybe 10 entries in the directory, something is always left out, but 
| rarely the same thing twice.  Needless to say, I never use that syntax.

If the source is a file and the destination is a file, or non-existant,
then you get a straight replication.  However, if the destination is a
directory, it puts the file _into_ the directory.  And this happens even
if the source is a directory (i.e. the source directory goes _into the
destination directory).  This is classic UNIX behaviour, and from that I
presume correct for rsync.  However, this behaviour (be it in rsync or
anywhere else, such as cp) is a big pitfall.  On a local machine, it's
easy enough to test the target before executing the command.  On a remote
it's somewhat more cumbersome.

I have found that for rsync, when I want to replicate a directory from one
machine to another and want to be certain that I am not putting one into
the other, but instead making one become another (e.g. treated as peers)
that the syntax of putting /. at the end of both source and destination
does the trick.  Whatever is _in_ the source goes _into_ the destination.
does require the destination must exist in advance, else the reference to
/. in the destination will fail (and thus so will the transfer).  But at
least I can do ssh to make the directory first (which might fail if the
destination is already a file, but I don't have to worry about getting a
reliable status from ssh concerning that because rsync will subsequently
fail in that case, too).  The end result is I get expected results or I
get a failure, but I don't get unexpected results (like filling up a disk
because files went in the wrong place, or deleting unintended files).

Perhaps a trailing / instead of training /. is supposed to work.  I do
not remember why I didn't start using it, but I am sure I would have tried
it, so maybe I encountered that problem.  But /. on the end works for me
and is what I have been using in all my backup scripts.

-- 
-
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ |
-




linking rsync-2.5.0 to libz.so

2001-12-01 Thread Phil Howard

There does not seem to be an option in configure to get rsync 2.5.0
to link libz as a shared library.  Is there any way to do this?

-- 
-
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ |
-




rsync transfers of data from Windows to Unix

2001-09-13 Thread Phil Howard

Are there any clients and/or servers for Windows (clients only
for Win98/ME) which can use the rsync protocol, or especially
rsync over SSL (e.g. like stunnel, not ssh), which would allow
setting up some well controlled and secure bulk file exchanging
between Windows an Unix?  SMB is not going to be an option and
a VPN may not be an option, either (there are technical reasons
for that but they are outside the scope of an rsync discussion).
FTP is already used in one of the cases I'm exploring and it is
problematic, and I would prefer to go with the rsync logic to
transfer the data since it is more of a syncronization kind of
thing anyway.

--
-
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ |
-




rsync and SSL

2001-09-13 Thread Phil Howard

I'm finding even less on rsync and SSL.  I would have imagined someone
would have done something with this already, but apparently not.  So
I guess I need to ask and see for sure: has anyone worked on issues of
using rsync via SSL, such as with stunnel?  I want to have encrypted
access, either anonymous or authenticated, but without granting any SSH
access to anyone (e.g. the rsync users won't be in the /etc/passwd
user space).

-- 
-
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ |
-




Re: Two issues (rsync 2.4.6)

2001-07-13 Thread Phil Howard

Paul D. Smith wrote:

 I have an anon rsync (2.4.6) daemon set up in my inetd on a Solaris 2.6
 server.

 I'm seeing two strange things, and I wonder if anyone has any comments.
 I tried using Google and other searches for these but didn't come up
 with anything very useful (a list of error messages rsync generates
 along with their meanings would be great bonus!)

  1) At the beginning of my transfers (on the client side), I see this
 message:

   IO error encountered - skipping file deletion

 Does that mean it's not deleting any files?  Why?

I suspect that it got an I/O error in a situation (like reading a
directory) that creates a non-zero probability that the list of
files that should not be deleted is incomplete.  The deletion is
skipped in order to avoid the risk of deleting something that should
not be deleted.

I would guess it's not deleting any files because it cannot be sure
that it would be deleting only the ones really to be deleted.  The
thing to do is find out why there is an I/O error, or what it thinks
is an I/O error.  It may, for example, be getting an incomplete list
from the server.


  2) At the end of a transfer that seems otherwise successful, in the
 server log I invariably see this message:

   transfer interrupted (code 11) at main.c(295)

 As far as I can tell it happens _every_ time a transfer is
 completed.  Also as far as I can tell, the transfer appears to
 work... but this seems like something that might be serious.

Is this via RSH/SSH or the rsyncd port?  This could be related to #1.

--
-
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ |
-




Re: Anti-hang comments?

2001-07-05 Thread Phil Howard

Dave Dykstra wrote:

 You shouldn't have to have it be in the foreground in order for strace -f

You're right, I was not aware of that option.  And I thought I
knew my way around strace.

Here's what strace shows me:

[pid 14576] open(/tmp/rsyncd.lock, O_RDWR|O_CREAT|0x8000, 0600) = 4
[pid 14576] fcntl(4, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0

But the source looks just right:

connection.c[39,42]:
/* find a free spot */
for (i=0;imax_connections;i++) {
if (lock_range(fd, i*4, 4)) return 1;
}

util.c[494,506]:
/* lock a byte range in a open file */
int lock_range(int fd, int offset, int len)
{
struct flock lock;

lock.l_type = F_WRLCK;
lock.l_whence = SEEK_SET;
lock.l_start = offset;
lock.l_len = len;
lock.l_pid = 0;

return fcntl(fd,F_SETLK,lock) == 0;
}

I guess maybe there's a library issue involved.  But why it would
stomp on a structure element is unclear.  I'm putting together a
couple new systems with Slackware 8.0 which has glibc 2.2.3 so
I'll probably just first try it on there and see if the problme
persists or not.

-- 
-
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ |
-




Re: Anti-hang comments?

2001-07-04 Thread Phil Howard

Wayne Davison wrote:

 We certainly do need to be careful here, since the interaction between
 the various read and write functions can be pretty complex.  However, I
 think that the data flow of my move-files patch stress-tests this code
 fairly well, so once we've done some more testing I feel that we will
 not leave rsync worse off than it was before the patch.
 
 Along those lines, I've been testing the new code plus I ported a
 version of my move-files patch on top of it.  The result has a couple
 fewer bugs and seems to be working well so far.
 
 The latest non-expanding-buffer-nohang patch is in the same place:
 
 http://www.clari.net/~wayne/rsync-nohang2.patch
 
 and the new move-files patch that works with nohang2 is here:
 
 http://www.clari.net/~wayne/rsync-move-files2.patch
 
 I'll keep banging on it.  Let me know what you think.

So far it is working for me.  Now I can kill my client side and know
that my daemon side will properly close down and exit and not leave
a dangling lock.

But the problem I still have (not quite as bad as before because of
no more hangs) is that the locks to control the number of daemons is
still working wrong.  It's still locking the whole lock file instead
of the first lockable 4 byte record.  I still don't know if it is
rsync or Linux causing the problem.  The code in both looks right to
me.  But lslk shows:

SRC   PID  DEV   INUM SZ TY M ST WH END LEN NAME
rsyncd  24401  3,5 44  0  w 0  0  0   0   0 /tmp/rsyncd.lock

(note, I've been moving the lock file around to see if it might be
sensitive to filesystem mounting options I'm using, etc).

I'd like to find a way to start rsync in daemon mode AND leave it
in the foreground so I can run it via strace and maybe see if the
syscall is being done right.

-- 
-
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ |
-




lingering rsync daemons

2001-06-28 Thread Phil Howard

If transfers complete normally, all the processes go away.  But if I
hit ^C to stop the client side when connected to an rsync daemon, the
daemon does not go away.  It also still has the file it was sending
open for read (according to lsof).  Also, lsof shows the socket
descriptor is still open, although netstat shows the connection in
CLOSE_WAIT for about 2 to 4 seconds, then gone.  That sounds like
it is doing a shutdown() but not closing the descriptor.

This is version 2.4.6 with the no-hang patch, but the problem was
also seen without that patch (so I know it didn't cause it).

-- 
-
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ |
-




Re: unexpected EOF in read_timeout

2001-05-29 Thread Phil Howard

Dale Phillips wrote:

 I am trying to rsync a rather large amount of 
 data. (i.e. clone a box) - What causes the
 
 unexpected EOF in read_timeout?

I get these.  Apparently it is ssh.  Are you running this through ssh?

-- 
-
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ |
-




problems encountered in 2.4.6

2001-05-25 Thread Phil Howard
 26704root  memREG3,279276   4239 /lib/ld-2.1.3.so
rsync 26704root  memREG3,2  1013224   4249 /lib/libc-2.1.3.so
rsync 26704root  memREG3,240360   4274 
/lib/libnss_compat-2.1.3.so
rsync 26704root  memREG3,275500   4272 /lib/libnsl-2.1.3.so
rsync 26704root0u  unix 0xce9cd9a0   135813769 socket
rsync 26704root2w   REG3,4568981778435 
/home/root/backup-hda-to-hdb/home.log
rsync 26704root6u  unix 0xcecaccc0   135814164 socket

If I kill 26704, nothing particularly happens.  If I kill 26652,
then I get unexpected EOF in read_timeout (not surprising), which
tells me 26651 was waiting for something from 26652 which was not
happening.  Maybe 26652 was waiting for something from 26651 first.
A deadly embrace?  It seems possible.

I'm also curious why 26704 has no fd 1.

3 =
@ERROR: max connections (16) reached - try again later

This occurs after just one connection is active.  It behaves as if
I had specified max connections = 1.  On another server I set it
to 40, and it showed:

@ERROR: max connections (40) reached - try again later

so it obvious is parsing and keeping the value I configure, but it
isn't using it correctly.

Also, if I ^C the client, then I get this error every time until I
restart the daemon (running in standalone daemon mode, not inetd).
So it seems like it counts clients wrong.  But I can't get more
that 1 right after restarting the server, so it's a little more
than that somewhere.

===

-- 
-
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ |
-




Re: problems encountered in 2.4.6

2001-05-25 Thread Phil Howard

Dave Dykstra wrote:

  2 =
  When syncronizing a very large number of files, all files in a large
  partition, rsync frequently hangs.  It's about 50% of the time, but
  seems to be a function of how much work there was to be done.  That
  is, if I run it soon after it just ran, it tends to not hang, but if
  I run it after quite some time (and lots of stuff to syncronize) it
  tends to hang.  It appears to have completed all the files, but I
  don't get any stats.  There are 3 rsync processes sitting idle with
  no files open in the source or target trees.
  
  At last count there were 368827 files and 8083 symlinks in 21749
  directories.
  
  df shows:
  /dev/hda4 42188460  38303916   3884544  91% /home
  /dev/hdb4 42188460  38301972   3886488  91% /mnt/hdb/home
  
  df -i shows:
  /dev/hda42662400  398419 2263981   15% /home
  /dev/hdb42662400  398462 2263938   15% /mnt/hdb/home
  
  The df numbers are not exact because change is constantly happening
  on this active server.  Drives hda and hdb are identical and are
  partitioned alike.
  
  The command line is echoed from the script that runs it:
  
  rsync -axv --stats --delete /home/. /mnt/hdb/home/.  
1'/home/root/backup-hda-to-hdb/home.log' 21
 
 
 Use the -W option to disable the rsync algorithm.  We really ought to make
 that the default when both the source and destination are local.

I don't want to copy everything every time.  That's why I am using
rsync to do this in the first place.  I don't understand why this
would be what's hanging.

  A deadly embrace?  It seems possible.
 
 
 No, the receiving side of an rsync transaction splits itself into two
 processes for the sake of pipelining: one to generate checksums and one to
 accept updates.  When you're sending and receiving to the same machine then
 you've got one sender and 2 receivers.

Right.  But what I was suggesting was a deadly embrace in that the
process killed was waiting for something, and the parent was waiting
for something.

I'm not using the c option, so why would checksum be generated?

  I'm also curious why 26704 has no fd 1.
 
 I don't know.  When I tried it all 3 processes had an fd 1.

Were you looking at it after it hung?  Or is it not hanging for you?
I am curious if the lack of fd 1 is related to the hang.  It is being
started with 1 and 2 redirected to a log file _and_ the whole thing
is being run via the script command for a big picture logfile.
It was set up this way with the intent to run it from cron, although
I haven't actually added it to crontab, yet, due to the problems.


  3 =
  @ERROR: max connections (16) reached - try again later
  
  This occurs after just one connection is active.  It behaves as if
  I had specified max connections = 1.  On another server I set it
  to 40, and it showed:
  
  @ERROR: max connections (40) reached - try again later
  
  so it obvious is parsing and keeping the value I configure, but it
  isn't using it correctly.
  
  Also, if I ^C the client, then I get this error every time until I
  restart the daemon (running in standalone daemon mode, not inetd).
  So it seems like it counts clients wrong.  But I can't get more
  that 1 right after restarting the server, so it's a little more
  than that somewhere.
 
 I don't know, I never used max connections.  Could indeed be a bug.
 The code looks pretty tricky.  It's trying to lock pieces of the file
 /var/run/rsyncd.lock in order for independent processes to coordinate. 
 Are you running as root (the lsof above suggests you are)?  If not, you
 probably need to specify another file that your daemon has access to in the
 lock file option.  Otherwise it would probably help for you to run some
 straces.

I would have presumed since there was a daemon process running
(as opposed to running from inetd) that the daemon itself could
simply track the connection count.

One possibility here is that I do have /var/run symlinked to /ram/run
which is on a ramdisk.  So the lock file is there.  The file is there
but it is empty.  Should it have data in it?  BTW, it was in ramdisk
in 2.4.4 and this max connections problem did not exist, so if there
is a ramdisk sensitivity, it's new since 2.4.4.

-- 
-
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ |
-




Re: problems encountered in 2.4.6

2001-05-25 Thread Phil Howard

David Bolen wrote:

 The discovery phase will by default just check timestamps and sizes.
 You can adjust that with command line options, including the use of -c
 to include a full file checksum as part of the comparison, if for
 example, files might change without affecting timestamp or size.
 
 Once rsync knows what it needs to transfer, then it works its way
 through the file list, and for each file it performs a transfer.  By
 default, that transfer is the rsync protocol - which involves the full
 process of dividing the file into chunks with both a strong and
 rolling checksum, and doing the computations to figure out what parts
 to send and so on.

This is where the docs were a bit confusing.  There was no clear
distinction of checksum types related to the -c option.  This implied
to me that w/o -c there would be no checksum at all, and what I thought
the behaviour would be was what I now understand it to be with -W.


 That's why the -W option is really the only logical thing to use with
 a single rsync and local (on-system or network share/mount) copies.
 Under such circumstances, the rsync protocol isn't going to help at
 all, and will probably slow things down and take more memory instead.
 With -W rsync becomes an intelligent copier (in terms of figuring out
 what changed), but that's about it.

Actually, the lack of -W isn't helping me at all.  The reason is that
even for the stuff I do over the network, 99% of it is compressed with
gzip or bzip2.  If the files change, the originals were changed and a
new compression is made, and usually most of the file is different.

It definitely helped for transferring ISO images where the whole image
would be changed if some files changed.  I set the chunk size to 2048
for that.  Why it defaults to 700 seems odd to me.

There is a feature I would like, and I notice that even with -c this
does not happen, but I think it could based on the way rsync works.
What I'd like to have is when a whole file is moved from one directory
to another, rsync would detect a new file with the same checksum as an
existing (potentially to be deleted) file, and copy, move, or link, as
appropriate.  In theory this should apply to anything anywhere in the
whole file tree being processed.

-- 
-
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ |
-