Re: using disk instead of tape

2006-09-12 Thread Ian Turner
On Friday 08 September 2006 17:03, you wrote:
> A few years ago I was doing a forensics security review for a client that
> had data that needed to be erased VERY reliably.  The determination was
> that because even IDE disks did remapping internally, it would be possible
> for previously written data to be inaccessible to a program writing random
> data over the whole disk several times.

Yes. Many drives provide a manufacturer-specific API that lets you force a 
low-level format, but at the portable level, there is no way to ensure this.

Cheers,

--Ian
-- 
Zmanda: Open Source Data Protection and Archiving.
http://www.zmanda.com


Re: using disk instead of tape

2006-09-09 Thread Josef Wolf
On Tue, Sep 05, 2006 at 05:17:31PM -0500, Phil Howard wrote:

> And you could still do a "bare metal" recovery as long as the partitions
> on disk were compatible (which is why, if I write such a driver in Linux
> I would use the MSDOS partition table format).

Some OS screw up when the partition table is modified.  This is why fdisk
gives a warning.  Beeing paranoid, I always reboot after I modify the
partition table.

I don't think it is really worth the effort to implement such a
functionality.  I bet in most installations compression/encryption/network
is the real bottlenek.


Re: using disk instead of tape

2006-09-08 Thread Phil Howard
On Fri, Sep 08, 2006 at 02:46:48PM +0200, Geert Uytterhoeven wrote:

| On Fri, 8 Sep 2006, Ronan KERYELL wrote:
| > Third, what about bad blocks on disk? How to skip them in a raw partition
| > if you do not have state-of-the-art disks that do block remapping for you
| > in your back-yard (such as SCSI)? Often FS do these tricks for you on
| > IDE disks for example.
| 
| These days IDE does that too.
| But if there are too many of them, you loose (same for SCSI).

A few years ago I was doing a forensics security review for a client that
had data that needed to be erased VERY reliably.  The determination was
that because even IDE disks did remapping internally, it would be possible
for previously written data to be inaccessible to a program writing random
data over the whole disk several times.  The only way to ensure that this
confidential data was destroyed was to grind the disk to dust, or at least
do so to the platters.  But modern IDE disks perhaps are indeed doing this.
I haven't had a bad sector on such a disk in years.

-- 
-
| Phil Howard KA9WGN   | http://linuxhomepage.com/  http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/   http://ka9wgn.ham.org/ |
-


Re: using disk instead of tape

2006-09-08 Thread Ian Turner
On Friday 08 September 2006 07:08, Ronan KERYELL wrote:
> First I would say it is possible to mkfs the disk before each new usage to
> have clean data structures with less overhead (no fragmentation...).

Not really necessary; on any modern filesystem (and a few very old ones), 
emptying the filesystem will clear any fragmentation that might have 
appeared.

> Secondly you could choose a file system optimized for big files and
> write-ahead only. It s possible to change the parameters of the FS to push
> even more this behaviour (how many cylinders? block size? no logging on
> the data, no block reserve for fast allocation...).

Well, there's no such thing as write-ahead (the kernel will guess the data you 
will write? :o) but as for big files, the best thing you can do at the FS 
layer is to use a large block size and no data journaling. Setting reserved 
blocks to zero is a good idea, as is using O_DIRECT (as discussed elsewhere).

> Third, what about bad blocks on disk? How to skip them in a raw partition
> if you do not have state-of-the-art disks that do block remapping for you
> in your back-yard (such as SCSI)? Often FS do these tricks for you on
> IDE disks for example.

Irrelevant. All modern drives (IDE included) since MFM have done automatic 
internal remapping.

> Well, IMHO, I would vote for a FS solution except if I have a real
> gain... :-)

As would I.
-- 
Forums for Amanda discussion: http://forums.zmanda.com/


Re: using disk instead of tape

2006-09-08 Thread Geert Uytterhoeven
On Fri, 8 Sep 2006, Ronan KERYELL wrote:
> Third, what about bad blocks on disk? How to skip them in a raw partition
> if you do not have state-of-the-art disks that do block remapping for you
> in your back-yard (such as SCSI)? Often FS do these tricks for you on
> IDE disks for example.

These days IDE does that too.
But if there are too many of them, you loose (same for SCSI).

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: using disk instead of tape

2006-09-08 Thread Ronan KERYELL
> On Tue, 5 Sep 2006 04:09:05 -0500, Phil Howard <[EMAIL PROTECTED]> said:

Phil> If tar can read from raw tape, it can read from raw disk.  I've
Phil> already done that several times for various things.  Bare metal
Phil> recovery will need at a minimum the tar or dump utility
Phil> depending on format used.

I've thought about this raw partition stuff and I'm a bit afraid like some
others on the list.

First I would say it is possible to mkfs the disk before each new usage to
have clean data structures with less overhead (no fragmentation...).

Secondly you could choose a file system optimized for big files and
write-ahead only. It s possible to change the parameters of the FS to push
even more this behaviour (how many cylinders? block size? no logging on
the data, no block reserve for fast allocation...).

Third, what about bad blocks on disk? How to skip them in a raw partition
if you do not have state-of-the-art disks that do block remapping for you
in your back-yard (such as SCSI)? Often FS do these tricks for you on
IDE disks for example.

Well, IMHO, I would vote for a FS solution except if I have a real
gain... :-)

-- 
  Ronan KERYELL   |\/  Tel:(+33|0) 2.29.00.14.15
  Département Informatique|/)  Fax:(+33|0) 2.29.00.12.82
  ENST Bretagne, CS 83818 KGSM:(+33|0) 6.13.14.37.66
  F-29238 PLOUZANÉ CEDEX  |\   E-mail: [EMAIL PROTECTED]
  FRANCE  | \  http://enstb.org/~keryell
   callto:ils.seconix.com/[EMAIL PROTECTED]



Re: Amanda vs. rsync vs. ... (was: Re: using disk instead of tape)

2006-09-06 Thread Geert Uytterhoeven
On Wed, 6 Sep 2006, Ian Turner wrote:
> On Wednesday 06 September 2006 04:23, Geert Uytterhoeven wrote:
> > So my ideal backup solution would be Amanda, with support for incrementally
> > storing backups at a remote location :-)
> 
> Well, Amanda does that, via incremental backups. What it doesn't do (because 
> of tool support) is incremental backups of individual files -- mostly because 
> we don't have (I'm not aware of) any tool that does that.

Except that from time to time you need a level 0, which is big. Switching to
pure-incremental doesn't help, since then you (a) need to keep the initial
level 0 forever and (b) restore will be painful since you have to go throughall
incrementals.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: Amanda vs. rsync vs. ... (was: Re: using disk instead of tape)

2006-09-06 Thread Ian Turner
On Wednesday 06 September 2006 04:23, Geert Uytterhoeven wrote:
> So my ideal backup solution would be Amanda, with support for incrementally
> storing backups at a remote location :-)

Well, Amanda does that, via incremental backups. What it doesn't do (because 
of tool support) is incremental backups of individual files -- mostly because 
we don't have (I'm not aware of) any tool that does that.

I am aware of how to create such a tool, however, and it's something we might 
do once Application API (or maybe Filter API; I'm not quite sure how to fit 
it in) lands.

Cheers,

--Ian
-- 
Wiki for Amanda documentation: http://wiki.zmanda.com/


Re: Amanda vs. rsync vs. ... (was: Re: using disk instead of tape)

2006-09-06 Thread Gene Heskett
On Wednesday 06 September 2006 04:23, Geert Uytterhoeven wrote:
>On Tue, 5 Sep 2006, Phil Howard wrote:
>> If you want all those benefits of restore, and don't mind having a disk
>> with a filesystem already on it, then why not use something like rsync
>> to make backups?  As long as you aren't working with over about a
>> million individual files, it works great.  It makes a replica of a
>> filesystem or multi-filesystem tree, and gives you direct access to
>> every individual file for restore purpose.  Use multiple disks to make
>> multiple backups. When backing up to a disk previously used, rsync
>> avoids the writing work for files not changed (according to matching
>> meta data, though this can be turned off).  And rsync works well over a
>> network via ssh.
>>
>> So I can't really understand your argument.  What you seem to
>> specifically want that dismisses raw disk might well be better served
>> with rsync instead of Amanda.  I might want Amanda, though, for huge
>> volume and speed.
>
>Now it starts to become interesting :-)
>
>This is actually what I've been in mind to post since a long time...
>First, let's say I use Amanda and vtapes to backup my home systems.
>
>I like Amanda, because it's simple to set up, robust, ease of recovery,
> ... However, storing backups offsite over the Internet (say, on a remote
> disk at a friend's place) is not an option, due to the monthly upload
> quota enforced by all ISPs here (in Belgium).
>
>I like rsync, since it only transfers what needs to be transfered. But it
>doesn't keep multiple days of backups and hard links can be tricky.

Writing a nearly identical rsync line for crontab, to be exec'd only on x 
day of the week, such that rsync uses a different directory on the raid 
for each (active) day of the week is one way to handle this problem.  
We've been doing that at the tv station for about 4 years now.

We've had to build a bigger raid of course, at least twice, starting at 
320GB but the last rebuild took it over the terrabyte marker by quite a 
bit.

Its been very handy.  We can lose a drive in a very important machine, 
replace it, re-install the os, then rsync its data from the raid, and have 
that machine back in service as if nothing ever happened in less than a 
day's elapsed time & with only an hour or 2 of actual, on the machine 
work.  And thats getting faster as gigabit cards and switches are being 
cycled into to replace the now aging 100base-T stuffs.

>I tried rdiff-backup, which keeps reverse-incrementals, but it can take
> lots of memory on the client side (i.e. not suitable to backup old
> machines) and doesn't work well with hard links.
>
>I also use duplicity, which keeps reverse-incrementals and supports
> encryption and authentication (nice for offsite backups of my digital
> pictures on a big scratch disk at work :-), but it can take lots of
> space on $TMPDIR on the client side, and it doesn't support hard links.
>
>So my ideal backup solution would be Amanda, with support for
> incrementally storing backups at a remote location :-)
>
>In theory, it should be possible to write a tool to take the tar archives
> as created by Amanda and calculate differentials, and reassemble the tar
> archives at the other end of the network pipe, right? Or are there
> better solutions?
>
One idea might be to have another drive located remotely, set it up 
similarly to the vtape lashup amanda is using, with a pair of crontab 
entries, one to re-cycle the 'data' link on the remote drive in a round 
robin fashion, and then rsync /path/to/data to the remotes /path/to/data 
sometime later in the morning after amanda has finished.  I've thought of 
doing that from here to my shops machine, but that mobo doesn't like 2 
drives on the same pata cable even if they are the same brand of drives.

Of course, looking at the bigger picture, if a fire took this house, but 
left the shop standing, I'd have a hell of a lot more important problems 
than recovering this machine...

>Gr{oetje,eeting}s,
>
>  Geert
>
>--
>Geert Uytterhoeven -- There's lots of Linux beyond ia32 --
> [EMAIL PROTECTED]
>
>In personal conversations with technical people, I call myself a hacker.
> But when I'm talking to journalists I just say "programmer" or something
> like that. -- Linus Torvalds

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2006 by Maurice Eugene Heskett, all rights reserved.


Amanda vs. rsync vs. ... (was: Re: using disk instead of tape)

2006-09-06 Thread Geert Uytterhoeven
On Tue, 5 Sep 2006, Phil Howard wrote:
> If you want all those benefits of restore, and don't mind having a disk
> with a filesystem already on it, then why not use something like rsync
> to make backups?  As long as you aren't working with over about a million
> individual files, it works great.  It makes a replica of a filesystem or
> multi-filesystem tree, and gives you direct access to every individual
> file for restore purpose.  Use multiple disks to make multiple backups.
> When backing up to a disk previously used, rsync avoids the writing work
> for files not changed (according to matching meta data, though this can
> be turned off).  And rsync works well over a network via ssh.
> 
> So I can't really understand your argument.  What you seem to specifically
> want that dismisses raw disk might well be better served with rsync instead
> of Amanda.  I might want Amanda, though, for huge volume and speed.

Now it starts to become interesting :-)

This is actually what I've been in mind to post since a long time...
First, let's say I use Amanda and vtapes to backup my home systems.

I like Amanda, because it's simple to set up, robust, ease of recovery, ...
However, storing backups offsite over the Internet (say, on a remote disk at a
friend's place) is not an option, due to the monthly upload quota enforced by
all ISPs here (in Belgium).

I like rsync, since it only transfers what needs to be transfered. But it
doesn't keep multiple days of backups and hard links can be tricky.

I tried rdiff-backup, which keeps reverse-incrementals, but it can take lots of
memory on the client side (i.e. not suitable to backup old machines) and
doesn't work well with hard links.

I also use duplicity, which keeps reverse-incrementals and supports encryption
and authentication (nice for offsite backups of my digital pictures on a big
scratch disk at work :-), but it can take lots of space on $TMPDIR on the
client side, and it doesn't support hard links.

So my ideal backup solution would be Amanda, with support for incrementally
storing backups at a remote location :-)

In theory, it should be possible to write a tool to take the tar archives as
created by Amanda and calculate differentials, and reassemble the tar archives
at the other end of the network pipe, right? Or are there better solutions?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: using disk instead of tape

2006-09-05 Thread Phil Howard
On Tue, Sep 05, 2006 at 12:23:08PM -0400, Ian Turner wrote:

| On Tuesday 05 September 2006 05:21, Phil Howard wrote:
| > On Mon, Sep 04, 2006 at 11:01:20PM -0400, Ian Turner wrote:
| > | On Saturday 02 September 2006 16:21, Phil Howard wrote:
| > | > It would not need to be separate for each OS.  The idea of using a
| > | > partition table isn't even the only approach.
| >
| > If all that is written is tar format, nothing more needs to be added.
|
| Ah, but if you ditch the partition table, then indeed more needs to be added.
| How else would you tell the end of one dump from the start of the next?

That would indeed be a limitation.  Using partitions would be better.  Not
doing so could still be an option for those that know they have no need to
do more than one dump per media.


| > I didn't keep any stats, or really do it scientifically.  Someone that
| > wants to should probably control for a lot of the variables that influence
| > it. But I do recall the speed improvement is about 25% to 30%.  I suspect
| > much of that is OS work bypassed with O_DIRECT.
|
| I suspect you incur a substantial performance penalty if other processes are
| using the disk concurrently, because then you only get one write() per
| elevator traversal.

The disk being used for backup would have to be dedicated.  You could get
away with doing it all entirely inside one partition of a non-dedicated
disk.  But I'm focused on the backups that go to external media, which can
be a real tape, if tape were a viable option, or to an external SATA disk
in a separate disk enclosure, plugged in as needed.

-- 
-
| Phil Howard KA9WGN   | http://linuxhomepage.com/  http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/   http://ka9wgn.ham.org/ |
-


Re: using disk instead of tape

2006-09-05 Thread Phil Howard
On Tue, Sep 05, 2006 at 10:09:10AM -0600, Charles Curley wrote:

| On Tue, Sep 05, 2006 at 04:09:05AM -0500, Phil Howard wrote:
| > On Sat, Sep 02, 2006 at 05:42:01PM -0600, Charles Curley wrote:
| > 
| > | In addition, it would make bare metal recovery more difficult. If you
| > | back up to a file system any Linux live CD (finnix, knoppix...) can
| > | read, recovery is easy: the tools you need are already on the live
| > | CD. The tools include a suitable file system driver for the
| > | partition. Back up to a bare partition, and you would need special
| > | tricks or possibly special software to read it.
| > 
| > If tar can read from raw tape, it can read from raw disk.  I've already
| > done that several times for various things.  Bare metal recovery will
| > need at a minimum the tar or dump utility depending on format used.
| 
| If you write to raw disk (e.g. tar -cf /dev/sdc.. ), how do you get
| more than one tarball onto the partition? As far as I know, there is
| nothing analogous to the no rewind tape device for disk drives. Amanda
| is quite capable of generating multiple tarballs per backup. So you
| will need multiple partitions, each large enough to hold the largest
| possible tarball.

If what you refer to is concatenating tarballs, e.g. as on tape with no
"tape mark" between them, then they would be written into the same disk
partition.  But if you refer to having each tarball, if on tape as being
separate tape files, e.g. writing a tape mark between, then in disk that
would be in a new partition.

Think of it in terms of how a tape-emulation-on-disk driver in the OS
would do it.  It would give the process the device semantics of a tape.
So the process can do the ioctl() to write a tape mark when it ends a
tape file and wants to write the next one.  When the TEOD driver gets
a request to write a tape mark, it would actually update the partition
table of the partition it was just writing to reflect exactly how much
was written.  Then it would add a 2nd partition, which for MSDOS style
partitions can be done by adding an "extended" entry in the previous
table pointing to the next sector after the 1st partition, and write a
new table there with a new partition entry (and a new dummy "extended"
entry sitting idle until a 3rd partition is needed after the 2nd tape
mark gets written).  When the process then proceeds to write the next
block of data, that data is written into the 2nd partition.  The TEOD
driver would remember the state of the emulated tape.  For example a
rewind operation would change the state to "file 1, offset 0".  An
operation to forward space 1 file would change the state to increment
the file number by 1 and reset the offset to 0.

If this were done as an OS driver, emulating a tape drive, Amanda would
not really know any difference (unless it does stuff like time how long
a rewind takes and go nuts because it took only a couple milliseconds).

And you could still do a "bare metal" recovery as long as the partitions
on disk were compatible (which is why, if I write such a driver in Linux
I would use the MSDOS partition table format).  You'd even have TWO ways
to do it.  If the OS driver is there during recovery, just access the
emulated tape device node as a tape device and proceed normally.  If the
OS driver is NOT there (because it was a module that has been lost and
needs to be recovered), then let tar read from disk partitions in the
order: 1,5,6,7,8,9,10... and so on, for each "tape file".

-- 
-
| Phil Howard KA9WGN   | http://linuxhomepage.com/  http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/   http://ka9wgn.ham.org/ |
-


Re: using disk instead of tape

2006-09-05 Thread Phil Howard
On Tue, Sep 05, 2006 at 10:36:56AM -0400, Gene Heskett wrote:

| Not workable at all IMO.  You cannot just willy-nilly rewrite the partition 
| table if you don't want to lose ALL the data in the next higher partition 
| and all those above it.  If that was the only partition, and you were 

Have you written to tape file 1, especially when there is more data this
time than previously, and then later recovered from tape file 2?

The semantics of overwriting partitions on disk, vis-a-vis the issue of
an earlier partition growing over a later partition, are just the same
as tape files on a tape.  Emulating a tape on a raw disk would not be
hard at all with respect to handling this.


| using the disks as tapes, meaning that for 20 'tapes' you'd have to have 
| 20 disks in carriers, then I assume it could be made to work.  The only 
| place I might be able to see a speed advantage is where the individual 
| dle's were less than 1k in total size as you would be skipping the file 
| opening and closing housekeeping along with the allocation searches.  But 
| as I point out in another post, disk speed, at least for me, is not a 
| factor to consider as its many times faster than some of the other 
| operations, like compression.  And my storage disk is slow, only a 7200 
| rpm'er.

Granted, disks do not come in "media changers".  My needs only need one
disk per backup cycle (400 GB each).  I'd use an external disk and plug
a new one in before the backup time, and unplug it when done.  Next time
a different disk is used.

I do need the speed.  And compression won't be a bottleneck for me because
I won't be using it (most of the data, MPEG/DV A/V files) are already about
as compressed as they ever will be (short of re-compressing to the latest
greatest MPEG4).

-- 
-
| Phil Howard KA9WGN   | http://linuxhomepage.com/  http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/   http://ka9wgn.ham.org/ |
-


Re: using disk instead of tape

2006-09-05 Thread Phil Howard
On Tue, Sep 05, 2006 at 10:21:36AM -0400, Gene Heskett wrote:

| And any perceived time saved advantages are lost by a factor of 20 or so 
| when software compression is in use.  Normal backups here are written at 
| 20-50 megabytes/second, but 'compress client best' on a 500 mhz K6 will be 
| slowed to about 50k/second or less for the compression phase.  Once the 
| compression is done, and its in the holding disk, then the actual write is 
| at 20-50 megs/second.  In no way is the speed of the disk a more than a 
| very very minor factor in the amount of time to do the backup here.
| 
| I personally fail to see the point of trying to bypass the filesystem as 
| being a speed bottleneck, its only a percent or three of the total time 
| doing the backups here.  Estimates and compression are the two places to 
| look at when configuring for speed.  If the storage capacity is there, 
| leave the compression out.  However, I selectively use it here on some 
| dle's, particularly those that will compress to less than 10% of the 
| original size, and that does take time when /usr/src on either machine is 
| several gigabytes.
| 
| YMMV, I have maybe 50 gigs at any one time, whereas some may have a 
| terrabyte or more, but thats my take on how relatively pointless (and 
| crippling to the basic premise of amanda) the proposed changes would be at 
| the end of the day.

I wouldn't be using compression.  I've found that when speed matters,
compression only gets in the way, big time.  At the pace disks are getting
bigger and bigger, compression becomes almost moot.  And most of my files
are already compressed.  One project I am considering this for would have
a few terabytes of files already compressed in MPEG and/or DV format.  So
I'd never use compression as the costs majorly outweigh the tiny advantange.

One big problem with a filesystem is the system itself.  It tries to cache
the data blocks and the system actually slows down because it steals pages
from other processes to accomplish that.  Writing such a massive amount of
data at one time is a big load on the system, which causes all processes
to suffer.  Writing to a raw device is different.  In BSD a specific raw
device node exists to bypass the caching.  In Linux, the O_DIRECT option
can be used when opening the device to achieve the same thing.  Writing
then goes directly to the disk and uses relatively little RAM and reduces
the amount of CPU needed, too.

-- 
-
| Phil Howard KA9WGN   | http://linuxhomepage.com/  http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/   http://ka9wgn.ham.org/ |
-


Re: using disk instead of tape

2006-09-05 Thread Phil Howard
On Tue, Sep 05, 2006 at 10:03:02AM -0400, Gene Heskett wrote:

| On Tuesday 05 September 2006 05:24, Geert Uytterhoeven wrote:
| >On Tue, 5 Sep 2006, Phil Howard wrote:
| >> On Sat, Sep 02, 2006 at 06:39:40PM -0400, Jon LaBadie wrote:
| >> | It certainly would destroy one of amanda's features,
| >> | the ability to easily recover backup data using
| >> | standard unix utilities without amanda software.
| >>
| >> How is that destroyed?
| >>
| >> Suppose you use tar format.  You can have tar read from tape directly,
| >> which is what I presume you mean for being able to recover outside of
| >> Amanda.  You can have tar read from disk partitions if the native
| >> partition scheme is used.
| >
| >At first I had the same reaction as you: it would work fine if you would
| > cycle your tapedev through the partitions.  However, then I realized a
| > tape can store multiple `files' sequentially, while a disk partition
| > can't (without hackerish that would annihiliate the easy recovery
| > again).
| >
| >So as long as you dump only one DLE, it would work fine. If you dump more
| > than one DLE, you need more logic.
| 
| I don't know how this conclusion was reached, but IMO its wrong.
| One of the beauties of amanda is that bare metal recoveries can be done 
| with nothing more than dd, tar(or dump if that what was used) and gzip.

And this could still be done with raw disk when using a compatible
(such as MSDOS) partitioning scheme, for the many OSes that support
MSDOS partitions (Linux, FreeBSD, NetBSD, OpenBSD, Solaris x86, at
least).


| Its far more trouble to locate a file you want on a sequential tape than it 
| is to locate it in a vtape.  The vtape itself is nothing more than a 
| subdir in a subdir in the filesystem of the hard drive.  Switching the 
| vtapes is as simple as replacing the link to the directory called data, 
| with a new link named data that points at the desired directory.

The raw disk would be like a raw tape, except that access to specific
"tape files" within would be much faster.  I don't claim that a raw
disk would have any of the benefits of backing up to a filesystem.


| for bare metal recovery, any of those files (in any order) can be accessed 
| with:
| #> tar xzf path-to-file-name-of-file
| 
| I've done it, it works, and its a whole lot EASIER than trying to locate a 
| file on a tape by scanning the tape until its finally found.  Hours 
| faster.  And note that dd wasn't used, its not required since the files 
| are random access, not sequential as on a tape.

FYI, at least on most systems I've worked with (all of those listed above)
even dd is not literally required.  Tar can read directly from tape or file.
It can even read directly from disk partitions (I've done so numerous times).

If you want all those benefits of restore, and don't mind having a disk
with a filesystem already on it, then why not use something like rsync
to make backups?  As long as you aren't working with over about a million
individual files, it works great.  It makes a replica of a filesystem or
multi-filesystem tree, and gives you direct access to every individual
file for restore purpose.  Use multiple disks to make multiple backups.
When backing up to a disk previously used, rsync avoids the writing work
for files not changed (according to matching meta data, though this can
be turned off).  And rsync works well over a network via ssh.

So I can't really understand your argument.  What you seem to specifically
want that dismisses raw disk might well be better served with rsync instead
of Amanda.  I might want Amanda, though, for huge volume and speed.

-- 
-
| Phil Howard KA9WGN   | http://linuxhomepage.com/  http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/   http://ka9wgn.ham.org/ |
-


Re: using disk instead of tape

2006-09-05 Thread Phil Howard
On Tue, Sep 05, 2006 at 11:24:52AM +0200, Geert Uytterhoeven wrote:

| On Tue, 5 Sep 2006, Phil Howard wrote:
| > On Sat, Sep 02, 2006 at 06:39:40PM -0400, Jon LaBadie wrote:
| > | It certainly would destroy one of amanda's features,
| > | the ability to easily recover backup data using
| > | standard unix utilities without amanda software.
| > 
| > How is that destroyed?
| > 
| > Suppose you use tar format.  You can have tar read from tape directly,
| > which is what I presume you mean for being able to recover outside of
| > Amanda.  You can have tar read from disk partitions if the native
| > partition scheme is used.
| 
| At first I had the same reaction as you: it would work fine if you would cycle
| your tapedev through the partitions.  However, then I realized a tape can 
store
| multiple `files' sequentially, while a disk partition can't (without hackerish
| that would annihiliate the easy recovery again).

I beg to differ.

There are TWO ways to do this:

1.  As a driver inside Amanda
2.  As a driver inside the OS

Either way can do it exactly the same on disk.  It requires the disk to be
fully dedicated to the backup, which is what you'd do for a tape, too.

The "tape file" would be a partition on disk.  The two ways to implement it
would differ only in the mechanism of achieving it, but the end result would
be the same.

When writing first starts on the first "tape file", the first partition
is created on the disk and writing to it begins.  The size of that
partition would be updated periodically, and definitely updated at the end
of writing the first "tape file".  When writing the second "tape file", a
2nd partition is created.  If the MSDOS partition style is used (supported
by numerous operating systems) the best approach would be to use extended
chaining as that can create an "infinite" number of "tape files".

Whenever a file on tape is overwritten, that file, as well as all that
follow it, would be "gone".

This could be a device driver in the OS designed to emulate fixed block
tape on a whole disk.  The driver would remember which "tape file" it is
currently in, and handle the partitions accordingly.

This could be done in Amanda as a new raw disk driver with the logic to
effect the same thing as an OS tape-emulation driver would do.

As long as the OS supports the partition entries, which would be true for
a number of OSes, restore can be performed without Amanda AND without the
tape emulation driver in the OS, by having tar read from the respective
disk partition.

It might also be possible to do this entirely in user space without mods
to Amanda through remote (network) tape access.


| So as long as you dump only one DLE, it would work fine. If you dump more than
| one DLE, you need more logic.

More logic is a trivial issue when writing a driver.

-- 
-
| Phil Howard KA9WGN   | http://linuxhomepage.com/  http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/   http://ka9wgn.ham.org/ |
-


Re: using disk instead of tape

2006-09-05 Thread Ian Turner
On Tuesday 05 September 2006 05:21, Phil Howard wrote:
> On Mon, Sep 04, 2006 at 11:01:20PM -0400, Ian Turner wrote:
> | On Saturday 02 September 2006 16:21, Phil Howard wrote:
> | > It would not need to be separate for each OS.  The idea of using a
> | > partition table isn't even the only approach.
>
> If all that is written is tar format, nothing more needs to be added.

Ah, but if you ditch the partition table, then indeed more needs to be added. 
How else would you tell the end of one dump from the start of the next?

> I didn't keep any stats, or really do it scientifically.  Someone that
> wants to should probably control for a lot of the variables that influence
> it. But I do recall the speed improvement is about 25% to 30%.  I suspect
> much of that is OS work bypassed with O_DIRECT.

I suspect you incur a substantial performance penalty if other processes are 
using the disk concurrently, because then you only get one write() per 
elevator traversal.

Cheers,

--Ian
-- 
Forums for Amanda discussion: http://forums.zmanda.com/


Re: using disk instead of tape

2006-09-05 Thread Charles Curley
On Tue, Sep 05, 2006 at 04:09:05AM -0500, Phil Howard wrote:
> On Sat, Sep 02, 2006 at 05:42:01PM -0600, Charles Curley wrote:
> 
> | In addition, it would make bare metal recovery more difficult. If you
> | back up to a file system any Linux live CD (finnix, knoppix...) can
> | read, recovery is easy: the tools you need are already on the live
> | CD. The tools include a suitable file system driver for the
> | partition. Back up to a bare partition, and you would need special
> | tricks or possibly special software to read it.
> 
> If tar can read from raw tape, it can read from raw disk.  I've already
> done that several times for various things.  Bare metal recovery will
> need at a minimum the tar or dump utility depending on format used.

If you write to raw disk (e.g. tar -cf /dev/sdc.. ), how do you get
more than one tarball onto the partition? As far as I know, there is
nothing analogous to the no rewind tape device for disk drives. Amanda
is quite capable of generating multiple tarballs per backup. So you
will need multiple partitions, each large enough to hold the largest
possible tarball.

-- 

Charles Curley  /"\ASCII Ribbon Campaign
Looking for fine software   \ /Respect for open standards
and/or writing?  X No HTML/RTF in email
http://www.charlescurley.com/ \No M$ Word docs in email

Key fingerprint = CE5C 6645 A45A 64E4 94C0  809C FFF6 4C48 4ECD DFDB


pgpabIbmtnLnW.pgp
Description: PGP signature


Re: using disk instead of tape

2006-09-05 Thread Gene Heskett
On Tuesday 05 September 2006 10:10, Geert Uytterhoeven wrote:
>On Tue, 5 Sep 2006, Gene Heskett wrote:
>> On Tuesday 05 September 2006 05:24, Geert Uytterhoeven wrote:
>> >On Tue, 5 Sep 2006, Phil Howard wrote:
>> >> On Sat, Sep 02, 2006 at 06:39:40PM -0400, Jon LaBadie wrote:
>> >> | It certainly would destroy one of amanda's features,
>> >> | the ability to easily recover backup data using
>> >> | standard unix utilities without amanda software.
>> >>
>> >> How is that destroyed?
>> >>
>> >> Suppose you use tar format.  You can have tar read from tape
>> >> directly, which is what I presume you mean for being able to recover
>> >> outside of Amanda.  You can have tar read from disk partitions if
>> >> the native partition scheme is used.
>> >
>> >At first I had the same reaction as you: it would work fine if you
>> > would cycle your tapedev through the partitions.  However, then I
>> > realized a tape can store multiple `files' sequentially, while a disk
>> > partition can't (without hackerish that would annihiliate the easy
>> > recovery again).
>> >
>> >So as long as you dump only one DLE, it would work fine. If you dump
>> > more than one DLE, you need more logic.
>>
>> I don't know how this conclusion was reached, but IMO its wrong.
>> One of the beauties of amanda is that bare metal recoveries can be done
>> with nothing more than dd, tar(or dump if that what was used) and gzip.
>>
>> Its far more trouble to locate a file you want on a sequential tape
>> than it is to locate it in a vtape.  The vtape itself is nothing more
>> than a subdir in a subdir in the filesystem of the hard drive. 
>> Switching the vtapes is as simple as replacing the link to the
>> directory called data, with a new link named data that points at the
>> desired directory.
>
>Yes, that's true. But this discussion was about using raw partitions on a
> disk instead of files on a filesystem on a disk.

Not workable at all IMO.  You cannot just willy-nilly rewrite the partition 
table if you don't want to lose ALL the data in the next higher partition 
and all those above it.  If that was the only partition, and you were 
using the disks as tapes, meaning that for 20 'tapes' you'd have to have 
20 disks in carriers, then I assume it could be made to work.  The only 
place I might be able to see a speed advantage is where the individual 
dle's were less than 1k in total size as you would be skipping the file 
opening and closing housekeeping along with the allocation searches.  But 
as I point out in another post, disk speed, at least for me, is not a 
factor to consider as its many times faster than some of the other 
operations, like compression.  And my storage disk is slow, only a 7200 
rpm'er.

>Gr{oetje,eeting}s,
>
>  Geert
>
>--
>Geert Uytterhoeven -- There's lots of Linux beyond ia32 --
> [EMAIL PROTECTED]
>
>In personal conversations with technical people, I call myself a hacker.
> But when I'm talking to journalists I just say "programmer" or something
> like that. -- Linus Torvalds

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2006 by Maurice Eugene Heskett, all rights reserved.


Re: using disk instead of tape

2006-09-05 Thread Gene Heskett
On Tuesday 05 September 2006 05:21, Phil Howard wrote:
>On Mon, Sep 04, 2006 at 11:01:20PM -0400, Ian Turner wrote:
>| On Saturday 02 September 2006 16:21, Phil Howard wrote:
>| > It would not need to be separate for each OS.  The idea of using a
>| > partition table isn't even the only approach.
>|
>| The tradeoff here is that if you don't use real partitions, then you
>| (again) need this tool for restore. At present the only thing you need
>| for restore is gzip and tar or dump. Even with raw partitions, that
>| would continue to be the case, but as soon as you introduce an
>| Amanda-specific blocking format, that would no longer be the case.
>| Performance advantages might make that worthwhile, but then again the
>| same effort applied elsewhere could probably yield equal improvements
>| without the sacrifice.
>
>If all that is written is tar format, nothing more needs to be added.
>The tar format can be handled as a stream, disregarding blocks (though
>I don't know if Amanda preserves that).  I do periodically write tar
>directly to disk partitions (and read it back).  I've also done this
>with DV format video, but that's another matter.
>
>| > FYI, I was benchmarking some disk writing for an unrelated purpose
>| > yesterday and found that in Linux 2.6 using the O_DIRECT option when
>| > opening a device to write on a disk raw (even a partition) results in
>| > much faster writing. Writing raw already beats writing through a
>| > filesystem. Raw with O_DIRECT is much faster than raw without.  If
>| > someone does decide to write a driver for raw disk support, I suggest
>| > having its implementation test for support for the O_DIRECT option,
>| > and use it where possible.  It does have some size, offset, and
>| > alignment requirements that vary by OS.
>|
>| This is an interesting idea, and certainly worth pursuing. I'd be
>| interested in seeing your data.
>
>I didn't keep any stats, or really do it scientifically.  Someone that
> wants to should probably control for a lot of the variables that
> influence it. But I do recall the speed improvement is about 25% to 30%.
>  I suspect much of that is OS work bypassed with O_DIRECT.

And any perceived time saved advantages are lost by a factor of 20 or so 
when software compression is in use.  Normal backups here are written at 
20-50 megabytes/second, but 'compress client best' on a 500 mhz K6 will be 
slowed to about 50k/second or less for the compression phase.  Once the 
compression is done, and its in the holding disk, then the actual write is 
at 20-50 megs/second.  In no way is the speed of the disk a more than a 
very very minor factor in the amount of time to do the backup here.

I personally fail to see the point of trying to bypass the filesystem as 
being a speed bottleneck, its only a percent or three of the total time 
doing the backups here.  Estimates and compression are the two places to 
look at when configuring for speed.  If the storage capacity is there, 
leave the compression out.  However, I selectively use it here on some 
dle's, particularly those that will compress to less than 10% of the 
original size, and that does take time when /usr/src on either machine is 
several gigabytes.

YMMV, I have maybe 50 gigs at any one time, whereas some may have a 
terrabyte or more, but thats my take on how relatively pointless (and 
crippling to the basic premise of amanda) the proposed changes would be at 
the end of the day.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2006 by Maurice Eugene Heskett, all rights reserved.


Re: using disk instead of tape

2006-09-05 Thread Geert Uytterhoeven
On Tue, 5 Sep 2006, Gene Heskett wrote:
> On Tuesday 05 September 2006 05:24, Geert Uytterhoeven wrote:
> >On Tue, 5 Sep 2006, Phil Howard wrote:
> >> On Sat, Sep 02, 2006 at 06:39:40PM -0400, Jon LaBadie wrote:
> >> | It certainly would destroy one of amanda's features,
> >> | the ability to easily recover backup data using
> >> | standard unix utilities without amanda software.
> >>
> >> How is that destroyed?
> >>
> >> Suppose you use tar format.  You can have tar read from tape directly,
> >> which is what I presume you mean for being able to recover outside of
> >> Amanda.  You can have tar read from disk partitions if the native
> >> partition scheme is used.
> >
> >At first I had the same reaction as you: it would work fine if you would
> > cycle your tapedev through the partitions.  However, then I realized a
> > tape can store multiple `files' sequentially, while a disk partition
> > can't (without hackerish that would annihiliate the easy recovery
> > again).
> >
> >So as long as you dump only one DLE, it would work fine. If you dump more
> > than one DLE, you need more logic.
> 
> I don't know how this conclusion was reached, but IMO its wrong.
> One of the beauties of amanda is that bare metal recoveries can be done 
> with nothing more than dd, tar(or dump if that what was used) and gzip.
> 
> Its far more trouble to locate a file you want on a sequential tape than it 
> is to locate it in a vtape.  The vtape itself is nothing more than a 
> subdir in a subdir in the filesystem of the hard drive.  Switching the 
> vtapes is as simple as replacing the link to the directory called data, 
> with a new link named data that points at the desired directory.

Yes, that's true. But this discussion was about using raw partitions on a disk
instead of files on a filesystem on a disk.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: using disk instead of tape

2006-09-05 Thread Gene Heskett
On Tuesday 05 September 2006 05:24, Geert Uytterhoeven wrote:
>On Tue, 5 Sep 2006, Phil Howard wrote:
>> On Sat, Sep 02, 2006 at 06:39:40PM -0400, Jon LaBadie wrote:
>> | It certainly would destroy one of amanda's features,
>> | the ability to easily recover backup data using
>> | standard unix utilities without amanda software.
>>
>> How is that destroyed?
>>
>> Suppose you use tar format.  You can have tar read from tape directly,
>> which is what I presume you mean for being able to recover outside of
>> Amanda.  You can have tar read from disk partitions if the native
>> partition scheme is used.
>
>At first I had the same reaction as you: it would work fine if you would
> cycle your tapedev through the partitions.  However, then I realized a
> tape can store multiple `files' sequentially, while a disk partition
> can't (without hackerish that would annihiliate the easy recovery
> again).
>
>So as long as you dump only one DLE, it would work fine. If you dump more
> than one DLE, you need more logic.

I don't know how this conclusion was reached, but IMO its wrong.
One of the beauties of amanda is that bare metal recoveries can be done 
with nothing more than dd, tar(or dump if that what was used) and gzip.

Its far more trouble to locate a file you want on a sequential tape than it 
is to locate it in a vtape.  The vtape itself is nothing more than a 
subdir in a subdir in the filesystem of the hard drive.  Switching the 
vtapes is as simple as replacing the link to the directory called data, 
with a new link named data that points at the desired directory.

Here for instance is an ls of last nights run here at the Heskett 
Ranchette:

[EMAIL PROTECTED] tesseract-1.0]# ls /amandatapes/Dailys/data
0-Dailys-9
0.Dailys-9
1-coyote._var.2
1.coyote._var.2
2-coyote._usr_music.0
2.coyote._usr_music.0
3-coyote._usr_pix.0
3.coyote._usr_pix.0
4-coyote._usr_dlds-tgzs.0
4.coyote._usr_dlds-tgzs.0
5-gene._usr_src.1
5.gene._usr_src.1
6-gene._var.1
6.gene._var.1
7-gene._usr_local.1
7.gene._usr_local.1
8-gene._root.1
8.gene._root.1
9-gene._opt.1
9.gene._opt.1
00010-gene._usr_bin.1
00010.gene._usr_bin.1
00011-gene._lib.1
00011.gene._lib.1
00012-gene._home.1
00012.gene._home.1
00013-gene._etc.1
00013.gene._etc.1
00014-gene._boot.1
00014.gene._boot.1
00015-gene._bin.1
00015.gene._bin.1
00016-gene._sbin.1
00016.gene._sbin.1
00017-coyote._opt.0
00017.coyote._opt.0
00018-coyote._usr_share.0
00018.coyote._usr_share.0
00019-coyote._root.1
00019.coyote._root.1
00020-coyote._usr_dlds-misc.0
00020.coyote._usr_dlds-misc.0
00021-coyote._usr_movies.0
00021.coyote._usr_movies.0
00022-coyote._home.0
00022.coyote._home.0
00023-coyote._GenesAmandaHelper-0.5.2
00023.coyote._GenesAmandaHelper-0.5.2
00024-coyote._usr_src.1
00024.coyote._usr_src.1
00025-coyote._dos.1
00025.coyote._dos.1
00026-coyote._usr_local.2
00026.coyote._usr_local.2
00027-coyote._boot.1
00027.coyote._boot.1
00028-coyote._usr_dlds-rpms.1
00028.coyote._usr_dlds-rpms.1
00029-coyote._usr_dlds.1
00029.coyote._usr_dlds.1
00030-coyote._usr_include.1
00030.coyote._usr_include.1
00031-coyote._usr_lib.1
00031.coyote._usr_lib.1
00032-coyote._lib.1
00032.coyote._lib.1
00033-coyote._dev.1
00033.coyote._dev.1
00034-coyote._tmp.1
00034.coyote._tmp.1
00035-coyote._etc.1
00035.coyote._etc.1
00036-coyote._usr_X11R6.1
00036.coyote._usr_X11R6.1
00037-coyote._usr_i386-glibc21-linux.1
00037.coyote._usr_i386-glibc21-linux.1
00038-coyote._usr_bin.1
00038.coyote._usr_bin.1
00039-coyote._usr_libexec.1
00039.coyote._usr_libexec.1
00040-coyote._usr_kerberos.1
00040.coyote._usr_kerberos.1
00041-coyote._usr_games.1
00041.coyote._usr_games.1
00042-coyote._sbin.1
00042.coyote._sbin.1
00043-coyote._bin.1
00043.coyote._bin.1
00044-coyote._usr_sbin.1
00044.coyote._usr_sbin.1
00045-coyote._usr_man.1
00045.coyote._usr_man.1
00046-TAPEEND
00046.TAPEEND
configuration.tar
indices.tar

for bare metal recovery, any of those files (in any order) can be accessed 
with:
#> tar xzf path-to-file-name-of-file

I've done it, it works, and its a whole lot EASIER than trying to locate a 
file on a tape by scanning the tape until its finally found.  Hours 
faster.  And note that dd wasn't used, its not required since the files 
are random access, not sequential as on a tape.

>Gr{oetje,eeting}s,
>
>  Geert
>
>--
>Geert Uytterhoeven -- There's lots of Linux beyond ia32 --
> [EMAIL PROTECTED]
>
>In personal conversations with technical people, I call myself a hacker.
> But when I'm talking to journalists I just say "programmer" or something
> like that. -- Linus Torvalds

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2006 by Maurice Eugene Heskett, all rights reserved.


Re: using disk instead of tape

2006-09-05 Thread Phil Howard
On Mon, Sep 04, 2006 at 11:01:20PM -0400, Ian Turner wrote:

| On Saturday 02 September 2006 16:21, Phil Howard wrote:
| > It would not need to be separate for each OS.  The idea of using a
| > partition table isn't even the only approach.
| 
| The tradeoff here is that if you don't use real partitions, then you (again) 
| need this tool for restore. At present the only thing you need for restore is 
| gzip and tar or dump. Even with raw partitions, that would continue to be the 
| case, but as soon as you introduce an Amanda-specific blocking format, that 
| would no longer be the case. Performance advantages might make that 
| worthwhile, but then again the same effort applied elsewhere could probably 
| yield equal improvements without the sacrifice.

If all that is written is tar format, nothing more needs to be added.
The tar format can be handled as a stream, disregarding blocks (though
I don't know if Amanda preserves that).  I do periodically write tar
directly to disk partitions (and read it back).  I've also done this
with DV format video, but that's another matter.


| > FYI, I was benchmarking some disk writing for an unrelated purpose
| > yesterday and found that in Linux 2.6 using the O_DIRECT option when
| > opening a device to write on a disk raw (even a partition) results in much
| > faster writing. Writing raw already beats writing through a filesystem. 
| > Raw with O_DIRECT is much faster than raw without.  If someone does decide
| > to write a driver for raw disk support, I suggest having its implementation
| > test for support for the O_DIRECT option, and use it where possible.  It
| > does have some size, offset, and alignment requirements that vary by OS.
| 
| This is an interesting idea, and certainly worth pursuing. I'd be interested 
| in seeing your data.

I didn't keep any stats, or really do it scientifically.  Someone that wants
to should probably control for a lot of the variables that influence it.
But I do recall the speed improvement is about 25% to 30%.  I suspect much
of that is OS work bypassed with O_DIRECT.

-- 
-
| Phil Howard KA9WGN   | http://linuxhomepage.com/  http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/   http://ka9wgn.ham.org/ |
-


Re: using disk instead of tape

2006-09-05 Thread Geert Uytterhoeven
On Tue, 5 Sep 2006, Phil Howard wrote:
> On Sat, Sep 02, 2006 at 06:39:40PM -0400, Jon LaBadie wrote:
> | It certainly would destroy one of amanda's features,
> | the ability to easily recover backup data using
> | standard unix utilities without amanda software.
> 
> How is that destroyed?
> 
> Suppose you use tar format.  You can have tar read from tape directly,
> which is what I presume you mean for being able to recover outside of
> Amanda.  You can have tar read from disk partitions if the native
> partition scheme is used.

At first I had the same reaction as you: it would work fine if you would cycle
your tapedev through the partitions.  However, then I realized a tape can store
multiple `files' sequentially, while a disk partition can't (without hackerish
that would annihiliate the easy recovery again).

So as long as you dump only one DLE, it would work fine. If you dump more than
one DLE, you need more logic.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: using disk instead of tape

2006-09-05 Thread Phil Howard
On Sat, Sep 02, 2006 at 05:42:01PM -0600, Charles Curley wrote:

| In addition, it would make bare metal recovery more difficult. If you
| back up to a file system any Linux live CD (finnix, knoppix...) can
| read, recovery is easy: the tools you need are already on the live
| CD. The tools include a suitable file system driver for the
| partition. Back up to a bare partition, and you would need special
| tricks or possibly special software to read it.

If tar can read from raw tape, it can read from raw disk.  I've already
done that several times for various things.  Bare metal recovery will
need at a minimum the tar or dump utility depending on format used.

-- 
-
| Phil Howard KA9WGN   | http://linuxhomepage.com/  http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/   http://ka9wgn.ham.org/ |
-


Re: using disk instead of tape

2006-09-05 Thread Phil Howard
On Sat, Sep 02, 2006 at 06:39:40PM -0400, Jon LaBadie wrote:

| Phil,
| what advantage(s) do you forsee in amanda's use of raw disk
| devices as opposed to files on the native filesystem.

The ability to avoid the page cached I/O subsystem to control
performance impact on the system.


| It certainly would destroy one of amanda's features,
| the ability to easily recover backup data using
| standard unix utilities without amanda software.

How is that destroyed?

Suppose you use tar format.  You can have tar read from tape directly,
which is what I presume you mean for being able to recover outside of
Amanda.  You can have tar read from disk partitions if the native
partition scheme is used.


| I've not heard people on the list reporting poor performance
| in using the current scheme for saving backups on disk-based
| 'virtual tapes'.  If there are, I'd like to know about it.

How could they have compared if there is none writing to raw disk now?


| Given that the backups are coming from dump or tar, possibly
| over a network, possibly processed by compression and encryption
| software, it is unlikely that the final disk writing is a bottleneck.

Certainly there will be situations where other factors affect the speed.


| Perhaps additional features would be possible.  Like multiplexed
| "direct to tape" dumps without a holding disk.  The current scheme
| only allows a single dump direct to tape.  Multiplexed dumps have
| to go to a holding disk before being taped.

A potential project will have some very large data.  Each machine can
have its own external backup drives, so for each machine it can be seen
as a single dump.  It's just going to be large.  Probably 400G or 800G
per machine.  I'm just exploring all options and Amanda is one of them.
If I do go with a filesystem on the disks, I'll probably use rsync so
the disk is a replica and can simply substitute as is.

-- 
-
| Phil Howard KA9WGN   | http://linuxhomepage.com/  http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/   http://ka9wgn.ham.org/ |
-


Re: using disk instead of tape

2006-09-04 Thread Ian Turner
On Saturday 02 September 2006 16:21, Phil Howard wrote:
> It would not need to be separate for each OS.  The idea of using a
> partition table isn't even the only approach.

The tradeoff here is that if you don't use real partitions, then you (again) 
need this tool for restore. At present the only thing you need for restore is 
gzip and tar or dump. Even with raw partitions, that would continue to be the 
case, but as soon as you introduce an Amanda-specific blocking format, that 
would no longer be the case. Performance advantages might make that 
worthwhile, but then again the same effort applied elsewhere could probably 
yield equal improvements without the sacrifice.

> FYI, I was benchmarking some disk writing for an unrelated purpose
> yesterday and found that in Linux 2.6 using the O_DIRECT option when
> opening a device to write on a disk raw (even a partition) results in much
> faster writing. Writing raw already beats writing through a filesystem. 
> Raw with O_DIRECT is much faster than raw without.  If someone does decide
> to write a driver for raw disk support, I suggest having its implementation
> test for support for the O_DIRECT option, and use it where possible.  It
> does have some size, offset, and alignment requirements that vary by OS.

This is an interesting idea, and certainly worth pursuing. I'd be interested 
in seeing your data.

--Ian
-- 
Forums for Amanda discussion: http://forums.zmanda.com/


Re: using disk instead of tape

2006-09-02 Thread Charles Curley
On Sat, Sep 02, 2006 at 06:39:40PM -0400, Jon LaBadie wrote:
> On Sat, Sep 02, 2006 at 03:21:33PM -0500, Phil Howard wrote:
> > 
> > | That functionality (if it will be created) should IMHO be optional, 
> > 
> > Absolutely.  If you want to use it, specify in the configuration.  If you
> > don't want to use it, don't specify it.
> > 

> 
> Phil,
> what advantage(s) do you forsee in amanda's use of raw disk
> devices as opposed to files on the native filesystem.
> 
> It certainly would destroy one of amanda's features,
> the ability to easily recover backup data using
> standard unix utilities without amanda software.

In addition, it would make bare metal recovery more difficult. If you
back up to a file system any Linux live CD (finnix, knoppix...) can
read, recovery is easy: the tools you need are already on the live
CD. The tools include a suitable file system driver for the
partition. Back up to a bare partition, and you would need special
tricks or possibly special software to read it.

That alone looses my interest.

-- 

Charles Curley  /"\ASCII Ribbon Campaign
Looking for fine software   \ /Respect for open standards
and/or writing?  X No HTML/RTF in email
http://www.charlescurley.com/ \No M$ Word docs in email

Key fingerprint = CE5C 6645 A45A 64E4 94C0  809C FFF6 4C48 4ECD DFDB


pgpQioLvsElur.pgp
Description: PGP signature


Re: using disk instead of tape

2006-09-02 Thread Jon LaBadie
On Sat, Sep 02, 2006 at 03:21:33PM -0500, Phil Howard wrote:
> 
> | That functionality (if it will be created) should IMHO be optional, 
> 
> Absolutely.  If you want to use it, specify in the configuration.  If you
> don't want to use it, don't specify it.
> 
> 
> | considering people who aren't using removable disks but for example just 
> | one partition on their RAID. If I were one of such people, I wouldn't 
> | feel too comfortable about Amanda re-writing my server's partition table 
> | every day. It also seems to me that such functionality would need to be 
> | programmed separately for each OS - quite a bit of work.
> 
> Such partition table rewriting should only be done to a disk that is used
> exclusively for raw disk backups and for nothing else.  It should never be
> done on a disk used for other things (see alternative below).
> 
> It would not need to be separate for each OS.  The idea of using a partition
> table isn't even the only approach.  A simple header that indicates how many
> bytes or blocks the next segment of data has is sufficient.  In a way that
> is like a partition table.  But it doesn't need to be OS compatible unless
> the OS goes nuts if it can't see a partition table it recognizes (which is
> an issue you'd see with a new empty disk).  As long as the OS can always
> give you whole disk access, it's good to go.
> 
> An alternative is just to do it within a raw partition.  Use whatever scheme
> of partitioning the OS supports (manually partition it), then access each
> partition as an emulated whole tape.  Headers in front of each segment of
> data would separate the emulation of tape files.  This alternative could be
> chosen simply by specifying the partition device name rather than the whole
> disk device name.  The driver implementation would simply work with what is
> given to it, be that a whole disk or a partition of a disk.

Phil,
what advantage(s) do you forsee in amanda's use of raw disk
devices as opposed to files on the native filesystem.

It certainly would destroy one of amanda's features,
the ability to easily recover backup data using
standard unix utilities without amanda software.

I've not heard people on the list reporting poor performance
in using the current scheme for saving backups on disk-based
'virtual tapes'.  If there are, I'd like to know about it.
Given that the backups are coming from dump or tar, possibly
over a network, possibly processed by compression and encryption
software, it is unlikely that the final disk writing is a bottleneck.

Perhaps additional features would be possible.  Like multiplexed
"direct to tape" dumps without a holding disk.  The current scheme
only allows a single dump direct to tape.  Multiplexed dumps have
to go to a holding disk before being taped.

-- 
Jon H. LaBadie  [EMAIL PROTECTED]
 JG Computing
 4455 Province Line Road(609) 252-0159
 Princeton, NJ  08540-4322  (609) 683-7220 (fax)


Re: using disk instead of tape

2006-09-02 Thread Phil Howard
On Sat, Sep 02, 2006 at 11:23:32AM +0300, Toomas Aas wrote:

| That functionality (if it will be created) should IMHO be optional, 

Absolutely.  If you want to use it, specify in the configuration.  If you
don't want to use it, don't specify it.


| considering people who aren't using removable disks but for example just 
| one partition on their RAID. If I were one of such people, I wouldn't 
| feel too comfortable about Amanda re-writing my server's partition table 
| every day. It also seems to me that such functionality would need to be 
| programmed separately for each OS - quite a bit of work.

Such partition table rewriting should only be done to a disk that is used
exclusively for raw disk backups and for nothing else.  It should never be
done on a disk used for other things (see alternative below).

It would not need to be separate for each OS.  The idea of using a partition
table isn't even the only approach.  A simple header that indicates how many
bytes or blocks the next segment of data has is sufficient.  In a way that
is like a partition table.  But it doesn't need to be OS compatible unless
the OS goes nuts if it can't see a partition table it recognizes (which is
an issue you'd see with a new empty disk).  As long as the OS can always
give you whole disk access, it's good to go.

An alternative is just to do it within a raw partition.  Use whatever scheme
of partitioning the OS supports (manually partition it), then access each
partition as an emulated whole tape.  Headers in front of each segment of
data would separate the emulation of tape files.  This alternative could be
chosen simply by specifying the partition device name rather than the whole
disk device name.  The driver implementation would simply work with what is
given to it, be that a whole disk or a partition of a disk.

The reason I thought of using the DOS partition scheme was in implementing
it as a Linux kernel driver.  In Amanda, any way to divide up the disk as
is given to it in the configuration could be chosen by the programmer.

FYI, I was benchmarking some disk writing for an unrelated purpose yesterday
and found that in Linux 2.6 using the O_DIRECT option when opening a device
to write on a disk raw (even a partition) results in much faster writing.
Writing raw already beats writing through a filesystem.  Raw with O_DIRECT
is much faster than raw without.  If someone does decide to write a driver
for raw disk support, I suggest having its implementation test for support
for the O_DIRECT option, and use it where possible.  It does have some size,
offset, and alignment requirements that vary by OS.

-- 
-
| Phil Howard KA9WGN   | http://linuxhomepage.com/  http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/   http://ka9wgn.ham.org/ |
-


Re: using disk instead of tape

2006-09-02 Thread Paul Bijnens

On 2006-09-01 22:21, Phil Howard wrote:

First nit: the subscription greeting for this mailing list said the archives
were at "ftp://ftp.amanda.org/pub/amanda/maillist-archives"; but they are not
there.  I could not find any elsewhere.  So I cannot look at past messages
to see if my question is answered.  It was not answered in the FAQ.


I use these archives:

http://marc.theaimsgroup.com/?l=amanda-users&w=2



What I would like to know is how Amanda handles backup to disk.  I did find
a "file driver".  I'm not sure if that is meant to be the "to disk" method
or not.  It certainly would have some complications, depending on how one
considered disks as equivalent to tapes.


Yes, see:

http://wiki.zmanda.com/index.php/File_driver


[...]

But I don't see any such "disk driver".  And I overlooking something, or is
the "file driver" the only means?  If the latter is true, will AMANDA know
to mount attached disks as filesystems to access the "tape files"?


Amanda will not mount them.  But this is something that can be done
automatically by hal (if using Redhat or similar linux), or some
automounter.


--
Paul Bijnens, xplanation Technology ServicesTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out  *
***



Re: using disk instead of tape

2006-09-02 Thread Toomas Aas

Phil Howard wrote:


What I would like to know is how Amanda handles backup to disk.  I did find
a "file driver".  I'm not sure if that is meant to be the "to disk" method
or not.  It certainly would have some complications, depending on how one
considered disks as equivalent to tapes.


File driver is indeed what is used to perform backup to disk.


The complication would be the steps involved in handling a disk.  I would
consider plugging the disk in (USB, Firewire, or eSATA) to be the rough
equivalent of inserting a tape into a manual tape drive.  The question is
what will AMANDA do with a disk that has merely been plugged in.  Can it
be configured to, or does it just understand that it needs to, mount the
disk?  What if the disk is new and not yet formatted?


AFAIK, Amanda has no built-in functionality to handle removable disks. I 
just wrote a little script for that purpose and it has worked quite 
reliably for past two years.


Another note about comparing disk to tape from Amanda's POV - disks 
(removable or not) are often handled not as individual tapes but as sort 
of "virtual tape changer". The disk partition is divided into 
subdirectories which Amanda handles as slots in a tape changer. I myself 
use two removable disks, each holding 5 "slots" and only change the disk 
once a week.



What it seems this "file driver" probably does not do, which a "disk driver"
(if such a thing exists) could do, is handle the disk as a raw device.  It
could create a partition to be the equivalent of a tape file, and write the
dump/tar image directory to the partition sectors.  When done (or when it
knows exactly how many sectors there will be), it could update the partition
table to represent the exact size.  The next "tape file" could be written
after it and a partition table entry added for that partition/file.


That functionality (if it will be created) should IMHO be optional, 
considering people who aren't using removable disks but for example just 
one partition on their RAID. If I were one of such people, I wouldn't 
feel too comfortable about Amanda re-writing my server's partition table 
every day. It also seems to me that such functionality would need to be 
programmed separately for each OS - quite a bit of work.


--
Toomas Aas