Re: [gentoo-user] checksumming files

2008-12-07 Thread Mick
On Friday 05 December 2008, Albert Hopkins wrote:
 On Thu, 2008-12-04 at 07:10 +, Mick wrote:
  Almost every time I split a large file 1G into say 200k chunks, then ftp
  it to a server and then:

 That's thousands of files!  Have you gone mad?!

Ha! small error in units . . . it is 200M (of course this is no disclaimer of 
me going/gone mad . . .)  I think the server drops the connection above 230M 
file uploads or something like that, so I tried 200M files and it seems to 
work.

   cat 1 2 3 4 5 6 7  completefile ; md5sum -c completefile
 
  if fails.  Checking the split files in turn I often find 1 or two chunks
  that fail on their own md5 checks.  Despite that the concatenated file
  often works (e.g. if it is a video file it'll play alright).

 Let me understand this. Are [1..7] the split files or the checksums of
 the split files?  

They are the the split files which I concatenate into the complete file.

 If the former then 'md5sum -c completefile' will fail 
 with no properly formatted MD5 checksum lines found or similar due to
 the fact that completefile is not a list of checksums.  If the latter,
 then how are you generating [1..7]? If you are using the split(1)
 command to split the files and are not passing at least -a 3 to it
 then your file is going to be truncated do to the fact that the suffix
 length is too small to accommodate the thousands of files needed to
 split a 1GB+ file into 200k chunks. You should get an error like split:
 Output file suffixes exhausted.

 Maybe if you give the exact commands used I might understand this
 better.

 I have a feeling that this is not the most efficient method of file
 transfer.

split --verbose -b 2000 big_file

tnftp -r 45 -u 
ftp://username:passwd@server_name/htdocs/directory_path/ xaa xab xac 
xad . . .

The above would fail after xaa was uploaded and about 1/3 or less of xab.  So, 
I split up the individual file upload:

tnftp -r 45 -u 
ftp://username:passwd@server_name/htdocs/directory_path/ xaa ; sleep 
1m ; tnftp -r 45 -u 
ftp://username:passwd@server_name/htdocs/directory_path/ xab ; 
sleep ... ; etc.

Does this make sense?
-- 
Regards,
Mick


signature.asc
Description: This is a digitally signed message part.


Re: [gentoo-user] checksumming files

2008-12-07 Thread Albert Hopkins
On Sun, 2008-12-07 at 15:39 +, Mick wrote:
 On Friday 05 December 2008, Albert Hopkins wrote:
  On Thu, 2008-12-04 at 07:10 +, Mick wrote:
   Almost every time I split a large file 1G into say 200k chunks, then ftp
   it to a server and then:
 
  That's thousands of files!  Have you gone mad?!
 
 Ha! small error in units . . . it is 200M (of course this is no disclaimer of 
 me going/gone mad . . .)  I think the server drops the connection above 230M 
 file uploads or something like that, so I tried 200M files and it seems to 
 work.
 
cat 1 2 3 4 5 6 7  completefile ; md5sum -c completefile
  
   if fails.  Checking the split files in turn I often find 1 or two chunks
   that fail on their own md5 checks.  Despite that the concatenated file
   often works (e.g. if it is a video file it'll play alright).
 
  Let me understand this. Are [1..7] the split files or the checksums of
  the split files?  
 
 They are the the split files which I concatenate into the complete file.

Well, unless you made another error in your OP, you are using md5sum
incorrectly.  When you use -c, md5sum expects a file that is a list of
files/checksums. For example

$ dd if=/dev/urandom of=bigfile bs=1M count=5
5+0 records in
5+0 records out
5242880 bytes (5.2 MB) copied, 2.29361 s, 2.3 MB/s
$ md5sum bigfile  checksum # create checksum file
$ split -b1M bigfile 
$ rm bigfile 
$ cat xa*  bigfile
$ # This is correct
$ md5sum -c checksum 
bigfile: OK
$ # This is wrong!
$ md5sum -c bigfile 
md5sum: bigfile: no properly formatted MD5 checksum lines found

[SNIP!]
  Maybe if you give the exact commands used I might understand this
  better.
 
  I have a feeling that this is not the most efficient method of file
  transfer.
 
 split --verbose -b 2000 big_file
 
 tnftp -r 45 -u 
 ftp://username:passwd@server_name/htdocs/directory_path/ xaa xab xac 
 xad . . .
 
 The above would fail after xaa was uploaded and about 1/3 or less of xab.  
 So, 
 I split up the individual file upload:
 
 tnftp -r 45 -u 
 ftp://username:passwd@server_name/htdocs/directory_path/ xaa ; sleep 
 1m ; tnftp -r 45 -u 
 ftp://username:passwd@server_name/htdocs/directory_path/ xab ; 
 sleep ... ; etc.
 
 Does this make sense?

Yes, but if you are truly using -c then it would make sense that you
could get a checksum error but the file be ok.

Here's how I would do it.  I'm not saying you should do it this way.
I'd use rsync.  Rsync does file xfer has checksumming built-in.  You say
you split because you get disconnected, right?  I'm not sure if rsync
handles re-connects, but you can write a loop so that if rsync fails you
continue where you left off:

status=30
until [ $status -eq 0 ] ;
do
rsync --append-verify big_file server_name:/htdocs/directory_path/
status=$?
done

No splitting/concatenating and no need to checksum.





Re: [gentoo-user] checksumming files

2008-12-07 Thread Mick
On Sunday 07 December 2008, Albert Hopkins wrote:
 On Sun, 2008-12-07 at 15:39 +, Mick wrote:

  They are the the split files which I concatenate into the complete file.

 Well, unless you made another error in your OP, you are using md5sum
 incorrectly.  When you use -c, md5sum expects a file that is a list of
 files/checksums. For example
[snip...]
 Yes, but if you are truly using -c then it would make sense that you
 could get a checksum error but the file be ok.

Sorry, yes I used the md5sum -c command correctly with the corresponding 
checksum file for the big_file.

 Here's how I would do it.  I'm not saying you should do it this way.
 I'd use rsync.  Rsync does file xfer has checksumming built-in.  You say
 you split because you get disconnected, right?  I'm not sure if rsync
 handles re-connects, but you can write a loop so that if rsync fails you
 continue where you left off:

 status=30
 until [ $status -eq 0 ] ;
 do
 rsync --append-verify big_file server_name:/htdocs/directory_path/
 status=$?
 done

 No splitting/concatenating and no need to checksum.

Wouldn't the server need to have rsyncd running to be able to do that?  Can I 
rsync to an ftp server?  Also, how would I pass username/passwd on the 
command line so that the upload can take place unattended?

Thank you for your replies.
-- 
Regards,
Mick


signature.asc
Description: This is a digitally signed message part.


Re: [gentoo-user] checksumming files

2008-12-07 Thread Neil Bothwick
On Sun, 7 Dec 2008 17:56:07 +, Mick wrote:

  rsync --append-verify big_file server_name:/htdocs/directory_path/ 
  status=$?

 Wouldn't the server need to have rsyncd running to be able to do that?
 Can I rsync to an ftp server?  Also, how would I pass username/passwd
 on the command line so that the upload can take place unattended?

You only need rsyncd running to use rsync::reponame type connections. The
above command uses ssh to connect.


-- 
Neil Bothwick

Veni, vermini, vomui
I came, I got ratted, I threw up


signature.asc
Description: PGP signature


Re: [gentoo-user] checksumming files

2008-12-06 Thread Dirk Heinrichs
Am Freitag, 5. Dezember 2008 19:48:18 schrieb Mick:
 On Thursday 04 December 2008, Heinrichs, Dirk (EXT-Capgemini -
 DE/Dusseldorf)

 wrote:
  Did you make sure the chunks are transfered in binary mode?

 Aha!! Since the split chunks were part of a video file I assumed that it
 would be binary - and I understand that the default type (for tnftp) is
 binary?

  BTW, most
  modern FTP clients have a resume option, so there's no need to split.

 Yes, tnftp has the 'reget' command but I can't find a 'reput', or 'resume'?
 It also has 'restart':
 [...]
 but I am not sure how this works exactly.  Would anyone be clued up on the
 intricacies of tnftp?

Unfortunately not, never heard of it before.

 Anything else I could try?

ncftp. This one also comes with ncftpget and ncftpput command line utilities. 
They use binary transfer as default and have resume capabilities.

HTH...

Dirk



Re: [gentoo-user] checksumming files

2008-12-05 Thread Mick
On Thursday 04 December 2008, Heinrichs, Dirk (EXT-Capgemini - DE/Dusseldorf) 
wrote:

 Did you make sure the chunks are transfered in binary mode? 

Aha!! Since the split chunks were part of a video file I assumed that it would 
be binary - and I understand that the default type (for tnftp) is binary?

There's more to it:

I use tnftp because it has an unattended feature which suits me nicely.  A 
string like:

sleep 90m ; tnftp -u ftp://username:passwd@server_address/htdocs/path \ 
files_to_upload

will login after 90 minutes and upload the file(s) I want (not sure if/how I 
can do this with vanilla ftp).

 BTW, most 
 modern FTP clients have a resume option, so there's no need to split.

Yes, tnftp has the 'reget' command but I can't find a 'reput', or 'resume'?  
It also has 'restart':
==
restart marker
Restart the immediately following get or put at the indicated
marker.  On UNIX systems, marker is usually a byte offset
into the file.
==

but I am not sure how this works exactly.  Would anyone be clued up on the 
intricacies of tnftp?

Anything else I could try?
-- 
Regards,
Mick


signature.asc
Description: This is a digitally signed message part.


Re: [gentoo-user] checksumming files

2008-12-05 Thread Paul Hartman
On Thu, Dec 4, 2008 at 1:10 AM, Mick [EMAIL PROTECTED] wrote:
 Almost every time I split a large file 1G into say 200k chunks, then ftp it
 to a server and then:

  cat 1 2 3 4 5 6 7  completefile ; md5sum -c completefile

 if fails.  Checking the split files in turn I often find 1 or two chunks that
 fail on their own md5 checks.  Despite that the concatenated file often works
 (e.g. if it is a video file it'll play alright).

 Can you explain this?  Should I be using a different check to verify the
 integrity of the ftp'd file?

Obviously something is going wrong... without knowing why that, I
suggest you emerge par2cmdline and use it to create some recovery
blocks. That way you can repair/reassemble the pieces when they get to
the other side.



Re: [gentoo-user] checksumming files

2008-12-05 Thread Albert Hopkins
On Thu, 2008-12-04 at 07:10 +, Mick wrote:
 Almost every time I split a large file 1G into say 200k chunks, then ftp it 
 to a server and then:

That's thousands of files!  Have you gone mad?!

 
  cat 1 2 3 4 5 6 7  completefile ; md5sum -c completefile

 if fails.  Checking the split files in turn I often find 1 or two chunks that 
 fail on their own md5 checks.  Despite that the concatenated file often works 
 (e.g. if it is a video file it'll play alright).

Let me understand this. Are [1..7] the split files or the checksums of
the split files?  If the former then 'md5sum -c completefile' will fail
with no properly formatted MD5 checksum lines found or similar due to
the fact that completefile is not a list of checksums.  If the latter,
then how are you generating [1..7]? If you are using the split(1)
command to split the files and are not passing at least -a 3 to it
then your file is going to be truncated do to the fact that the suffix
length is too small to accommodate the thousands of files needed to
split a 1GB+ file into 200k chunks. You should get an error like split:
Output file suffixes exhausted.

Maybe if you give the exact commands used I might understand this
better.

I have a feeling that this is not the most efficient method of file
transfer.




Re: [gentoo-user] checksumming files

2008-12-04 Thread Neil Bothwick
On Thu, 4 Dec 2008 07:10:06 +, Mick wrote:

 Despite that the concatenated file often works 
 (e.g. if it is a video file it'll play alright).
 
 Can you explain this?  Should I be using a different check to verify
 the integrity of the ftp'd file?

An MD5 check will fail if one bit is changed, which won't affect the
playback of a video file. Try it with a large compressed tarball and
you'll notice a difference.


-- 
Neil Bothwick

--T-A+G-L-I+N-E--+M-E-A+S-U-R+I-N-G+--G-A+U-G-E--


signature.asc
Description: PGP signature


Re: [gentoo-user] checksumming files

2008-12-03 Thread Heinrichs, Dirk (EXT-Capgemini - DE/Dusseldorf)
Am Donnerstag, den 04.12.2008, 07:10 + schrieb ext Mick:
 Almost every time I split a large file 1G into say 200k chunks, then ftp it 
 to a server and then:
 
  cat 1 2 3 4 5 6 7  completefile ; md5sum -c completefile
 
 if fails.  Checking the split files in turn I often find 1 or two chunks that 
 fail on their own md5 checks.  Despite that the concatenated file often works 
 (e.g. if it is a video file it'll play alright).
 
 Can you explain this?  Should I be using a different check to verify the 
 integrity of the ftp'd file?

Did you make sure the chunks are transfered in binary mode? BTW, most
modern FTP clients have a resume option, so there's no need to split.

HTH...

Dirk
-- 
Dirk Heinrichs  | Tel:  +49 (0)162 234 3408
Configuration Manager   | Fax:  +49 (0)211 47068 111
Capgemini Deutschland   | Mail: [EMAIL PROTECTED]
Wanheimerstraße 68  | Web:  http://www.capgemini.com
D-40468 Düsseldorf  | ICQ#: 110037733
GPG Public Key C2E467BB | Keyserver: www.keyserver.net


signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil