Re: [gentoo-user] checksumming files
On Friday 05 December 2008, Albert Hopkins wrote: On Thu, 2008-12-04 at 07:10 +, Mick wrote: Almost every time I split a large file 1G into say 200k chunks, then ftp it to a server and then: That's thousands of files! Have you gone mad?! Ha! small error in units . . . it is 200M (of course this is no disclaimer of me going/gone mad . . .) I think the server drops the connection above 230M file uploads or something like that, so I tried 200M files and it seems to work. cat 1 2 3 4 5 6 7 completefile ; md5sum -c completefile if fails. Checking the split files in turn I often find 1 or two chunks that fail on their own md5 checks. Despite that the concatenated file often works (e.g. if it is a video file it'll play alright). Let me understand this. Are [1..7] the split files or the checksums of the split files? They are the the split files which I concatenate into the complete file. If the former then 'md5sum -c completefile' will fail with no properly formatted MD5 checksum lines found or similar due to the fact that completefile is not a list of checksums. If the latter, then how are you generating [1..7]? If you are using the split(1) command to split the files and are not passing at least -a 3 to it then your file is going to be truncated do to the fact that the suffix length is too small to accommodate the thousands of files needed to split a 1GB+ file into 200k chunks. You should get an error like split: Output file suffixes exhausted. Maybe if you give the exact commands used I might understand this better. I have a feeling that this is not the most efficient method of file transfer. split --verbose -b 2000 big_file tnftp -r 45 -u ftp://username:passwd@server_name/htdocs/directory_path/ xaa xab xac xad . . . The above would fail after xaa was uploaded and about 1/3 or less of xab. So, I split up the individual file upload: tnftp -r 45 -u ftp://username:passwd@server_name/htdocs/directory_path/ xaa ; sleep 1m ; tnftp -r 45 -u ftp://username:passwd@server_name/htdocs/directory_path/ xab ; sleep ... ; etc. Does this make sense? -- Regards, Mick signature.asc Description: This is a digitally signed message part.
Re: [gentoo-user] checksumming files
On Sun, 2008-12-07 at 15:39 +, Mick wrote: On Friday 05 December 2008, Albert Hopkins wrote: On Thu, 2008-12-04 at 07:10 +, Mick wrote: Almost every time I split a large file 1G into say 200k chunks, then ftp it to a server and then: That's thousands of files! Have you gone mad?! Ha! small error in units . . . it is 200M (of course this is no disclaimer of me going/gone mad . . .) I think the server drops the connection above 230M file uploads or something like that, so I tried 200M files and it seems to work. cat 1 2 3 4 5 6 7 completefile ; md5sum -c completefile if fails. Checking the split files in turn I often find 1 or two chunks that fail on their own md5 checks. Despite that the concatenated file often works (e.g. if it is a video file it'll play alright). Let me understand this. Are [1..7] the split files or the checksums of the split files? They are the the split files which I concatenate into the complete file. Well, unless you made another error in your OP, you are using md5sum incorrectly. When you use -c, md5sum expects a file that is a list of files/checksums. For example $ dd if=/dev/urandom of=bigfile bs=1M count=5 5+0 records in 5+0 records out 5242880 bytes (5.2 MB) copied, 2.29361 s, 2.3 MB/s $ md5sum bigfile checksum # create checksum file $ split -b1M bigfile $ rm bigfile $ cat xa* bigfile $ # This is correct $ md5sum -c checksum bigfile: OK $ # This is wrong! $ md5sum -c bigfile md5sum: bigfile: no properly formatted MD5 checksum lines found [SNIP!] Maybe if you give the exact commands used I might understand this better. I have a feeling that this is not the most efficient method of file transfer. split --verbose -b 2000 big_file tnftp -r 45 -u ftp://username:passwd@server_name/htdocs/directory_path/ xaa xab xac xad . . . The above would fail after xaa was uploaded and about 1/3 or less of xab. So, I split up the individual file upload: tnftp -r 45 -u ftp://username:passwd@server_name/htdocs/directory_path/ xaa ; sleep 1m ; tnftp -r 45 -u ftp://username:passwd@server_name/htdocs/directory_path/ xab ; sleep ... ; etc. Does this make sense? Yes, but if you are truly using -c then it would make sense that you could get a checksum error but the file be ok. Here's how I would do it. I'm not saying you should do it this way. I'd use rsync. Rsync does file xfer has checksumming built-in. You say you split because you get disconnected, right? I'm not sure if rsync handles re-connects, but you can write a loop so that if rsync fails you continue where you left off: status=30 until [ $status -eq 0 ] ; do rsync --append-verify big_file server_name:/htdocs/directory_path/ status=$? done No splitting/concatenating and no need to checksum.
Re: [gentoo-user] checksumming files
On Sunday 07 December 2008, Albert Hopkins wrote: On Sun, 2008-12-07 at 15:39 +, Mick wrote: They are the the split files which I concatenate into the complete file. Well, unless you made another error in your OP, you are using md5sum incorrectly. When you use -c, md5sum expects a file that is a list of files/checksums. For example [snip...] Yes, but if you are truly using -c then it would make sense that you could get a checksum error but the file be ok. Sorry, yes I used the md5sum -c command correctly with the corresponding checksum file for the big_file. Here's how I would do it. I'm not saying you should do it this way. I'd use rsync. Rsync does file xfer has checksumming built-in. You say you split because you get disconnected, right? I'm not sure if rsync handles re-connects, but you can write a loop so that if rsync fails you continue where you left off: status=30 until [ $status -eq 0 ] ; do rsync --append-verify big_file server_name:/htdocs/directory_path/ status=$? done No splitting/concatenating and no need to checksum. Wouldn't the server need to have rsyncd running to be able to do that? Can I rsync to an ftp server? Also, how would I pass username/passwd on the command line so that the upload can take place unattended? Thank you for your replies. -- Regards, Mick signature.asc Description: This is a digitally signed message part.
Re: [gentoo-user] checksumming files
On Sun, 7 Dec 2008 17:56:07 +, Mick wrote: rsync --append-verify big_file server_name:/htdocs/directory_path/ status=$? Wouldn't the server need to have rsyncd running to be able to do that? Can I rsync to an ftp server? Also, how would I pass username/passwd on the command line so that the upload can take place unattended? You only need rsyncd running to use rsync::reponame type connections. The above command uses ssh to connect. -- Neil Bothwick Veni, vermini, vomui I came, I got ratted, I threw up signature.asc Description: PGP signature
Re: [gentoo-user] checksumming files
Am Freitag, 5. Dezember 2008 19:48:18 schrieb Mick: On Thursday 04 December 2008, Heinrichs, Dirk (EXT-Capgemini - DE/Dusseldorf) wrote: Did you make sure the chunks are transfered in binary mode? Aha!! Since the split chunks were part of a video file I assumed that it would be binary - and I understand that the default type (for tnftp) is binary? BTW, most modern FTP clients have a resume option, so there's no need to split. Yes, tnftp has the 'reget' command but I can't find a 'reput', or 'resume'? It also has 'restart': [...] but I am not sure how this works exactly. Would anyone be clued up on the intricacies of tnftp? Unfortunately not, never heard of it before. Anything else I could try? ncftp. This one also comes with ncftpget and ncftpput command line utilities. They use binary transfer as default and have resume capabilities. HTH... Dirk
Re: [gentoo-user] checksumming files
On Thursday 04 December 2008, Heinrichs, Dirk (EXT-Capgemini - DE/Dusseldorf) wrote: Did you make sure the chunks are transfered in binary mode? Aha!! Since the split chunks were part of a video file I assumed that it would be binary - and I understand that the default type (for tnftp) is binary? There's more to it: I use tnftp because it has an unattended feature which suits me nicely. A string like: sleep 90m ; tnftp -u ftp://username:passwd@server_address/htdocs/path \ files_to_upload will login after 90 minutes and upload the file(s) I want (not sure if/how I can do this with vanilla ftp). BTW, most modern FTP clients have a resume option, so there's no need to split. Yes, tnftp has the 'reget' command but I can't find a 'reput', or 'resume'? It also has 'restart': == restart marker Restart the immediately following get or put at the indicated marker. On UNIX systems, marker is usually a byte offset into the file. == but I am not sure how this works exactly. Would anyone be clued up on the intricacies of tnftp? Anything else I could try? -- Regards, Mick signature.asc Description: This is a digitally signed message part.
Re: [gentoo-user] checksumming files
On Thu, Dec 4, 2008 at 1:10 AM, Mick [EMAIL PROTECTED] wrote: Almost every time I split a large file 1G into say 200k chunks, then ftp it to a server and then: cat 1 2 3 4 5 6 7 completefile ; md5sum -c completefile if fails. Checking the split files in turn I often find 1 or two chunks that fail on their own md5 checks. Despite that the concatenated file often works (e.g. if it is a video file it'll play alright). Can you explain this? Should I be using a different check to verify the integrity of the ftp'd file? Obviously something is going wrong... without knowing why that, I suggest you emerge par2cmdline and use it to create some recovery blocks. That way you can repair/reassemble the pieces when they get to the other side.
Re: [gentoo-user] checksumming files
On Thu, 2008-12-04 at 07:10 +, Mick wrote: Almost every time I split a large file 1G into say 200k chunks, then ftp it to a server and then: That's thousands of files! Have you gone mad?! cat 1 2 3 4 5 6 7 completefile ; md5sum -c completefile if fails. Checking the split files in turn I often find 1 or two chunks that fail on their own md5 checks. Despite that the concatenated file often works (e.g. if it is a video file it'll play alright). Let me understand this. Are [1..7] the split files or the checksums of the split files? If the former then 'md5sum -c completefile' will fail with no properly formatted MD5 checksum lines found or similar due to the fact that completefile is not a list of checksums. If the latter, then how are you generating [1..7]? If you are using the split(1) command to split the files and are not passing at least -a 3 to it then your file is going to be truncated do to the fact that the suffix length is too small to accommodate the thousands of files needed to split a 1GB+ file into 200k chunks. You should get an error like split: Output file suffixes exhausted. Maybe if you give the exact commands used I might understand this better. I have a feeling that this is not the most efficient method of file transfer.
Re: [gentoo-user] checksumming files
On Thu, 4 Dec 2008 07:10:06 +, Mick wrote: Despite that the concatenated file often works (e.g. if it is a video file it'll play alright). Can you explain this? Should I be using a different check to verify the integrity of the ftp'd file? An MD5 check will fail if one bit is changed, which won't affect the playback of a video file. Try it with a large compressed tarball and you'll notice a difference. -- Neil Bothwick --T-A+G-L-I+N-E--+M-E-A+S-U-R+I-N-G+--G-A+U-G-E-- signature.asc Description: PGP signature
Re: [gentoo-user] checksumming files
Am Donnerstag, den 04.12.2008, 07:10 + schrieb ext Mick: Almost every time I split a large file 1G into say 200k chunks, then ftp it to a server and then: cat 1 2 3 4 5 6 7 completefile ; md5sum -c completefile if fails. Checking the split files in turn I often find 1 or two chunks that fail on their own md5 checks. Despite that the concatenated file often works (e.g. if it is a video file it'll play alright). Can you explain this? Should I be using a different check to verify the integrity of the ftp'd file? Did you make sure the chunks are transfered in binary mode? BTW, most modern FTP clients have a resume option, so there's no need to split. HTH... Dirk -- Dirk Heinrichs | Tel: +49 (0)162 234 3408 Configuration Manager | Fax: +49 (0)211 47068 111 Capgemini Deutschland | Mail: [EMAIL PROTECTED] Wanheimerstraße 68 | Web: http://www.capgemini.com D-40468 Düsseldorf | ICQ#: 110037733 GPG Public Key C2E467BB | Keyserver: www.keyserver.net signature.asc Description: Dies ist ein digital signierter Nachrichtenteil