Re: question re tar

2022-09-23 Thread Charles Curley
On Fri, 23 Sep 2022 19:18:35 -0400
The Wanderer  wrote:

> I think the question was about a way/place/method to manually add such
> headers from within Gmail, so that they can be present even when
> replying to a message from within the digest, so that replies can be
> made correctly while subscribed only to the digest and not to the full
> mailing list.

Are those headers even in the digest for each email (not the digest
email itself)?

-- 
Does anybody read signatures any more?

https://charlescurley.com
https://charlescurley.com/blog/



Re: question re tar

2022-09-23 Thread The Wanderer
On 2022-09-23 at 16:24, Thomas Schmitt wrote:

> Hi,
> 
> jr wrote:
>
>> I [...] cannot find an obvious (any!) place where headers could be set [...]
>> I [...] hope that someone can/will supply Gmail specific instructions
> 
> The normal way to participate is to subscribe your mail address at
>   https://lists.debian.org/debian-user/
> or by sending a mail to
>   debian-user-requ...@lists.debian.org
> with
>   Subject: subscribe
> 
> Then you get the mails from the list delivered to your mailbox from where
> your mail program is supposed to be able to reply, adding the appropriate
> headers.

I think the question was about a way/place/method to manually add such
headers from within Gmail, so that they can be present even when
replying to a message from within the digest, so that replies can be
made correctly while subscribed only to the digest and not to the full
mailing list.

I would not be surprised if there were not any such thing. Gmail, like
Web-mail in general, does not have a good reputation when it comes to
exposing the technical details of E-mail properly.

-- 
   The Wanderer

The reasonable man adapts himself to the world; the unreasonable one
persists in trying to adapt the world to himself. Therefore all
progress depends on the unreasonable man. -- George Bernard Shaw



signature.asc
Description: OpenPGP digital signature


Re: question re tar

2022-09-23 Thread Thomas Schmitt
Hi,

jr wrote:
> I [...] cannot find an obvious (any!) place where headers could be set [...]
> I [...] hope that someone can/will supply Gmail specific instructions

The normal way to participate is to subscribe your mail address at
  https://lists.debian.org/debian-user/
or by sending a mail to
  debian-user-requ...@lists.debian.org
with
  Subject: subscribe

Then you get the mails from the list delivered to your mailbox from where
your mail program is supposed to be able to reply, adding the appropriate
headers.


Have a nice day :)

Thomas



Re: question re tar

2022-09-22 Thread Max Nikulin

On 22/09/2022 14:00, jr wrote:

On Wednesday, 21 September 2022 at 17:20:05 UTC+1, Markus Schönhaber wrote:


Could you please stop using a mail client that starts a new thread with
every message you send?
Please use something instead that really creates a reply when you are
replying to someone (i. e. something that sets the
In-Reply-To/References message headers accordingly).


ouch.  I read the digest, and in that it appears as a single thread.
and I've just checked 'linux.debian.user' on Google Groups, there too
I see a single thread only.  cannot see "a new thread [started] with
every message", sorry.


Debian mail list archive has a rare mhonarc configuration that adds 
reply to list action (usually only reply to sender is available) and 
these mailto: links contain proper In-Reply-To value

https://lists.debian.org/debian-user/2022/09/msg00584.html

Even groups.google.com is unable to render proper discussion tree, its 
representation is a list.


Mail user agents may intentionally suppress threads based on heuristics 
to avoid combining unrelated discussions having subjects like "question" 
or "problem".




Re: question re tar

2022-09-22 Thread Charles Curley
On Thu, 22 Sep 2022 08:00:34 +0100
jr  wrote:

> ouch.  I read the digest,

That could be your problem. If you would subscribe as a regular user,
rather than to the digests (or in addition to) and reply to those
messages, you might solve the problem.

-- 
Does anybody read signatures any more?

https://charlescurley.com
https://charlescurley.com/blog/



Re: question re tar

2022-09-22 Thread Tixy
On Thu, 2022-09-22 at 08:00 +0100, jr wrote:
[...]
> > [a reply that isn't one]
> > 
> > Could you please stop using a mail client that starts a new thread with
> > every message you send?
> > Please use something instead that really creates a reply when you are
> > replying to someone (i. e. something that sets the
> > In-Reply-To/References message headers accordingly).
> 
> ouch.  I read the digest, and in that it appears as a single thread.
> and I've just checked 'linux.debian.user' on Google Groups, there too
> I see a single thread only.  cannot see "a new thread [started] with
> every message", sorry.

The lists archive [1] shows your replies as ''
presumably because it is making a guess based on something like the
subject line.

For a lot (most?) of us, your messages appear as the start of a new
thread because they don't have the standard headers that any decent
email client would add when replying to an email.

[1] https://lists.debian.org/debian-user/2022/09/thrd2.html

-- 
Tixy



Re: question re tar

2022-09-22 Thread Thomas Schmitt
Hi,

jr wrote:
> I see a single thread only.  cannot see "a new thread [started] with
> every message", sorry.

Your mails lack headers like "In-Reply-To:" or "References:", which
should tell the Message Id of the mail to which you answer resp. some
or all IDs of the thread. (My "References:" only has the Id of the
mail to which i answer.)

This is already defined in ye olde RFC 822, which is now replaced by
   https://datatracker.ietf.org/doc/html/rfc2822

The headers in the mail, where Markus Schönhaber complained about
the broken threading shows:

  References: 

  In-Reply-To: 

  Message-ID: <33ea0533-ae82-c939-1b91-77828f191...@list-post.mks-mail.de>

which correspond to the Message Id in your mail

  From: jr 
  Date: Wed, 21 Sep 2022 16:29:07 +0100
  Message-ID: 


A properly threaded reply to the mail of Markus Schönhaber would have
something like

  References: <33ea0533-ae82-c939-1b91-77828f191...@list-post.mks-mail.de>
  In-Reply-To: <33ea0533-ae82-c939-1b91-77828f191...@list-post.mks-mail.de>


Have a nice day :)

Thomas



Re: question re tar

2022-09-22 Thread jr
On Wednesday, 21 September 2022 at 17:10:05 UTC+1, Greg Wooledge wrote:
> On Wed, Sep 21, 2022 at 04:29:07PM +0100, jr wrote:
> > $ locate /jr/ |
> > > grep -v -e /.cache/ -e /tmp/ |
> >
> > that is (one way) how the "ACTUAL CONTENTS" are arrived at.
> That listing almost certainly includes subdirectory names. Hence your
> issues.
> ...
> That's why you're getting duplicates.
> ...
> As I told you several posts ago, if you're feeding "find" output -- or
> in your case, "locate" output, which has the exact same characteristics --
> to tar, then you need to use GNU tar's "--no-recursion" option.

had some time last night to try stuff.  and yes, the '--no-recursion'
option appears to be the one I need.  thank you.


> Also, it boggles my mind that you don't have a SINGLE file with spaces
> in its name in your entire home directory. Apparently, you don't run
> Steam or Google Chrome.

correct, I do not play computer games (prefer the newspaper puzzle
pages), and only use a (Chrome) browser on a Chromebook.  file names
with spaces, like eg 'Screenshot 2022-09-22 07.39.14.png' would get
renamed when they get copied/moved to the SAMBA share.


> ...
> Finally, your backup, being based on "locate" rather than "find", is going
> to miss any files that were created after the most recent updatedb run.
> I find this to be an incredibly strange choice. Maybe your files just
> aren't that important to you. I can't guess why.

simple really.  on all but two machines, new files are created ..
infrequently, so updating the locate db only once per day is no
problem.  one machine runs the same cron job, just more often, and the
one I write on is for interweb access only, disposable if you will.


# -

On Wednesday, 21 September 2022 at 17:20:05 UTC+1, Markus Schönhaber wrote:
> 21.09.22, 17:29 +0200, jr wrote:
>
> [a reply that isn't one]
>
> Could you please stop using a mail client that starts a new thread with
> every message you send?
> Please use something instead that really creates a reply when you are
> replying to someone (i. e. something that sets the
> In-Reply-To/References message headers accordingly).

ouch.  I read the digest, and in that it appears as a single thread.
and I've just checked 'linux.debian.user' on Google Groups, there too
I see a single thread only.  cannot see "a new thread [started] with
every message", sorry.


# -

On Wednesday, 21 September 2022 at 17:50:04 UTC+1, Michael Stone wrote:
> ... The premise is false. There are actually multiple implementations of
> "locate" available in debian (and more, historically) so just saying
> "locate" doesn't describe the implementation very well. ...

good point.

(and thanks for also pointing out the '--no-recursion' solution)



Re: question re tar

2022-09-21 Thread Michael Stone

On Wed, Sep 21, 2022 at 04:29:07PM +0100, jr wrote:

On Wednesday, 21 September 2022 at 13:10:05 UTC+1, Greg Wooledge wrote:

On Wed, Sep 21, 2022 at 12:31:58PM +0100, jr wrote:
> ...
> "What's in the file"
>
> file names, one per line. (and, before you ask, '\n' terminated lines)
This is not helpful. We want to see the ACTUAL CONTENTS so we can
look for DUPLICATES. How are you not understanding this?


oh dear, UPPERCASE.  copied from my previous post:
 $ locate /jr/ |
 > grep -v -e /.cache/ -e /tmp/ |
 > sed -e 's#/home/jr/##'

that is (one way) how the "ACTUAL CONTENTS" are arrived at.

and you may want to try to re-read my previous post, re locate,
database(s), and "DUPLICATES".


The premise is false. There are actually multiple implementations of 
"locate" available in debian (and more, historically) so just saying 
"locate" doesn't describe the implementation very well. (I.e., different 
implementations of locate will output different results.) Also, once you 
start rewriting the output, the supposition that "databases" have 
handled duplicates goes right out the window.


In general it seems weird to depend on any locate for this sort of thing 
rather than using find, because the results won't reflect the current 
state of the system. 

In this case it seems like an especially bad choice to use locate, 
because that command will probably list each matching directory as well 
as the contents of the directory. e.g.:


/home/jr/dir1
/home/jr/dir1/file1
/home/jr/dir2
/home/jr/dir2/file2

Much better if you want just a list of files would be to use `find 
/home/jr -type f` which would output:


/home/jr/dir1/file1
/home/jr/dir2/file2

find tends to be the right tool for anything other than an interactive 
file listing. It would be better yet to use `find /home/jr -type f 
-print0` which would output:


/home/jr/dir1/file1\0/home/jr/dir2/file2\0

[disclaimer: the rest of this assumes GNU tar; other tar implementations 
will have different behavior, capabilities, and options]


Which you could then use with tar's --null option and -T:

find /home/jr -type f -print0 | tar cf file.tar --null -T -

Alternatively, you could pass both filenames and directory names to tar, 
but add the --no-recursion flag to tar. Then tar wouldn't add the entire 
directory tree each time it sees a directory, followed by another copy 
of the files in each directory -- or, actually, a link to the original 
copy as an optimization of a strange request. Depending on your 
objectives this may be the better solution vs only sending filenames, if 
the permissions on the directories matter and should be preserved. 
(Though be aware that tar by default won't restore user/group if running 
as non-root, so you'd need to add --same-owner if that matters.)



sure.  I'm talking about a working environment, not a play-skool
situation where we .. make things up to pursue "hypothetical"s.
(again, see previous post) while names like '* file3 *' could,
conceivably, result from a poorly written command-line, they would be
removed/renamed immediately, to quote your good self: "How are you not
understanding this?"


He's fundamentally correct. Especially in a "working environment" it's
best to avoid constructions that are known to cause hard-to-diagnose
problems at unexpected times.


many programs support include/exclude lists, 'rsync' comes to mind.
the "nonsense" is quick + convenient on the command-line (don't know
about you, but I have no problems differentiating between a "one off"
command line and a to-be-used-frequently script, and adjust
accordingly)


Essentially all of them accept null-terminated file lists these days, 
specifically to avoid issues with filename ambiguity. (And most of the 
locate implementations will generate that sort of output!)




Re: question re tar

2022-09-21 Thread Markus Schönhaber

21.09.22, 17:29 +0200, jr wrote:

[a reply that isn't one]

Could you please stop using a mail client that starts a new thread with 
every message you send?
Please use something instead that really creates a reply when you are 
replying to someone (i. e. something that sets the 
In-Reply-To/References message headers accordingly).


--
Regards
   mks



Re: question re tar

2022-09-21 Thread Greg Wooledge
On Wed, Sep 21, 2022 at 04:29:07PM +0100, jr wrote:
>   $ locate /jr/ |
>   > grep -v -e /.cache/ -e /tmp/ |
>   > sed -e 's#/home/jr/##'
> 
> that is (one way) how the "ACTUAL CONTENTS" are arrived at.

That listing almost certainly includes subdirectory names.  Hence your
issues.

You're feeding both a filename *and* its containing directory name to
tar.  Tar recurses into the directory whose name you have fed it, and
then also grabs the file whose name you have fed it.

That's why you're getting duplicates.

As I told you several posts ago, if you're feeding "find" output -- or
in your case, "locate" output, which has the exact same characteristics --
to tar, then you need to use GNU tar's "--no-recursion" option.

Also, it boggles my mind that you don't have a SINGLE file with spaces
in its name in your entire home directory.  Apparently, you don't run
Steam or Google Chrome.

Consider this my final warning, that any backup scheme which fails to
backup files with spaces in their names is going to leave you with an
incomplete backup some day.  That day might even be today.

Finally, your backup, being based on "locate" rather than "find", is going
to miss any files that were created after the most recent updatedb run.
I find this to be an incredibly strange choice.  Maybe your files just
aren't that important to you.  I can't guess why.



Re: question re tar

2022-09-21 Thread jr
On Wednesday, 21 September 2022 at 13:10:05 UTC+1, Greg Wooledge wrote:
> On Wed, Sep 21, 2022 at 12:31:58PM +0100, jr wrote:
> > ...
> > "What's in the file"
> >
> > file names, one per line. (and, before you ask, '\n' terminated lines)
> This is not helpful. We want to see the ACTUAL CONTENTS so we can
> look for DUPLICATES. How are you not understanding this?

oh dear, UPPERCASE.  copied from my previous post:
  $ locate /jr/ |
  > grep -v -e /.cache/ -e /tmp/ |
  > sed -e 's#/home/jr/##'

that is (one way) how the "ACTUAL CONTENTS" are arrived at.

and you may want to try to re-read my previous post, re locate,
database(s), and "DUPLICATES".


> > "your approach ... is utterly crap"
> >
> > charming.  (and yet, in spite of your .. low opinion, I do get by 
> > ;-))
> Do you need a DEMONSTRATION of how it is broken and wrong?
>
> unicorn:/tmp/x$ mkdir -p sub1/sub2
> unicorn:/tmp/x$ touch sub1/sub2/{file1,file2,'* file3 *'}
> unicorn:/tmp/x$ not_safe=$(find . -type f)
> unicorn:/tmp/x$ tar cf ../foo.tar $not_safe

sure.  I'm talking about a working environment, not a play-skool
situation where we .. make things up to pursue "hypothetical"s.
(again, see previous post) while names like '* file3 *' could,
conceivably, result from a poorly written command-line, they would be
removed/renamed immediately, to quote your good self: "How are you not
understanding this?"


> ...
> See how this is YET ANOTHER way we can reproduce your original symptoms?




> You "get by" because you've been getting LUCKY so far.

luck may be involved, who knows, but there's care and attention to
details, too.  (not that I think you'd want to know )


> Bash has array variables. You can use array variables to hold lists,
> safely, including all possible filenames.

they (arrays) have their uses.  for a simple list of something like
file names in a script, I'm happy to just use 'mktemp' and put the
list in the file.


> GNU tar has --files-from which you can use to point to a file that
> contains a list of filenames one per line. You can use that instead of
> your incredibly broken $(cat $unquoted_filename) nonsense.

many programs support include/exclude lists, 'rsync' comes to mind.
the "nonsense" is quick + convenient on the command-line (don't know
about you, but I have no problems differentiating between a "one off"
command line and a to-be-used-frequently script, and adjust
accordingly)


> But clearly you're not terribly invested in finding the answers to your
> problems, and I'm running out of patience with you, so... good luck.

the problem, in this case, as Tomas pointed out, may be the underlying
file system.
so, rather than SHOUTING and conjecturing and being .. brash, why
don't you "put up"?  try for yourself, on a Debian VM as provided on a
Chromebook, and _then_, perhaps, if you do find cause, you can start
denigrating me.  until then I'd say wear a Stetson, instead of a
too-small baseball cap ;-).



Re: question re tar

2022-09-21 Thread Greg Wooledge
On Wed, Sep 21, 2022 at 12:31:58PM +0100, jr wrote:
> > > $ tar -cvWf $arcname $fnames
> > >
> > > where $fnames initially was a list in a variable (this is preparing a
> > > shell script), then I switched to storing in those in a file and
> > > $ tar -cvWf $arcname $(cat $fname_list_file)
> > So... what's in these variables? What's in the file pointed to by
> > the last variable?
> 
> "what's in these variables?"
> 
> $arcname := name of archive, eg '/tmp/220921_bkup.tar'.
> $fname_list_file := the name of a file much like the previous.
> 
> "What's in the file"
> 
> file names, one per line.  (and, before you ask, '\n' terminated lines)

This is not helpful.  We want to see the ACTUAL CONTENTS so we can
look for DUPLICATES.  How are you not understanding this?

> "your approach ... is utterly crap"
> 
> charming.(and yet, in spite of your .. low opinion, I do get by 
> ;-))

Do you need a DEMONSTRATION of how it is broken and wrong?

unicorn:/tmp/x$ mkdir -p sub1/sub2
unicorn:/tmp/x$ touch sub1/sub2/{file1,file2,'* file3 *'}
unicorn:/tmp/x$ not_safe=$(find . -type f)
unicorn:/tmp/x$ tar cf ../foo.tar $not_safe
tar: file3: Cannot stat: No such file or directory
tar: Exiting with failure status due to previous errors
unicorn:/tmp/x$ tar tvf ../foo.tar
-rw-r--r-- greg/greg 0 2022-09-21 07:54 ./sub1/sub2/file2
-rw-r--r-- greg/greg 0 2022-09-21 07:54 ./sub1/sub2/file1
hrw-r--r-- greg/greg 0 2022-09-21 07:54 ./sub1/sub2/file1 link to 
./sub1/sub2/file1
hrw-r--r-- greg/greg 0 2022-09-21 07:54 ./sub1/sub2/file2 link to 
./sub1/sub2/file2
-rw-r--r-- greg/greg 0 2022-09-21 07:54 ./sub1/sub2/* file3 *
drwxr-xr-x greg/greg 0 2022-09-21 07:54 sub1/
drwxr-xr-x greg/greg 0 2022-09-21 07:54 sub1/sub2/
hrw-r--r-- greg/greg 0 2022-09-21 07:54 sub1/sub2/file2 link to 
./sub1/sub2/file2
hrw-r--r-- greg/greg 0 2022-09-21 07:54 sub1/sub2/file1 link to 
./sub1/sub2/file1
hrw-r--r-- greg/greg 0 2022-09-21 07:54 sub1/sub2/* file3 * link to 
./sub1/sub2/* file3 *

See how this is YET ANOTHER way we can reproduce your original symptoms?

You "get by" because you've been getting LUCKY so far.

Bash has array variables.  You can use array variables to hold lists,
safely, including all possible filenames.

GNU tar has --files-from which you can use to point to a file that
contains a list of filenames one per line.  You can use that instead of
your incredibly broken $(cat $unquoted_filename) nonsense.

But clearly you're not terribly invested in finding the answers to your
problems, and I'm running out of patience with you, so... good luck.



Re: question re tar

2022-09-21 Thread jr
hi,

On Wednesday, 21 September 2022 at 01:40:05 UTC+1, Greg Wooledge wrote:
> On Tue, Sep 20, 2022 at 10:19:50PM +0100, jr wrote:
> > On Tuesday, 20 September 2022 at 12:30:05 UTC+1, Greg Wooledge wrote:
> > > ...
> > > With this new information, it occurs to me that perhaps the OP did
> > > something like this: ...
> > > unicorn:/tmp/x$ find . | tar cv --files-from=- -f ../foo.tar
> >
> > I prefer 'locate' to 'find'. an no guessing involved, as I wrote on
> > the 18th, the invocation was:
> > $ tar -cvWf $arcname $fnames
> >
> > where $fnames initially was a list in a variable (this is preparing a
> > shell script), then I switched to storing in those in a file and
> > $ tar -cvWf $arcname $(cat $fname_list_file)
> So... what's in these variables? What's in the file pointed to by
> the last variable?

"what's in these variables?"

$arcname := name of archive, eg '/tmp/220921_bkup.tar'.
$fname_list_file := the name of a file much like the previous.

"What's in the file"

file names, one per line.  (and, before you ask, '\n' terminated lines)


> In general, your approach of "storing a list in a string variable using
> spaces and word splitting" is utterly crap and will NOT work with
> arbitrary filenames. Filenames can contain spaces and globbing
> characters.

fwiw, in my personal space(s) I have long since learned the value of
.. good hygiene, and neither space characters nor wildcards "allowed".
that leaves files with "exotic" names (some utf8); I have none at
present but am aware that there are mount options that would help in
such a situation.

"your approach ... is utterly crap"

charming.(and yet, in spite of your .. low opinion, I do get by ;-))


> But that's probably not the main issue -- unless one of your filenames
> happens to contain an asterisk or two.

if a mis-typed command line should lead to such a name being created,
the file would be renamed, or removed, soonest.


> It's just as likely that your "list" has duplicates in it.

(reading comprehension, we ought to talk that topic sometime :-))

as I wrote, I use 'locate'.  that is database-backed[*], and, afaik,
there are no "issues" with this (mature) software.

[*] the advantage of using a database -- if done properly, there's no
duplication of _anything_.

btw, I find 'locate' also makes all sorts of ad-hoc stuff easy.  one
of the s/wares I use uses .inc files, so if I wanted to have a quick
look, I could (from any working directory) simply:
  $ less $(locate thing.inc)

and yes, it does require some knowledge of one's system and files, the
naming, etc, but you'll need that anyway.


> If you'd show the content of those variables (and the file) maybe we
> would know.

typically I'd use a pipeline of commands.  a 'locate', followed by a
'grep -v' with one or more expressions to filter out unwanted,
followed, depending on use, perhaps by a 'tr' if I don't need/want
newlines.  I may also get rid of a common prefix with an 'sed'.
example (typed, not copy/paste):
  $ locate /jr/ |
  > grep -v -e /.cache/ -e /tmp/ |
  > sed -e 's#/home/jr/##'

would give me a "nice" clean list of relative file names, to use in
some way, for instance to pass to tar.


and lastly, just because I find it .. interesting, meant to say that
one can get a good impression of an online stranger, in a situation
like this, when one looks at what gets snipped, and when, and or gets
passed over.  my impression now is that you're the type of person who
wears a MAGA baseball cap at the dinner table, am I close? :-)



Re: question re tar

2022-09-20 Thread Greg Wooledge
On Tue, Sep 20, 2022 at 10:19:50PM +0100, jr wrote:
> hi,
> 
> On Tuesday, 20 September 2022 at 12:30:05 UTC+1, Greg Wooledge wrote:
> > ...
> > With this new information, it occurs to me that perhaps the OP did
> > something like this: ...
> > unicorn:/tmp/x$ find . | tar cv --files-from=- -f ../foo.tar
> 
> I prefer 'locate' to 'find'.  an no  guessing involved, as I wrote on
> the 18th, the invocation was:
>   $ tar -cvWf $arcname $fnames
> 
> where $fnames initially was a list in a variable (this is preparing a
> shell script), then I switched to storing in those in a file and
>   $ tar -cvWf $arcname $(cat $fname_list_file)

So... what's in these variables?  What's in the file pointed to by
the last variable?

In general, your approach of "storing a list in a string variable using
spaces and word splitting" is utterly crap and will NOT work with
arbitrary filenames.  Filenames can contain spaces and globbing
characters.

But that's probably not the main issue -- unless one of your filenames
happens to contain an asterisk or two.

It's just as likely that your "list" has duplicates in it.

If you'd show the content of those variables (and the file) maybe we
would know.



Re: question re tar

2022-09-20 Thread jr
hi,

On Tuesday, 20 September 2022 at 12:30:05 UTC+1, Greg Wooledge wrote:
> ...
> With this new information, it occurs to me that perhaps the OP did
> something like this: ...
> unicorn:/tmp/x$ find . | tar cv --files-from=- -f ../foo.tar

I prefer 'locate' to 'find'.  an no  guessing involved, as I wrote on
the 18th, the invocation was:
  $ tar -cvWf $arcname $fnames

where $fnames initially was a list in a variable (this is preparing a
shell script), then I switched to storing in those in a file and
  $ tar -cvWf $arcname $(cat $fname_list_file)

> ...
> For kicks, I'll also give option #3, which is:
> 3) Use cpio or pax, which are designed to accept find's output. Not tar.

:-)

> ...
> Finally, I'll end by citing an earlier message:
> > What are the *exact* tar commands that you used, to create the archive,
> > and to get that partial listing that you gave?

ah, the partial listing, I omitted to mention that it was simply
clipped from a 'script' recorded file.

> There's a reason we ask that kind of question. If this message turns
> out to be an accurate diagnosis of the problem, then the fact that the
> OP refused to give us their exact tar commands led to a huge delay in
> getting their answer.

assume I'm the OP.  refused?  (I guess you could have missed the 18th
Sep 2100h post)  anyway, appreciate you digging into the details.
have learned (a little) about 'tar' (who'd have thought? :-))



Re: question re tar

2022-09-20 Thread Greg Wooledge
> It seems some files are present multiple times in your list.
> 
> echo text >file.txt
> tar cvWf test.tar file.txt file.txt
> tar tvf test.tar

Sorry, I deleted this message, and then had a thought a few minutes later,
so I'm quoting text from the mailing list archive.

With this new information, it occurs to me that perhaps the OP did
something like this:

unicorn:/tmp/x$ echo hello > file.txt
unicorn:/tmp/x$ ls -l
total 4
-rw-r--r-- 1 greg greg 6 Sep 20 07:05 file.txt
unicorn:/tmp/x$ find . | tar cv --files-from=- -f ../foo.tar
./
./file.txt
./file.txt
unicorn:/tmp/x$ tar tvf ../foo.tar
drwxr-xr-x greg/greg 0 2022-09-20 07:08 ./
-rw-r--r-- greg/greg 6 2022-09-20 07:05 ./file.txt
hrw-r--r-- greg/greg 0 2022-09-20 07:05 ./file.txt link to ./file.txt

Feeding an *unfiltered* "find ." listing to tar --files-from=- will
cause this behavior, because find supplies the directory names as well
as the file names.  tar will recurse into the directories (specified by
name), and then also grab the files (specified by name).

If you want to avoid that, you have two (well, more than two) choices:

1) Tell find not to supply directory names.  This means you won't archive
   any empty directories.

2) Tell tar not to recurse into named directories.

For #1 you'd have:

unicorn:/tmp/x$ find . -type f | tar cv --files-from=- -f ../bar.tar
./file.txt
unicorn:/tmp/x$ tar tvf ../bar.tar
-rw-r--r-- greg/greg 6 2022-09-20 07:05 ./file.txt

For #2 it would be:

unicorn:/tmp/x$ find . | tar cv --no-recursion --files-from=- -f ../baz.tar
./
./file.txt
unicorn:/tmp/x$ tar tvf ../baz.tar 
drwxr-xr-x greg/greg 0 2022-09-20 07:08 ./
-rw-r--r-- greg/greg 6 2022-09-20 07:05 ./file.txt

For kicks, I'll also give option #3, which is:

3) Use cpio or pax, which are designed to accept find's output.  Not tar.

But of course nobody will ever listen to that.  tar is *much* too popular,
to the point where most people don't even realize there are other choices.

Finally, I'll end by citing an earlier message:

> What are the *exact* tar commands that you used, to create the archive,
> and to get that partial listing that you gave?

There's a reason we ask that kind of question.  If this message turns
out to be an accurate diagnosis of the problem, then the fact that the
OP refused to give us their exact tar commands led to a huge delay in
getting their answer.



Re: question re tar

2022-09-20 Thread Thomas Schmitt
Hi,

Max Nikulin wrote:
> It seems some files are present multiple times in your list.
> tar cvWf test.tar file.txt file.txt

Well if it is that easy to create the situation, i can test what happens
on restoring the tarball:

  $ tar cvf test.tar x x
  x
  x
  $ rm x
  rm: remove regular file ‘x’? y
  $ tar xvf test.tar
  x
  x
  $ ls -1
  test.tar
  x

No error messages.

This is the behavior promised by
  https://www.gnu.org/software/tar/manual/html_node/multiple.html
for the case of later adding a newer version of the same file to the
tarball.


Have a nice day :)

Thomas



Re: question re tar

2022-09-20 Thread Max Nikulin

On 19/09/2022 02:37, jr wrote:


when I saw the links and started investigating, I tried cat for the names, ie
   $ tar -cvWf $arcname $(cat $fnames)

adding one or two file names on the command line works as expected,
supplying names from list and or file produces those links.


It seems some files are present multiple times in your list.

echo text >file.txt
tar cvWf test.tar file.txt file.txt
tar tvf test.tar

-rw-rw-r-- user/user   5 2022-09-20 17:43 file.txt
hrw-rw-r-- user/user   0 2022-09-20 17:43 file.txt link to file.txt



Re: question re tar

2022-09-19 Thread jr
On Monday, 19 September 2022 at 10:10:05 UTC+1, Thomas Schmitt wrote:
> ...
> But you could create a small ext filesystem in a file, mount it and make
> experiments with it.

oh, that's an excellent suggestion.  thanks.  will do that in the coming days.

> > the "machine" is a VM, pre-installed by Google, and it has more mounts
> > than dog has fleas :-) (but '/' says is on btrfs)
> I know that flea effect from ZFS on Solaris. It makes the mount command
> nearly unusable for information gathering.




On Monday, 19 September 2022 at 12:10:05 UTC+1, Greg Wooledge wrote:
> Since none of us can reproduce your archive, only you are in a position
> to test that. Doing it in a subdirectory of /tmp or /var/tmp ought to
> be harmless enough. You can just nuke that subdirectory when you're
> done with it.

should be safe enough, agree.  (and could then copy from tmpfs type fs
to the partition with /home.  decided to go with Tomas' idea.  thanks)



Re: question re tar

2022-09-19 Thread Greg Wooledge
On Mon, Sep 19, 2022 at 08:24:08AM +0100, jr wrote:
> _thank you_.   another question, if you don't mind: what will happen
> if I extract such an archive on a "normal" computer with ext3/4
> filesystems? (don't want to .. experiment with this)

Since none of us can reproduce your archive, only you are in a position
to test that.  Doing it in a subdirectory of /tmp or /var/tmp ought to
be harmless enough.  You can just nuke that subdirectory when you're
done with it.



Re: question re tar

2022-09-19 Thread Thomas Schmitt
Hi,

i wrote:
> > test/hardlinks/hardlink_x link to u/test/hardlinks/x

This comes when i edit my experiment output to remove unnecessary
local information. I forgot to remove that last "u/".


jr wrote:
> what will happen
> if I extract such an archive on a "normal" computer with ext3/4
> filesystems?

Interesting question. The identical names might cause problems.


> (don't want to .. experiment with this)

Since i have no experience with btrfs, i cannot create such a tarball.

But you could create a small ext filesystem in a file, mount it and make
experiments with it.
From the view of the Linux kernel the expectable unpacking activities of
tar should not be too exotic. I am quite sure that attempts to link a
file to itself have happened in the last 25+ years:

  $ ln x x
  ln: failed to create hard link ‘x’: File exists


> the "machine" is a VM, pre-installed by Google, and it has more mounts
> than  dog has fleas :-)  (but '/' says is on btrfs)

I know that flea effect from ZFS on Solaris. It makes the mount command
nearly unusable for information gathering.


Have a nice day :)

Thomas



Re: question re tar

2022-09-19 Thread jr
hi,

On Sun, 18 Sept 2022 at 21:39, Thomas Schmitt  wrote:
> Will Mengarini wrote:
> > Note that the file-type character "h" (the leftmost character in your
> > second line of output) isn't documented ...
> The 'h' probably comes from {...}
> which converts tar file type LNKTYPE to 'h'.

thanks.  (yes, no documentation..)


> It's tar which does it by the (dev,ino) comparison in dump_hard_link().
> I have a test case from xorriso development: ...
>   $ tar cf - test/hardlinks | tar tvf -
>   drwxr-xr-x thomas/thomas 0 2009-05-18 19:57 test/hardlinks/
>   -rw-r--r-- thomas/thomas 42786 2008-11-14 09:44 test/hardlinks/x
>   hrw-r--r-- thomas/thomas 0 2008-11-14 09:44 test/hardlinks/hardlink_x 
> link to u/test/hardlinks/x

_thank you_.   another question, if you don't mind: what will happen
if I extract such an archive on a "normal" computer with ext3/4
filesystems? (don't want to .. experiment with this)


> The question remains why jr's tar records two files with the same path
> as a pair of hardlinks. (I place my bet on btrfs snapshots.)

the "machine" is a VM, pre-installed by Google, and it has more mounts
than  dog has fleas :-)  (but '/' says is on btrfs)



Re: question re tar

2022-09-18 Thread Thomas Schmitt
Hi,

Will Mengarini wrote:
> Note that the file-type character "h" (the leftmost character in your
> second line of output) isn't documented in
> ,

The 'h' probably comes from
  https://sources.debian.org/src/tar/1.34%2Bdfsg-1/src/list.c/#L1188
which converts tar file type LNKTYPE to 'h'.

This type is set when dump_hard_link() detects that there is already
a file in the archive with the same device and inode number:
  https://sources.debian.org/src/tar/1.34+dfsg-1/src/create.c/?hl=1519#L1483

The only caller of dump_hard_link() is dump_file0(), where it happens
unconditionally.

(Thanks go to https://codesearch.debian.net )


> I'm not aware that ext$i filesystems can distinguish hard
> links from original names.

It's tar which does it by the (dev,ino) comparison in dump_hard_link().
I have a test case from xorriso development:

  $ ls -l test/hardlinks
  total 88
  -rw-r--r-- 2 thomas thomas 42786 Nov 14  2008 hardlink_x
  -rw-r--r-- 2 thomas thomas 42786 Nov 14  2008 x

Let's see what tar does with it:

  $ tar cf - test/hardlinks | tar tvf -
  drwxr-xr-x thomas/thomas 0 2009-05-18 19:57 test/hardlinks/
  -rw-r--r-- thomas/thomas 42786 2008-11-14 09:44 test/hardlinks/x
  hrw-r--r-- thomas/thomas 0 2008-11-14 09:44 test/hardlinks/hardlink_x 
link to u/test/hardlinks/x

I assume that it depends on the quite random order of file listing
by readdir(3) which of both link siblings keeps type '-' and which
becomes 'h' in the tarball.


The question remains why jr's tar records two files with the same path
as a pair of hardlinks. (I place my bet on btrfs snapshots.)


Have a nice day :)

Thomas



Re: question re tar

2022-09-18 Thread Will Mengarini
* jr  [22-09/18=Su 12:59 +0100]:
> When I create an archive with '-cvWf' I'm used to finding only the files
> specified, but every time I use 'tar' on this Debian, there is a "link" for
> each and every file.  Why?  eg:
> -rw--- jr/jr 256 2022-06-1  22:10 .config/pulse/cookie
> hrw--- jr/jr   0 2022-06-12 22:10 .config/pulse/cookie link to
> .config/pulse/cookie

Note that the file-type character "h" (the leftmost character in your
second line of output) isn't documented in
,
which is presumably the most recent documentation.  So I wonder
whether there's some new feature that's not being correctly handled.

I'm not aware that ext$i filesystems can distinguish hard
links from original names.  If some other filesystems can
do so (and if these are hard links), then it's possible
some subsystem is "helping" you by secretly creating these
links to enable data recovery after accidental deletion.



Re: question re tar

2022-09-18 Thread jr
hi,

> What kind of file system are the files sitting on?

btrfs.  the system is the pre-installed (on Chromebooks) Debian VM.

> What are the *exact* tar commands that you used, to create the archive,
> and to get that partial listing that you gave?

initially from a shell script, assembling a list of file names and
passing those, ie
  tar -cvWf $arcname $fnames

when I saw the links and started investigating, I tried cat for the names, ie
  $ tar -cvWf $arcname $(cat $fnames)

adding one or two file names on the command line works as expected,
supplying names from list and or file produces those links.

(I only get the digest, sorry for delays)



Re: question re tar

2022-09-18 Thread Charles Curley
On Sun, 18 Sep 2022 12:59:21 +0100
jr  wrote:

> I hope someone can help me to make 'tar' "behave" as expected.  tia.
> 
> when I create an archive with '-cvWf' I'm used to finding only the
> files specified, but every time I use 'tar' on this Debian, there is
> a "link" for each and every file.  why?  eg:
> -rw--- jr/jr 256 2022-06-1  22:10 .config/pulse/cookie
> hrw--- jr/jr   0 2022-06-12 22:10 .config/pulse/cookie link to
> .config/pulse/cookie

Well, just for the halibut I did the same, and did not get the links.

This may be a silly question, but did you check to be sure that those
symlinks aren't in the original file system? 'ls -al' should show any
symlinks.

charles@hawk:~/versioned/tle$ ll
total 40
drwxr-xr-x  3 charles charles 4096 Sep 14 11:57 ./
drwxr-xr-x 15 charles charles 4096 Aug  3 12:41 ../
drwxr-xr-x  8 charles charles 4096 Sep 14 11:59 .git/
-rwxr--r--  1 charles charles  374 Aug  1 13:48 install.sh*
-rwxr--r--  1 charles charles 1210 Jul 17 08:13 tle.compose.announce.sh*
-rwxr--r--  1 charles charles 1277 Jul  8 14:01 tledate.sh*
-rwxr--r--  1 charles charles 3742 Aug 23 14:29 tlenew.archive.index.sh*
-rwxr--r--  1 charles charles 4901 Sep 14 11:57 tlenew.page.sh*
-rwxr--r--  1 charles charles 3787 Sep 12 13:23 tlenew.week.sh*
charles@hawk:~/versioned/tle$ tar -cvWf ../test.tar *.sh
install.sh
tle.compose.announce.sh
tledate.sh
tlenew.archive.index.sh
tlenew.page.sh
tlenew.week.sh
Verify install.sh
Verify tle.compose.announce.sh
Verify tledate.sh
Verify tlenew.archive.index.sh
Verify tlenew.page.sh
Verify tlenew.week.sh
charles@hawk:~/versioned/tle$ tar tvf ../test.tar 
-rwxr--r-- charles/charles 374 2022-08-01 13:48 install.sh
-rwxr--r-- charles/charles 1210 2022-07-17 08:13 tle.compose.announce.sh
-rwxr--r-- charles/charles 1277 2022-07-08 14:01 tledate.sh
-rwxr--r-- charles/charles 3742 2022-08-23 14:29 tlenew.archive.index.sh
-rwxr--r-- charles/charles 4901 2022-09-14 11:57 tlenew.page.sh
-rwxr--r-- charles/charles 3787 2022-09-12 13:23 tlenew.week.sh
charles@hawk:~/versioned/tle$ tar --version
tar (GNU tar) 1.34
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later .
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by John Gilmore and Jay Fenlason.
charles@hawk:~/versioned/tle$ 

Also on Debian 11 as updated. The underlying file system is ext4.


-- 
Does anybody read signatures any more?

https://charlescurley.com
https://charlescurley.com/blog/



Re: question re tar

2022-09-18 Thread Greg Wooledge
On Sun, Sep 18, 2022 at 12:59:21PM +0100, jr wrote:
> when I create an archive with '-cvWf' I'm used to finding only the files
> specified, but every time I use 'tar' on this Debian, there is a "link" for
> each and every file.  why?  eg:
> -rw--- jr/jr 256 2022-06-1  22:10 .config/pulse/cookie
> hrw--- jr/jr   0 2022-06-12 22:10 .config/pulse/cookie link to
> .config/pulse/cookie

-W is new to me, but the man page says it just means "verify the
archive after writing it", so that doesn't sound relevant.

What kind of file system are the files sitting on?

What are the *exact* tar commands that you used, to create the archive,
and to get that partial listing that you gave?

I can't reproduce your issue on my system, from the information you
provided:

unicorn:~$ tar -cvWf foo.tar .cache/fontconfig/CACHEDIR.TAG 
.cache/fontconfig/CACHEDIR.TAG
Verify .cache/fontconfig/CACHEDIR.TAG
unicorn:~$ ls -l foo.tar
-rw-r--r-- 1 greg greg 10240 Sep 18 09:09 foo.tar
unicorn:~$ tar tvf foo.tar
-rw-r--r-- greg/greg   200 2015-04-22 12:26 .cache/fontconfig/CACHEDIR.TAG

> $ cat /etc/debian_version
> 11.5
> $ tar --version
> tar (GNU tar) 1.34
> Copyright (C) 2021 Free Software Foundation, Inc.
> [...]

Same versions here.  Whatever's causing it is coming from something else.