Re: [v2] introduction of content_id

2018-02-09 Thread Konstantin Ryabitsev
On 02/09/18 13:17, Eric Wong wrote: > In addition to the git object_id (blob SHA-1) and Message-Id > header; it seems necessary to introduce an in-between identifier > for deduplicating which isn't as loose as Message-Id or as > strict as object_id: content_id > > I think a hash of the following

[v2] introduction of content_id

2018-02-09 Thread Eric Wong
In addition to the git object_id (blob SHA-1) and Message-Id header; it seems necessary to introduce an in-between identifier for deduplicating which isn't as loose as Message-Id or as strict as object_id: content_id I think a hash of the following raw headers + raw body will suffice:

[v2] one file to rule them all?

2018-02-09 Thread Eric Wong
Since 95acd5901491e4f333f5d2bbeed6fb5e6b53e07c ("searchmsg: add git object ID to doc_data") the need for having file stored in trees is reduced since Xapian stores the git object_id and asks git to retrieve it without doing tree lookups. So, as long as git knows an object exists, it should be no

Re: [v2] introduction of content_id

2018-02-09 Thread Eric Wong
Konstantin Ryabitsev wrote: > On 02/09/18 13:17, Eric Wong wrote: > > In addition to the git object_id (blob SHA-1) and Message-Id > > header; it seems necessary to introduce an in-between identifier > > for deduplicating which isn't as loose as Message-Id or as >