Re: [DNSOP] Benjamin Kaduk's Discuss on draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT)

2018-11-29 Thread Benjamin Kaduk
On Thu, Nov 29, 2018 at 03:43:20PM +, Sara Dickinson wrote:
> 
> > On 24 Nov 2018, at 03:58, Benjamin Kaduk  > > wrote:
> > 
> > On Thu, Nov 22, 2018 at 12:01:00PM +, Sara Dickinson wrote:
> >>> 
> >>> Section 7.4.1.1.1
> >>> 
> >>> Am I parsing the "query-response-hints" text correctly to say that a bit 
> >>> is
> >>> set in the bitmap if the corresponding field is recorded (if present) by
> >>> the collecting implementation?  The causality of "if the field is omitted
> >>> the bit is unset" goes in a direction that is not what I expected.
> >>> (Similarly for the other fields in this table.)
> >> 
> >> ekr picked up on the same point - as responded to him:
> >> 
> >> "The issue is that if the bit is set the field might still be missing 
> >> because although the configuration was set to collect it the data wasn’t 
> >> available to the encoder from some other reason. However when the bit is 
> >> not set it means that the data will definitely not be present because the 
> >> collector is configured not to collect it. 
> >> 
> >> We do discuss this problem in section 6.2.1 - perhaps a reference in the 
> >> table back to that discussion is what is needed?”
> >> 
> >> Looking again I think a slight update to the text in 6.2.1 might help too:
> >> 
> >> OLD:
> >> “The Storage Parameters therefore also contains a Storage Hints item
> >>  which specifies which items the encoder of the file omits from the
> >>  stored data."
> >> 
> >> NEW: “The Storage Parameters therefore also contains a Storage Hints item
> >>  which specifies which items the encoder of the file omits from the
> >>  stored data and will therefore never be present. (This approach is taken 
> >> because a flag that indicated which items were included for collection 
> >> would 
> >> not guarantee that the item was present, only that it might be.) "
> > 
> > This text helps, but I think it is not quite what I was going after -- that
> > is, when I think of a "hint" that feels like something active and that
> > would be indicated by setting a bit to one.  In this design, the hints
> > about what are *omitted* are the bits that are *zero*, which is
> > counter-intuitive, at least to me.  So maybe we could say (in 7.4.1.1.1, in
> > addition to your suggested change in 6.2.1):
> > 
> > Hints indicating which "QueryResponse" fields are candidates for capture or
> > omitted, see section 7.6.  If a bit is unset, that field is omitted from
> > the capture.
> 
> Ah, ok I see the confusion now and yes - this text improves the draft - thank 
> you!
> 
> > 
> >> 
> >>> 
> >>> Section 7.4.2
> >>> 
> >>> Do we need a reference for "promiscuous mode”?
> >> 
> >> Promiscuous mode is discussed on the main PCAP manpage…. Hopefully a way
> >> will be found to address the question of a suitable reference format for
> >> PCAP material.
> >> 
> >>> 
> >>> Just to check: in "server-addresses", I just infer the IP version from the
> >>> length of the byte string?
> >> 
> >> As mentioned in the DISCUSS response, we probably need to make the 
> >> transport flags mandatory.
> >> 
> >>> 
> >>> Do we need to say more about where the vlan-ids identifiers are taken 
> >>> from?
> >> 
> >> Suggest: 
> >> 
> >> OLD: “ | vlan-ids | O | A | Array of identifiers (of type unsigned 
> >> |
> >>  |  |   |   | integer) of VLANs selected for |
> >>  |  |   |   | collection. “
> >> 
> >> NEW: “ | vlan-ids | O | A | User specified array of identifiers 
> >> (of type unsigned |
> >>  |  |   |   | integer) of VLANs  [IEEE 802.1Q] selected 
> >> for |
> >>  |  |   |   | collection.  "
> > 
> > It seems likely to me that we want to say that the actual VLAN ID values
> > are only unique within an administrative domain.
> 
> OK - yes, makes sense.
> 
> > 
> >>> 
> >>> Is the "generator-id" string intended to only be human readable?  Only
> >>> within a specific (administrative) context?
> >> 
> >> The generator ID is intended only to identify the collecting
> >> application. Specifying that it is human-readable (if present) seems a
> >> good idea. Would this be sufficient?
> >> 
> >> OLD: "String identifying the collection method.”
> >> NEW: “User specified human-readable string identifying the collection 
> >> method."
> > 
> > Does "user-specified" mean that only the user is responsible for reading it
> > later (or would we want it to make sense even when the data is conveyed to
> > some other party)?
> 
> Yes - that’s correct. Maybe 'implementation specific' is better?

I think that's more explicit about what scope we should expect.
(But of course this is all in the non-blocking comment section, so your
judgment takes precedence over mine.)

> > If so, this would be enough for to address my comment, but then Ben's
> > comment about internationalization concerns would come into play.
> 
> Sorry - I missed that comment - could you clarify? I’m not sure how I see 

Re: [DNSOP] Benjamin Kaduk's Discuss on draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT)

2018-11-29 Thread Sara Dickinson

> On 24 Nov 2018, at 03:58, Benjamin Kaduk  > wrote:
> 
> On Thu, Nov 22, 2018 at 12:01:00PM +, Sara Dickinson wrote:
>>> 
>>> Section 7.4.1.1.1
>>> 
>>> Am I parsing the "query-response-hints" text correctly to say that a bit is
>>> set in the bitmap if the corresponding field is recorded (if present) by
>>> the collecting implementation?  The causality of "if the field is omitted
>>> the bit is unset" goes in a direction that is not what I expected.
>>> (Similarly for the other fields in this table.)
>> 
>> ekr picked up on the same point - as responded to him:
>> 
>> "The issue is that if the bit is set the field might still be missing 
>> because although the configuration was set to collect it the data wasn’t 
>> available to the encoder from some other reason. However when the bit is not 
>> set it means that the data will definitely not be present because the 
>> collector is configured not to collect it. 
>> 
>> We do discuss this problem in section 6.2.1 - perhaps a reference in the 
>> table back to that discussion is what is needed?”
>> 
>> Looking again I think a slight update to the text in 6.2.1 might help too:
>> 
>> OLD:
>> “The Storage Parameters therefore also contains a Storage Hints item
>>  which specifies which items the encoder of the file omits from the
>>  stored data."
>> 
>> NEW: “The Storage Parameters therefore also contains a Storage Hints item
>>  which specifies which items the encoder of the file omits from the
>>  stored data and will therefore never be present. (This approach is taken 
>> because a flag that indicated which items were included for collection would 
>> not guarantee that the item was present, only that it might be.) "
> 
> This text helps, but I think it is not quite what I was going after -- that
> is, when I think of a "hint" that feels like something active and that
> would be indicated by setting a bit to one.  In this design, the hints
> about what are *omitted* are the bits that are *zero*, which is
> counter-intuitive, at least to me.  So maybe we could say (in 7.4.1.1.1, in
> addition to your suggested change in 6.2.1):
> 
> Hints indicating which "QueryResponse" fields are candidates for capture or
> omitted, see section 7.6.  If a bit is unset, that field is omitted from
> the capture.

Ah, ok I see the confusion now and yes - this text improves the draft - thank 
you!

> 
>> 
>>> 
>>> Section 7.4.2
>>> 
>>> Do we need a reference for "promiscuous mode”?
>> 
>> Promiscuous mode is discussed on the main PCAP manpage…. Hopefully a way
>> will be found to address the question of a suitable reference format for
>> PCAP material.
>> 
>>> 
>>> Just to check: in "server-addresses", I just infer the IP version from the
>>> length of the byte string?
>> 
>> As mentioned in the DISCUSS response, we probably need to make the transport 
>> flags mandatory.
>> 
>>> 
>>> Do we need to say more about where the vlan-ids identifiers are taken from?
>> 
>> Suggest: 
>> 
>> OLD: “ | vlan-ids | O | A | Array of identifiers (of type unsigned |
>>  |  |   |   | integer) of VLANs selected for |
>>  |  |   |   | collection. “
>> 
>> NEW: “ | vlan-ids | O | A | User specified array of identifiers (of 
>> type unsigned |
>>  |  |   |   | integer) of VLANs  [IEEE 802.1Q] selected for  
>>|
>>  |  |   |   | collection.  "
> 
> It seems likely to me that we want to say that the actual VLAN ID values
> are only unique within an administrative domain.

OK - yes, makes sense.

> 
>>> 
>>> Is the "generator-id" string intended to only be human readable?  Only
>>> within a specific (administrative) context?
>> 
>> The generator ID is intended only to identify the collecting
>> application. Specifying that it is human-readable (if present) seems a
>> good idea. Would this be sufficient?
>> 
>> OLD: "String identifying the collection method.”
>> NEW: “User specified human-readable string identifying the collection 
>> method."
> 
> Does "user-specified" mean that only the user is responsible for reading it
> later (or would we want it to make sense even when the data is conveyed to
> some other party)?

Yes - that’s correct. Maybe 'implementation specific' is better?

> If so, this would be enough for to address my comment, but then Ben's
> comment about internationalization concerns would come into play.

Sorry - I missed that comment - could you clarify? I’m not sure how I see this 
is any different to any other (unicode) text string used in CBOR?

> 
>>> 
>>> Section 7.5.1
>>> 
>>> Does "earliest-time" include leap seconds?
>> 
>> Thanks for noticing this…after digging into it…
>> 
>> The description specifies the number of seconds to be the
>> number of seconds since the POSIX epoch ("time_t"). POSIX requires that
>> leap seconds be omitted from reported time, and all days are defined as
>> having 86,400 seconds. This means that a POSIX timestamp can be
>> 

Re: [DNSOP] Benjamin Kaduk's Discuss on draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT)

2018-11-29 Thread Sara Dickinson


> On 24 Nov 2018, at 03:35, Benjamin Kaduk  wrote:
> 
> On Wed, Nov 21, 2018 at 01:53:09PM +, Sara Dickinson wrote:
>> 
>> 
>>> Begin forwarded message:
>>> 
>>> From: Benjamin Kaduk mailto:ka...@mit.edu>>
>>> Subject: Benjamin Kaduk's Discuss on 
>>> draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT)
>>> Date: 19 November 2018 at 00:28:19 GMT
>>> To: "The IESG" mailto:i...@ietf.org>>
>>> Cc: draft-ietf-dnsop-dns-capture-for...@ietf.org 
>>> <mailto:draft-ietf-dnsop-dns-capture-for...@ietf.org>, Tim Wicinski 
>>> mailto:tjw.i...@gmail.com>>, dnsop-cha...@ietf.org 
>>> <mailto:dnsop-cha...@ietf.org>, tjw.i...@gmail.com 
>>> <mailto:tjw.i...@gmail.com>, dnsop@ietf.org <mailto:dnsop@ietf.org>
>>> Resent-From: mailto:alias-boun...@ietf.org>>
>>> Resent-To: j...@sinodun.com <mailto:j...@sinodun.com>, j...@sinodun.com 
>>> <mailto:j...@sinodun.com>, s...@sinodun.com <mailto:s...@sinodun.com>, 
>>> terry.mander...@icann.org <mailto:terry.mander...@icann.org>, 
>>> john.b...@icann.org <mailto:john.b...@icann.org>
>> 
>> Many thanks for the detailed review. 
>> 
>>> 
>>> --
>>> DISCUSS:
>>> --
>>> 
>>> It is pretty shocking to not see any discussion of the privacy
>>> considerations of storing data including client addresses (and ports)
>>> alongside DNS transactions, given how central DNS resolution is to user
>>> behavior on the web.  (Note that there are mentions of potentially
>>> anonymized data in Sections 6.2 and 6.2.3 which would presumably
>>> forward-reference the privacy considerations.)  Data normalization would
>>> probably also be mentioned in this section, since (e.g.) the case used for
>>> a query/response could be used in fingerprinting an implementation.
>> 
>> There have been extensive discussion of data storage risks and practices in 
>> two DPRIVE documents so I’d suggest the following changes in the first 
>> instance to address this:
> 
> This is exactly the sort of thing I was hoping to see, thank you!  I have
> just a couple tweaks to suggest, inline.
> 
>> New Privacy Considerations section:
>> “ Storage of DNS traffic by operators in PCAP and other formats is a long 
>> standing and widespread practice. Section 2.5 of 
>> draft-bortzmeyer-dprive-rfc7626-bis is an analysis of the risks to Internet 
>> users of the storage of DNS traffic data in servers (recursive resolvers, 
>> authoritative and rogue server). 
>> 
>> Section 5.2 of draft-dickinson-dprive-bcp-op describes mitigations for those 
>> risks for data stored on recursive resolvers (but which could by extension 
>> apply to authoritative servers). These include data handling practices and 
>> methods for data minimisation, IP address pseudonymization and 
>> anonymization. Appendix B of that document presents an analysis of 7 
>> published anonymization processes. In addition RSSAC have recently published 
>> RSSAC04: " Recommendations on Anonymization Processes for Source IP 
>> Addresses Submitted for Future Analysis”[1].
>> 
>> The above analyses consider full data capture (e.g using PCAP) as a
>> baseline for privacy considerations and therefore this format
>> specification introduces no new user privacy issues beyond those of full
>> data capture. It does provides mechanisms to selectively record only
> 
> I would say "beyond those of full data capture (which are quite severe)".
> That is, while the current state of affairs is a valid baseline for
> comparison, that does not absolve us of responsibility for analyzing the
> current state of affairs.  (To be clear,
> draft-bortzmeyer-dprive-rfc7626-bis is a fine place for the bulk of that
> anlaysis to live, but in this document we should not pretend that the
> current state of affairs is a good situation to be in.)
> 
>> certain fields at the time of data capture to improve user privacy and to
>> explicitly indicate that data is sampled and or anonymised. It also
>> provide flags to indicate if data normalisation has been performed; data
>> normalisation increases user privacy by reducing the potential for
>> fingerprinting individuals however a trade-off is potentially reducing
> 
> I think "however" would be offset by commas on both sides.

Both these WFM - thanks.

And thanks for t

Re: [DNSOP] Benjamin Kaduk's Discuss on draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT)

2018-11-27 Thread Tony Finch
[ trim CC: list due to off-topic tangent ]

Brian Dickson  wrote:
>
> Doesn't UTC actually derive its time from TAI plus/minus leap seconds?

It's more complicated than that :-)

Strictly speaking, TAI is a paper clock, which is published as
retrospective corrections to national time lab reference clocks. In
practice what the general public has access to are time signals that trace
back to national versions of UTC, because those are the continuously
maintained reference timescales. GNSS time signals are an exception
because they mostly lack leap seconds, so their offset from TAI is fixed
to within some precision. But GPS time is only roughly TAI-19s.

> Why isn't it already available to use as a time zone?

Timezones on Unix have to be defined wrt POSIX time (because that's how
localtime() works), and POSIX time is a lossy representation of UTC, so
you can't get TAI that way without lossage. There were some experiments
defining TZ based on a TAI-ish non-standard time_t (the "right" aka wrong
timezones) but they aren't usable on a POSIX system. (But note the epoch
for the "right" timezones is 10s different from the SMPTE PTP epoch.
Sigh.)

Tony.
-- 
f.anthony.n.finchhttp://dotat.at/
Fisher, German Bight: Southeasterly 5 to 7, increasing gale 8 later. Slight or
moderate, becoming rough or very rough later. Showers, rain later. Moderate or
good.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] Benjamin Kaduk's Discuss on draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT)

2018-11-26 Thread Brian Dickson
>
>
> On 27 Nov 2018, at 1:54 am, Tony Finch  wrote:

> >

> > Richard Gibson  wrote:

> >>

> >> I am currently going through a similar exercise in another context, and
the

> >> best current text there explicitly characterizes the non-obvious
day-based

> >> accounting of POSIX time.

> >

> > In general I think it's best to just refer to POSIX on this matter, and

> > not try to restate the definition. POSIX is very clear and explicit about

> > the day-based accounting of seconds.

> >

> >
http://pubs.opengroup.org/onlinepubs/9699919799.2016edition/basedefs/V1_chap04.html#tag_04_16

> >

> >> However, there may be C-DNS purposes that cannot tolerate such

> >> discontinuities, and they would presumably want to use a continuous
monotonic

> >> timescale with a fixed offset from TAI (as is the case for e.g. GPS
time).

> >

> > That's practically unobtanium on most systems :-) Even if you have PTP

> > there isn't a fixed PTP epoch, though the SMPTE profile defines it to be

> > equivalent to POSIX time plus the TAI-UTC value from the IERS / NIST

> > leapsecond tables.

>
IMHO, this (the SMTPE profile) makes the most sense.
The most important aspect is that it is continuous, and the second most
important is that it relates predictably to UTC.

Honestly, the whole "leap second" thing... don't get me started.

But, I think it would be fair to have "this thing" (what the formula
describes) be its own "time zone", i.e. that it be given a proper
representation everywhere that anything that uses some notion of "local
time", can refer to time using that as a TZ.

I'd suggest that C-DNS start using it, and parallel to that, propose the
reference timeframe be adopted by whatever WG or standards body manages
timezones and/or UTC.

Besides, isn't it kind of backwards?
Doesn't UTC actually derive its time from TAI plus/minus leap seconds?
Why isn't it already available to use as a time zone?
(Consults wikipedia on unix time, sees there is an IANA database. Curses
POSIX.)

Brian


> Tony.

> > --

> > f.anthony.n.finchhttp://dotat.at/
> disperse power, foster diversity, and nurture creativity
___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] Benjamin Kaduk's Discuss on draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT)

2018-11-26 Thread Mark Andrews
Basically one needs to know if there is a leap second about to occur at the end 
of the month
and direction and if you are in a leap second. That can be encoded in two bits.

00  no leap at end of UTC month
01  in additive leap second at end of UTC month
10  subtractive leap at end of UTC month
11  additive leap at end of UTC month

This allows you to detect when :59 (60th second) does not exist and to detect 
when
you are in :60 (61st second) provided your system provides access to the 
information.

When calculating deltas across the end of a UTC month you use the information 
from the
earlier timestamp to compute the correction as there can be leap seconds in two 
consecutive
months.


> On 27 Nov 2018, at 1:54 am, Tony Finch  wrote:
> 
> Richard Gibson  wrote:
>> 
>> I am currently going through a similar exercise in another context, and the
>> best current text there explicitly characterizes the non-obvious day-based
>> accounting of POSIX time.
> 
> In general I think it's best to just refer to POSIX on this matter, and
> not try to restate the definition. POSIX is very clear and explicit about
> the day-based accounting of seconds.
> 
> http://pubs.opengroup.org/onlinepubs/9699919799.2016edition/basedefs/V1_chap04.html#tag_04_16
> 
>> However, there may be C-DNS purposes that cannot tolerate such
>> discontinuities, and they would presumably want to use a continuous monotonic
>> timescale with a fixed offset from TAI (as is the case for e.g. GPS time).
> 
> That's practically unobtanium on most systems :-) Even if you have PTP
> there isn't a fixed PTP epoch, though the SMPTE profile defines it to be
> equivalent to POSIX time plus the TAI-UTC value from the IERS / NIST
> leapsecond tables.
> 
> Tony.
> -- 
> f.anthony.n.finchhttp://dotat.at/
> disperse power, foster diversity, and nurture creativity
> 
> ___
> DNSOP mailing list
> DNSOP@ietf.org
> https://www.ietf.org/mailman/listinfo/dnsop

-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742  INTERNET: ma...@isc.org

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] Benjamin Kaduk's Discuss on draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT)

2018-11-26 Thread Tony Finch
Richard Gibson  wrote:
>
> I am currently going through a similar exercise in another context, and the
> best current text there explicitly characterizes the non-obvious day-based
> accounting of POSIX time.

In general I think it's best to just refer to POSIX on this matter, and
not try to restate the definition. POSIX is very clear and explicit about
the day-based accounting of seconds.

http://pubs.opengroup.org/onlinepubs/9699919799.2016edition/basedefs/V1_chap04.html#tag_04_16

> However, there may be C-DNS purposes that cannot tolerate such
> discontinuities, and they would presumably want to use a continuous monotonic
> timescale with a fixed offset from TAI (as is the case for e.g. GPS time).

That's practically unobtanium on most systems :-) Even if you have PTP
there isn't a fixed PTP epoch, though the SMPTE profile defines it to be
equivalent to POSIX time plus the TAI-UTC value from the IERS / NIST
leapsecond tables.

Tony.
-- 
f.anthony.n.finchhttp://dotat.at/
disperse power, foster diversity, and nurture creativity

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] Benjamin Kaduk's Discuss on draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT)

2018-11-23 Thread Benjamin Kaduk
On Thu, Nov 22, 2018 at 12:01:00PM +, Sara Dickinson wrote:
> 
> > Begin forwarded message:
> > 
> > From: Benjamin Kaduk mailto:ka...@mit.edu>>
> > Subject: Benjamin Kaduk's Discuss on 
> > draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT)
> > Date: 19 November 2018 at 00:28:19 GMT
> > To: "The IESG" mailto:i...@ietf.org>>
> > Cc: draft-ietf-dnsop-dns-capture-for...@ietf.org 
> > <mailto:draft-ietf-dnsop-dns-capture-for...@ietf.org>, Tim Wicinski 
> > mailto:tjw.i...@gmail.com>>, dnsop-cha...@ietf.org 
> > <mailto:dnsop-cha...@ietf.org>, tjw.i...@gmail.com 
> > <mailto:tjw.i...@gmail.com>,  dnsop@ietf.org <mailto:dnsop@ietf.org>
> > Resent-From: mailto:alias-boun...@ietf.org>>
> > Resent-To: j...@sinodun.com <mailto:j...@sinodun.com>, j...@sinodun.com 
> > <mailto:j...@sinodun.com>, s...@sinodun.com <mailto:s...@sinodun.com>, 
> > terry.mander...@icann.org <mailto:terry.mander...@icann.org>, 
> > john.b...@icann.org <mailto:john.b...@icann.org>
> > 
> > Benjamin Kaduk has entered the following ballot position for
> > draft-ietf-dnsop-dns-capture-format-08: Discuss
> 
> To follow up on items not addressed in our previous email.
> 
> > --
> > DISCUSS:
> > --
> > 
> > There are also a couple of fields whose semantics don't seem to be
> > sufficiently well specified for a proposed-standard document, such as
> > vlan-ids, generator-id, name-rdata, and ae-code.  (I understand that some
> > of them are probably only going to have locally relevant semantics, but we
> > should be explicit about when that's the case.)
> 
> We have addressed the specific fields mentioned here in the comments below 
> related to each of them.
> 
> > 
> > 
> > --
> > COMMENT:
> > --
> > 
> > Section 2
> > 
> > Please consider using the RFC 8174 version of the BCP 14 boilerplate.
> 
> Yes - will replace.
> 
> > 
> > Section 3
> > 
> >   Because of these considerations, a major factor in the design of the
> >   format is minimal storage size of the capture files.
> > 
> > maybe "storage and transmission”?
> 
> Sure.
> 
> > 
> > Section 6
> > 
> > In Figure 2, the Query name is marked as "(q)" (only present if there is a
> > query), but the running text in Section 4 (bullet 1) says that the Question
> > section from the response can be used as an identifying QNAME if there is a
> > response with no corresponding query.  Am I misexpanding QNAME here, or is
> > there a disagreement between these two parts of the text?  In particular, I
> > do not see a part of Figure 2 that would correspond to a Question section
> > in the response, given the various "(q)"/"(r)" markings.
> 
> Good spot - you are correct this is an error in the diagram and it should 
> read 'Query name' with no qualifier. 

Oh good, I was worried that I was just confusing myself, so that's
reassuring to know.

> > 
> > Section 6.2.2
> > 
> >   Messages with OPCODES known to the recording application but not
> >   listed in the Storage Parameters are discarded (regardless of whether
> >   they are malformed or not).
> > 
> > (Do we need to say anything that the "discarded" is only w.r.t. the capture
> > process, and not meant to imply that DNS queries would not get a normal
> > response?)
> 
> Suggest: “Messages with OPCODES known to the recording application but not
>   listed in the Storage Parameters are discarded by the recording application 
>   during C-DNS capture (regardless of whether they are malformed or not)."

That sounds good (and to be clear, when I asked the question I wasn't sure
if the answer would just be "no").

> > 
> > Section 6.2.4
> > 
> > Please consider using IPv6 examples, per
> > https://www.iab.org/2016/11/07/iab-statement-on-ipv6/ 
> > <https://www.iab.org/2016/11/07/iab-statement-on-ipv6/> .
> 
> Yes - will add an IPv6 example.
> 
> > 
> > Section 7.2
> > 
> >   o  The column T gives the CBOR data type of the item.
> > 
> >  *  U - Unsigned integer
> > 
> >  *  I - Signed integer
> > 
> > This is venturing a bit far from my normal area 

Re: [DNSOP] Benjamin Kaduk's Discuss on draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT)

2018-11-23 Thread Benjamin Kaduk
On Wed, Nov 21, 2018 at 01:53:09PM +, Sara Dickinson wrote:
> 
> 
> > Begin forwarded message:
> > 
> > From: Benjamin Kaduk mailto:ka...@mit.edu>>
> > Subject: Benjamin Kaduk's Discuss on 
> > draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT)
> > Date: 19 November 2018 at 00:28:19 GMT
> > To: "The IESG" mailto:i...@ietf.org>>
> > Cc: draft-ietf-dnsop-dns-capture-for...@ietf.org 
> > <mailto:draft-ietf-dnsop-dns-capture-for...@ietf.org>, Tim Wicinski 
> > mailto:tjw.i...@gmail.com>>, dnsop-cha...@ietf.org 
> > <mailto:dnsop-cha...@ietf.org>, tjw.i...@gmail.com 
> > <mailto:tjw.i...@gmail.com>, dnsop@ietf.org <mailto:dnsop@ietf.org>
> > Resent-From: mailto:alias-boun...@ietf.org>>
> > Resent-To: j...@sinodun.com <mailto:j...@sinodun.com>, j...@sinodun.com 
> > <mailto:j...@sinodun.com>, s...@sinodun.com <mailto:s...@sinodun.com>, 
> > terry.mander...@icann.org <mailto:terry.mander...@icann.org>, 
> > john.b...@icann.org <mailto:john.b...@icann.org>
> 
> Many thanks for the detailed review. 
> 
> > 
> > --
> > DISCUSS:
> > --
> > 
> > It is pretty shocking to not see any discussion of the privacy
> > considerations of storing data including client addresses (and ports)
> > alongside DNS transactions, given how central DNS resolution is to user
> > behavior on the web.  (Note that there are mentions of potentially
> > anonymized data in Sections 6.2 and 6.2.3 which would presumably
> > forward-reference the privacy considerations.)  Data normalization would
> > probably also be mentioned in this section, since (e.g.) the case used for
> > a query/response could be used in fingerprinting an implementation.
> 
> There have been extensive discussion of data storage risks and practices in 
> two DPRIVE documents so I’d suggest the following changes in the first 
> instance to address this:

This is exactly the sort of thing I was hoping to see, thank you!  I have
just a couple tweaks to suggest, inline.

> New Privacy Considerations section:
> “ Storage of DNS traffic by operators in PCAP and other formats is a long 
> standing and widespread practice. Section 2.5 of 
> draft-bortzmeyer-dprive-rfc7626-bis is an analysis of the risks to Internet 
> users of the storage of DNS traffic data in servers (recursive resolvers, 
> authoritative and rogue server). 
> 
> Section 5.2 of draft-dickinson-dprive-bcp-op describes mitigations for those 
> risks for data stored on recursive resolvers (but which could by extension 
> apply to authoritative servers). These include data handling practices and 
> methods for data minimisation, IP address pseudonymization and anonymization. 
> Appendix B of that document presents an analysis of 7 published anonymization 
> processes. In addition RSSAC have recently published RSSAC04: " 
> Recommendations on Anonymization Processes for Source IP Addresses Submitted 
> for Future Analysis”[1].
> 
> The above analyses consider full data capture (e.g using PCAP) as a
> baseline for privacy considerations and therefore this format
> specification introduces no new user privacy issues beyond those of full
> data capture. It does provides mechanisms to selectively record only

I would say "beyond those of full data capture (which are quite severe)".
That is, while the current state of affairs is a valid baseline for
comparison, that does not absolve us of responsibility for analyzing the
current state of affairs.  (To be clear,
draft-bortzmeyer-dprive-rfc7626-bis is a fine place for the bulk of that
anlaysis to live, but in this document we should not pretend that the
current state of affairs is a good situation to be in.)

> certain fields at the time of data capture to improve user privacy and to
> explicitly indicate that data is sampled and or anonymised. It also
> provide flags to indicate if data normalisation has been performed; data
> normalisation increases user privacy by reducing the potential for
> fingerprinting individuals however a trade-off is potentially reducing

I think "however" would be offset by commas on both sides.

> the capacity to identify attack traffic via query name signatures.
> Operators should carefully consider their operational requirements and
> privacy policies and SHOULD capture at source the minimum user data
> required to meet their needs“
> 
> [1] https://www.icann.org/en/system/files/files/rssac-040-07aug18-en.pdf 
> <https://www.icann.org/en/system/files/files/rssac-040-07aug18

Re: [DNSOP] Benjamin Kaduk's Discuss on draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT)

2018-11-23 Thread Richard Gibson

On timekeeping:

On 11/22/18 07:01, Sara Dickinson wrote:

Section 7.5.1

Does "earliest-time" include leap seconds?


Thanks for noticing this…after digging into it…

The description specifies the number of seconds to be the
number of seconds since the POSIX epoch ("time_t"). POSIX requires that
leap seconds be omitted from reported time, and all days are defined as
having 86,400 seconds. This means that a POSIX timestamp can be
ambiguous and refer to either of the last 2 seconds of a day containing
a leap second (who knew time could stand still in POSIX world - aargh?!)

However, libpcap (for example) can only provide POSIX timestamps for
packets as far as we can see…

Do you think we should just document this as a limitation or do you have
another option in mind?


I am currently going through a similar exercise in another context, and 
the best current text there explicitly characterizes the non-obvious 
day-based accounting of POSIX time. In this context, it would look 
something like


   A time value that is a multiple of 86,400 (i.e., is equal to 86,400
   × /d/ for some integer /d/) represents the instant at the start of
   the UTC day that follows the 1970-01-01T00:00Z epoch by /d/ whole
   UTC days. Every other finite time value /t/ is defined relative to
   the greatest preceding time value /s/ that is such a multiple, and
   represents the instant that occurs within the same UTC day as /s/
   but follows it by /t/ − /s/ seconds.

   Time values do not account for UTC leap seconds—there are no time
   values representing instants within positive leap seconds, but there
   are time values representing instants removed from the UTC timeline
   by negative leap seconds. However, the definition of time values
   nonetheless yields piecewise alignment with UTC, discontinuities
   only at leap second boundaries, and zero difference outside of leap
   seconds.

However, there may be C-DNS purposes that cannot tolerate such 
discontinuities, and they would presumably want to use a continuous 
monotonic timescale with a fixed offset from TAI (as is the case for 
e.g. GPS time). It would be nice to have a field in StorageParameters 
defining the time scale and therefore proper interpretation of 
Timestamps, defaulting to UTC-approximating POSIX but also accommodating 
unadjusted seconds counting.


___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] Benjamin Kaduk's Discuss on draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT)

2018-11-22 Thread Sara Dickinson

> Begin forwarded message:
> 
> From: Benjamin Kaduk mailto:ka...@mit.edu>>
> Subject: Benjamin Kaduk's Discuss on draft-ietf-dnsop-dns-capture-format-08: 
> (with DISCUSS and COMMENT)
> Date: 19 November 2018 at 00:28:19 GMT
> To: "The IESG" mailto:i...@ietf.org>>
> Cc: draft-ietf-dnsop-dns-capture-for...@ietf.org 
> <mailto:draft-ietf-dnsop-dns-capture-for...@ietf.org>, Tim Wicinski 
> mailto:tjw.i...@gmail.com>>, dnsop-cha...@ietf.org 
> <mailto:dnsop-cha...@ietf.org>, tjw.i...@gmail.com 
> <mailto:tjw.i...@gmail.com>,  dnsop@ietf.org <mailto:dnsop@ietf.org>
> Resent-From: mailto:alias-boun...@ietf.org>>
> Resent-To: j...@sinodun.com <mailto:j...@sinodun.com>, j...@sinodun.com 
> <mailto:j...@sinodun.com>, s...@sinodun.com <mailto:s...@sinodun.com>, 
> terry.mander...@icann.org <mailto:terry.mander...@icann.org>, 
> john.b...@icann.org <mailto:john.b...@icann.org>
> 
> Benjamin Kaduk has entered the following ballot position for
> draft-ietf-dnsop-dns-capture-format-08: Discuss

To follow up on items not addressed in our previous email.

> --
> DISCUSS:
> --
> 
> There are also a couple of fields whose semantics don't seem to be
> sufficiently well specified for a proposed-standard document, such as
> vlan-ids, generator-id, name-rdata, and ae-code.  (I understand that some
> of them are probably only going to have locally relevant semantics, but we
> should be explicit about when that's the case.)

We have addressed the specific fields mentioned here in the comments below 
related to each of them.

> 
> 
> --
> COMMENT:
> --
> 
> Section 2
> 
> Please consider using the RFC 8174 version of the BCP 14 boilerplate.

Yes - will replace.

> 
> Section 3
> 
>   Because of these considerations, a major factor in the design of the
>   format is minimal storage size of the capture files.
> 
> maybe "storage and transmission”?

Sure.

> 
> Section 6
> 
> In Figure 2, the Query name is marked as "(q)" (only present if there is a
> query), but the running text in Section 4 (bullet 1) says that the Question
> section from the response can be used as an identifying QNAME if there is a
> response with no corresponding query.  Am I misexpanding QNAME here, or is
> there a disagreement between these two parts of the text?  In particular, I
> do not see a part of Figure 2 that would correspond to a Question section
> in the response, given the various "(q)"/"(r)" markings.

Good spot - you are correct this is an error in the diagram and it should read 
'Query name' with no qualifier. 

> 
> Section 6.2.2
> 
>   Messages with OPCODES known to the recording application but not
>   listed in the Storage Parameters are discarded (regardless of whether
>   they are malformed or not).
> 
> (Do we need to say anything that the "discarded" is only w.r.t. the capture
> process, and not meant to imply that DNS queries would not get a normal
> response?)

Suggest: “Messages with OPCODES known to the recording application but not
  listed in the Storage Parameters are discarded by the recording application 
  during C-DNS capture (regardless of whether they are malformed or not)."

> 
> Section 6.2.4
> 
> Please consider using IPv6 examples, per
> https://www.iab.org/2016/11/07/iab-statement-on-ipv6/ 
> <https://www.iab.org/2016/11/07/iab-statement-on-ipv6/> .

Yes - will add an IPv6 example.

> 
> Section 7.2
> 
>   o  The column T gives the CBOR data type of the item.
> 
>  *  U - Unsigned integer
> 
>  *  I - Signed integer
> 
> This is venturing a bit far from my normal area of expertise, but my
> understanding is that CBOR native major types are only provided for
> unsigned integer and negative integer, with "signed integer" being an
> abstraction at a slightly higher layer that needs to be managed in the
> application.  Do we need to add any clarifying text here or will the
> meaning be clear to the reader?

CDDL happily talks about uint and int types, but we think this might
indeed be a useful clarification to implementers. We suggest:

OLD: "* I - Signed integer"
NEW: "* I - Signed integer (i.e. CBOR unsigned or negative integer)"

> 
> Section 7.4
> 
> Should probably forward-reference section 8 for the format version numbers'
> semantics.

Yes, will do. 

> 
> Section 7.4.1.1
> 
> We should w

Re: [DNSOP] Benjamin Kaduk's Discuss on draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT)

2018-11-21 Thread Sara Dickinson


> Begin forwarded message:
> 
> From: Benjamin Kaduk mailto:ka...@mit.edu>>
> Subject: Benjamin Kaduk's Discuss on draft-ietf-dnsop-dns-capture-format-08: 
> (with DISCUSS and COMMENT)
> Date: 19 November 2018 at 00:28:19 GMT
> To: "The IESG" mailto:i...@ietf.org>>
> Cc: draft-ietf-dnsop-dns-capture-for...@ietf.org 
> <mailto:draft-ietf-dnsop-dns-capture-for...@ietf.org>, Tim Wicinski 
> mailto:tjw.i...@gmail.com>>, dnsop-cha...@ietf.org 
> <mailto:dnsop-cha...@ietf.org>, tjw.i...@gmail.com 
> <mailto:tjw.i...@gmail.com>, dnsop@ietf.org <mailto:dnsop@ietf.org>
> Resent-From: mailto:alias-boun...@ietf.org>>
> Resent-To: j...@sinodun.com <mailto:j...@sinodun.com>, j...@sinodun.com 
> <mailto:j...@sinodun.com>, s...@sinodun.com <mailto:s...@sinodun.com>, 
> terry.mander...@icann.org <mailto:terry.mander...@icann.org>, 
> john.b...@icann.org <mailto:john.b...@icann.org>

Many thanks for the detailed review. 

> 
> --
> DISCUSS:
> --
> 
> It is pretty shocking to not see any discussion of the privacy
> considerations of storing data including client addresses (and ports)
> alongside DNS transactions, given how central DNS resolution is to user
> behavior on the web.  (Note that there are mentions of potentially
> anonymized data in Sections 6.2 and 6.2.3 which would presumably
> forward-reference the privacy considerations.)  Data normalization would
> probably also be mentioned in this section, since (e.g.) the case used for
> a query/response could be used in fingerprinting an implementation.

There have been extensive discussion of data storage risks and practices in two 
DPRIVE documents so I’d suggest the following changes in the first instance to 
address this:

New Privacy Considerations section:
“ Storage of DNS traffic by operators in PCAP and other formats is a long 
standing and widespread practice. Section 2.5 of 
draft-bortzmeyer-dprive-rfc7626-bis is an analysis of the risks to Internet 
users of the storage of DNS traffic data in servers (recursive resolvers, 
authoritative and rogue server). 

Section 5.2 of draft-dickinson-dprive-bcp-op describes mitigations for those 
risks for data stored on recursive resolvers (but which could by extension 
apply to authoritative servers). These include data handling practices and 
methods for data minimisation, IP address pseudonymization and anonymization. 
Appendix B of that document presents an analysis of 7 published anonymization 
processes. In addition RSSAC have recently published RSSAC04: " Recommendations 
on Anonymization Processes for Source IP Addresses Submitted for Future 
Analysis”[1].

The above analyses consider full data capture (e.g using PCAP) as a baseline 
for privacy considerations and therefore this format specification introduces 
no new user privacy issues beyond those of full data capture. It does provides 
mechanisms to selectively record only certain fields at the time of data 
capture to improve user privacy and to explicitly indicate that data is sampled 
and or anonymised. It also provide flags to indicate if data normalisation has 
been performed; data normalisation increases user privacy by reducing the 
potential for fingerprinting individuals however a trade-off is potentially 
reducing the capacity to identify attack traffic via query name signatures. 
Operators should carefully consider their operational requirements and privacy 
policies and SHOULD capture at source the minimum user data required to meet 
their needs“

[1] https://www.icann.org/en/system/files/files/rssac-040-07aug18-en.pdf 
<https://www.icann.org/en/system/files/files/rssac-040-07aug18-en.pdf>


As noted, there are a few other places we can also highlight the privacy 
aspects:

Introduction:
OLD: “The PCAP [pcap] or PCAP-NG [pcapng] formats are typically used in 
practice for packet captures, but these file formats can contain a great deal 
of additional  information that is not directly pertinent to DNS traffic 
analysis  and thus unnecessarily increases the capture file size.”

NEW: “The PCAP [pcap] or PCAP-NG [pcapng] formats are typically used in 
practice for packet captures, but these file formats can contain a great deal 
of additional  information that is not directly pertinent to DNS traffic 
analysis  and thus unnecessarily increases the capture file size. Additionally 
these tools and format typically have no filter mechanism to selectively record 
only certain fields at capture time, requiring post-processing for 
anonymisation or pseudonymistaion of data to protect user privacy.

Section 4, bullet point 2:

OLD: “Different users will have different requirements
  for data to be available for analysis.  Users

[DNSOP] Benjamin Kaduk's Discuss on draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT)

2018-11-18 Thread Benjamin Kaduk
Benjamin Kaduk has entered the following ballot position for
draft-ietf-dnsop-dns-capture-format-08: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-dnsop-dns-capture-format/



--
DISCUSS:
--

It is pretty shocking to not see any discussion of the privacy
considerations of storing data including client addresses (and ports)
alongside DNS transactions, given how central DNS resolution is to user
behavior on the web.  (Note that there are mentions of potentially
anonymized data in Sections 6.2 and 6.2.3 which would presumably
forward-reference the privacy considerations.)  Data normalization would
probably also be mentioned in this section, since (e.g.) the case used for
a query/response could be used in fingerprinting an implementation.

I'm also concerned about the policy/procedure for allocating/extending the
various bitfields and similar potential extension points in the data
structures.  Section 8 covers the major/minor versioning semantics with
respect to new map keys and new maps, but not addition of new bits within
existing (uint) bitmaps.  Given the usage of the CDDL .bits constraint,
it's not really clear that an IANA registry is the right tool to use, but I
think some indication of the expected way to allocate new bits is in order,
whether it's "a future standards-track document that updates this document"
or otherwise.  (I've noted many, but not all, instances of such bitmaps in
my COMMENT section.)

There are also a couple of fields whose semantics don't seem to be
sufficiently well specified for a proposed-standard document, such as
vlan-ids, generator-id, name-rdata, and ae-code.  (I understand that some
of them are probably only going to have locally relevant semantics, but we
should be explicit about when that's the case.)

If I'm reading things correctly that the IP address type is inferred from
the bytestring length, then I think we need to enforce a restriction on the
address prefix length(s) to allow for that inference to be unambiguous
(noting that we only have the *byte* length of the address fields at our
disposal for disabmgituation, and not the more precise bit-length).


--
COMMENT:
--

Section 2

Please consider using the RFC 8174 version of the BCP 14 boilerplate.

Section 3

   Because of these considerations, a major factor in the design of the
   format is minimal storage size of the capture files.

maybe "storage and transmission"?

Section 6

In Figure 2, the Query name is marked as "(q)" (only present if there is a
query), but the running text in Section 4 (bullet 1) says that the Question
section from the response can be used as an identifying QNAME if there is a
response with no corresponding query.  Am I misexpanding QNAME here, or is
there a disagreement between these two parts of the text?  In particular, I
do not see a part of Figure 2 that would correspond to a Question section
in the response, given the various "(q)"/"(r)" markings.

Section 6.2.2

   Messages with OPCODES known to the recording application but not
   listed in the Storage Parameters are discarded (regardless of whether
   they are malformed or not).

(Do we need to say anything that the "discarded" is only w.r.t. the capture
process, and not meant to imply that DNS queries would not get a normal
response?)

Section 6.2.4

Please consider using IPv6 examples, per
https://www.iab.org/2016/11/07/iab-statement-on-ipv6/ .

Section 7.2

   o  The column T gives the CBOR data type of the item.

  *  U - Unsigned integer

  *  I - Signed integer

This is venturing a bit far from my normal area of expertise, but my
understanding is that CBOR native major types are only provided for
unsigned integer and negative integer, with "signed integer" being an
abstraction at a slightly higher layer that needs to be managed in the
application.  Do we need to add any clarifying text here or will the
meaning be clear to the reader?

Section 7.4

Should probably forward-reference section 8 for the format version numbers'
semantics.

Section 7.4.1.1

We should we reference the IANA registries by name for any of these fields
(e.g., opcodes, rr-types, etc.).  (Also in Section 7.5.3.1, etc.)

Are the storage flags going to be allocated in sequence by updating
standards-track documents, or some other mechanism?  (Is a registry
necessary?)

For the various address