Re: [tor-bugs] #33061 [Metrics/CollecTor]: archived bandwidth scanner files lack explicit source attibution

2020-01-30 Thread Tor Bug Tracker & Wiki
#33061: archived bandwidth scanner files lack explicit source attibution
---+---
 Reporter:  starlight  |  Owner:  metrics-team
 Type:  enhancement| Status:  needs_information
 Priority:  Medium |  Milestone:
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords: |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+---

Comment (by starlight):

 Replying to [comment:4 karsten]:
 > Replying to [comment:3 starlight]:
 > > To clarify further:  Each bandwidth scanner has a unique perspective
 of available bandwidth capacities in the network.  Associating documents
 in time series tied to individual scanners is critical to making sense of
 the data.
 >
 > True. What you'll have to do is combine bandwidth files with votes to
 extract meaningful results.

 I agree  combining votes and bandwidth documents is useful, but I find
 significant value in bandwidth scanner documents alone provided the source
 scanners are attributed.

 >. . .it's also not trivial or maybe not even possible for CollecTor to
 include this information in bandwidth files while archiving them.

 I'm curious why--have no difficulty with attribution here.  The scanner-
 to-authority correlation may not be the big picture design, but is the
 practical reality to date.

 >
 > Note that combining descriptors is not unusual for an analysis. Right
 now I'm combining consensuses, votes, server descriptors, and extra-infos
 for another, unrelated analysis. Sometimes it's simply necessary to
 combine data from different data sources; in the bandwidth files case from
 bandwidth scanners and directory authorities using bandwidth scanner data.

 No disagreement some forms of analysis are fine or even better without the
 source.

 =

 I managed a perl script that successfully attributes scanner sources for
 the gaps filled from Collector.  Willing to make the results available.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #33061 [Metrics/CollecTor]: archived bandwidth scanner files lack explicit source attibution

2020-01-30 Thread Tor Bug Tracker & Wiki
#33061: archived bandwidth scanner files lack explicit source attibution
---+---
 Reporter:  starlight  |  Owner:  metrics-team
 Type:  enhancement| Status:  needs_information
 Priority:  Medium |  Milestone:
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords: |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+---

Comment (by karsten):

 Replying to [comment:3 starlight]:
 > To clarify further:  Each bandwidth scanner has a unique perspective of
 available bandwidth capacities in the network.  Associating documents in
 time series tied to individual scanners is critical to making sense of the
 data.

 True. What you'll have to do is combine bandwidth files with votes to
 extract meaningful results. This is certainly more work than getting
 source information from bandwidth files directly. But it's also not
 trivial or maybe not even possible for CollecTor to include this
 information in bandwidth files while archiving them. That's why it needs
 to happen at the analysis stage right now.

 Note that combining descriptors is not unusual for an analysis. Right now
 I'm combining consensuses, votes, server descriptors, and extra-infos for
 another, unrelated analysis. Sometimes it's simply necessary to combine
 data from different data sources; in the bandwidth files case from
 bandwidth scanners and directory authorities using bandwidth scanner data.

 Maybe we cannot decide this right now. Maybe we first need to experience
 how painful it would be to analyze bandwidth files when we include that
 data somewhere.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #33061 [Metrics/CollecTor]: archived bandwidth scanner files lack explicit source attibution

2020-01-28 Thread Tor Bug Tracker & Wiki
#33061: archived bandwidth scanner files lack explicit source attibution
---+---
 Reporter:  starlight  |  Owner:  metrics-team
 Type:  enhancement| Status:  needs_information
 Priority:  Medium |  Milestone:
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords: |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+---

Comment (by starlight):

 To clarify further:  Each bandwidth scanner has a unique perspective of
 available bandwidth capacities in the network.  Associating documents in
 time series tied to individual scanners is critical to making sense of the
 data.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #33061 [Metrics/CollecTor]: archived bandwidth scanner files lack explicit source attibution

2020-01-27 Thread Tor Bug Tracker & Wiki
#33061: archived bandwidth scanner files lack explicit source attibution
---+---
 Reporter:  starlight  |  Owner:  metrics-team
 Type:  enhancement| Status:  needs_information
 Priority:  Medium |  Milestone:
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords: |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+---

Comment (by starlight):

 I have conflated bandwidth scanner with bandwidth authority in my thinking
 and this ticket, but what's interesting is attribution of the bandwidth
 scanner source for each document.  These originate in the bandwidth
 scanners, and as you say generally one bandwidth scanner is associated one
 authority thus far.  Have thought of the new mechanism mainly as a
 standardized way for making the information available, in contrast to the
 previous ad-hoc web-server hosting.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #33061 [Metrics/CollecTor]: archived bandwidth scanner files lack explicit source attibution

2020-01-27 Thread Tor Bug Tracker & Wiki
#33061: archived bandwidth scanner files lack explicit source attibution
---+---
 Reporter:  starlight  |  Owner:  metrics-team
 Type:  enhancement| Status:  needs_information
 Priority:  Medium |  Milestone:
Component:  Metrics/CollecTor  |Version:
 Severity:  Normal | Resolution:
 Keywords: |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+---
Changes (by karsten):

 * cc: metrics-team (added)
 * status:  new => needs_information
 * type:  defect => enhancement


Comment:

 This was a deliberate design decision back when we added bandwidth files
 to CollecTor. Bandwidth scanners and the files they generate are not tied
 to directory authorities except that ''usually'' one bandwidth file is
 being used by one directory authority. But it could be that a bandwidth
 file is never being used, or used by more than one directory authority.
 The only way to be certain about a bandwidth file being used by a
 directory authority is to look at a vote find the bandwidth file reference
 in there.

 I'll leave this open as an enhancement and in needs_information for our
 next team meeting to decide whether we want to question this design
 decision.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

[tor-bugs] #33061 [Metrics/CollecTor]: archived bandwidth scanner files lack explicit source attibution

2020-01-26 Thread Tor Bug Tracker & Wiki
#33061: archived bandwidth scanner files lack explicit source attibution
---+---
 Reporter:  starlight  |  Owner:  metrics-team
 Type:  defect | Status:  new
 Priority:  Medium |  Component:  Metrics/CollecTor
  Version: |   Severity:  Normal
 Keywords: |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+---
 Current files in

 https://collector.torproject.org/archive/relay-descriptors/bandwidths/
 https://collector.torproject.org/recent/relay-descriptors/bandwidths/

 lack indication of which bandwidth scanner generated them.  (files
 archived from Tom's collection are attributed)

 Collect these files here and abandoned an attempt to fill a gap due to
 this issue.  Ad-hoc logic to bin them may be possible but is not trivial.
 Can provide attribution by sha256 digest for most of them if the file
 naming is improved.

 original ticket #21378

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs