Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2018-01-31 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  closed
 Priority:  High |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:  fixed
 Keywords:  metrics-2018 |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---
Changes (by karsten):

 * status:  needs_review => closed
 * resolution:   => fixed


Comment:

 Local tests pass! I squashed all commits related to this change, updated
 the change log (to reflect that we're not sanitizing logs anymore),
 rebased to master, and pushed. I'll run some more tests with files from
 webstats.tp.o, but I'll open new tickets in case I run into any bugs.
 Closing. Thanks! :)

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2018-01-30 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  needs_review
 Priority:  High |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:  metrics-2018 |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---

Comment (by karsten):

 Replying to [comment:71 iwakeh]:
 > Replying to [comment:70 iwakeh]:
 > > Lines like `0.0.0.0 - - [22/Jan/2018:00:00:00 +] "GET
 /collector/archive HTTP/1.1" 301 -` should be valid (cf. comments 50, 51
 on #22428).
 >
 > Please review [https://gitweb.torproject.org/user/iwakeh/metrics-
 lib.git/commit/?h=task-22983-5 this commit] on the current task branch for
 adding this feature.

 Commit 9b4516f looks good!

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2018-01-30 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  needs_review
 Priority:  High |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:  metrics-2018 |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---

Comment (by iwakeh):

 Replying to [comment:70 iwakeh]:
 > Lines like `0.0.0.0 - - [22/Jan/2018:00:00:00 +] "GET
 /collector/archive HTTP/1.1" 301 -` should be valid (cf. comments 50, 51
 on #22428).

 Please review [https://gitweb.torproject.org/user/iwakeh/metrics-
 lib.git/commit/?h=task-22983-5 this commit] on the current task branch for
 adding this feature.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2018-01-29 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  needs_review
 Priority:  High |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:  metrics-2018 |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---

Comment (by iwakeh):

 Lines like `0.0.0.0 - - [22/Jan/2018:00:00:00 +] "GET
 /collector/archive HTTP/1.1" 301 -` should be valid (cf. comments 50, 51
 on #22428).

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2018-01-22 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  needs_review
 Priority:  High |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:  metrics-2018 |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---

Comment (by karsten):

 Commit e99b3e5 looks good. Moving the sanitizing code to CollecTor sounds
 reasonable, too.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2018-01-22 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  needs_review
 Priority:  High |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:  metrics-2018 |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---
Changes (by iwakeh):

 * status:  needs_revision => needs_review


Comment:

 Please review [https://gitweb.torproject.org/user/iwakeh/metrics-
 lib.git/log/?h=task-22983-5 the most recent commit] on the current task
 branch.

 The revision was necessary because the specification turned the reference
 date into a reference interval.  This change is also the reason for moving
 the sanitization logic into CollecTor's webstats module.

 'WebServerAccessLogLine' can later be a model for #23046.  Thus, there is
 no interface added yet.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2018-01-17 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  needs_revision
 Priority:  High |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:  metrics-2018 |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---
Changes (by iwakeh):

 * status:  merge_ready => needs_revision


Comment:

 The internal sanitization implementation will be adapted to enable
 processing in CollecTor's webstats module according to the revised spec.

 Setting this ticket to needs revision to mark the upcoming changes.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2018-01-12 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  merge_ready
 Priority:  High |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:  metrics-2018 |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---
Changes (by karsten):

 * status:  needs_review => merge_ready


Comment:

 Looks good!

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2018-01-12 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  needs_review
 Priority:  High |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:  metrics-2018 |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---
Changes (by iwakeh):

 * status:  needs_revision => needs_review


Comment:

 Please review [https://gitweb.torproject.org/user/iwakeh/metrics-
 lib.git/commit/?h=task-22983-5 my task-22983-5 branch] (based on the final
 branch above) with one additional commit that adapts to using underscore
 instead of a separator in log file names (published by collector).

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2018-01-12 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  needs_revision
 Priority:  High |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:  metrics-2018 |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---
Changes (by iwakeh):

 * status:  merge_ready => needs_revision


Comment:

 I will post anther branch with an adaption to the current spec that is
 currently missing.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2018-01-11 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  merge_ready
 Priority:  High |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:  metrics-2018 |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---
Changes (by karsten):

 * status:  needs_review => merge_ready


Comment:

 Great! Let's squash and merge when CollecTor changes are done. Yay!

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2018-01-11 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  needs_review
 Priority:  High |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:  metrics-2018 |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---

Comment (by iwakeh):

 I'd say this is merge ready now.  All tests pass, your changes are
 fine, and the current state of CollecTor's webstats development branch
 works with this metrics-lib version.

 Maybe, wait with the merge to master until the development changes for
 CollecTor are completed in case something needs to be altered here (which
 is very unlikely) or a bug is discovered during CollecTor development?

 Anyway, this is a huge step forward :-)

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2018-01-11 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  needs_review
 Priority:  High |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:  metrics-2018 |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---
Changes (by karsten):

 * status:  needs_revision => needs_review


Comment:

 Please find [https://gitweb.torproject.org/user/karsten/metrics-
 lib.git/commit/?h=task-22983-5=06e1ea68ae3fef4da93583c736b6bc08c54a2367
 commit 06e1ea6 in my task-22983-5 branch] with a couple of changes to your
 task-22983-4 branch. I hope I didn't break anything. Please check!

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2018-01-10 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  needs_revision
 Priority:  High |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:  metrics-2018 |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---
Changes (by karsten):

 * status:  needs_review => needs_revision


Comment:

 I'll make the discussed two changes and a few more and will have something
 for you to review some time tomorrow!

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2018-01-10 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  needs_review
 Priority:  High |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:  metrics-2018 |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---

Comment (by iwakeh):

 Replying to [comment:58 karsten]:
 > ...
 > > You only need to look at the latest four commits.  The others are
 reviewed already (comment:47).
 >
 > Oh. That information would have saved some time this morning. That also
 explains why I have just one remark on the earlier commits and a couple
 more on the first four!

 Hrmm, in my request for review I did mention:

... The additional commits should address all topics from the comments
 above starting at comment:47. ...

 Maybe, too subtle for 'please re-read the comments starting at comment:47
 unless you memorized them earlier' ;-)

 >
 > ...
 > There are two types of changes here:
 >
 >  1. Changes in the review process that require knowing the branch under
 review. Ideally, we'd use "fixup" or "squash" commits for all commits made
 during the review process. Alternatively, we can agree that commits will
 be squashed prior to merge, which works for me, too.
 >  2. Changes in the merge process that only require knowing master before
 the merge. Somebody who didn't follow the review process, including our
 future selfs, should see changes where we're not going forth and back
 throughout commit series to achieve something that we could do in one
 step.

 Sounds ok.

 >
 >  ...
 > Anyway, here's what I found in your branch:
 >
 >  - `AccessLogLineSanVal` contains an explicit mention of `"FTP"`. Does
 this mean we're considering FTP log lines to be valid? Why FTP in
 particular and no other protocols? Or should we take that out and restrict
 ourselves to HTTP? Not feeling strongly about this one.

 Please also refer to comments 50 to 52.  The main issue was to not only
 declare lines as valid that resemble sanitized lines, but be more general.
 Thus, allowing different protocols could be part of that, but I don't
 insist.
 Which log lines should be considered valid?  How much can log lines differ
 from sanitized log lines and still be considered valid?  The current code
 only sketches one possibility, e.g. allowing different protocols, various
 ip addresses, and other request types.  Before making any more code
 changes there should be at least a list of valid and invalid lines or a
 description of either state.

 >  - Further down below we're comparing `referenceDate` to `extractedDate`
 and discarding lines where the two don't match. Not sure if this is
 problematic, but I'm at least flagging it here as potential issue. If this
 is a non-issue, feel free to ignore this comment.

 The variable `extractedDate` refers to the date extracted from the log
 line and reference date is the log files date (see
 [https://metrics.torproject.org/web-server-logs.html#n-re-assembling-log-
 files spec section 4.3]).  As discarding is only performed during
 sanitization this is ok.  The lines would be considered valid.

 >  - A few lines further down we're formatting a date using
 `DateTimeFormatter.ofPattern()`. Is that expensive? Would we be able to
 create the formatter just once and re-use it here? Not trying to optimize
 prematurely, but trying to avoid shooting ourselves in the foot.

 Makes sense. (1)

 >  - Logging an error in case of catching a `Throwable` might become very
 noisy. After all, we're processing third-party input here, and we can
 typically proceed without parsing that line. The debug level seemed like
 the better choice here.

 Oh, that's an oversight. (2)

 >  - In `WebServerAccessLog`, what exactly is the log date? Is it the date
 of contained requests, is it the date when the file was written to disk,
 or what is it? This is still unclear in the docs.

 [https://gitweb.torproject.org/user/iwakeh/metrics-
 
lib.git/tree/src/main/java/org/torproject/descriptor/WebServerAccessLog.java?h=task-22983-4=5dbbeb0ac38728f09718c80e295e8524d92d2e68#n19
 Javadoc] says:
 `Returns the date of the log as LocalDate.`
 and the [https://metrics.torproject.org/web-server-logs.html#n-re-
 assembling-log-files spec 4.3] says: `Rewritten log lines are re-assembled
 into sanitized log files based on physical host, virtual host, and request
 start date.`

 Where would a clarification be needed?

 >
 > When these remaining issues are clearer to me, do you mind me editing
 your branch and preparing one or more "fixup"/"squash" commits? There are
 still a few things that I'd like to document differently or where I'd like
 

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2018-01-10 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  needs_review
 Priority:  High |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:  metrics-2018 |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---

Comment (by karsten):

 Replying to [comment:57 iwakeh]:
 > Replying to [comment:56 karsten]:
 > > I started looking at your branch, but it's a pretty big diff again, so
 that'll take some time.
 >
 > You only need to look at the latest four commits.  The others are
 reviewed already (comment:47).

 Oh. That information would have saved some time this morning. That also
 explains why I have just one remark on the earlier commits and a couple
 more on the first four!

 > > Regarding Git history, we're going to squash all these commits (except
 for unrelated ones like adding a space or package-info) into a single
 commit that adds interfaces, implementations, and tests, right? I'm
 asking, because you marked some commits as "fixup!", but not all of them.
 Or do you have a separation in mind? If so, which?
 > >
 >
 > Maybe, keep those that you find easier to understand the changes.  I
 used 'fixup' for making clear that a change is related to an earlier
 review discussion.

 There are two types of changes here:

  1. Changes in the review process that require knowing the branch under
 review. Ideally, we'd use "fixup" or "squash" commits for all commits made
 during the review process. Alternatively, we can agree that commits will
 be squashed prior to merge, which works for me, too.
  2. Changes in the merge process that only require knowing master before
 the merge. Somebody who didn't follow the review process, including our
 future selfs, should see changes where we're not going forth and back
 throughout commit series to achieve something that we could do in one
 step.

 > > By the way, what was the reason for rebasing your branch? It would
 have been a tad bit easier to stay on the same branch until we're done
 with the review process. Just saying for future reviews.
 >
 > You mentioned that above already ;-)
 > (I think, last year I did the rebase because it seemed better to work on
 a branch close to master.)

 Maybe I ran into that rebase today when fetching from your repository and
 seeing this branch to be force-updated. But it's good to know I mentioned
 this before. With a few more samples you'll soon be able to compute my
 attention span. :)

 Anyway, here's what I found in your branch:

  - `AccessLogLineSanVal` contains an explicit mention of `"FTP"`. Does
 this mean we're considering FTP log lines to be valid? Why FTP in
 particular and no other protocols? Or should we take that out and restrict
 ourselves to HTTP? Not feeling strongly about this one.
  - Further down below we're comparing `referenceDate` to `extractedDate`
 and discarding lines where the two don't match. Not sure if this is
 problematic, but I'm at least flagging it here as potential issue. If this
 is a non-issue, feel free to ignore this comment.
  - A few lines further down we're formatting a date using
 `DateTimeFormatter.ofPattern()`. Is that expensive? Would we be able to
 create the formatter just once and re-use it here? Not trying to optimize
 prematurely, but trying to avoid shooting ourselves in the foot.
  - Logging an error in case of catching a `Throwable` might become very
 noisy. After all, we're processing third-party input here, and we can
 typically proceed without parsing that line. The debug level seemed like
 the better choice here.
  - In `WebServerAccessLog`, what exactly is the log date? Is it the date
 of contained requests, is it the date when the file was written to disk,
 or what is it? This is still unclear in the docs.

 When these remaining issues are clearer to me, do you mind me editing your
 branch and preparing one or more "fixup"/"squash" commits? There are still
 a few things that I'd like to document differently or where I'd like to
 make the new code more similar to existing code (in the good sense). You
 could still go through them and talk me out of making those changes. Sound
 okay?

 Oh, and here's another remark for future reviews: let's try to reduce the
 time for reviewing and for revising a branch to a few days. Neither
 reviewing nor revising is fun, but it's even less fun when doing it
 several weeks later. I know this is a special case with the holiday break,
 and I'm not blaming you any more than I'm blaming myself. Just an idea to
 make reviews a little more fun. :)

--
Ticket URL: 

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2018-01-10 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  needs_review
 Priority:  High |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:  metrics-2018 |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---

Comment (by iwakeh):

 Replying to [comment:56 karsten]:
 > I started looking at your branch, but it's a pretty big diff again, so
 that'll take some time.

 You only need to look at the latest four commits.  The others are reviewed
 already (comment:47).


 >
 > Regarding Git history, we're going to squash all these commits (except
 for unrelated ones like adding a space or package-info) into a single
 commit that adds interfaces, implementations, and tests, right? I'm
 asking, because you marked some commits as "fixup!", but not all of them.
 Or do you have a separation in mind? If so, which?
 >

 Maybe, keep those that you find easier to understand the changes.  I used
 'fixup' for making clear that a change is related to an earlier review
 discussion.

 > By the way, what was the reason for rebasing your branch? It would have
 been a tad bit easier to stay on the same branch until we're done with the
 review process. Just saying for future reviews.

 You mentioned that above already ;-)

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2018-01-10 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  needs_review
 Priority:  High |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:  metrics-2018 |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---

Comment (by karsten):

 I started looking at your branch, but it's a pretty big diff again, so
 that'll take some time.

 Regarding Git history, we're going to squash all these commits (except for
 unrelated ones like adding a space or package-info) into a single commit
 that adds interfaces, implementations, and tests, right? I'm asking,
 because you marked some commits as "fixup!", but not all of them. Or do
 you have a separation in mind? If so, which?

 By the way, what was the reason for rebasing your branch? It would have
 been a tad bit easier to stay on the same branch until we're done with the
 review process. Just saying for future reviews.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2018-01-09 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  needs_review
 Priority:  High |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:  metrics-2018 |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---
Changes (by iwakeh):

 * priority:  Medium => High
 * status:  needs_revision => needs_review


Comment:

 Please review the top commits on
 [https://gitweb.torproject.org/user/iwakeh/metrics-
 lib.git/log/?h=task-22983-4 this branch].

 The additional commits should address all topics from the comments above
 starting at comment:47.  Sanitization and validation are now separated (of
 course only valid lines can be cleaned at all and invalid ones have to be
 discarded).  (The range of what log lines are considered valid
 (independent of them being discarded during sanitization) is kind of
 arbitrary.  Thus, for having a stricter or laxer validation it is best to
 provide test log line examples and how they should be treated.)

 Setting to high, as this is the basis for CollecTor's webstat module in
 #22428.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2017-11-22 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  needs_revision
 Priority:  Medium   |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:  metrics-2017 |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---

Comment (by iwakeh):

 Replying to [comment:52 karsten]:
 > Hmm, why are you concerned about providing a general-purpose log parser?
 It doesn't claim to be that, so if it doesn't parse FTP lines correctly,
 so what. It's supposed to be a tool that helps with parsing sanitized log
 lines. If it doesn't support parsing something in logs that never occurs
 in sanitized logs, we can fix that if we want, but we don't have to.

 This is the distinction I'm concerned about.  So, all settled.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2017-11-22 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  needs_revision
 Priority:  Medium   |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:  metrics-2017 |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---

Comment (by karsten):

 Hmm, why are you concerned about providing a general-purpose log parser?
 It doesn't claim to be that, so if it doesn't parse FTP lines correctly,
 so what. It's supposed to be a tool that helps with parsing sanitized log
 lines. If it doesn't support parsing something in logs that never occurs
 in sanitized logs, we can fix that if we want, but we don't have to. I'm
 just saying that we shouldn't make it overly restrictive by rejecting
 lines that it could easily handle.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2017-11-22 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  needs_revision
 Priority:  Medium   |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:  metrics-2017 |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---

Comment (by iwakeh):

 Replying to [comment:50 karsten]:
 > Agreed with all points above except one:

 Fine.

 >
 > ...
 > Does that mean we should provide a general-purpose log parser? Probably
 not.

 This is my main concern here.

 > ... In the parser we don't have to provide getters for fields that we
 don't care about, like user-agent string. But we should be prepared to
 find request methods GET, HEAD, POST, or really anything else in log lines
 we're given.

 Maybe, the general rule here could be that the parser/validator accepts
 the format in a backward compatible way, i.e., log lines that have been
 valid at some point will be parsed/valid in future?  This would enable
 parsing of HEAD from your example above and also accommodate changes life
 accepting FTP in future with the benefit of having a defined pattern of
 valid log lines that is more restrictive than a general log parser.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2017-11-22 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  needs_revision
 Priority:  Medium   |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:  metrics-2017 |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---

Comment (by karsten):

 Agreed with all points above except one:

 When ''parsing'' sanitized log lines metrics-lib should not reject log
 lines that it would discard when ''sanitizing'' original log lines.

 It's not the job of the ''parser'' to ensure that its input is properly
 sanitized or to do some sort of post-sanitizing. Of course it needs to
 perform some basic format verifications to perform its job. But dropping
 lines because the sanitizer would drop them seems out of place.

 Imagine a hypothetical situation where we decide at some point that HEAD
 requests are too sensitive and we take them out in the parser. However,
 previously sanitized logs would still contain them, including archives
 that people keep locally and that we can't update. If somebody then takes
 a recent metrics-lib version to parse their data, they'd suddenly don't
 get the HEAD lines anymore. That would be rather confusing.

 I think sanitizing and parsing should be separate things. In this case,
 discarding lines because of certain field contents should be left to the
 sanitizer.

 Does that mean we should provide a general-purpose log parser? Probably
 not. In the parser we don't have to provide getters for fields that we
 don't care about, like user-agent string. But we should be prepared to
 find request methods GET, HEAD, POST, or really anything else in log lines
 we're given.

 Does that make sense, or am I overlooking something?

 (By the way, it's a good thing that we're keeping the spec unchanged with
 regard to IP addresses not starting with `0.0.0.`. I think it would have
 been pretty bad to just rewrite the first three octets to `0.0.0` and keep
 the fourth unchanged. Not very privacy-preserving.)

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2017-11-21 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  needs_revision
 Priority:  Medium   |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:  metrics-2017 |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---

Comment (by iwakeh):

 Clarification regarding the pattern question:

 The pattern start `^((?:\\d{1,3}\\.){3}\\d{1,3}) ...` results in one group
 {{{
   ^((?:\\d{1,3}\\.){3}\\d{1,3})
^--^
 }}}
 The `?:` prevents sub-groups.

 The pattern is used for extracting the various parts of a log line and
 these are further checked.
 The code follows the spec
 {{{
 if (!res.ip.startsWith("0.0.0.")) {
   res.ip = "0.0.0.0";
 }
 }}}

 [https://metrics.torproject.org/web-server-logs.html#n-rewriting-matching-
 lines Spec]:
 {{{
  If the remote hostname starts with "0.0.0.", it is kept unchanged,
 otherwise it's rewritten to "0.0.0.0".
 }}}

 (cf.[https://gitweb.torproject.org/user/iwakeh/metrics-
 
lib.git/tree/src/test/java/org/torproject/descriptor/log/AccessLogLineSanValTest.java?h=task-22983-4=629ef152be1fd2f5a00d203b614fc01e946c518d
 test cases])

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2017-11-21 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  needs_revision
 Priority:  Medium   |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:  metrics-2017 |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---

Comment (by iwakeh):

 Thanks for hanging in and working through all!

 I try to shorten my reply and simply leave out all points of agreement
 withou any further comment.  Feel free to raise them again if necessary.

 Replying to [comment:47 karsten]:
 > = Variable names

 I copied the comments about variable names to ticket #24370, which is for
 defining some naming guidelines all over metrics.  Let's continue the
 discussion there.

 > = Validation vs. sanitization
 > I'm still confused what validation means in this context.

 Metrics-lib provides web log parsing of sanitized logs as available on
 CollecTor.  When parsing such a log file lines need to be validated, i.e.,
 metrics-lib verifies that these are standard sanitized log lines.
 Metrics-lib does not supply a general log parser.

 Sanitization is only supplied as internal metrics-lib feature and used by
 CollecTor to sanitize the logs to be published.

 > Is a line containing a POST request a valid line, or one that uses FTP
 as protocol or that returns HTTP 404 as status code. It's okay that
 CollecTor skips these lines as part of the sanitizing process. But that
 doesn't make them invalid.

 From the point of a consumer that expects sanitized log lines the POST and
 FTP lines are invalid.

 >
 > I'm also a bit uncleear if the separate validate and sanitize steps have
 a negative impact on performance. In theory, it should be sufficient to
 touch each line once. But I could be convinced that we're trading
 performance for better design, if this is the case.

 Touching the lines once is only a concern during sanitization.  The
 current code only operates once on each line.  Simply parsing santized
 logs only needs the validation part anyway.

 >
 > However, I'd really want us to be clear what it means for a line to be
 valid!

 Yes!

 > ...
 > One thing we should do is document whether `private byte[] logBytes`
 might be compressed or not. We have been discussing that many times now,
 and I'm deeply confused already what we're doing there.

 Already addressed in the respective javadoc comments.  All decompressed.

 > = 629ef152be1fd2f5a00d203b614fc01e946c518d Tweak pattern for logline
 validation.
 > One question about the pattern: Does the `?:` in
 `"^((?:\\d{1,3}\\.){3}\\d{1,3}) ..."` mean that we're now ignoring the
 first 3 octets regardless of whether they're `0.0.0.` or not? That would
 not follow the specification ...

 I will re-check that with the new spec we have from #23243.

 > = e24dda11613f340da3fbd6f1a93ba07d857f0b16 Remove getLogDateMillis
 and move getLogDate to WebServerAccessLog.
 >
 > I wonder why there's no `getLogDateMillis()` anymore. But I can be
 convinced that users should just extract millisecond from `LocalDate` if
 they need them. Is that the plan? If so, green light.

 Yep.

 >  6a6e93a2fd8b3f3b912ce9a258f4b32069c18ef8 (iwakeh/task-22983-4) Set
 dev version.
 > Let's revert this commit. ...

 I reverted this, but instead would like to add the '2.1.1-dev' commit.
 (Mainly for CollecTor #22428 implementation and testing.)

 Next steps:

 The latest changes in #23243 also makes a revision necessary.  I'll use
 the current branch here to add the changes.
 So far the only open topic is your question about validation
 The above answer might have resolved it a bit.  Maybe, it'll also be more
 clear in the upcoming commit(s) for the spec changes.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2017-10-18 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  needs_revision
 Priority:  Medium   |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:  metrics-2017 |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---

Comment (by karsten):

 Heh, that's indeed a huge comment! And this one is going to be huge, too.
 However, rather than splitting it into multiple smaller comment I'll try
 to keep it as organized as I can by using the output of `git log --reverse
 --pretty=oneline origin/master..iwakeh/task-22983-4` as section headers in
 this ~~little essay~~comment.

  1232ff767375993f65a4e78d3a9383b23681925b Added a space.

 This was commit c6c805c before your rebase to master.

 In the future, please avoid rebasing the branch under review if you can,
 because it changes all commit IDs and makes it harder to refer to commits
 here. In most cases I'll be able to do the rebase as part of merging to
 master just fine.

 Other than that, the explanation in your previous comment makes sense to
 me and this commit is good to go in.

  4434316a769ea6403d2872073bc76f10a2822cf2 Added package-info.

 This was commit 3b426bf before your rebase to master.

 We agree on this one plus the fixup commit below, so it's good to go in.

  04997b1d9b3a73d2724a2e9d39a144895ded460f Add new descriptor type for
 web server access logs.

 This was commit 7822a27 before your rebase to master.

 It's the commit that most of the subsequent fixup and squash commits will
 be squashed into. I'll comment on your feedback under the fixup commits
 below.

  444b55709bef99b1f63a20e3595c1439aaa68775 Add tests for log descriptor
 functionality.

 This was commit 929e265 before your rebase to master.

 This one still needs review. That review was so far blocking on my
 question regarding validation vs. sanitization. I'll get to this in the
 next round after that question is resolved. So, still '''needs review'''!

  4b3f27e93761b172dd3ee102f89c66059bd3da7a squash! Added package-info.

 This was commit 1bb7a43 before your rebase to master.

 You say that you agree with these tweaks, so I'll take that as a go.

  be986cc88e66cb1df9772e65421462ca0570e190 fixup! Add new descriptor
 type for web server access logs.

 This was commit e4d6e90 before your rebase to master.

 You say that you're fine with these changes and have more tweaks in your
 branch. I'll assume this one is good to be merged then, and I'll comment
 on your subsequent commits further down below.

  3de717268c3d27ad096cf669fc73a52f049acb59 fixup! Add new descriptor
 type for web server access logs.

 This was commit 664f540 before your rebase to master.

 This commit brought up a few topics that we need to resolve before moving
 on.

 = Variable names

 Thanks for writing down your thinking about variable names in the given
 detail. It helps a lot, not only for this ticket but also as a guide for
 future tickets. Let's go through the examples:

  - Leaving out the somewhat redundant "descriptor" from
 `rawDescriptorBytes` and `descriptorFile` is fine with me. It's indeed
 obvious what's meant here.

  - I quite strongly disagree with `isValid(line)` as a name for a method
 that takes a line, tries to validate it, and returns whether that was
 successful. To me, `isX()` is the name for a getter, not the name for a
 method that does something with a given parameter. If this were a `Line`
 class with a method `isValid()` that returns whether the line is valid or
 not, that would be something different. But that's not the case here. For
 another example, consider `File.delete()` which deletes the file and
 returns whether that was successful. We wouldn't argue about renaming that
 method to `isDeleted()` just because it returns `true` or `false`. As a
 general rule I'd say that the name of a method that performs something
 should be the verb describing the action, whereas `is` is typically
 reserved for getters. Ah, and in this case it's up to the documentation to
 say what `validate(line)` returns, though that's relatively obvious.

  - Leaving out "line" in `sanitizeLine(line)` and friends is okay, too.

  - You're probably right about keeping `logBytes` rather than
 `rawDescriptorBytes`.

 = `LogDescriptor` interface

 I agree with keeping `LogDescriptor`. But let me explain my earlier
 hesitation:

  - I'm not a big fan of marker interfaces. :) However, it seems that we'll
 soon add a `LogDescriptor.LogLine` subinterface, and that is (to me) a
 good reason to keep the `LogDescriptor` interface.

  - In the future we might create an 

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2017-10-10 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  needs_revision
 Priority:  Medium   |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:  metrics-2017 |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---

Comment (by iwakeh):

 Replying to [comment:44 karsten]:
 > Alright, here's what I found:

 Thanks for the detailed review!

 >
 > Commit c6c805c looks trivially correct. It comes a bit out of nowhere,
 so may you could add a remark to the commit message of future commits like
 this saying why you're making this change now. Like, did it violate a
 checkstyle rule? And was this the only place that was lacking a space? In
 any case, ready to merge this one.

 The misterious space results from a rule we don't have for spaces between
 cast operator and variable.  I made the usage in this file consistent by
 adding the space as [https://gitweb.torproject.org/user/iwakeh/metrics-
 
lib.git/tree/src/main/java/org/torproject/descriptor/impl/KeyValueMap.java?h=task-22983-3=6b8de66d171dd85d8244a7d326ca436ce68e0d78#n57
 the earlier cast] did have one.  But, we should add a checkstyle rule to
 make it consistent everywhere; I don't have a preference with or without
 space is fine.

 >
 > Commit 3b426bf contains a couple of code style violations: the summary
 fragment should describe the package and not be a warning; the summary
 fragment is not supposed to contain markup like ; there needs to be an
 empty line before the next ; and the line starting with
 "descriptors." is wrongly indented. I'm providing some tweaks in a
 fixup commit. Please review and tweak further as needed.

 These seem fine.  Thanks for fixing them and rewording the javadoc.

 >
 > Commit 7822a27 has several issues that require several fixup commits
 and/or discussion:
 >  - My commit e4d6e90 in my task-22983-4 branch contains a bunch of
 whitespace fixes and other trivial tweaks which I believe are mostly
 oversights. (Unless they are not, in which case see the next item.)

 I think the javadoc reformulations are fine.  Some tweaks are in my new
 branch.
 I also agree to the other few tweaks to the code or came up with something
 else, which I added in tiny fixup commits to make reviewing easier
 (hopefully)

 >  - My commit 664f540 is a mix of suggestions and tweaks. Some of the
 tweaks are an attempt to make your code fit better into existing code,
 which is either based on our style guide or on implicit conventions in
 existing code for which we don't have a written style guide yet. Please
 review that commit carefully. If you believe that some of these are non-
 issues, let's discuss that. I'm not saying that anything or anyone is
 wrong, I'm just trying to keep (heh, or make?) the overall code base as
 readable as possible and at the same time make future review processes
 even smoother.

 Ok, things should match the old code base, but at the same time evolve to
 the better, or be modernized.
 I admid that I have a knack for terse variable names (just two letters
 seem verbose sometimes ;-) but you're right in some cases, thanks for
 fixing.
 I'd say there are also very many non-issues here, which I know are
 different in the existing older code base, but seem to make the code less
 readable to me, when I started to become familiar with the old codebase.
 For example, there are almost line filling variable names (not in this
 patch) that differ only in one to five letters.  And there is the word
 'Descriptor' all over, which often feels like cluttering when reading the
 code for the first time.  The following is an example:
 (all following diffs are from the 664f540 commit)

 {{{
 -  public static List parse(byte[] raw, File file)
 ...
 +  public static List parse(byte[] rawDescriptorBytes,
 +  File descriptorFile) ...
 }}}

 Maybe, 'raw' alone is too terse, but 'rawBytes' seems fine whereas in
 'rawDescriptorBytes' the word part 'Descriptor' overwhelms the important
 information.  The method 'parse' receives raw bytes and tries to find a
 descriptor.

 Here some other examples from the current patch round (we could
 recycle them for the guide lines, so I try to be verbose):

 Renaming of isValid, here:
 {{{
 -public boolean isValid(String line);
 +public boolean validateLine(String line);
 }}}

 makes the code less readable.  For example:

 {{{
 -  -> null != line && !line.isEmpty() && !validator.isValid(line))
   ...
 +  -> null != line && !line.isEmpty() &&
 !validator.validateLine(line))
 }}}

 `validateLine` doesn't say that the result of the validation is returned
 (as a 

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2017-10-06 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  needs_revision
 Priority:  Medium   |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:  metrics-2017 |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---

Comment (by iwakeh):

 While the topics raised in the above comment are worked on, I think it is
 save to review commit 929e265 b/c especially with regard to the test data
 and to check that the tests confirm to the specification.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2017-10-05 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  needs_revision
 Priority:  Medium   |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:  metrics-2017 |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---
Changes (by karsten):

 * status:  needs_review => needs_revision


Comment:

 Alright, here's what I found:

 Commit c6c805c looks trivially correct. It comes a bit out of nowhere, so
 may you could add a remark to the commit message of future commits like
 this saying why you're making this change now. Like, did it violate a
 checkstyle rule? And was this the only place that was lacking a space? In
 any case, ready to merge this one.

 Commit 3b426bf contains a couple of code style violations: the summary
 fragment should describe the package and not be a warning; the summary
 fragment is not supposed to contain markup like ; there needs to be an
 empty line before the next ; and the line starting with
 "descriptors." is wrongly indented. I'm providing some tweaks in a
 fixup commit. Please review and tweak further as needed.

 Commit 7822a27 has several issues that require several fixup commits
 and/or discussion:
  - My commit e4d6e90 in my task-22983-4 branch contains a bunch of
 whitespace fixes and other trivial tweaks which I believe are mostly
 oversights. (Unless they are not, in which case see the next item.)
  - My commit 664f540 is a mix of suggestions and tweaks. Some of the
 tweaks are an attempt to make your code fit better into existing code,
 which is either based on our style guide or on implicit conventions in
 existing code for which we don't have a written style guide yet. Please
 review that commit carefully. If you believe that some of these are non-
 issues, let's discuss that. I'm not saying that anything or anyone is
 wrong, I'm just trying to keep (heh, or make?) the overall code base as
 readable as possible and at the same time make future review processes
 even smoother.
  - I wonder if we should take out the `LogDescriptor` interface for
 several reasons: there is just one subinterface extending that, so YAGNI;
 the only fields/methods that it adds are the log date, and it's unclear if
 future logs will cover a single date or have entirely different
 requirements. Looks like premature interface hierarchy optimization to me.
  - I'm unclear whether this new code will cover both original and
 sanitized web server logs. Possibly related to that question, why is
 validation separated from sanitization, and is validation performed before
 sanitization or not? Can you clarify?
  - I may have further comments based on your answers to my
 questions/remarks above.

 I haven't reviewed commit 929e265 in detail yet.

 When you make changes, please make them as fixup commits on top of my
 branch. Thanks!

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2017-10-04 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  needs_review
 Priority:  Medium   |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:  metrics-2017 |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---

Comment (by karsten):

 FWIW, I'm currently reviewing this branch.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2017-10-03 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  needs_review
 Priority:  Medium   |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:  metrics-2017 |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---
Changes (by iwakeh):

 * status:  needs_information => needs_review


Comment:

 Please review [https://gitweb.torproject.org/user/iwakeh/metrics-
 lib.git/log?h=task-22983-3 this task branch], which is based on the
 current master and already uses java 8 features.

 The implementation changed quite a bit as it could build on the web server
 log specification (#23243).  That's why I didn't continue on the earlier
 branch,  because I think the code is easier to review in total here.
 This dev version of metrics-lib was also system-tested as part of
 CollecTor's new module, i.e., used for sync and local imports there.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs

2017-09-21 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  needs_information
 Priority:  Medium   |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:   |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---

Comment (by iwakeh):

 repeating comment:30 from #22428:

 "... A bunch of real logfiles for testing would be really great! Roughly
 the two week amount of successive logs should be sufficient to encountered
 odd edge cases and have a good testing ground. These also will help
 testing the metrics-lib part #22983 well.
 Could you provide these?"

 So, this doesn't drown in the recent flood of trac changes ;-)

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs (was: add a descriptor interface and implementation for web-logs)

2017-09-20 Thread Tor Bug Tracker & Wiki
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-+---
 Reporter:  iwakeh   |  Owner:  metrics-team
 Type:  enhancement  | Status:  needs_information
 Priority:  Medium   |  Milestone:  metrics-lib 2.2.0
Component:  Metrics/Library  |Version:
 Severity:  Normal   | Resolution:
 Keywords:   |  Actual Points:
Parent ID:   | Points:
 Reviewer:   |Sponsor:
-+---

Comment (by karsten):

 Tweak summary.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs