Re: [tor-bugs] #28877 [Core Tor/Tor]: Paginate large controller commands like 'GETINFO desc/all-recent'

2018-12-19 Thread Tor Bug Tracker & Wiki
#28877: Paginate large controller commands like 'GETINFO desc/all-recent'
--+--
 Reporter:  wagon |  Owner:  (none)
 Type:  defect| Status:  assigned
 Priority:  Medium|  Milestone:
Component:  Core Tor/Tor  |Version:
 Severity:  Normal| Resolution:
 Keywords:|  Actual Points:
Parent ID:| Points:
 Reviewer:|Sponsor:
--+--

Comment (by wagon):

 > Do you have a SSD?

 > I suspect this might be a matter of disk iops.
 No, I don't have SSD. However, disk cache should work (further invocations
 of the same commands are faster than the first ones, I think it is because
 of disk cache). Now I cannot make an ideal test with the whole system
 residing in RAM, but I copied Nyx/Stem directory to /tmp (which resides in
 RAM) and compared it with my above initial tests. I didn't find any
 difference. Shell is still at least two times faster than `tor-prompt`.

 > Again, this command is reading fourteen megabytes of data from disk then
 dumping it on the socket.
 I thought this data is read from memory of tor process, i.e. from RAM. Are
 you sure Tor makes request to hard drive each time it needs to provide
 this data?

 > get ran in low resource environments (arduino and such) where such
 commands can easily block the control connection for tens of seconds to
 minutes.
 Do you think it also explains the crash of `tor-prompt --run 'GETINFO desc
 /all-recent >/dev/null` command?

 If pipe is used, shell will also break (this is the reason I had to use
 descriptors in `test.sh`). For example, success of the command
 {{{
 ( echo AUTHENTICATE \"pass\" ; echo GETINFO ns/all ; echo QUIT ) | nc
 127.0.0.1 9051
 }}}
 depends on the time of invocation, existence of redirection to some file
 or to `/dev/null`, weather, and so on. If I redirect the output of this
 command to some file in RAM, I see that it stops after roughly 20k lines
 are printed. However, it works nice with short output commands.

 > these commands are no-gos if distributing an application more broadly.
 I agree. Let us wait what Tor core people will say.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #28877 [Core Tor/Tor]: Paginate large controller commands like 'GETINFO desc/all-recent'

2018-12-19 Thread Tor Bug Tracker & Wiki
#28877: Paginate large controller commands like 'GETINFO desc/all-recent'
--+--
 Reporter:  wagon |  Owner:  (none)
 Type:  defect| Status:  assigned
 Priority:  Medium|  Milestone:
Component:  Core Tor/Tor  |Version:
 Severity:  Normal| Resolution:
 Keywords:|  Actual Points:
Parent ID:| Points:
 Reviewer:|Sponsor:
--+--

Comment (by atagar):

 > Hello, atagar. Thank you for the explanation, now it is more clear for
 me.

 Thanks wagon! These are interesting stats.

 > Could you explain why it is a problem?

 Do you have a SSD? When I run on my newish laptop I get similar results...

 {{{
 atagar@morrigan:~$ time tor-prompt --run 'GETINFO ns/all' 1>/dev/null

 real0m0.236s
 user0m0.108s
 sys 0m0.045s
 }}}

 ... but when I run on my admittedly ancient PC it's not so rosy...

 {{{
 atagar@odin:~$ time tor-prompt --run 'GETINFO ns/all' 1>/dev/null

 real0m1.466s
 user0m0.240s
 sys 0m0.116s
 }}}

 I suspect this might be a matter of disk iops. Again, this command is
 reading fourteen megabytes of data from disk then dumping it on the
 socket. Stem (and by extension Nyx) get ran in low resource environments
 (arduino and such) where such commands can easily block the control
 connection for tens of seconds to minutes.

 You're absolutely right that in your case (and most people's) these are
 fine, but these commands are no-gos if distributing an application more
 broadly.

 > People always say that shell is too slow and inconvenient, while python
 is really fast and convenient. However...

 These are interesting numbers. I should profile our controller code with
 this input to figure out where the time's being spent (I made several
 optimizations for the Stem 1.7 release, but there's probably more room for
 improvement).

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #28877 [Core Tor/Tor]: Paginate large controller commands like 'GETINFO desc/all-recent'

2018-12-19 Thread Tor Bug Tracker & Wiki
#28877: Paginate large controller commands like 'GETINFO desc/all-recent'
--+--
 Reporter:  wagon |  Owner:  (none)
 Type:  defect| Status:  assigned
 Priority:  Medium|  Milestone:
Component:  Core Tor/Tor  |Version:
 Severity:  Normal| Resolution:
 Keywords:|  Actual Points:
Parent ID:| Points:
 Reviewer:|Sponsor:
--+--

Comment (by wagon):

 Hello, atagar. Thank you for the explanation, now it is more clear for me.

 > saturates the control connection, preventing further commands and events
 from being transmitted.
 What it actually saturates? Indeed, in some setups I did see this problem,
 but `tor-prompt` doesn't have  it. Is it some internal TCP or socket thing
 that is saturated? Can that limit be increased by setting some preference
 in the system?

 > but I've been cautioned that any direct use of tor's data directory is a
 bad idea.
 I agree. Furthermore, you cannot use tor's data directory if you are not
 running Nyx as `debian-tor` user. If your user is just a member of
 `debian-tor` group (this is recommended setup, see discussion in #25890),
 you still cannot read files in that directory because of #28356.

 > We need some way of breaking up these responses. Pagination probably
 isn't a good fit so ideas welcome.

 > The above is a long time problem I've had with tor's 'get all
 descriptor' commands
 Could you explain why it is a problem? Yes, it may block controller for a
 few seconds if user type this command in Nyx interpreter, but only in this
 case. This blocking would happen anyway because printing the output to a
 terminal always requires some time. If you need this output internally in
 some functions in Stem or Nyx, you can get it almost immediately. In
 addition you don't need to run such commands very often. Maybe you need to
 run them only once, when Nyx is started (then response can be cached for
 some time).

 To be clear, let us check exact numbers. To be sure we are not wrong here
 because of specifics in implementation of Nyx or Stem, we can use the
 following simple script `test.sh` for these tests:
 {{{
 #!/bin/bash -e

 cmd="$@"
 pass="ControlPortPassword"

 function test_tor() {
 echo "$1" >&3
 sed "/^250 OK\r$/q" <&3
 echo QUIT >&3
 exec 3<&-
 }

 exec 3<>/dev/tcp/127.0.0.1/9051
 echo AUTHENTICATE \"$pass\" >&3
 read -u 3
 test_tor "$cmd"
 }}}
 Now fun begins. Let us check some simple fast command:

 {{{
 $ time tor-prompt --run 'GETINFO version' 1>/dev/null
   0.14s user 0.04s system 77% cpu 0.227 total
 $ time ./test.sh 'GETINFO version' 1>/dev/null
   0.00s user 0.00s system 36% cpu 0.011 total
 }}}

 Compare it with more heavy output such as `ns/all`:
 {{{
 $ time tor-prompt --run 'GETINFO ns/all' 1>/dev/null
   0.28s user 0.07s system 58% cpu 0.591 total
 $ time ./test.sh 'GETINFO ns/all' 1>/dev/null
   0.02s user 0.00s system  8% cpu 0.230 total
 }}}

 Now check it with the most resource consuming command, `desc/all-recent`:
 {{{
 $ time tor-prompt --run 'GETINFO desc/all-recent' 1>/dev/null
 Traceback (most recent call last):
   File "/path/to/stem/tor-prompt", line 8, in 
 stem.interpreter.main()
   File "/path/to/stem/stem/interpreter/__init__.py", line 151, in main
 interpreter.run_command(args.run_cmd, print_response = True)
   File "/path/to/stem/stem/util/conf.py", line 289, in wrapped
 return func(*args, config = config, **kwargs)
   File "/path/to/stem/stem/interpreter/commands.py", line 381, in
 run_command
 print(output)
 UnicodeEncodeError: 'ascii' codec can't encode character u'\u021b' in
 position 1237805: ordinal not in range(128)

 $ time ./test.sh 'GETINFO desc/all-recent' 1>/dev/null
   0.21s user 0.03s system 60% cpu 0.391 total
 }}}

 People always say that shell is too slow and inconvenient, while python is
 really fast and convenient. However, as you see, simple shell is 20-30
 times faster than python on simple commands, about 2-3 times faster than
 python on heavy commands, and gets many megabytes output of `desc/all-
 recent` within less that half of second. Old UNIX style still rocks. Had
 you choose Bash or some other shell instead of python for writing Stem?

 If I don't redirect output to `/dev/null`, terminal becomes a bottleneck
 for these commands. It will take about 5-6 seconds to print `decs/all-
 recent` output to a terminal. Nyx interpreter is even more slow than `tor-
 prompt` (I think it is due to curses), it will take 15 seconds or like
 that for this printing. If I don't redirect output to `/dev/null`, the
 last `tor-prompt` command doesn't  crash.

 Thus, I still doubt we need to ask tor network people to do something with
 it. As you see, well written tools work fast with current `ControlPort`
 implementation.

 > Feel free to file a separate ticket with the 'nyx 

Re: [tor-bugs] #28877 [Core Tor/Tor]: Paginate large controller commands like 'GETINFO desc/all-recent'

2018-12-18 Thread Tor Bug Tracker & Wiki
#28877: Paginate large controller commands like 'GETINFO desc/all-recent'
--+--
 Reporter:  wagon |  Owner:  (none)
 Type:  defect| Status:  assigned
 Priority:  Medium|  Milestone:
Component:  Core Tor/Tor  |Version:
 Severity:  Normal| Resolution:
 Keywords:|  Actual Points:
Parent ID:| Points:
 Reviewer:|Sponsor:
--+--
Changes (by atagar):

 * status:  new => assigned
 * owner:  atagar => (none)


--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #28877 [Core Tor/Tor]: Paginate large controller commands like 'GETINFO desc/all-recent'

2018-12-18 Thread Tor Bug Tracker & Wiki
#28877: Paginate large controller commands like 'GETINFO desc/all-recent'
--+
 Reporter:  wagon |  Owner:  atagar
 Type:  defect| Status:  new
 Priority:  Medium|  Milestone:
Component:  Core Tor/Tor  |Version:
 Severity:  Normal| Resolution:
 Keywords:|  Actual Points:
Parent ID:| Points:
 Reviewer:|Sponsor:
--+
Changes (by atagar):

 * status:  assigned => new


--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #28877 [Core Tor/Tor]: Paginate large controller commands like 'GETINFO desc/all-recent'

2018-12-18 Thread Tor Bug Tracker & Wiki
#28877: Paginate large controller commands like 'GETINFO desc/all-recent'
--+--
 Reporter:  wagon |  Owner:  atagar
 Type:  defect| Status:  assigned
 Priority:  Medium|  Milestone:
Component:  Core Tor/Tor  |Version:
 Severity:  Normal| Resolution:
 Keywords:|  Actual Points:
Parent ID:| Points:
 Reviewer:|Sponsor:
--+--

Comment (by atagar):

 > so how it can be "paginated"?

 Hi wagon. A paginated API is one in which you can receive batches of a
 limited size. The caller then makes a series of calls to get the full
 listing. Ignoring the control interface for a minute paginated interfaces
 look like...

 {{{
 First Request: {
   first_index: 0,
   size: 5,
 }

 First Response: [
   ,
   ,
   ,
   ,
   ,
 ]

 Second Request: {
   first_index: 5,
   size: 5,
 }

 Second Response: [
   ,
   ,
 ]
 }}}

 The caller sees its second request received only two descriptors (rather
 than the five requested) so it knows it has received them all.

 The reason for a paginated API is to divide the fourteen megabyte GETINFO
 response controllers presently get into a series of bite sized responses.
 A massive GETINFO response like this saturates the control connection,
 preventing further commands and events from being transmitted.

 In fact, Nyx used to avoid commands like 'GETINFO ns/all' entirely in
 favor of reading cached descriptors from tor's data directory. This was
 far faster and avoids blocking the control socket (effectively all the
 command does is echo the file), but I've been cautioned that any direct
 use of tor's data directory is a bad idea.

 I'm not overly married to the idea of a paginated API. I'd be delighted to
 chat with the network team about design ideas, but first step lets be
 clear about the problem we're trying to address: **controllers need the
 ability to break up multi-megabyte responses into smaller replies so we
 avoid saturating the control connection.**

 Pagination might be a poor fit. In particular...

 * GETINFO commands are not designed to take keyword arguments. We could
 hack this together with positional arguments (**GETINFO desc/batch/0/5**
 then **GETINFO desc/batch/5/5** for the example above), but needless to
 say... ick.

 * Concurrency. If tor downloads new descriptors while we're in the middle
 of iterating over it's prior ones the caller will conclude with an
 incorrect enumeration. Usually this would be dealt with by a consensus id
 argument so the caller can specify the set of descriptors its iterating
 over but this isn't really how tor is designed.

 So TL;DR: We need some way of breaking up these responses. Pagination
 probably isn't a good fit so ideas welcome.

 > Maybe you can. I started from finding a source of this problem.

 I suspect we're talking about two different things. The above is a long
 time problem I've had with tor's 'get all descriptor' commands ('GETINFO
 desc/all-recent', 'GETINFO md/all', and 'GETINFO ns/all'). I'm tackling
 that topic in this ticket because that's the problem you cited originally
 ("GETINFO desc/all-recent returns very huge listing which interpreter
 cannot manage properly.").

 Nyx actually **avoids** making that particular query because doing so
 would temporarily hose the control connection in the way you describe. I
 just took a look and unless I'm missing something I'm not finding any
 'GETINFO desc/all-recent' calls in nyx.

 I suspect your initial hypothesis about the reason Nyx is freezing is
 inaccurate. Feel free to file a **separate** ticket with the 'nyx --debug'
 output when Nyx freezes so I can see what's up. But I'd like to hijack
 this ticket to brainstorm our long term plan for these bulky GETINFO
 commands since they've been a long time pain point for me.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #28877 [Core Tor/Tor]: Paginate large controller commands like 'GETINFO desc/all-recent'

2018-12-17 Thread Tor Bug Tracker & Wiki
#28877: Paginate large controller commands like 'GETINFO desc/all-recent'
--+--
 Reporter:  wagon |  Owner:  atagar
 Type:  defect| Status:  assigned
 Priority:  Medium|  Milestone:
Component:  Core Tor/Tor  |Version:
 Severity:  Normal| Resolution:
 Keywords:|  Actual Points:
Parent ID:| Points:
 Reviewer:|Sponsor:
--+--

Comment (by wagon):

 > The usual way of dealing with this is to have a paginated API where the
 caller can provide a starting index and number of responses. This would
 let Nyx fetch smaller batches without hosing the control connection.
 I'm not sure I understand what you mean. There are other commands which
 return descriptors by id or by name like `desc/id/FINGERPRINT`, output of
 these commands is small. However, a purpose of `desc/all-recent` is to get
 the full content of the file with descriptors, so how it can be
 "paginated"? Do you mean some internal TCP transport level?

 > Unfortunately without that there's not much I can do on my end.
 Maybe you can. I started from finding a source of this problem. Initially
 I thought it may be a Tor side. I checked
 [[https://stem.torproject.org/faq.html#i-m-using-cookie-
 authentication|your instructions]] for `telnet`, but `telnet` works fine
 with `desc/all-recent` when invoked from interactive shell. It returns the
 whole output without any problem. The same is true for `netcat` and
 `socat` if they are used to connect to `ControlPort`.

 However, if any of these tools are not used interactively, but are called
 inside some wrappers or functions, they all fail. They return some part of
 `desc/all-recent` output (each time it is a different part) after which
 the connection hangs. I looked for a way to solve this problem and found
 it. Probably, so powerful language as python also has some way to solve
 this problem.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #28877 [Core Tor/Tor]: Paginate large controller commands like 'GETINFO desc/all-recent' (was: Command desc/all-recent slows down or craches Nyx)

2018-12-17 Thread Tor Bug Tracker & Wiki
#28877: Paginate large controller commands like 'GETINFO desc/all-recent'
--+--
 Reporter:  wagon |  Owner:  atagar
 Type:  defect| Status:  assigned
 Priority:  Medium|  Milestone:
Component:  Core Tor/Tor  |Version:
 Severity:  Normal| Resolution:
 Keywords:|  Actual Points:
Parent ID:| Points:
 Reviewer:|Sponsor:
--+--
Changes (by atagar):

 * component:  Core Tor/Nyx => Core Tor/Tor


Comment:

 Hi wagon, I'm gonna toss this over to tor and repurpose it to be for
 paginated GETINFO commands. You're right that commands like 'GETINFO desc
 /all-recent' provide megabytes of data.

 These GETINFO commands were created in tor's early days when the consensus
 was pretty small, but nowadays the response of this method is massive. The
 usual way of dealing with this is to have a paginated API where the caller
 can provide a starting index and number of responses. This would let Nyx
 fetch smaller batches without hosing the control connection.

 Unfortunately without that there's not much I can do on my end.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs