> If I understand you correctly, you basically state that their > data does not match the underlying set.
I erred in threading the link as part of that reply, which was meant to prior talk on what a casual surfer/reporter might see and surmise about hidden services. The paper... is good. But maybe not the whole picture the casual reader of it might believe it to be either. > they used public *and* private onions Given current onion discovery mechanics as in the paper, public vs. private is defined largely by the access restrictions operators put up, not by whether they posted the address somewhere or not. (descriptor-cookie and stealth auth may have more to say about that.) Revisiting categorization... - They find 44% which some might call 'bad', at least for such broad definitions of bad as 'adult, drugs, counterfeit, weapons'. That means there is 56% good, for those definitions. Of that 44% bad... a - They don't break down the 17% 'adult' further into what is commonly divided as legal/good/commercial (18+) and illegal/bad (underage). b - If you surf around, a large amount of the other 27% appear to actually be one page scams and trolls of various kinds. So thus other than being in their own category of bogus, they're not really the genuine 'original bad'. - The paper is focused on the mechanics and meta level rather than a deeper editorial content analysis of onionspace. That's its chosen scope and not a fault. (Oppurtunity exists for someone to do that analysis project.) - However they fail to list any of the 'good' non index/search/host/wiki services in their top 547 popularity list. Inline they mentioned some by name (wikileaks, strongbox, etc), by type (politics, games, libraries), and should have seen other obvious ones that would be likely to rank (the popular general social and discussion services can see well over 100 posts/users/connections a day [compare that to their lowest request counts]). A big table of only bad/neutral doesn't present fairly. Though the paper doesn't read an anti good HS bias, the reason for this table omission is unknown. Using the paper to claim 'the vast majority of hidden services are not socially laudable' would be difficult. (Especially considering that what a lot of people complain about is just free speech that they don't happen to like/understand and can't easily censor. Big range between words like illegal, laudable, protected, activism and so forth.) The last paragraph and a third of their conclusion is spot on. As to unseen services... - When all you have is a TCP port serving http, unless a forward URL list is published by the admin or users somewhere, what is effectively your 'GET / request to an IP address' cannot possibly discover all the virtual hosting and pages any given onion may serve within. - Missing are specific mention of many services/protocols we know for a fact are online in onionspace such as sftp, nntp, xmpp, imap, smtp, torrent, telnet, onioncat, etc. (Though perhaps generalized by 'less than 50 each of those ... found 495 unique ports total,') - No real mention is made of pairs for which they could identify the protocol but could not access further due to various user facing authentication methods. (The '5% ssh' may be a partial start there.) - There were 15k onions (and their would be ports) which were offline during the scans. So the purpose behind their existance remained unknown. Same for the 1k previously active onion:port pairs offline during the content phase. - The scale may not have been large or long enough to collect a full snapshot of the entire/intermittant onion space. They discuss scale factors in their former paper, here there is little. Then only 62% of onions collected were portscanned. And of those, scanning only reached 87% coverage. Presumably ports 1-65535 were scanned but stated is only 'full scan'. Regardless of whether you're just manually browsing around the surface, using the best crawler indexes, or using the methods in the paper... there's still more riding on top of anonymity networks than you think, and more than your probe will ever reveal or be granted access to "see". Just something to keep in mind when covering and balancing what is out there. ps: The likely reason they're seeing clients requesting nonexistant descriptors is because there are other projects polling other [old/defaced] onion lists. (one answer to pg 3 pp 2). -- tor-talk mailing list - [email protected] To unsubscribe or change other settings go to https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-talk
