Re: [uknof] SentryPeer: A distributed peer to peer list of bad IP addresses and phone numbers collected via a SIP Honeypot

Gavin Henry Thu, 25 Nov 2021 13:07:51 -0800

>> Working on the API and web UI next, then the p2p part of it. Feel free
>> to submit any feature requests or have a play :-)
>


Hi Matthew,

Thank you very much for your reply and time spent thinking about all
of the below. Much appreciated!

> P2P sounds ripe for abuse by bad actors... A few scenarios:

That's correct. I think the authz/authn issues have already been
solved in other places. I'm thinking about things like signing up on
StackOverflow or Reddit and what you can do the first time without any
reputation etc. Similar to email. I was chatting to Justin Richer
(https://www.se-radio.net/2019/08/episode-376-justin-richer-on-api-security-with-oauth-2/)
about this last month:

"I took a look at the peer project and it sounds interesting. A lot
like BitTorrent’s protocol, but with the sharing at a higher level, it
seems? So it might be worthwhile researching into how graph networks
like that determine trustworthiness of nodes. Most of them have a kind
of distributed consensus state that gets reached after some time, and
so there’s no client authentication needed within the network itself
because the clients will be identified by some ephemeral key and
trusted based on actions instead of a pre-registration.

Still, there are a few different efforts that are dealing with
bridging registration type questions in the OAuth and related spaces.
OAuth 2 assumes clients all have client IDs and they’re
pre-registered. The Dynamic Registration spec (RFC7591) allows that
registration to happen programmatically as a discrete pre-step, but it
also allows the client to present a signed assertion (the software
statement) that helps the client claim that it is legitimate. An
extension to OpenID Connect recently introduced the idea of the client
sending a “registration” object with the initial request to the AS, to
provide a drive-by registration in a single step. The client would get
a client ID out the other end if it’s successful. I haven’t seen this
applied in practice anywhere yet. The OpenID SIOP group has been
discussing overloading the Client ID parameter itself to contain
semantic information allowing the client to send an identifier that
the AS could use to fetch client registration information. This
subverts the idea of the client ID as understood by most
implementations (it’s now client-supplied and meaningful instead of
AS-supplied and opaque to the client). The frontrunner here is using
DIDs and DID documents to convey stuff, but that’s mostly because
that’s the tech this crowd currently likes a lot.

In GNAP we’ve inverted the registration requirement a bit — the
protocol’s set up to assume that you’re coming in with no previous
registration, so you can send any client information necessary during
the initial request, and that initial request always happens the same
way regardless of how the interactions and other next steps go. But
there’s an optimization for cases when you :do: have a pre-registered
client, so that you can send the ID instead of the client info itself.

I’m not sure how much of that actually applies to what you’re working
on, based on my very limited understanding of what you’re doing, but I
hope it’s helpful. Good luck with the project!"

> 1. You only get the list if you provide a list of your own. Therefore, 
> someone adds some random IPs into a list, then knows what the state of the 
> network is, and as soon as the IP they're using appears on the list, they 
> stop using it until it drops back off.

True. The IP address harvesting is one thing, but stage two when they
actively try to make phone calls will always happen as it's too
lucrative not to. That's the data I'm also interested in getting and
sharing. Folks that run the nodes will be able to add their own phone
number allocations and I'm thinking about using the various RIR feeds
etc. RPKI. Again, I think this is a solved problem, I just need to
find the right place to look.

> 2. IPv6 means presumably blocking /64s at a time rather than individual 
> addresses, I don't know if privacy addressing etc is a thing in the telephony 
> market, where addresses rotate after a while?

Not sure yet.

> 3. CGNAT means you might affect more than you intended, and the problem will 
> only get worse over time.

How is this currently handled with an infected PC behind CGNAT? That's
a solved problem?

> 4. If the source IP is just a compromised device, you've booted that person 
> (who may be an entire office) off SIP for a week or more, even if they fix 
> the issue.

You don't need to block them, but depending on what the ITSP wants to
do, they could get limited service etc.

> Additionally, from a feature POV:
>
> 1. BGP sounds like a needless over-complication. Surely just some iptables 
> (realistically: nftables) hooks would do?

Both. Depends on how you run your nodes. The BGP part I just like the
thought of and want to explore.

> 2. A user is never going to pay for all data collected if it's available via 
> P2P, and if it isn't all on P2P, then why would anyone use the P2P version? 
> Not to mention it's once again a GDPR minefield.

I think IP addresses and GDPR is a solved problem?

> 3. "Small binary size for IoT usage" -- presumably this is going either on 
> your voice gateway or being scraped from logs, it's way out of scope for IoT?

Maybe some form of it lives in a device/gateway. I was looking at
Juniper JET and MQTT types things for the data sharing part
https://www.juniper.net/documentation/us/en/software/junos/jet-developer/index.html

I'm just thinking about devices like the RIPE Atlas probes and small
devices that can just sit doing this - https://atlas.ripe.net/probes/

> Might I suggest just implementing a DNSBL or similar? Would be a lot simpler, 
> allows for local caching, and it's very easy to extend -- allowing AXFR/IXFR 
> if you wanted users to be able to scrape the entire list, or just with a 
> pointer to an HTTP(S) URL that the zone can be downloaded. You can even parse 
> submitted data and maybe even do a probe of your own or correlate with other 
> submitted reports so that you only implement when multiple submissions from 
> different locations report the same thing. Sure, it's not the distributed 
> content hosting model you're looking for, but otherwise there's no stopping 
> it from being abused.
>

This is a great idea and worth exploring. It might be useful for the
bootstrapping/authz/authn part of the P2P. I'm really, really trying
to not have to run anything, but everyone I speak to says the problem
just gets pushed further up:

"SentryPeer looks cool. I guess its p2p discovery mechanism primarily
must work over WAN while Zyre emphasizes LAN (via UDP broadcast) and
it works well there. Zyre can instead connect via TCP to some "well
known" zgossip servers. This works on the WAN if the zgossip servers
are accessible by the peer. Of course, it also adds some amount of
unwanted centralization."

More here:

https://github.com/zeromq/zyre/issues/701#issuecomment-947808963

> I used to run a couple of nodes in the PGP keyserver ring (aka "SKS") and 
> it's amazing what things people will do to either be a nuisance or to show 
> how "smart" they are. I would *strongly* recommend speaking with more 
> operators first.

I'm doing the p2p/sharing last after the API and web UI. I'm going to
enjoy this problem!

> Just my two pence worth. Maybe I'm wrong -- I've not done any SIP for a 
> decade, and certainly not at your scale.

I'm very grateful. Thank you.

Re: [uknof] SentryPeer: A distributed peer to peer list of bad IP addresses and phone numbers collected via a SIP Honeypot

Reply via email to