Re: [sidr] Master thesis - RPKI

Demian Rosenkranz Tue, 14 Jan 2014 07:50:23 -0800

Thank you for the detailed answer! I see, there are obscurities becauseof a missing detailed description of the particular problems.Unfortunately, I'm writing my thesis in German but I hope my commentsclear it up. Beneath the classification, I will try to find ways toidentify the appearance of a problem. I'm searching for anomaliesregarding the repository behavior for several months. I hope/think thishelps a lot to find reasonable ways for identification.

The primary goal is to identify the problems and not to find the guiltyone. This wouldn't help the RP/RO to react on a problem.


My comments below.

Am 13.01.2014 18:20, schrieb Murphy, Sandra:

(Speaking as regular ol' member)

as some of you know, I'm writing my master thesis about RPKI at
Deutsche Telekom (Rüdiger Volk). Especially I try to identify the
"problems (attack, misconfiguration, ...)" of using RPKI as a relying
party/resource owner and try to find ways to identify if such a
"problem" arises (i.e. competing ROAs).


Sounds like fun.

Yes, it's an interesting topic and I've spent a lot of time to read andunderstand all the drafts/RFC's :-)

Furthermore I want to give
proposals on how to proceed if a "problem" arises. To sum it up, a RP
should use the RPKI as a reliable tool to improve it's routing. I hope
this thesis helps to reduce the concerns of some RPs of using RPKI for
securing inter-domain routing.


Advice on how to proceed in the case of errors would be very interesting.

Beside the understanding of what problems can happen the advices on howto proceed should be the added value for a RP. I hope this helps forsome deployments :-)

The following classification lists the groups of problems I've
identified including a short description. If possible, I used terms
which are used in SIDR drafts/RFCs. It would be great to get some
feedback to this classification. I guess most of you prefer textual
description, so I tried to represent it in textual form. Additionally,
you can find a jpg attached.


Some comments below:

Classification of "problems"
1. Incorrect representation of RPs/ROs INRs at "RPKI layer":  The
initial transformation of RP/Resource Owner INRs as RPKI objects is not
correct.


You mean X issues a certificate for resources Y to some customer/member and
the resources Y weren't actually held by that customer/member?

So this is X's error?

Yes, this would belongs to that "problem" category.

More in detail:

I would say there are three (technical) layers which are of interest fora relying party/ resource owner (RO) regarding inter-domain routing:1: RP/RO layer: How does a RP/RO wants to see it's own INRs at routinglayer and rpki layer.Is there a correct mapping? This is kind of asemantic layer...2: RPKI layer: Describes the permissions to use an INR (in form of acryptographic object).

3: routing layer: Thats the actual inter-domain routing (BGP).

Every layer has it's own challanges/problems. "Incorrect representationof RPs/ROs INRs at RPKI layer" means that there is an incorrect mappingand so a wrong semantic representation.

Usually there could be another problem with a wrong representation atthe routing layer (i.e. wrong route announcements because of wrongrouter configuration) but the routing layer is not part of my thesis.I'm focusing on problems regarding the tool RPKI.


Are you working on just the categorization, or are you going to propose methods
of detecting these problems?

I'm going to try to find ways detecting the identified problems with thegiven public information. And if there is no way with the giveninformation, I would like to propose extensions to the existing"information structure".But just theoretical. Unfortunately there is not enough time toimplement it.

2. Incorrect/untrustworthy/suspicious RPKI information
2.1 Object related
2.1.1 Competing Attack: Router certificate: ASN competes with existing
router certificate;


Could you say what you mean by "ASN competes"?

A router might belong to an organization that uses more than one ASN.
The AS migration case is one particular case of that happening.

So I'd say a router might have multiple router certificates with different ASNs.

So I'm not sure what "ASN competes" means.

Ok, I should change the description :). I mean if there exists a routercertificate with AS number X and another entity comes up with a routercertificate with the same AS number X but doesn't own this AS number. Ofcourse there are cases in which this is ok, but here I talk about anattack, misconfiguration...Here it would be great, if a distinction betweenintentionally/unintentionally is possible.

   Other Objects: IP-Range competes with existing
objects (In my opinion, certificates can also compete with other certs
because of their X.509 extensions. Of course, at the end the ROA causes
the problem but it could be kind of an early warning system if competing
certificates are identified --> Should a doctor try to take care of the
cause or just try to allay the pain?).


Again, what do you mean by "competes"?

A prefix holder might authorize more than one ASN to advertise the prefix:
It might authorize upstream providers to announce for it.
It might hold and use multiple ASNs on a regular basis.
There might be AS migration taking place.

And there's the case that a provider has a ROA for its aggregate, has issued
prefixes to its customer and allowed the customer to multi-home with that 
prefix,
which means the customer might be issuing ROAs for a more specific prefix
to a different ASN.

So there are reasons why there might be multiple ROAs for the same
prefix to different ASNs.

Same as above. I mean attacks, misconfigurations... The classificationis explained in detail in my documentation for a better understandingbut this is unfortunately in German.

2.1.2 Whacked Objects: Object transition which results in a route
transition from valid to missing or invalid.


You mean a failure to correctly handle the timing involved in changing the
state of a route advertisement and getting the ROAs in place before the
route advertisement occurs?  Or a failure to correctly handle the refresh
of certificates so that ROAs do not expire?

Not alone. I mean any action which affects an RPKI object in a "bad way.In the suspenders draft it's explained as:

   "Any object in the RPKI can become invalid or inaccessible (to RPs)
   via various actions by CAs and/or publication point maintainers along
   the certificate path from the object's EE certificate to a trust
   anchor (TA).  Any action that causes an object to become invalid or
   inaccessible is termed "whacking"."

I mean at the end, it's important what impact the actions has on theinter-domain routing. Therefore I choosed this description: "...whichresults in a route transition from valid to missing or invalid." I guessthis was the obscurity?


(Again, are you going to work on the categorization, or are you going to
propose means of identifying these errors?  I think I ask because I'm not
sure how you would tell the difference between a failure and deliberate 
intention.)

Yes, the distinction between failure and deliberate intention is noteasy to detect if possible at all by a program. I think the intention ofthe causer is secondary. The RP/RO has to be able to detect the arisingof a problem to handle it in the best way.

2.1.3 Non-compliance: Non-compliance of RPKI objects can cause bad
behavior of RP software. Weak alg./key length could result in a
downgrade attack. (There are syntactical checks, but for the sake of
completeness and because of possible implementation mistakes it's also
included)


Implementation errors are a problem everywhere, but you're hypothesizing
two errors here: a compliance error in issuing an object and an error in 
checking
objects for compliance.  This is probably more likely if both implementations
have the same source.

Or maybe you are talking about error in issuing a certificate causes the 
implementation
that is validating the certificate to fail (i.e., crash).

As seen in the past, syntactical/semantical inconsistence ofcertificates causes a lot of problems in the "usual" PKI world. This isalso a potential problem for a RP. The RP software has to interpret thecertificates and objects. Checking for compliance to the detailedstandards helps in my opinion to reduce such problems. Of course, suchchecks are already included in the current three RP softwares. As Isaid, it's for the sake of completeness (academic constraint :-)) and Iwould say it's an important point to reduce failures.

2.1.4 Expiring object: Check if objects are almost expired/forgotten.
Can cause unwanted routing behavior.


That's another timing issue, and somewhat related to the Whacked Objects
case.  Right?

Yes. Could be merged with "whacked objects".

2.2 System related
2.2.1 Replay attack:A whole old dataset could replace a newer one and
could be still valid.


That's one reason for the manifests.  If you manage to come up with a
scenario where replay occurred but was undetected by the manifest, it would
be very interesting.

I.e. it is possible if there are changes on the repository and the the(now) old manifest has to be set on the CRL as long as it's not expired.An attacker could use the old dataset for a replay attack and until themanifest isn't expired the RP would see a valid dataset. Of course, thevalidity is limited by the update period of the manifest but even if theold dataset is expired it depends on the local policy of a relying partyhow to handle this situation. I guess most of them are not very strict.

2.2.2 Short lifetime attack: Recurring ROA sets with short lifetimes
could overload the RP software because of it's cryptographic checks.


I think that might depend on whether the RP software is setup to re-sync with
the repositories on a periodic basis on on events like expiration.

Another interesting question is whether short lifetimes would cause churn in
the routing space.

I Agree, it would depend on the configuration of the local cache.Furthermore it would depend on the tier the objects are located in theRPKI hierarchy, the hardware the RP sofware runs on, ...At this point, I'm not sure how CPU-intensive the checks are, but thefact of the matter is, the local cache processes the objects on therepository. So, it's at least theoretically possible.

2.2.3 Incomplete amount of objects: Incompleteness of RPKI objects could
affect the global routing behavior.


Are you talking about objects that are somehow deleted from the repository?
That should be something the manifest should detect, so again, if you come
up with a scenario, that would be interesting.

Are you talking about the "missing" case of route validity?  Different RPs
might respond differently to that, and the difference could have interesting 
effects on
routing - but that's true of any difference in local routing policy.

Are you talking about the "clueless  customer" case - a provider produces a
ROA for its aggregate before there are ROAs for its customers who are
advertising more specifics.  This is indeed a recognized case - there's
guidance in the origin-ops (see page 5) draft with obligations (MUST)
to providers about this.  Which is not to say it won't occur.

I mean the first case. It's would be at least possible through a replayattack and because of the missingauthentication/confidentiality/integrity, a mitm attack is possible. Anattacker could retain a whole dataset (ca certs including all dataregarding this cert on the repository) belongs to a ca cert. As Iunderstand the data structure, a rp wouldn't recognize that case?!

2.2.4 repository availability:A DoS attack could affect the availability
of the Repository.


Anything that affects the availability of the repository would be a problem -
DoS attack, power outage, site unreachable, etc., or even internal RP issues.
Certainly there are operational means to ameliorate the impact. Which is not to
say that it is not a problem.  If  your focus is on the effect on RPs, there's 
no
way for the RP to know the cause, so coming up with recommendations of how
to proceed could be tricky.

The availability of the repository is important for an RP to get updatedand could cause in expired objects. To identify the reason (attack,...) is not my first intention. To identify that the problem arises isimportant to avoid i.e. expired objects.

--Sandy, speaking as regular ol' member


Kind regards

Demian
_______________________________________________
sidr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/sidr

Re: [sidr] Master thesis - RPKI

Reply via email to