Re: [PATCH v3 11/11] x86/mm: Try to preserve old TLB entries using PCID
On Tue, Jun 20, 2017 at 10:22:17PM -0700, Andy Lutomirski wrote: > PCID is a "process context ID" -- it's what other architectures call > an address space ID. Every non-global TLB entry is tagged with a > PCID, only TLB entries that match the currently selected PCID are > used, and we can switch PGDs without flushing the TLB. x86's > PCID is 12 bits. > > This is an unorthodox approach to using PCID. x86's PCID is far too > short to uniquely identify a process, and we can't even really > uniquely identify a running process because there are monster > systems with over 4096 CPUs. To make matters worse, past attempts > to use all 12 PCID bits have resulted in slowdowns instead of > speedups. > > This patch uses PCID differently. We use a PCID to identify a > recently-used mm on a per-cpu basis. An mm has no fixed PCID > binding at all; instead, we give it a fresh PCID each time it's > loaded except in cases where we want to preserve the TLB, in which > case we reuse a recent value. > > This seems to save about 100ns on context switches between mms. "... with my microbenchmark of ping-ponging." :) > > Signed-off-by: Andy Lutomirski > --- > arch/x86/include/asm/mmu_context.h | 3 ++ > arch/x86/include/asm/processor-flags.h | 2 + > arch/x86/include/asm/tlbflush.h| 18 +++- > arch/x86/mm/init.c | 1 + > arch/x86/mm/tlb.c | 82 > ++ > 5 files changed, 86 insertions(+), 20 deletions(-) ... > diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h > index 57b305e13c4c..a9a5aa6f45f7 100644 > --- a/arch/x86/include/asm/tlbflush.h > +++ b/arch/x86/include/asm/tlbflush.h > @@ -82,6 +82,12 @@ static inline u64 bump_mm_tlb_gen(struct mm_struct *mm) > #define __flush_tlb_single(addr) __native_flush_tlb_single(addr) > #endif > > +/* > + * 6 because 6 should be plenty and struct tlb_state will fit in > + * two cache lines. > + */ > +#define NR_DYNAMIC_ASIDS 6 TLB_NR_DYN_ASIDS Properly prefixed, I guess. The rest later, when you're done experimenting. :) -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
Re: [PATCH v3 11/11] x86/mm: Try to preserve old TLB entries using PCID
On Thu, 22 Jun 2017, Andy Lutomirski wrote: > On Thu, Jun 22, 2017 at 2:22 PM, Thomas Gleixner wrote: > > On Thu, 22 Jun 2017, Andy Lutomirski wrote: > >> On Thu, Jun 22, 2017 at 5:21 AM, Thomas Gleixner > >> wrote: > >> > Now one other optimization which should be trivial to add is to keep the > >> > 4 > >> > asid context entries in cpu_tlbstate and cache the last asid in thread > >> > info. If that's still valid then use it otherwise unconditionally get a > >> > new > >> > one. That avoids the whole loop machinery and thread info is cache hot in > >> > the context switch anyway. Delta patch on top of your version below. > >> > >> I'm not sure I understand. If an mm has ASID 0 on CPU 0 and ASID 1 on > >> CPU 1 and a thread in that mm bounces back and forth between those > >> CPUs, won't your patch cause it to flush every time? > > > > Yeah, I was too focussed on the non migratory case, where two tasks from > > different processes play rapid ping pong. That's what I was looking at for > > various reasons. > > > > There the cached asid really helps by avoiding the loop completely, but > > yes, the search needs to be done for the bouncing between CPUs case. > > > > So maybe a combo of those might be interesting. > > > > I'm not too worried about optimizing away the loop. It's a loop over > four or six things that are all in cachelines that we need anyway. I > suspect that we'll never be able to see it in any microbenchmark, let > alone real application. Fair enough.
Re: [PATCH v3 11/11] x86/mm: Try to preserve old TLB entries using PCID
On Thu, Jun 22, 2017 at 2:22 PM, Thomas Gleixner wrote: > On Thu, 22 Jun 2017, Andy Lutomirski wrote: >> On Thu, Jun 22, 2017 at 5:21 AM, Thomas Gleixner wrote: >> > Now one other optimization which should be trivial to add is to keep the 4 >> > asid context entries in cpu_tlbstate and cache the last asid in thread >> > info. If that's still valid then use it otherwise unconditionally get a new >> > one. That avoids the whole loop machinery and thread info is cache hot in >> > the context switch anyway. Delta patch on top of your version below. >> >> I'm not sure I understand. If an mm has ASID 0 on CPU 0 and ASID 1 on >> CPU 1 and a thread in that mm bounces back and forth between those >> CPUs, won't your patch cause it to flush every time? > > Yeah, I was too focussed on the non migratory case, where two tasks from > different processes play rapid ping pong. That's what I was looking at for > various reasons. > > There the cached asid really helps by avoiding the loop completely, but > yes, the search needs to be done for the bouncing between CPUs case. > > So maybe a combo of those might be interesting. > I'm not too worried about optimizing away the loop. It's a loop over four or six things that are all in cachelines that we need anyway. I suspect that we'll never be able to see it in any microbenchmark, let alone real application.
Re: [PATCH v3 11/11] x86/mm: Try to preserve old TLB entries using PCID
On Thu, 22 Jun 2017, Andy Lutomirski wrote: > On Thu, Jun 22, 2017 at 5:21 AM, Thomas Gleixner wrote: > > Now one other optimization which should be trivial to add is to keep the 4 > > asid context entries in cpu_tlbstate and cache the last asid in thread > > info. If that's still valid then use it otherwise unconditionally get a new > > one. That avoids the whole loop machinery and thread info is cache hot in > > the context switch anyway. Delta patch on top of your version below. > > I'm not sure I understand. If an mm has ASID 0 on CPU 0 and ASID 1 on > CPU 1 and a thread in that mm bounces back and forth between those > CPUs, won't your patch cause it to flush every time? Yeah, I was too focussed on the non migratory case, where two tasks from different processes play rapid ping pong. That's what I was looking at for various reasons. There the cached asid really helps by avoiding the loop completely, but yes, the search needs to be done for the bouncing between CPUs case. So maybe a combo of those might be interesting. Thanks, tglx
Re: [PATCH v3 11/11] x86/mm: Try to preserve old TLB entries using PCID
On Thu, Jun 22, 2017 at 5:21 AM, Thomas Gleixner wrote: > On Wed, 21 Jun 2017, Andy Lutomirski wrote: >> On Wed, Jun 21, 2017 at 6:38 AM, Thomas Gleixner wrote: >> > That requires a conditional branch >> > >> > if (asid >= NR_DYNAMIC_ASIDS) { >> > asid = 0; >> > >> > } >> > >> > The question is whether 4 IDs would be sufficient which trades the branch >> > for a mask operation. Or you go for 8 and spend another cache line. >> >> Interesting. I'm inclined to either leave it at 6 or reduce it to 4 >> for now and to optimize later. > > :) > >> > Hmm. So this loop needs to be taken unconditionally even if the task stays >> > on the same CPU. And of course the number of dynamic IDs has to be short in >> > order to makes this loop suck performance wise. >> > >> > Something like the completely disfunctional below might be worthwhile to >> > explore. At least arch/x86/mm/ compiles :) >> > >> > It gets rid of the loop search and lifts the limit of dynamic ids by >> > trading it with a percpu variable in mm_context_t. >> >> That would work, but it would take a lot more memory on large systems >> with lots of processes, and I'd also be concerned that we might run >> out of dynamic percpu space. > > Yeah, did not think about the dynamic percpu space. > >> How about a different idea: make the percpu data structure look like a >> 4-way set associative cache. The ctxs array could be, say, 1024 >> entries long without using crazy amounts of memory. We'd divide it >> into 256 buckets, so you'd index it like ctxs[4*bucket + slot]. For >> each mm, we choose a random bucket (from 0 through 256), and then we'd >> just loop over the four slots in the bucket in choose_asid(). This >> would require very slightly more arithmetic (I'd guess only one or two >> cycles, though) but, critically, wouldn't touch any more cachelines. >> >> The downside of both of these approaches over the one in this patch is >> that the change that the percpu cacheline we need is not in the cache >> is quite a bit higher since it's potentially a different cacheline for >> each mm. It would probably still be a win because avoiding the flush >> is really quite valuable. >> >> What do you think? The added code would be tiny. > > That might be worth a try. > > Now one other optimization which should be trivial to add is to keep the 4 > asid context entries in cpu_tlbstate and cache the last asid in thread > info. If that's still valid then use it otherwise unconditionally get a new > one. That avoids the whole loop machinery and thread info is cache hot in > the context switch anyway. Delta patch on top of your version below. I'm not sure I understand. If an mm has ASID 0 on CPU 0 and ASID 1 on CPU 1 and a thread in that mm bounces back and forth between those CPUs, won't your patch cause it to flush every time? --Andy
Re: [PATCH v3 11/11] x86/mm: Try to preserve old TLB entries using PCID
On Thu, Jun 22, 2017 at 9:09 AM, Nadav Amit wrote: > Andy Lutomirski wrote: > >> >> --- a/arch/x86/mm/init.c >> +++ b/arch/x86/mm/init.c >> @@ -812,6 +812,7 @@ void __init zone_sizes_init(void) >> >> DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate) = { >> .loaded_mm = &init_mm, >> + .next_asid = 1, > > I think this is a remainder from previous version of the patches, no? It > does not seem necessary and may be confusing (ctx_id 0 is reserved, but not > asid 0). Hmm. It's no longer needed for correctness, but init_mm still lands in slot 0, and it seems friendly to avoid immediately stomping it. Admittedly, this won't make any practical difference since it'll only happen once per cpu. > > Other than that, if you want, you can put for the entire series: > > Reviewed-by: Nadav Amit > Thanks!
Re: [PATCH v3 11/11] x86/mm: Try to preserve old TLB entries using PCID
Andy Lutomirski wrote: > > --- a/arch/x86/mm/init.c > +++ b/arch/x86/mm/init.c > @@ -812,6 +812,7 @@ void __init zone_sizes_init(void) > > DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate) = { > .loaded_mm = &init_mm, > + .next_asid = 1, I think this is a remainder from previous version of the patches, no? It does not seem necessary and may be confusing (ctx_id 0 is reserved, but not asid 0). Other than that, if you want, you can put for the entire series: Reviewed-by: Nadav Amit
Re: [PATCH v3 11/11] x86/mm: Try to preserve old TLB entries using PCID
On Wed, 21 Jun 2017, Andy Lutomirski wrote: > On Wed, Jun 21, 2017 at 6:38 AM, Thomas Gleixner wrote: > > That requires a conditional branch > > > > if (asid >= NR_DYNAMIC_ASIDS) { > > asid = 0; > > > > } > > > > The question is whether 4 IDs would be sufficient which trades the branch > > for a mask operation. Or you go for 8 and spend another cache line. > > Interesting. I'm inclined to either leave it at 6 or reduce it to 4 > for now and to optimize later. :) > > Hmm. So this loop needs to be taken unconditionally even if the task stays > > on the same CPU. And of course the number of dynamic IDs has to be short in > > order to makes this loop suck performance wise. > > > > Something like the completely disfunctional below might be worthwhile to > > explore. At least arch/x86/mm/ compiles :) > > > > It gets rid of the loop search and lifts the limit of dynamic ids by > > trading it with a percpu variable in mm_context_t. > > That would work, but it would take a lot more memory on large systems > with lots of processes, and I'd also be concerned that we might run > out of dynamic percpu space. Yeah, did not think about the dynamic percpu space. > How about a different idea: make the percpu data structure look like a > 4-way set associative cache. The ctxs array could be, say, 1024 > entries long without using crazy amounts of memory. We'd divide it > into 256 buckets, so you'd index it like ctxs[4*bucket + slot]. For > each mm, we choose a random bucket (from 0 through 256), and then we'd > just loop over the four slots in the bucket in choose_asid(). This > would require very slightly more arithmetic (I'd guess only one or two > cycles, though) but, critically, wouldn't touch any more cachelines. > > The downside of both of these approaches over the one in this patch is > that the change that the percpu cacheline we need is not in the cache > is quite a bit higher since it's potentially a different cacheline for > each mm. It would probably still be a win because avoiding the flush > is really quite valuable. > > What do you think? The added code would be tiny. That might be worth a try. Now one other optimization which should be trivial to add is to keep the 4 asid context entries in cpu_tlbstate and cache the last asid in thread info. If that's still valid then use it otherwise unconditionally get a new one. That avoids the whole loop machinery and thread info is cache hot in the context switch anyway. Delta patch on top of your version below. > (P.S. Why doesn't random_p32() try arch_random_int()?) Could you please ask questions which do not require crystalballs for answering? Thanks, tglx 8<--- --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -159,8 +159,16 @@ static inline void destroy_context(struc extern void switch_mm(struct mm_struct *prev, struct mm_struct *next, struct task_struct *tsk); -extern void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, - struct task_struct *tsk); +extern void __switch_mm_irqs_off(struct mm_struct *prev, +struct mm_struct *next, u32 *last_asid); + +static inline void switch_mm_irqs_off(struct mm_struct *prev, + struct mm_struct *next, + struct task_struct *tsk) +{ + __switch_mm_irqs_off(prev, next, &tsk->thread_info.asid); +} + #define switch_mm_irqs_off switch_mm_irqs_off #define activate_mm(prev, next)\ --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -54,6 +54,7 @@ struct task_struct; struct thread_info { unsigned long flags; /* low level flags */ + u32 asid; }; #define INIT_THREAD_INFO(tsk) \ --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -83,10 +83,13 @@ static inline u64 bump_mm_tlb_gen(struct #endif /* - * 6 because 6 should be plenty and struct tlb_state will fit in - * two cache lines. + * NR_DYNAMIC_ASIDS must be a power of 2. 4 makes tlb_state fit into two + * cache lines. */ -#define NR_DYNAMIC_ASIDS 6 +#define NR_DYNAMIC_ASIDS_BITS 2 +#define NR_DYNAMIC_ASIDS (1U << NR_DYNAMIC_ASIDS_BITS) +#define DYNAMIC_ASIDS_MASK (NR_DYNAMIC_ASIDS - 1) +#define ASID_NEEDS_FLUSH (1U << 16) struct tlb_context { u64 ctx_id; @@ -102,7 +105,8 @@ struct tlb_state { */ struct mm_struct *loaded_mm; u16 loaded_mm_asid; - u16 next_asid; + u16 curr_asid; + u32 notask_asid; /* * Access to this CR4 shadow and to H/W CR4 is protected by --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -812,7 +812,7 @@ void __init zone_sizes_init(void) DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate)
Re: [PATCH v3 11/11] x86/mm: Try to preserve old TLB entries using PCID
On Wed, Jun 21, 2017 at 6:38 AM, Thomas Gleixner wrote: > On Tue, 20 Jun 2017, Andy Lutomirski wrote: >> This patch uses PCID differently. We use a PCID to identify a >> recently-used mm on a per-cpu basis. An mm has no fixed PCID >> binding at all; instead, we give it a fresh PCID each time it's >> loaded except in cases where we want to preserve the TLB, in which >> case we reuse a recent value. >> >> This seems to save about 100ns on context switches between mms. > > Depending on the work load I assume. For a CPU switching between a large > number of processes consecutively it won't make a change. In fact it will > be slower due to the extra few cycles required for rotating the asid, but I > doubt that this can be measured. True. I suspect this can be improved -- see below. > >> +/* >> + * 6 because 6 should be plenty and struct tlb_state will fit in >> + * two cache lines. >> + */ >> +#define NR_DYNAMIC_ASIDS 6 > > That requires a conditional branch > > if (asid >= NR_DYNAMIC_ASIDS) { > asid = 0; > > } > > The question is whether 4 IDs would be sufficient which trades the branch > for a mask operation. Or you go for 8 and spend another cache line. Interesting. I'm inclined to either leave it at 6 or reduce it to 4 for now and to optimize later. > >> atomic64_t last_mm_ctx_id = ATOMIC64_INIT(1); >> >> +static void choose_new_asid(struct mm_struct *next, u64 next_tlb_gen, >> + u16 *new_asid, bool *need_flush) >> +{ >> + u16 asid; >> + >> + if (!static_cpu_has(X86_FEATURE_PCID)) { >> + *new_asid = 0; >> + *need_flush = true; >> + return; >> + } >> + >> + for (asid = 0; asid < NR_DYNAMIC_ASIDS; asid++) { >> + if (this_cpu_read(cpu_tlbstate.ctxs[asid].ctx_id) != >> + next->context.ctx_id) >> + continue; >> + >> + *new_asid = asid; >> + *need_flush = (this_cpu_read(cpu_tlbstate.ctxs[asid].tlb_gen) < >> +next_tlb_gen); >> + return; >> + } > > Hmm. So this loop needs to be taken unconditionally even if the task stays > on the same CPU. And of course the number of dynamic IDs has to be short in > order to makes this loop suck performance wise. > > Something like the completely disfunctional below might be worthwhile to > explore. At least arch/x86/mm/ compiles :) > > It gets rid of the loop search and lifts the limit of dynamic ids by > trading it with a percpu variable in mm_context_t. That would work, but it would take a lot more memory on large systems with lots of processes, and I'd also be concerned that we might run out of dynamic percpu space. How about a different idea: make the percpu data structure look like a 4-way set associative cache. The ctxs array could be, say, 1024 entries long without using crazy amounts of memory. We'd divide it into 256 buckets, so you'd index it like ctxs[4*bucket + slot]. For each mm, we choose a random bucket (from 0 through 256), and then we'd just loop over the four slots in the bucket in choose_asid(). This would require very slightly more arithmetic (I'd guess only one or two cycles, though) but, critically, wouldn't touch any more cachelines. The downside of both of these approaches over the one in this patch is that the change that the percpu cacheline we need is not in the cache is quite a bit higher since it's potentially a different cacheline for each mm. It would probably still be a win because avoiding the flush is really quite valuable. What do you think? The added code would be tiny. (P.S. Why doesn't random_p32() try arch_random_int()?) --Andy
Re: [PATCH v3 11/11] x86/mm: Try to preserve old TLB entries using PCID
On Wed, 21 Jun 2017, Thomas Gleixner wrote: > > + for (asid = 0; asid < NR_DYNAMIC_ASIDS; asid++) { > > + if (this_cpu_read(cpu_tlbstate.ctxs[asid].ctx_id) != > > + next->context.ctx_id) > > + continue; > > + > > + *new_asid = asid; > > + *need_flush = (this_cpu_read(cpu_tlbstate.ctxs[asid].tlb_gen) < > > + next_tlb_gen); > > + return; > > + } > > Hmm. So this loop needs to be taken unconditionally even if the task stays > on the same CPU. And of course the number of dynamic IDs has to be short in > order to makes this loop suck performance wise. ... not suck ...
Re: [PATCH v3 11/11] x86/mm: Try to preserve old TLB entries using PCID
On Tue, 20 Jun 2017, Andy Lutomirski wrote: > This patch uses PCID differently. We use a PCID to identify a > recently-used mm on a per-cpu basis. An mm has no fixed PCID > binding at all; instead, we give it a fresh PCID each time it's > loaded except in cases where we want to preserve the TLB, in which > case we reuse a recent value. > > This seems to save about 100ns on context switches between mms. Depending on the work load I assume. For a CPU switching between a large number of processes consecutively it won't make a change. In fact it will be slower due to the extra few cycles required for rotating the asid, but I doubt that this can be measured. > +/* > + * 6 because 6 should be plenty and struct tlb_state will fit in > + * two cache lines. > + */ > +#define NR_DYNAMIC_ASIDS 6 That requires a conditional branch if (asid >= NR_DYNAMIC_ASIDS) { asid = 0; } The question is whether 4 IDs would be sufficient which trades the branch for a mask operation. Or you go for 8 and spend another cache line. > atomic64_t last_mm_ctx_id = ATOMIC64_INIT(1); > > +static void choose_new_asid(struct mm_struct *next, u64 next_tlb_gen, > + u16 *new_asid, bool *need_flush) > +{ > + u16 asid; > + > + if (!static_cpu_has(X86_FEATURE_PCID)) { > + *new_asid = 0; > + *need_flush = true; > + return; > + } > + > + for (asid = 0; asid < NR_DYNAMIC_ASIDS; asid++) { > + if (this_cpu_read(cpu_tlbstate.ctxs[asid].ctx_id) != > + next->context.ctx_id) > + continue; > + > + *new_asid = asid; > + *need_flush = (this_cpu_read(cpu_tlbstate.ctxs[asid].tlb_gen) < > +next_tlb_gen); > + return; > + } Hmm. So this loop needs to be taken unconditionally even if the task stays on the same CPU. And of course the number of dynamic IDs has to be short in order to makes this loop suck performance wise. Something like the completely disfunctional below might be worthwhile to explore. At least arch/x86/mm/ compiles :) It gets rid of the loop search and lifts the limit of dynamic ids by trading it with a percpu variable in mm_context_t. Thanks, tglx 8< --- a/arch/x86/include/asm/mmu.h +++ b/arch/x86/include/asm/mmu.h @@ -25,6 +25,8 @@ typedef struct { */ atomic64_t tlb_gen; + u32 __percpu*asids; + #ifdef CONFIG_MODIFY_LDT_SYSCALL struct ldt_struct *ldt; #endif --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -156,11 +156,23 @@ static inline void destroy_context(struc destroy_context_ldt(mm); } -extern void switch_mm(struct mm_struct *prev, struct mm_struct *next, - struct task_struct *tsk); +extern void __switch_mm(struct mm_struct *prev, struct mm_struct *next); + +static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, +struct task_struct *tsk) +{ + __switch_mm(prev, next); +} + +extern void __switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next); + +static inline void switch_mm_irqs_off(struct mm_struct *prev, + struct mm_struct *next, + struct task_struct *tsk) +{ + __switch_mm_irqs_off(prev, next); +} -extern void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, - struct task_struct *tsk); #define switch_mm_irqs_off switch_mm_irqs_off #define activate_mm(prev, next)\ @@ -299,6 +311,9 @@ static inline unsigned long __get_curren { unsigned long cr3 = __pa(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd); + if (static_cpu_has(X86_FEATURE_PCID)) + cr3 |= this_cpu_read(cpu_tlbstate.loaded_mm_asid); + /* For now, be very restrictive about when this can be called. */ VM_WARN_ON(in_nmi() || !in_atomic()); --- a/arch/x86/include/asm/processor-flags.h +++ b/arch/x86/include/asm/processor-flags.h @@ -35,6 +35,7 @@ /* Mask off the address space ID bits. */ #define CR3_ADDR_MASK 0x7000ull #define CR3_PCID_MASK 0xFFFull +#define CR3_NOFLUSH (1UL << 63) #else /* * CR3_ADDR_MASK needs at least bits 31:5 set on PAE systems, and we save @@ -42,6 +43,7 @@ */ #define CR3_ADDR_MASK 0xull #define CR3_PCID_MASK 0ull +#define CR3_NOFLUSH 0 #endif #endif /* _ASM_X86_PROCESSOR_FLAGS_H */ --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -82,6 +82,15 @@ static inline u64 bump_mm_tlb_gen(struct #define __flush_tlb_single(addr) __native_flush_tlb_single(addr) #endif +/* + * NR_DYNAMIC_ASIDS must be a power of 2. 4 makes tlb_state fit into two + * cache lines. + */ +#define NR_DYNAMIC_ASIDS_BITS 2 +#define NR_DYNAMIC_ASIDS (1U << NR_DYNAMIC_ASIDS_B