Re: [RFC PATCH v1 1/2] Bluetooth: queue ACL packets if no handle is found

2020-07-17 Thread Alain Michaud
Hi Marcel,

On Fri, Jul 17, 2020 at 2:51 AM Marcel Holtmann  wrote:
>
> Hi Alain,
>
> > >>> There is a possibility that an ACL packet is received before we
> > >>> receive the HCI connect event for the corresponding handle. If this
> > >>> happens, we discard the ACL packet.
> > >>>
> > >>> Rather than just ignoring them, this patch provides a queue for
> > >>> incoming ACL packet without a handle. The queue is processed when
> > >>> receiving a HCI connection event. If 2 seconds elapsed without
> > >>> receiving the HCI connection event, assume something bad happened
> > >>> and discard the queued packet.
> > >>>
> > >>> Signed-off-by: Archie Pusaka 
> > >>> Reviewed-by: Abhishek Pandit-Subedi 
> > >>
> > >> so two things up front. I want to hide this behind a 
> > >> HCI_QUIRK_OUT_OF_ORDER_ACL that a transport driver has to set first. 
> > >> Frankly if this kind of out-of-order happens on UART or SDIO transports, 
> > >> then something is obviously going wrong. I have no plan to fix up after 
> > >> a fully serialized transport.
> > >>
> > >> Secondly, if a transport sets HCI_QUIRK_OUT_OF_ORDER_ACL, then I want 
> > >> this off by default. You can enable it via an experimental setting. The 
> > >> reason here is that we have to make it really hard and fail as often as 
> > >> possible so that hardware manufactures and spec writers realize that 
> > >> something is fundamentally broken here.
> > I don't have any objection to making this explicit enable to non serialized 
> > transports.  However, I do wonder what the intention is around making this 
> > off by default.  We already know there is a race condition between the 
> > interupt and bulk endpoints over USB, so this can and does happen.  
> > Hardware manufaturers can't relly do much about this other than trying to 
> > pull the interupt endpoint more often, but that's only a workaround, it 
> > can't avoid it all together.
> >
> > IMO, this seems like a legitimate fix at the host level and I don't see any 
> > obvious benefits to hide this fix under an experimental feature and make it 
> > more difficult for the customers and system integrators to discover.
>
> the problem is that this is not a fix. It is papering over a hole and at best 
> a workaround with both eyes closed and hoping for the best. I am not looking 
> forward for the first security researcher to figure out that they have a 
> chance to inject an unencrypted packet since we are waiting 2 seconds for the 
> USB transport to get its act together.
I don't think this is the right characterization but I agree, 2
seconds would be too long, it would ideally be no longer than the USB
polling interval diff.

>
> In addition, I think that Luiz attempt to align with the poll intervals 
> inside the USB transport directly is a cleaner and more self-contained 
> approach. It also reduces the window of opportunity for any attacker since we 
> actually align the USB transport specific intervals with each other.
I'll have to look at Luiz's patch and think through if this really
eliminates the problem.  If may indeed be a more practical approach to
this problem.

>
> Regards
>
> Marcel
>


Re: [RFC PATCH v1 1/2] Bluetooth: queue ACL packets if no handle is found

2020-07-17 Thread Marcel Holtmann
Hi Alain,

> >>> There is a possibility that an ACL packet is received before we
> >>> receive the HCI connect event for the corresponding handle. If this
> >>> happens, we discard the ACL packet.
> >>> 
> >>> Rather than just ignoring them, this patch provides a queue for
> >>> incoming ACL packet without a handle. The queue is processed when
> >>> receiving a HCI connection event. If 2 seconds elapsed without
> >>> receiving the HCI connection event, assume something bad happened
> >>> and discard the queued packet.
> >>> 
> >>> Signed-off-by: Archie Pusaka 
> >>> Reviewed-by: Abhishek Pandit-Subedi 
> >> 
> >> so two things up front. I want to hide this behind a 
> >> HCI_QUIRK_OUT_OF_ORDER_ACL that a transport driver has to set first. 
> >> Frankly if this kind of out-of-order happens on UART or SDIO transports, 
> >> then something is obviously going wrong. I have no plan to fix up after a 
> >> fully serialized transport.
> >> 
> >> Secondly, if a transport sets HCI_QUIRK_OUT_OF_ORDER_ACL, then I want this 
> >> off by default. You can enable it via an experimental setting. The reason 
> >> here is that we have to make it really hard and fail as often as possible 
> >> so that hardware manufactures and spec writers realize that something is 
> >> fundamentally broken here.
> I don't have any objection to making this explicit enable to non serialized 
> transports.  However, I do wonder what the intention is around making this 
> off by default.  We already know there is a race condition between the 
> interupt and bulk endpoints over USB, so this can and does happen.  Hardware 
> manufaturers can't relly do much about this other than trying to pull the 
> interupt endpoint more often, but that's only a workaround, it can't avoid it 
> all together.
> 
> IMO, this seems like a legitimate fix at the host level and I don't see any 
> obvious benefits to hide this fix under an experimental feature and make it 
> more difficult for the customers and system integrators to discover.

the problem is that this is not a fix. It is papering over a hole and at best a 
workaround with both eyes closed and hoping for the best. I am not looking 
forward for the first security researcher to figure out that they have a chance 
to inject an unencrypted packet since we are waiting 2 seconds for the USB 
transport to get its act together.

In addition, I think that Luiz attempt to align with the poll intervals inside 
the USB transport directly is a cleaner and more self-contained approach. It 
also reduces the window of opportunity for any attacker since we actually align 
the USB transport specific intervals with each other.

Regards

Marcel



Re: [RFC PATCH v1 1/2] Bluetooth: queue ACL packets if no handle is found

2020-07-15 Thread Luiz Augusto von Dentz
Hi Alain,

On Wed, Jul 15, 2020 at 8:10 AM Alain Michaud  wrote:
>
> Resending in plain text.

I've sent a RFC to work out the ordering, that should work out for any
race where it process ACL before Events during a polling interval
(1ms) so I hope that is enough to catch all these races, if that is
not perhaps we could make the interval configurable.

>
> On Wed, Jul 15, 2020 at 9:56 AM Alain Michaud  wrote:
> >
> > Hi Marcel,
> >
> > Sorry, just got around to this.
> >
> > On Tue, Jun 30, 2020 at 2:55 AM Marcel Holtmann  wrote:
> >>
> >> Hi Archie,
> >>
> >> >>> There is a possibility that an ACL packet is received before we
> >> >>> receive the HCI connect event for the corresponding handle. If this
> >> >>> happens, we discard the ACL packet.
> >> >>>
> >> >>> Rather than just ignoring them, this patch provides a queue for
> >> >>> incoming ACL packet without a handle. The queue is processed when
> >> >>> receiving a HCI connection event. If 2 seconds elapsed without
> >> >>> receiving the HCI connection event, assume something bad happened
> >> >>> and discard the queued packet.
> >> >>>
> >> >>> Signed-off-by: Archie Pusaka 
> >> >>> Reviewed-by: Abhishek Pandit-Subedi 
> >> >>
> >> >> so two things up front. I want to hide this behind a 
> >> >> HCI_QUIRK_OUT_OF_ORDER_ACL that a transport driver has to set first. 
> >> >> Frankly if this kind of out-of-order happens on UART or SDIO 
> >> >> transports, then something is obviously going wrong. I have no plan to 
> >> >> fix up after a fully serialized transport.
> >> >>
> >> >> Secondly, if a transport sets HCI_QUIRK_OUT_OF_ORDER_ACL, then I want 
> >> >> this off by default. You can enable it via an experimental setting. The 
> >> >> reason here is that we have to make it really hard and fail as often as 
> >> >> possible so that hardware manufactures and spec writers realize that 
> >> >> something is fundamentally broken here.
> >
> > I don't have any objection to making this explicit enable to non serialized 
> > transports.  However, I do wonder what the intention is around making this 
> > off by default.  We already know there is a race condition between the 
> > interupt and bulk endpoints over USB, so this can and does happen.  
> > Hardware manufaturers can't relly do much about this other than trying to 
> > pull the interupt endpoint more often, but that's only a workaround, it 
> > can't avoid it all together.
> >
> > IMO, this seems like a legitimate fix at the host level and I don't see any 
> > obvious benefits to hide this fix under an experimental feature and make it 
> > more difficult for the customers and system integrators to discover.
> >
> >>
> >> >>
> >> >> I have no problem in running the code and complaining loudly in case 
> >> >> the quirk has been set. Just injecting the packets can only happen if 
> >> >> bluetoothd explicitly enabled it.
> >> >
> >> > Got it.
> >> >
> >> >>
> >> >>
> >> >>>
> >> >>> ---
> >> >>>
> >> >>> include/net/bluetooth/hci_core.h |  8 +++
> >> >>> net/bluetooth/hci_core.c | 84 +---
> >> >>> net/bluetooth/hci_event.c|  2 +
> >> >>> 3 files changed, 88 insertions(+), 6 deletions(-)
> >> >>>
> >> >>> diff --git a/include/net/bluetooth/hci_core.h 
> >> >>> b/include/net/bluetooth/hci_core.h
> >> >>> index 836dc997ff94..b69ecdd0d15a 100644
> >> >>> --- a/include/net/bluetooth/hci_core.h
> >> >>> +++ b/include/net/bluetooth/hci_core.h
> >> >>> @@ -270,6 +270,9 @@ struct adv_monitor {
> >> >>> /* Default authenticated payload timeout 30s */
> >> >>> #define DEFAULT_AUTH_PAYLOAD_TIMEOUT   0x0bb8
> >> >>>
> >> >>> +/* Time to keep ACL packets without a corresponding handle queued 
> >> >>> (2s) */
> >> >>> +#define PENDING_ACL_TIMEOUT  msecs_to_jiffies(2000)
> >> >>> +
> >> >>
> >> >> Do we have some btmon traces with timestamps. Isn’t a second enough? 
> >> >> Actually 2 seconds is an awful long time.
> >> >
> >> > When this happens in the test lab, the HCI connect event is about
> >> > 0.002 second behind the first ACL packet. We can change this if
> >> > required.
> >> >
> >> >>
> >> >>> struct amp_assoc {
> >> >>>  __u16   len;
> >> >>>  __u16   offset;
> >> >>> @@ -538,6 +541,9 @@ struct hci_dev {
> >> >>>  struct delayed_work rpa_expired;
> >> >>>  bdaddr_trpa;
> >> >>>
> >> >>> + struct delayed_work remove_pending_acl;
> >> >>> + struct sk_buff_head pending_acl_q;
> >> >>> +
> >> >>
> >> >> can we name this ooo_q and move it to the other queues in this struct. 
> >> >> Unless we want to add a Kconfig option around it, we don’t need to keep 
> >> >> it here.
> >> >
> >> > Ack.
> >> >
> >> >>
> >> >>> #if IS_ENABLED(CONFIG_BT_LEDS)
> >> >>>  struct led_trigger  *power_led;
> >> >>> #endif
> >> >>> @@ -1773,6 +1779,8 @@ void hci_le_start_enc(struct hci_conn *conn, 
> >> >>> __le16 ediv, __le64 rand,
> >> >>> void hci_copy_identity_address(struct hci_dev *hdev, bdaddr_t *bdaddr,
> >> >>> 

Re: [RFC PATCH v1 1/2] Bluetooth: queue ACL packets if no handle is found

2020-07-15 Thread Alain Michaud
Resending in plain text.


On Wed, Jul 15, 2020 at 9:56 AM Alain Michaud  wrote:
>
> Hi Marcel,
>
> Sorry, just got around to this.
>
> On Tue, Jun 30, 2020 at 2:55 AM Marcel Holtmann  wrote:
>>
>> Hi Archie,
>>
>> >>> There is a possibility that an ACL packet is received before we
>> >>> receive the HCI connect event for the corresponding handle. If this
>> >>> happens, we discard the ACL packet.
>> >>>
>> >>> Rather than just ignoring them, this patch provides a queue for
>> >>> incoming ACL packet without a handle. The queue is processed when
>> >>> receiving a HCI connection event. If 2 seconds elapsed without
>> >>> receiving the HCI connection event, assume something bad happened
>> >>> and discard the queued packet.
>> >>>
>> >>> Signed-off-by: Archie Pusaka 
>> >>> Reviewed-by: Abhishek Pandit-Subedi 
>> >>
>> >> so two things up front. I want to hide this behind a 
>> >> HCI_QUIRK_OUT_OF_ORDER_ACL that a transport driver has to set first. 
>> >> Frankly if this kind of out-of-order happens on UART or SDIO transports, 
>> >> then something is obviously going wrong. I have no plan to fix up after a 
>> >> fully serialized transport.
>> >>
>> >> Secondly, if a transport sets HCI_QUIRK_OUT_OF_ORDER_ACL, then I want 
>> >> this off by default. You can enable it via an experimental setting. The 
>> >> reason here is that we have to make it really hard and fail as often as 
>> >> possible so that hardware manufactures and spec writers realize that 
>> >> something is fundamentally broken here.
>
> I don't have any objection to making this explicit enable to non serialized 
> transports.  However, I do wonder what the intention is around making this 
> off by default.  We already know there is a race condition between the 
> interupt and bulk endpoints over USB, so this can and does happen.  Hardware 
> manufaturers can't relly do much about this other than trying to pull the 
> interupt endpoint more often, but that's only a workaround, it can't avoid it 
> all together.
>
> IMO, this seems like a legitimate fix at the host level and I don't see any 
> obvious benefits to hide this fix under an experimental feature and make it 
> more difficult for the customers and system integrators to discover.
>
>>
>> >>
>> >> I have no problem in running the code and complaining loudly in case the 
>> >> quirk has been set. Just injecting the packets can only happen if 
>> >> bluetoothd explicitly enabled it.
>> >
>> > Got it.
>> >
>> >>
>> >>
>> >>>
>> >>> ---
>> >>>
>> >>> include/net/bluetooth/hci_core.h |  8 +++
>> >>> net/bluetooth/hci_core.c | 84 +---
>> >>> net/bluetooth/hci_event.c|  2 +
>> >>> 3 files changed, 88 insertions(+), 6 deletions(-)
>> >>>
>> >>> diff --git a/include/net/bluetooth/hci_core.h 
>> >>> b/include/net/bluetooth/hci_core.h
>> >>> index 836dc997ff94..b69ecdd0d15a 100644
>> >>> --- a/include/net/bluetooth/hci_core.h
>> >>> +++ b/include/net/bluetooth/hci_core.h
>> >>> @@ -270,6 +270,9 @@ struct adv_monitor {
>> >>> /* Default authenticated payload timeout 30s */
>> >>> #define DEFAULT_AUTH_PAYLOAD_TIMEOUT   0x0bb8
>> >>>
>> >>> +/* Time to keep ACL packets without a corresponding handle queued (2s) 
>> >>> */
>> >>> +#define PENDING_ACL_TIMEOUT  msecs_to_jiffies(2000)
>> >>> +
>> >>
>> >> Do we have some btmon traces with timestamps. Isn’t a second enough? 
>> >> Actually 2 seconds is an awful long time.
>> >
>> > When this happens in the test lab, the HCI connect event is about
>> > 0.002 second behind the first ACL packet. We can change this if
>> > required.
>> >
>> >>
>> >>> struct amp_assoc {
>> >>>  __u16   len;
>> >>>  __u16   offset;
>> >>> @@ -538,6 +541,9 @@ struct hci_dev {
>> >>>  struct delayed_work rpa_expired;
>> >>>  bdaddr_trpa;
>> >>>
>> >>> + struct delayed_work remove_pending_acl;
>> >>> + struct sk_buff_head pending_acl_q;
>> >>> +
>> >>
>> >> can we name this ooo_q and move it to the other queues in this struct. 
>> >> Unless we want to add a Kconfig option around it, we don’t need to keep 
>> >> it here.
>> >
>> > Ack.
>> >
>> >>
>> >>> #if IS_ENABLED(CONFIG_BT_LEDS)
>> >>>  struct led_trigger  *power_led;
>> >>> #endif
>> >>> @@ -1773,6 +1779,8 @@ void hci_le_start_enc(struct hci_conn *conn, 
>> >>> __le16 ediv, __le64 rand,
>> >>> void hci_copy_identity_address(struct hci_dev *hdev, bdaddr_t *bdaddr,
>> >>> u8 *bdaddr_type);
>> >>>
>> >>> +void hci_process_pending_acl(struct hci_dev *hdev, struct hci_conn 
>> >>> *conn);
>> >>> +
>> >>> #define SCO_AIRMODE_MASK   0x0003
>> >>> #define SCO_AIRMODE_CVSD   0x
>> >>> #define SCO_AIRMODE_TRANSP 0x0003
>> >>> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
>> >>> index 7959b851cc63..30780242c267 100644
>> >>> --- a/net/bluetooth/hci_core.c
>> >>> +++ b/net/bluetooth/hci_core.c
>> >>> @@ -1786,6 +1786,7 @@ int hci_dev_do_close(struct hci_dev *hdev)

Re: [RFC PATCH v1 1/2] Bluetooth: queue ACL packets if no handle is found

2020-06-30 Thread Marcel Holtmann
Hi Archie,

>>> There is a possibility that an ACL packet is received before we
>>> receive the HCI connect event for the corresponding handle. If this
>>> happens, we discard the ACL packet.
>>> 
>>> Rather than just ignoring them, this patch provides a queue for
>>> incoming ACL packet without a handle. The queue is processed when
>>> receiving a HCI connection event. If 2 seconds elapsed without
>>> receiving the HCI connection event, assume something bad happened
>>> and discard the queued packet.
>>> 
>>> Signed-off-by: Archie Pusaka 
>>> Reviewed-by: Abhishek Pandit-Subedi 
>> 
>> so two things up front. I want to hide this behind a 
>> HCI_QUIRK_OUT_OF_ORDER_ACL that a transport driver has to set first. Frankly 
>> if this kind of out-of-order happens on UART or SDIO transports, then 
>> something is obviously going wrong. I have no plan to fix up after a fully 
>> serialized transport.
>> 
>> Secondly, if a transport sets HCI_QUIRK_OUT_OF_ORDER_ACL, then I want this 
>> off by default. You can enable it via an experimental setting. The reason 
>> here is that we have to make it really hard and fail as often as possible so 
>> that hardware manufactures and spec writers realize that something is 
>> fundamentally broken here.
>> 
>> I have no problem in running the code and complaining loudly in case the 
>> quirk has been set. Just injecting the packets can only happen if bluetoothd 
>> explicitly enabled it.
> 
> Got it.
> 
>> 
>> 
>>> 
>>> ---
>>> 
>>> include/net/bluetooth/hci_core.h |  8 +++
>>> net/bluetooth/hci_core.c | 84 +---
>>> net/bluetooth/hci_event.c|  2 +
>>> 3 files changed, 88 insertions(+), 6 deletions(-)
>>> 
>>> diff --git a/include/net/bluetooth/hci_core.h 
>>> b/include/net/bluetooth/hci_core.h
>>> index 836dc997ff94..b69ecdd0d15a 100644
>>> --- a/include/net/bluetooth/hci_core.h
>>> +++ b/include/net/bluetooth/hci_core.h
>>> @@ -270,6 +270,9 @@ struct adv_monitor {
>>> /* Default authenticated payload timeout 30s */
>>> #define DEFAULT_AUTH_PAYLOAD_TIMEOUT   0x0bb8
>>> 
>>> +/* Time to keep ACL packets without a corresponding handle queued (2s) */
>>> +#define PENDING_ACL_TIMEOUT  msecs_to_jiffies(2000)
>>> +
>> 
>> Do we have some btmon traces with timestamps. Isn’t a second enough? 
>> Actually 2 seconds is an awful long time.
> 
> When this happens in the test lab, the HCI connect event is about
> 0.002 second behind the first ACL packet. We can change this if
> required.
> 
>> 
>>> struct amp_assoc {
>>>  __u16   len;
>>>  __u16   offset;
>>> @@ -538,6 +541,9 @@ struct hci_dev {
>>>  struct delayed_work rpa_expired;
>>>  bdaddr_trpa;
>>> 
>>> + struct delayed_work remove_pending_acl;
>>> + struct sk_buff_head pending_acl_q;
>>> +
>> 
>> can we name this ooo_q and move it to the other queues in this struct. 
>> Unless we want to add a Kconfig option around it, we don’t need to keep it 
>> here.
> 
> Ack.
> 
>> 
>>> #if IS_ENABLED(CONFIG_BT_LEDS)
>>>  struct led_trigger  *power_led;
>>> #endif
>>> @@ -1773,6 +1779,8 @@ void hci_le_start_enc(struct hci_conn *conn, __le16 
>>> ediv, __le64 rand,
>>> void hci_copy_identity_address(struct hci_dev *hdev, bdaddr_t *bdaddr,
>>> u8 *bdaddr_type);
>>> 
>>> +void hci_process_pending_acl(struct hci_dev *hdev, struct hci_conn *conn);
>>> +
>>> #define SCO_AIRMODE_MASK   0x0003
>>> #define SCO_AIRMODE_CVSD   0x
>>> #define SCO_AIRMODE_TRANSP 0x0003
>>> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
>>> index 7959b851cc63..30780242c267 100644
>>> --- a/net/bluetooth/hci_core.c
>>> +++ b/net/bluetooth/hci_core.c
>>> @@ -1786,6 +1786,7 @@ int hci_dev_do_close(struct hci_dev *hdev)
>>>  skb_queue_purge(>rx_q);
>>>  skb_queue_purge(>cmd_q);
>>>  skb_queue_purge(>raw_q);
>>> + skb_queue_purge(>pending_acl_q);
>>> 
>>>  /* Drop last sent command */
>>>  if (hdev->sent_cmd) {
>>> @@ -3518,6 +3519,78 @@ static int hci_suspend_notifier(struct 
>>> notifier_block *nb, unsigned long action,
>>>  return NOTIFY_STOP;
>>> }
>>> 
>>> +static void hci_add_pending_acl(struct hci_dev *hdev, struct sk_buff *skb)
>>> +{
>>> + skb_queue_tail(>pending_acl_q, skb);
>>> +
>>> + queue_delayed_work(hdev->workqueue, >remove_pending_acl,
>>> +PENDING_ACL_TIMEOUT);
>>> +}
>>> +
>>> +void hci_process_pending_acl(struct hci_dev *hdev, struct hci_conn *conn)
>>> +{
>>> + struct sk_buff *skb, *tmp;
>>> + struct hci_acl_hdr *hdr;
>>> + u16 handle, flags;
>>> + bool reset_timer = false;
>>> +
>>> + skb_queue_walk_safe(>pending_acl_q, skb, tmp) {
>>> + hdr = (struct hci_acl_hdr *)skb->data;
>>> + handle = __le16_to_cpu(hdr->handle);
>>> + flags  = hci_flags(handle);
>>> + handle = hci_handle(handle);
>>> +
>>> + if (handle != conn->handle)
>>> + 

Re: [RFC PATCH v1 1/2] Bluetooth: queue ACL packets if no handle is found

2020-06-30 Thread Archie Pusaka
Hi Marcel,

On Mon, 29 Jun 2020 at 14:40, Marcel Holtmann  wrote:
>
> Hi Archie,
>
> > There is a possibility that an ACL packet is received before we
> > receive the HCI connect event for the corresponding handle. If this
> > happens, we discard the ACL packet.
> >
> > Rather than just ignoring them, this patch provides a queue for
> > incoming ACL packet without a handle. The queue is processed when
> > receiving a HCI connection event. If 2 seconds elapsed without
> > receiving the HCI connection event, assume something bad happened
> > and discard the queued packet.
> >
> > Signed-off-by: Archie Pusaka 
> > Reviewed-by: Abhishek Pandit-Subedi 
>
> so two things up front. I want to hide this behind a 
> HCI_QUIRK_OUT_OF_ORDER_ACL that a transport driver has to set first. Frankly 
> if this kind of out-of-order happens on UART or SDIO transports, then 
> something is obviously going wrong. I have no plan to fix up after a fully 
> serialized transport.
>
> Secondly, if a transport sets HCI_QUIRK_OUT_OF_ORDER_ACL, then I want this 
> off by default. You can enable it via an experimental setting. The reason 
> here is that we have to make it really hard and fail as often as possible so 
> that hardware manufactures and spec writers realize that something is 
> fundamentally broken here.
>
> I have no problem in running the code and complaining loudly in case the 
> quirk has been set. Just injecting the packets can only happen if bluetoothd 
> explicitly enabled it.

Got it.

>
>
> >
> > ---
> >
> > include/net/bluetooth/hci_core.h |  8 +++
> > net/bluetooth/hci_core.c | 84 +---
> > net/bluetooth/hci_event.c|  2 +
> > 3 files changed, 88 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/net/bluetooth/hci_core.h 
> > b/include/net/bluetooth/hci_core.h
> > index 836dc997ff94..b69ecdd0d15a 100644
> > --- a/include/net/bluetooth/hci_core.h
> > +++ b/include/net/bluetooth/hci_core.h
> > @@ -270,6 +270,9 @@ struct adv_monitor {
> > /* Default authenticated payload timeout 30s */
> > #define DEFAULT_AUTH_PAYLOAD_TIMEOUT   0x0bb8
> >
> > +/* Time to keep ACL packets without a corresponding handle queued (2s) */
> > +#define PENDING_ACL_TIMEOUT  msecs_to_jiffies(2000)
> > +
>
> Do we have some btmon traces with timestamps. Isn’t a second enough? Actually 
> 2 seconds is an awful long time.

When this happens in the test lab, the HCI connect event is about
0.002 second behind the first ACL packet. We can change this if
required.

>
> > struct amp_assoc {
> >   __u16   len;
> >   __u16   offset;
> > @@ -538,6 +541,9 @@ struct hci_dev {
> >   struct delayed_work rpa_expired;
> >   bdaddr_trpa;
> >
> > + struct delayed_work remove_pending_acl;
> > + struct sk_buff_head pending_acl_q;
> > +
>
> can we name this ooo_q and move it to the other queues in this struct. Unless 
> we want to add a Kconfig option around it, we don’t need to keep it here.

Ack.

>
> > #if IS_ENABLED(CONFIG_BT_LEDS)
> >   struct led_trigger  *power_led;
> > #endif
> > @@ -1773,6 +1779,8 @@ void hci_le_start_enc(struct hci_conn *conn, __le16 
> > ediv, __le64 rand,
> > void hci_copy_identity_address(struct hci_dev *hdev, bdaddr_t *bdaddr,
> >  u8 *bdaddr_type);
> >
> > +void hci_process_pending_acl(struct hci_dev *hdev, struct hci_conn *conn);
> > +
> > #define SCO_AIRMODE_MASK   0x0003
> > #define SCO_AIRMODE_CVSD   0x
> > #define SCO_AIRMODE_TRANSP 0x0003
> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> > index 7959b851cc63..30780242c267 100644
> > --- a/net/bluetooth/hci_core.c
> > +++ b/net/bluetooth/hci_core.c
> > @@ -1786,6 +1786,7 @@ int hci_dev_do_close(struct hci_dev *hdev)
> >   skb_queue_purge(>rx_q);
> >   skb_queue_purge(>cmd_q);
> >   skb_queue_purge(>raw_q);
> > + skb_queue_purge(>pending_acl_q);
> >
> >   /* Drop last sent command */
> >   if (hdev->sent_cmd) {
> > @@ -3518,6 +3519,78 @@ static int hci_suspend_notifier(struct 
> > notifier_block *nb, unsigned long action,
> >   return NOTIFY_STOP;
> > }
> >
> > +static void hci_add_pending_acl(struct hci_dev *hdev, struct sk_buff *skb)
> > +{
> > + skb_queue_tail(>pending_acl_q, skb);
> > +
> > + queue_delayed_work(hdev->workqueue, >remove_pending_acl,
> > +PENDING_ACL_TIMEOUT);
> > +}
> > +
> > +void hci_process_pending_acl(struct hci_dev *hdev, struct hci_conn *conn)
> > +{
> > + struct sk_buff *skb, *tmp;
> > + struct hci_acl_hdr *hdr;
> > + u16 handle, flags;
> > + bool reset_timer = false;
> > +
> > + skb_queue_walk_safe(>pending_acl_q, skb, tmp) {
> > + hdr = (struct hci_acl_hdr *)skb->data;
> > + handle = __le16_to_cpu(hdr->handle);
> > + flags  = hci_flags(handle);
> > + handle = hci_handle(handle);
> > +
> > + if (handle != conn->handle)
> > + 

Re: [RFC PATCH v1 1/2] Bluetooth: queue ACL packets if no handle is found

2020-06-29 Thread Marcel Holtmann
Hi Archie,

> There is a possibility that an ACL packet is received before we
> receive the HCI connect event for the corresponding handle. If this
> happens, we discard the ACL packet.
> 
> Rather than just ignoring them, this patch provides a queue for
> incoming ACL packet without a handle. The queue is processed when
> receiving a HCI connection event. If 2 seconds elapsed without
> receiving the HCI connection event, assume something bad happened
> and discard the queued packet.
> 
> Signed-off-by: Archie Pusaka 
> Reviewed-by: Abhishek Pandit-Subedi 

so two things up front. I want to hide this behind a HCI_QUIRK_OUT_OF_ORDER_ACL 
that a transport driver has to set first. Frankly if this kind of out-of-order 
happens on UART or SDIO transports, then something is obviously going wrong. I 
have no plan to fix up after a fully serialized transport.

Secondly, if a transport sets HCI_QUIRK_OUT_OF_ORDER_ACL, then I want this off 
by default. You can enable it via an experimental setting. The reason here is 
that we have to make it really hard and fail as often as possible so that 
hardware manufactures and spec writers realize that something is fundamentally 
broken here.

I have no problem in running the code and complaining loudly in case the quirk 
has been set. Just injecting the packets can only happen if bluetoothd 
explicitly enabled it.

> 
> ---
> 
> include/net/bluetooth/hci_core.h |  8 +++
> net/bluetooth/hci_core.c | 84 +---
> net/bluetooth/hci_event.c|  2 +
> 3 files changed, 88 insertions(+), 6 deletions(-)
> 
> diff --git a/include/net/bluetooth/hci_core.h 
> b/include/net/bluetooth/hci_core.h
> index 836dc997ff94..b69ecdd0d15a 100644
> --- a/include/net/bluetooth/hci_core.h
> +++ b/include/net/bluetooth/hci_core.h
> @@ -270,6 +270,9 @@ struct adv_monitor {
> /* Default authenticated payload timeout 30s */
> #define DEFAULT_AUTH_PAYLOAD_TIMEOUT   0x0bb8
> 
> +/* Time to keep ACL packets without a corresponding handle queued (2s) */
> +#define PENDING_ACL_TIMEOUT  msecs_to_jiffies(2000)
> +

Do we have some btmon traces with timestamps. Isn’t a second enough? Actually 2 
seconds is an awful long time.

> struct amp_assoc {
>   __u16   len;
>   __u16   offset;
> @@ -538,6 +541,9 @@ struct hci_dev {
>   struct delayed_work rpa_expired;
>   bdaddr_trpa;
> 
> + struct delayed_work remove_pending_acl;
> + struct sk_buff_head pending_acl_q;
> +

can we name this ooo_q and move it to the other queues in this struct. Unless 
we want to add a Kconfig option around it, we don’t need to keep it here.

> #if IS_ENABLED(CONFIG_BT_LEDS)
>   struct led_trigger  *power_led;
> #endif
> @@ -1773,6 +1779,8 @@ void hci_le_start_enc(struct hci_conn *conn, __le16 
> ediv, __le64 rand,
> void hci_copy_identity_address(struct hci_dev *hdev, bdaddr_t *bdaddr,
>  u8 *bdaddr_type);
> 
> +void hci_process_pending_acl(struct hci_dev *hdev, struct hci_conn *conn);
> +
> #define SCO_AIRMODE_MASK   0x0003
> #define SCO_AIRMODE_CVSD   0x
> #define SCO_AIRMODE_TRANSP 0x0003
> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> index 7959b851cc63..30780242c267 100644
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -1786,6 +1786,7 @@ int hci_dev_do_close(struct hci_dev *hdev)
>   skb_queue_purge(>rx_q);
>   skb_queue_purge(>cmd_q);
>   skb_queue_purge(>raw_q);
> + skb_queue_purge(>pending_acl_q);
> 
>   /* Drop last sent command */
>   if (hdev->sent_cmd) {
> @@ -3518,6 +3519,78 @@ static int hci_suspend_notifier(struct notifier_block 
> *nb, unsigned long action,
>   return NOTIFY_STOP;
> }
> 
> +static void hci_add_pending_acl(struct hci_dev *hdev, struct sk_buff *skb)
> +{
> + skb_queue_tail(>pending_acl_q, skb);
> +
> + queue_delayed_work(hdev->workqueue, >remove_pending_acl,
> +PENDING_ACL_TIMEOUT);
> +}
> +
> +void hci_process_pending_acl(struct hci_dev *hdev, struct hci_conn *conn)
> +{
> + struct sk_buff *skb, *tmp;
> + struct hci_acl_hdr *hdr;
> + u16 handle, flags;
> + bool reset_timer = false;
> +
> + skb_queue_walk_safe(>pending_acl_q, skb, tmp) {
> + hdr = (struct hci_acl_hdr *)skb->data;
> + handle = __le16_to_cpu(hdr->handle);
> + flags  = hci_flags(handle);
> + handle = hci_handle(handle);
> +
> + if (handle != conn->handle)
> + continue;
> +
> + __skb_unlink(skb, >pending_acl_q);
> + skb_pull(skb, HCI_ACL_HDR_SIZE);
> +
> + l2cap_recv_acldata(conn, skb, flags);
> + reset_timer = true;
> + }
> +
> + if (reset_timer)
> + mod_delayed_work(hdev->workqueue, >remove_pending_acl,
> +  PENDING_ACL_TIMEOUT);
> +}
> +
> +/* Remove the oldest pending ACL, and all pending ACLs 

[RFC PATCH v1 1/2] Bluetooth: queue ACL packets if no handle is found

2020-06-27 Thread Archie Pusaka
From: Archie Pusaka 

There is a possibility that an ACL packet is received before we
receive the HCI connect event for the corresponding handle. If this
happens, we discard the ACL packet.

Rather than just ignoring them, this patch provides a queue for
incoming ACL packet without a handle. The queue is processed when
receiving a HCI connection event. If 2 seconds elapsed without
receiving the HCI connection event, assume something bad happened
and discard the queued packet.

Signed-off-by: Archie Pusaka 
Reviewed-by: Abhishek Pandit-Subedi 

---

 include/net/bluetooth/hci_core.h |  8 +++
 net/bluetooth/hci_core.c | 84 +---
 net/bluetooth/hci_event.c|  2 +
 3 files changed, 88 insertions(+), 6 deletions(-)

diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
index 836dc997ff94..b69ecdd0d15a 100644
--- a/include/net/bluetooth/hci_core.h
+++ b/include/net/bluetooth/hci_core.h
@@ -270,6 +270,9 @@ struct adv_monitor {
 /* Default authenticated payload timeout 30s */
 #define DEFAULT_AUTH_PAYLOAD_TIMEOUT   0x0bb8
 
+/* Time to keep ACL packets without a corresponding handle queued (2s) */
+#define PENDING_ACL_TIMEOUTmsecs_to_jiffies(2000)
+
 struct amp_assoc {
__u16   len;
__u16   offset;
@@ -538,6 +541,9 @@ struct hci_dev {
struct delayed_work rpa_expired;
bdaddr_trpa;
 
+   struct delayed_work remove_pending_acl;
+   struct sk_buff_head pending_acl_q;
+
 #if IS_ENABLED(CONFIG_BT_LEDS)
struct led_trigger  *power_led;
 #endif
@@ -1773,6 +1779,8 @@ void hci_le_start_enc(struct hci_conn *conn, __le16 ediv, 
__le64 rand,
 void hci_copy_identity_address(struct hci_dev *hdev, bdaddr_t *bdaddr,
   u8 *bdaddr_type);
 
+void hci_process_pending_acl(struct hci_dev *hdev, struct hci_conn *conn);
+
 #define SCO_AIRMODE_MASK   0x0003
 #define SCO_AIRMODE_CVSD   0x
 #define SCO_AIRMODE_TRANSP 0x0003
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 7959b851cc63..30780242c267 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -1786,6 +1786,7 @@ int hci_dev_do_close(struct hci_dev *hdev)
skb_queue_purge(>rx_q);
skb_queue_purge(>cmd_q);
skb_queue_purge(>raw_q);
+   skb_queue_purge(>pending_acl_q);
 
/* Drop last sent command */
if (hdev->sent_cmd) {
@@ -3518,6 +3519,78 @@ static int hci_suspend_notifier(struct notifier_block 
*nb, unsigned long action,
return NOTIFY_STOP;
 }
 
+static void hci_add_pending_acl(struct hci_dev *hdev, struct sk_buff *skb)
+{
+   skb_queue_tail(>pending_acl_q, skb);
+
+   queue_delayed_work(hdev->workqueue, >remove_pending_acl,
+  PENDING_ACL_TIMEOUT);
+}
+
+void hci_process_pending_acl(struct hci_dev *hdev, struct hci_conn *conn)
+{
+   struct sk_buff *skb, *tmp;
+   struct hci_acl_hdr *hdr;
+   u16 handle, flags;
+   bool reset_timer = false;
+
+   skb_queue_walk_safe(>pending_acl_q, skb, tmp) {
+   hdr = (struct hci_acl_hdr *)skb->data;
+   handle = __le16_to_cpu(hdr->handle);
+   flags  = hci_flags(handle);
+   handle = hci_handle(handle);
+
+   if (handle != conn->handle)
+   continue;
+
+   __skb_unlink(skb, >pending_acl_q);
+   skb_pull(skb, HCI_ACL_HDR_SIZE);
+
+   l2cap_recv_acldata(conn, skb, flags);
+   reset_timer = true;
+   }
+
+   if (reset_timer)
+   mod_delayed_work(hdev->workqueue, >remove_pending_acl,
+PENDING_ACL_TIMEOUT);
+}
+
+/* Remove the oldest pending ACL, and all pending ACLs with the same handle */
+static void hci_remove_pending_acl(struct work_struct *work)
+{
+   struct hci_dev *hdev;
+   struct sk_buff *skb, *tmp;
+   struct hci_acl_hdr *hdr;
+   u16 handle, oldest_handle;
+
+   hdev = container_of(work, struct hci_dev, remove_pending_acl.work);
+   skb = skb_dequeue(>pending_acl_q);
+
+   if (!skb)
+   return;
+
+   hdr = (struct hci_acl_hdr *)skb->data;
+   oldest_handle = hci_handle(__le16_to_cpu(hdr->handle));
+   kfree_skb(skb);
+
+   bt_dev_err(hdev, "ACL packet for unknown connection handle %d",
+  oldest_handle);
+
+   skb_queue_walk_safe(>pending_acl_q, skb, tmp) {
+   hdr = (struct hci_acl_hdr *)skb->data;
+   handle = hci_handle(__le16_to_cpu(hdr->handle));
+
+   if (handle == oldest_handle) {
+   __skb_unlink(skb, >pending_acl_q);
+   kfree_skb(skb);
+   }
+   }
+
+   if (!skb_queue_empty(>pending_acl_q))
+   queue_delayed_work(hdev->workqueue, >remove_pending_acl,
+  PENDING_ACL_TIMEOUT);
+}
+
 /* Alloc HCI device