Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-04-07 Thread Dongli Zhang



On 4/6/21 7:20 PM, Jason Wang wrote:
> 
> 在 2021/4/7 上午7:27, Dongli Zhang 写道:
>>> This will answer your question that "Can it bypass the masking?".
>>>
>>> For vhost-scsi, virtio-blk, virtio-scsi and virtio-net, to write to eventfd 
>>> is
>>> not able to bypass masking because masking is to unregister the eventfd. To
>>> write to eventfd does not take effect.
>>>
>>> However, it is possible to bypass masking for vhost-net because vhost-net
>>> registered a specific masked_notifier eventfd in order to mask irq. To 
>>> write to
>>> original eventfd still takes effect.
>>>
>>> We may leave the user to decide whether to write to 'masked_notifier' or
>>> original 'guest_notifier' for vhost-net.
>> My fault here. To write to masked_notifier will always be masked:(
> 
> 
> Only when there's no bug in the qemu.
> 
> 
>>
>> If it is EventNotifier level, we will not care whether the EventNotifier is
>> masked or not. It just provides an interface to write to EventNotifier.
> 
> 
> Yes.
> 
> 
>>
>> To dump the MSI-x table for both virtio and vfio will help confirm if the 
>> vector
>> is masked.
> 
> 
> That would be helpful as well. It's probably better to extend "info pci" 
> command.
> 
> Thanks

I will try if to add to "info pci" (introduce new arg option to "info pci"), or
to introduce new command.

About the EventNotifier, I will classify them as guest notifier or host notifier
so that it will be much more easier for user to tell if the eventfd is for
injecting IRQ or kicking the doorbell.

Thank you very much for all suggestions!

Dongli Zhang

> 
> 
>>
>> Thank you very much!
>>
>> Dongli Zhang
>>
> 



Re: [PATCH 1/6] qdev: introduce qapi/hmp command for kick/call event

2021-04-07 Thread Dongli Zhang



On 4/7/21 6:40 AM, Eduardo Habkost wrote:
> On Thu, Mar 25, 2021 at 10:44:28PM -0700, Dongli Zhang wrote:
>> The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to
>> the loss of doorbell kick, e.g.,
>>
>> https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!NaqdV_o0gMJkUtVWaHyLRwKDa_8MsiuANAqEcM-Ooy4pYE3R1bwPmLdCTkE0gq6gywY$
>>  
>>
>> ... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
>> fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").
>>
>> This patch introduces a new debug interface 'DeviceEvent' to DeviceClass
>> to help narrow down if the issue is due to loss of irq/kick. So far the new
>> interface handles only two events: 'call' and 'kick'. Any device (e.g.,
>> virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X
>> or legacy IRQ).
>>
>> The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
>> vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
>> on purpose by admin at QEMU/host side for a specific device.
>>
>> Signed-off-by: Dongli Zhang 
> [...]
>> diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
>> index 605d57287a..c7795d4ba5 100644
>> --- a/include/monitor/hmp.h
>> +++ b/include/monitor/hmp.h
>> @@ -129,5 +129,6 @@ void hmp_info_replay(Monitor *mon, const QDict *qdict);
>>  void hmp_replay_break(Monitor *mon, const QDict *qdict);
>>  void hmp_replay_delete_break(Monitor *mon, const QDict *qdict);
>>  void hmp_replay_seek(Monitor *mon, const QDict *qdict);
>> +void hmp_x_debug_device_event(Monitor *mon, const QDict *qdict);
>>  
>>  #endif
>> diff --git a/qapi/qdev.json b/qapi/qdev.json
>> index b83178220b..711c4a297a 100644
>> --- a/qapi/qdev.json
>> +++ b/qapi/qdev.json
>> @@ -124,3 +124,33 @@
>>  ##
>>  { 'event': 'DEVICE_DELETED',
>>'data': { '*device': 'str', 'path': 'str' } }
>> +
>> +##
>> +# @x-debug-device-event:
>> +#
>> +# Generate device event for a specific device queue
>> +#
>> +# @dev: device path
>> +#
>> +# @event: event (e.g., kick or call) to trigger
> 
> Any specific reason to not use an enum here?
> 
> In addition to making the QAPI schema and documentation more
> descriptive, it would save you the work of manually defining the
> DEVICE_EVENT_* constants and implementing get_device_event().

Thank you very much for the suggestion!

I will use enum in json file.

Dongli Zhang

> 
> 
>> +#
>> +# @queue: queue id
>> +#
>> +# Returns: Nothing on success
>> +#
>> +# Since: 6.1
>> +#
>> +# Notes: This is used to debug VM driver hang issue. The 'kick' event is to
>> +#send notification to QEMU/vhost while the 'call' event is to
>> +#interrupt VM on purpose.
>> +#
>> +# Example:
>> +#
>> +# -> { "execute": "x-debug-device_event",
>> +#  "arguments": { "dev": "/machine/peripheral/vscsi0", "event": "kick",
>> +# "queue": 1 } }
>> +# <- { "return": {} }
>> +#
>> +##
>> +{ 'command': 'x-debug-device-event',
>> +  'data': {'dev': 'str', 'event': 'str', 'queue': 'int'} }
> [...]
> 



RE: [PATCH v5 05/10] Add a function named packet_new_nocopy for COLO.

2021-04-07 Thread Zhang, Chen



> -Original Message-
> From: Rao, Lei 
> Sent: Thursday, April 1, 2021 3:47 PM
> To: Zhang, Chen ; lizhij...@cn.fujitsu.com;
> jasow...@redhat.com; quint...@redhat.com; dgilb...@redhat.com;
> pbonz...@redhat.com; lukasstra...@web.de
> Cc: qemu-devel@nongnu.org; Rao, Lei 
> Subject: [PATCH v5 05/10] Add a function named packet_new_nocopy for
> COLO.
> 
> From: "Rao, Lei" 
> 
> Use the packet_new_nocopy instead of packet_new in the filter-rewriter
> module. There will be one less memory copy in the processing of each
> network packet.
> 
> Signed-off-by: Lei Rao 
> ---
>  net/colo.c| 23 +++
>  net/colo.h|  1 +
>  net/filter-rewriter.c |  3 +--
>  3 files changed, 25 insertions(+), 2 deletions(-)
> 
> diff --git a/net/colo.c b/net/colo.c
> index ef00609..58106a8 100644
> --- a/net/colo.c
> +++ b/net/colo.c
> @@ -174,6 +174,29 @@ Packet *packet_new(const void *data, int size, int
> vnet_hdr_len)
>  return pkt;
>  }
> 
> +/*
> + * packet_new_nocopy will not copy data, so the caller can't release
> + * the data. And it will be released in packet_destroy.
> + */
> +Packet *packet_new_nocopy(void *data, int size, int vnet_hdr_len) {
> +Packet *pkt = g_slice_new(Packet);

We can use g_slice_new0() to avoid "pkt->xxx = 0" here.
For the original code also need do this work to optimize code.

Thanks
Chen

> +
> +pkt->data = data;
> +pkt->size = size;
> +pkt->creation_ms = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> +pkt->vnet_hdr_len = vnet_hdr_len;
> +pkt->tcp_seq = 0;
> +pkt->tcp_ack = 0;
> +pkt->seq_end = 0;
> +pkt->header_size = 0;
> +pkt->payload_size = 0;
> +pkt->offset = 0;
> +pkt->flags = 0;
> +
> +return pkt;
> +}
> +
>  void packet_destroy(void *opaque, void *user_data)  {
>  Packet *pkt = opaque;
> diff --git a/net/colo.h b/net/colo.h
> index 573ab91..d91cd24 100644
> --- a/net/colo.h
> +++ b/net/colo.h
> @@ -101,6 +101,7 @@ bool connection_has_tracked(GHashTable
> *connection_track_table,
>  ConnectionKey *key);  void
> connection_hashtable_reset(GHashTable *connection_track_table);  Packet
> *packet_new(const void *data, int size, int vnet_hdr_len);
> +Packet *packet_new_nocopy(void *data, int size, int vnet_hdr_len);
>  void packet_destroy(void *opaque, void *user_data);  void
> packet_destroy_partial(void *opaque, void *user_data);
> 
> diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index 
> 10fe393..cb3a96c
> 100644
> --- a/net/filter-rewriter.c
> +++ b/net/filter-rewriter.c
> @@ -270,8 +270,7 @@ static ssize_t
> colo_rewriter_receive_iov(NetFilterState *nf,
>  vnet_hdr_len = nf->netdev->vnet_hdr_len;
>  }
> 
> -pkt = packet_new(buf, size, vnet_hdr_len);
> -g_free(buf);
> +pkt = packet_new_nocopy(buf, size, vnet_hdr_len);
> 
>  /*
>   * if we get tcp packet
> --
> 1.8.3.1




RE: [PATCH v5 04/10] Remove migrate_set_block_enabled in checkpoint

2021-04-07 Thread Zhang, Chen



> -Original Message-
> From: Rao, Lei 
> Sent: Thursday, April 1, 2021 3:47 PM
> To: Zhang, Chen ; lizhij...@cn.fujitsu.com;
> jasow...@redhat.com; quint...@redhat.com; dgilb...@redhat.com;
> pbonz...@redhat.com; lukasstra...@web.de
> Cc: qemu-devel@nongnu.org; Rao, Lei 
> Subject: [PATCH v5 04/10] Remove migrate_set_block_enabled in
> checkpoint
> 
> From: "Rao, Lei" 
> 
> We can detect disk migration in migrate_prepare, if disk migration is enabled
> in COLO mode, we can directly report an error.and there is no need to
> disable block migration at every checkpoint.
> 
> Signed-off-by: Lei Rao 
> Signed-off-by: Zhang Chen 
> Reviewed-by: Li Zhijian 

Reviewed-by: Zhang Chen 

> ---
>  migration/colo.c  | 6 --
>  migration/migration.c | 4 
>  2 files changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c index de27662..1aaf316
> 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -435,12 +435,6 @@ static int
> colo_do_checkpoint_transaction(MigrationState *s,
>  if (failover_get_state() != FAILOVER_STATUS_NONE) {
>  goto out;
>  }
> -
> -/* Disable block migration */
> -migrate_set_block_enabled(false, _err);
> -if (local_err) {
> -goto out;
> -}
>  qemu_mutex_lock_iothread();
> 
>  #ifdef CONFIG_REPLICATION
> diff --git a/migration/migration.c b/migration/migration.c index
> ca8b97b..4578f22 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2219,6 +2219,10 @@ static bool migrate_prepare(MigrationState *s,
> bool blk, bool blk_inc,
>  }
> 
>  if (blk || blk_inc) {
> +if (migrate_colo_enabled()) {
> +error_setg(errp, "No disk migration is required in COLO mode");
> +return false;
> +}
>  if (migrate_use_block() || migrate_use_block_incremental()) {
>  error_setg(errp, "Command options are incompatible with "
> "current migration capabilities");
> --
> 1.8.3.1




RE: [PATCH v5 03/10] Optimize the function of filter_send

2021-04-07 Thread Zhang, Chen



> -Original Message-
> From: Rao, Lei 
> Sent: Thursday, April 1, 2021 3:47 PM
> To: Zhang, Chen ; lizhij...@cn.fujitsu.com;
> jasow...@redhat.com; quint...@redhat.com; dgilb...@redhat.com;
> pbonz...@redhat.com; lukasstra...@web.de
> Cc: qemu-devel@nongnu.org; Rao, Lei 
> Subject: [PATCH v5 03/10] Optimize the function of filter_send
> 
> From: "Rao, Lei" 
> 
> The iov_size has been calculated in filter_send(). we can directly return the
> size.In this way, this is no need to repeat calculations in
> filter_redirector_receive_iov();
> 
> Signed-off-by: Lei Rao 
> Reviewed-by: Li Zhijian 

Reviewed-by: Zhang Chen 

> ---
>  net/filter-mirror.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/net/filter-mirror.c b/net/filter-mirror.c index f8e6500..f20240c
> 100644
> --- a/net/filter-mirror.c
> +++ b/net/filter-mirror.c
> @@ -88,7 +88,7 @@ static int filter_send(MirrorState *s,
>  goto err;
>  }
> 
> -return 0;
> +return size;
> 
>  err:
>  return ret < 0 ? ret : -EIO;
> @@ -159,7 +159,7 @@ static ssize_t filter_mirror_receive_iov(NetFilterState
> *nf,
>  int ret;
> 
>  ret = filter_send(s, iov, iovcnt);
> -if (ret) {
> +if (ret < 0) {
>  error_report("filter mirror send failed(%s)", strerror(-ret));
>  }
> 
> @@ -182,10 +182,10 @@ static ssize_t
> filter_redirector_receive_iov(NetFilterState *nf,
> 
>  if (qemu_chr_fe_backend_connected(>chr_out)) {
>  ret = filter_send(s, iov, iovcnt);
> -if (ret) {
> +if (ret < 0) {
>  error_report("filter redirector send failed(%s)", 
> strerror(-ret));
>  }
> -return iov_size(iov, iovcnt);
> +return ret;
>  } else {
>  return 0;
>  }
> --
> 1.8.3.1




RE: [PATCH v5 02/10] Fix the qemu crash when guest shutdown during checkpoint

2021-04-07 Thread Zhang, Chen



> -Original Message-
> From: Rao, Lei 
> Sent: Thursday, April 1, 2021 3:47 PM
> To: Zhang, Chen ; lizhij...@cn.fujitsu.com;
> jasow...@redhat.com; quint...@redhat.com; dgilb...@redhat.com;
> pbonz...@redhat.com; lukasstra...@web.de
> Cc: qemu-devel@nongnu.org; Rao, Lei 
> Subject: [PATCH v5 02/10] Fix the qemu crash when guest shutdown during
> checkpoint
> 
> From: "Rao, Lei" 
> 
> This patch fixes the following:
> qemu-system-x86_64: invalid runstate transition: 'colo' ->'shutdown'
> Aborted (core dumped)
> 
> Signed-off-by: Lei Rao 
> Reviewed-by: Li Zhijian 

Reviewed-by: Zhang Chen 

> ---
>  softmmu/runstate.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/softmmu/runstate.c b/softmmu/runstate.c index
> ce8977c..1564057 100644
> --- a/softmmu/runstate.c
> +++ b/softmmu/runstate.c
> @@ -126,6 +126,7 @@ static const RunStateTransition
> runstate_transitions_def[] = {
>  { RUN_STATE_RESTORE_VM, RUN_STATE_PRELAUNCH },
> 
>  { RUN_STATE_COLO, RUN_STATE_RUNNING },
> +{ RUN_STATE_COLO, RUN_STATE_SHUTDOWN},
> 
>  { RUN_STATE_RUNNING, RUN_STATE_DEBUG },
>  { RUN_STATE_RUNNING, RUN_STATE_INTERNAL_ERROR },
> --
> 1.8.3.1




RE: [PATCH v5 01/10] Remove some duplicate trace code.

2021-04-07 Thread Zhang, Chen



> -Original Message-
> From: Rao, Lei 
> Sent: Thursday, April 1, 2021 3:47 PM
> To: Zhang, Chen ; lizhij...@cn.fujitsu.com;
> jasow...@redhat.com; quint...@redhat.com; dgilb...@redhat.com;
> pbonz...@redhat.com; lukasstra...@web.de
> Cc: qemu-devel@nongnu.org; Rao, Lei 
> Subject: [PATCH v5 01/10] Remove some duplicate trace code.
> 
> From: "Rao, Lei" 
> 
> There is the same trace code in the colo_compare_packet_payload.
> 
> Signed-off-by: Lei Rao 
> Reviewed-by: Li Zhijian 

Reviewed-by: Zhang Chen 

Thanks
Chen

> ---
>  net/colo-compare.c | 13 -
>  1 file changed, 13 deletions(-)
> 
> diff --git a/net/colo-compare.c b/net/colo-compare.c index 9d1ad99..c142c08
> 100644
> --- a/net/colo-compare.c
> +++ b/net/colo-compare.c
> @@ -590,19 +590,6 @@ static int colo_packet_compare_other(Packet *spkt,
> Packet *ppkt)
>  uint16_t offset = ppkt->vnet_hdr_len;
> 
>  trace_colo_compare_main("compare other");
> -if (trace_event_get_state_backends(TRACE_COLO_COMPARE_IP_INFO))
> {
> -char pri_ip_src[20], pri_ip_dst[20], sec_ip_src[20], sec_ip_dst[20];
> -
> -strcpy(pri_ip_src, inet_ntoa(ppkt->ip->ip_src));
> -strcpy(pri_ip_dst, inet_ntoa(ppkt->ip->ip_dst));
> -strcpy(sec_ip_src, inet_ntoa(spkt->ip->ip_src));
> -strcpy(sec_ip_dst, inet_ntoa(spkt->ip->ip_dst));
> -
> -trace_colo_compare_ip_info(ppkt->size, pri_ip_src,
> -   pri_ip_dst, spkt->size,
> -   sec_ip_src, sec_ip_dst);
> -}
> -
>  if (ppkt->size != spkt->size) {
>  trace_colo_compare_main("Other: payload size of packets are
> different");
>  return -1;
> --
> 1.8.3.1




Re: [PATCH v4 3/3] ppc: Enable 2nd DAWR support on p10

2021-04-07 Thread Ravi Bangoria




+static void cap_dawr1_apply(SpaprMachineState *spapr, uint8_t val,
+   Error **errp)
+{
+ERRP_GUARD();
+if (!val) {
+return; /* Disable by default */
+}
+
+if (tcg_enabled()) {
+error_setg(errp, "DAWR1 not supported in TCG.");
+error_append_hint(errp, "Try appending -machine cap-dawr1=off\n");
+} else if (kvm_enabled()) {
+if (!kvmppc_has_cap_dawr1()) {
+error_setg(errp, "DAWR1 not supported by KVM.");
+error_append_hint(errp, "Try appending -machine cap-dawr1=off\n");
+} else if (kvmppc_set_cap_dawr1(val) < 0) {
+error_setg(errp, "DAWR1 not supported by KVM.");


Well... technically KVM does support DAWR1 but something went wrong when
trying to enable it. In case you need to repost, maybe change the error
message in this path, e.g. like in cap_nested_kvm_hv_apply().


This won't be going in until 6.1 anyway, so please to update the
message.


Sure. Will post v5 with updated message.



I'd probably prefer to actually wait until the 6.1 tree opens to apply
this, rather than pre-queueing it in ppc-for-6.1, because there's a
fairly good chance the header update patch will conflict with someone
else's during the 6.1 merge flurry.


No worries.

Thanks Greg, David for the review.
Ravi



RE: [PATCH V4 3/7] qapi/net: Add new QMP command for COLO passthrough

2021-04-07 Thread Zhang, Chen



> -Original Message-
> From: Markus Armbruster 
> Sent: Tuesday, April 6, 2021 4:01 PM
> To: Zhang, Chen 
> Cc: Lukas Straub ; Li Zhijian
> ; Jason Wang ; qemu-
> dev ; Dr. David Alan Gilbert
> ; Zhang Chen 
> Subject: Re: [PATCH V4 3/7] qapi/net: Add new QMP command for COLO
> passthrough
> 
> "Zhang, Chen"  writes:
> 
> >> -Original Message-
> >> From: Qemu-devel  >> bounces+chen.zhang=intel@nongnu.org> On Behalf Of Markus
> >> Armbruster
> >> Sent: Tuesday, March 23, 2021 5:58 PM
> >> To: Zhang, Chen 
> >> Cc: Lukas Straub ; Li Zhijian
> >> ; Jason Wang ; qemu-
> >> dev ; Dr. David Alan Gilbert
> >> ; Zhang Chen 
> >> Subject: Re: [PATCH V4 3/7] qapi/net: Add new QMP command for COLO
> >> passthrough
> >>
> >> "Zhang, Chen"  writes:
> >>
> >> >> -Original Message-
> >> >> From: Markus Armbruster 
> >> [...]
> >> >> Now let's look at colo-passthrough-del.  I figure it is for
> >> >> deleting the kind of things colo-passthrough-add adds.
> >> >>
> >> >
> >> > Yes.
> >> >
> >> >> What exactly is deleted?  The thing created with the exact same
> >> arguments?
> >> >>
> >> >
> >> > Delete the rule from the module's private bypass list.
> >> > When user input a rule, the colo-passthrough-del will find the
> >> > specific module by the object ID, Then delete the rule.
> >> >
> >> >> This would be unusual.  Commonly, FOO-add and FOO-del both take a
> >> >> string ID argument.  The FOO created by FOO-add remembers its ID,
> >> >> and FOO-del deletes by ID.
> >> >
> >> > The ID not for rules itself, it just logged the modules(ID tagged)
> >> > affected by
> >> the rule.
> >>
> >> I'm not sure I understand.
> >>
> >> If you're pointing out that existing colo-passthrough-del parameter
> >> @id is not suitable for use as unique rule ID: you can always add
> >> another parameter that is suitable.
> >
> > Sorry to missed this mail.
> >
> > For example:
> > The VM running with filter-mirror(object id==0),
> > filter-redirector(object id==1) and colo-compare(object id==2), We use
> colo-passthrough-add/del to add/del a rule with a ID, if the ID==2, the rule
> just affect to colo-compare.
> > The filter-mirror and filter-redirector feel nothing after the add/del.
> 
> I think you're trying to explain existing parameter @id.  The point I was 
> trying
> to make is unrelated to this parameter, except by name collision.
> 
> My point is: our existing "delete" operations select the object to be deleted
> by some unique name that is assigned by the "add" operation.
> The unique name is a property of the object.  The property name is often,
> but not always "id".
> 
> Examples:
> 
> device_add argument "id" sets the device's unique name.
> device_del argument "id" selects the device to delete by its name.
> 
> blockdev-add argument "node-name" sets the block backend device's
> unique name.
> blockdev-del argument "node-name" selects the block backend device
> to delete by its name.
> 
> Is there any particular reason why deletion of your kind of object can't work
> the same way?

Current command can work in this way, It seems that name "ID" can be 
misunderstood.
The id=object0 is OK here. I will change the "id" to "object-name".
Thank you for clear the comments.

Thanks   
Chen




Re: [for-6.0 PATCH 0/3] ppc: e500: Bump ppce500 u-boot to v2021.04

2021-04-07 Thread David Gibson
On Thu, Apr 08, 2021 at 11:07:22AM +0800, Bin Meng wrote:
> Hi David,
> 
> On Thu, Apr 8, 2021 at 10:39 AM David Gibson
>  wrote:
> >
> > On Tue, Apr 06, 2021 at 04:15:10PM +0800, Bin Meng wrote:
> > > This series bumps the u-boot.e500 to v2021.04, which fixed a long
> > > overdue broken pci issue caused by QEMU changes since Nov 2014.
> > >
> > > While we are here, add a reST documentation for the ppce500 machine.
> > >
> > > Please pull the full contents (binary) from 
> > > https://github.com/lbmeng/qemu/
> > > ppc branch.
> >
> > This is much to late to go into ppc-for-6.0, but I'm happy to queue it
> > for 6.1.
> 
> I think this should go 6.0 because it is a bug fix for the long
> overdue broken pci support in the U-Boot binary that QEMU ships.

No.  If we were early in the hard freeze, then maybe.  But this
certainly isn't a regression - it's been broken for 6+ years, which
means we don't have a case to put it in rc3.

> 
> > However, I'm not sure which branch from your site I need to pull in.
> >
> 
> It's the ppc branch, as I mentioned in this cover letter.

Sorry, I missed that.  I've now merged these patches into ppc-for-6.1.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [for-6.0 PATCH 0/3] ppc: e500: Bump ppce500 u-boot to v2021.04

2021-04-07 Thread Bin Meng
Hi David,

On Thu, Apr 8, 2021 at 10:39 AM David Gibson
 wrote:
>
> On Tue, Apr 06, 2021 at 04:15:10PM +0800, Bin Meng wrote:
> > This series bumps the u-boot.e500 to v2021.04, which fixed a long
> > overdue broken pci issue caused by QEMU changes since Nov 2014.
> >
> > While we are here, add a reST documentation for the ppce500 machine.
> >
> > Please pull the full contents (binary) from https://github.com/lbmeng/qemu/
> > ppc branch.
>
> This is much to late to go into ppc-for-6.0, but I'm happy to queue it
> for 6.1.

I think this should go 6.0 because it is a bug fix for the long
overdue broken pci support in the U-Boot binary that QEMU ships.

> However, I'm not sure which branch from your site I need to pull in.
>

It's the ppc branch, as I mentioned in this cover letter.

> >
> >
> > Bin Meng (3):
> >   roms/Makefile: Update ppce500 u-boot build directory name
> >   roms/u-boot: Bump ppce500 u-boot to v2021.04 to fix broken pci support
> >   docs/system: ppc: Add documentation for ppce500 machine
> >

Regards,
Bin



Re: [PATCH v3] ppc/spapr: Add support for implement support for H_SCM_HEALTH

2021-04-07 Thread David Gibson
On Fri, Apr 02, 2021 at 03:51:28PM +0530, Vaibhav Jain wrote:
> Add support for H_SCM_HEALTH hcall described at [1] for spapr
> nvdimms. This enables guest to detect the 'unarmed' status of a
> specific spapr nvdimm identified by its DRC and if its unarmed, mark
> the region backed by the nvdimm as read-only.
> 
> The patch adds h_scm_health() to handle the H_SCM_HEALTH hcall which
> returns two 64-bit bitmaps (health bitmap, health bitmap mask) derived
> from 'struct nvdimm->unarmed' member.
> 
> Linux kernel side changes to enable handling of 'unarmed' nvdimms for
> ppc64 are proposed at [2].
> 
> References:
> [1] "Hypercall Op-codes (hcalls)"
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/powerpc/papr_hcalls.rst#n220
> [2] "powerpc/papr_scm: Mark nvdimm as unarmed if needed during probe"
> 
> https://lore.kernel.org/linux-nvdimm/20210329113103.476760-1-vaib...@linux.ibm.com/
> 
> Signed-off-by: Vaibhav Jain 

Applied to ppc-for-6.1, thanks.

> ---
> Changelog
> 
> v3:
> * Switched to PPC_BIT macro for definitions of the health bits. [ Greg, David 
> ]
> * Updated h_scm_health() to use a const uint64_t to denote supported
>   bits in 'hbitmap_mask'.
> * Fixed an error check for drc->dev to return H_PARAMETER in case nvdimm
>   is not yet plugged in [ Greg ]
> * Fixed an wrong error check for ensuring drc and drc-type are correct
>   [ Greg ]
> 
> v2:
> * Added a check for drc->dev to ensure that the dimm is plugged in
>   when servicing H_SCM_HEALTH. [ Shiva ]
> * Instead of accessing the 'nvdimm->unarmed' member directly use the
>   object_property_get_bool accessor to fetch it. [ Shiva ]
> * Update the usage of PAPR_PMEM_UNARMED* macros [ Greg ]
> * Updated patch description reference#1 to point appropriate section
>   in the documentation. [ Greg ]
> ---
>  hw/ppc/spapr_nvdimm.c  | 36 
>  include/hw/ppc/spapr.h |  3 ++-
>  2 files changed, 38 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr_nvdimm.c b/hw/ppc/spapr_nvdimm.c
> index b46c36917c..252204e25f 100644
> --- a/hw/ppc/spapr_nvdimm.c
> +++ b/hw/ppc/spapr_nvdimm.c
> @@ -31,6 +31,10 @@
>  #include "qemu/range.h"
>  #include "hw/ppc/spapr_numa.h"
>  
> +/* DIMM health bitmap bitmap indicators. Taken from kernel's papr_scm.c */
> +/* SCM device is unable to persist memory contents */
> +#define PAPR_PMEM_UNARMED PPC_BIT(0)
> +
>  bool spapr_nvdimm_validate(HotplugHandler *hotplug_dev, NVDIMMDevice *nvdimm,
> uint64_t size, Error **errp)
>  {
> @@ -467,6 +471,37 @@ static target_ulong h_scm_unbind_all(PowerPCCPU *cpu, 
> SpaprMachineState *spapr,
>  return H_SUCCESS;
>  }
>  
> +static target_ulong h_scm_health(PowerPCCPU *cpu, SpaprMachineState *spapr,
> + target_ulong opcode, target_ulong *args)
> +{
> +
> +NVDIMMDevice *nvdimm;
> +uint64_t hbitmap = 0;
> +uint32_t drc_index = args[0];
> +SpaprDrc *drc = spapr_drc_by_index(drc_index);
> +const uint64_t hbitmap_mask = PAPR_PMEM_UNARMED;
> +
> +
> +/* Ensure that the drc is valid & is valid PMEM dimm and is plugged in */
> +if (!drc || !drc->dev ||
> +spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
> +return H_PARAMETER;
> +}
> +
> +nvdimm = NVDIMM(drc->dev);
> +
> +/* Update if the nvdimm is unarmed and send its status via health 
> bitmaps */
> +if (object_property_get_bool(OBJECT(nvdimm), NVDIMM_UNARMED_PROP, NULL)) 
> {
> +hbitmap |= PAPR_PMEM_UNARMED;
> +}
> +
> +/* Update the out args with health bitmap/mask */
> +args[0] = hbitmap;
> +args[1] = hbitmap_mask;
> +
> +return H_SUCCESS;
> +}
> +
>  static void spapr_scm_register_types(void)
>  {
>  /* qemu/scm specific hcalls */
> @@ -475,6 +510,7 @@ static void spapr_scm_register_types(void)
>  spapr_register_hypercall(H_SCM_BIND_MEM, h_scm_bind_mem);
>  spapr_register_hypercall(H_SCM_UNBIND_MEM, h_scm_unbind_mem);
>  spapr_register_hypercall(H_SCM_UNBIND_ALL, h_scm_unbind_all);
> +spapr_register_hypercall(H_SCM_HEALTH, h_scm_health);
>  }
>  
>  type_init(spapr_scm_register_types)
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 47cebaf3ac..6e1eafb05d 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -538,8 +538,9 @@ struct SpaprMachineState {
>  #define H_SCM_BIND_MEM  0x3EC
>  #define H_SCM_UNBIND_MEM0x3F0
>  #define H_SCM_UNBIND_ALL0x3FC
> +#define H_SCM_HEALTH0x400
>  
> -#define MAX_HCALL_OPCODEH_SCM_UNBIND_ALL
> +#define MAX_HCALL_OPCODEH_SCM_HEALTH
>  
>  /* The hcalls above are standardized in PAPR and implemented by pHyp
>   * as well.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!

Re: [PATCH v4 3/3] ppc: Enable 2nd DAWR support on p10

2021-04-07 Thread David Gibson
On Wed, Apr 07, 2021 at 10:10:41AM +0200, Greg Kurz wrote:
> On Tue,  6 Apr 2021 11:08:33 +0530
> Ravi Bangoria  wrote:
> 
> > As per the PAPR, bit 0 of byte 64 in pa-features property indicates
> > availability of 2nd DAWR registers. i.e. If this bit is set, 2nd
> > DAWR is present, otherwise not. Use KVM_CAP_PPC_DAWR1 capability to
> > find whether kvm supports 2nd DAWR or not. If it's supported, allow
> > user to set the pa-feature bit in guest DT using cap-dawr1 machine
> > capability. Though, watchpoint on powerpc TCG guest is not supported
> > and thus 2nd DAWR is not enabled for TCG mode.
> > 
> > Signed-off-by: Ravi Bangoria 
> > ---
> >  hw/ppc/spapr.c  |  7 ++-
> >  hw/ppc/spapr_caps.c | 32 
> >  include/hw/ppc/spapr.h  |  6 +-
> >  target/ppc/cpu.h|  2 ++
> >  target/ppc/kvm.c| 12 
> >  target/ppc/kvm_ppc.h| 12 
> >  target/ppc/translate_init.c.inc | 15 +++
> >  7 files changed, 84 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 73a06df3b1..6317fad973 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -238,7 +238,7 @@ static void spapr_dt_pa_features(SpaprMachineState 
> > *spapr,
> >  0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 48 - 53 */
> >  /* 54: DecFP, 56: DecI, 58: SHA */
> >  0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 54 - 59 */
> > -/* 60: NM atomic, 62: RNG */
> > +/* 60: NM atomic, 62: RNG, 64: DAWR1 (ISA 3.1) */
> >  0x80, 0x00, 0x80, 0x00, 0x00, 0x00, /* 60 - 65 */
> >  };
> >  uint8_t *pa_features = NULL;
> > @@ -279,6 +279,9 @@ static void spapr_dt_pa_features(SpaprMachineState 
> > *spapr,
> >   * in pa-features. So hide it from them. */
> >  pa_features[40 + 2] &= ~0x80; /* Radix MMU */
> >  }
> > +if (spapr_get_cap(spapr, SPAPR_CAP_DAWR1)) {
> > +pa_features[66] |= 0x80;
> > +}
> >  
> >  _FDT((fdt_setprop(fdt, offset, "ibm,pa-features", pa_features, 
> > pa_size)));
> >  }
> > @@ -2003,6 +2006,7 @@ static const VMStateDescription vmstate_spapr = {
> >  _spapr_cap_ccf_assist,
> >  _spapr_cap_fwnmi,
> >  _spapr_fwnmi,
> > +_spapr_cap_dawr1,
> >  NULL
> >  }
> >  };
> > @@ -4542,6 +4546,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
> > void *data)
> >  smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
> >  smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_ON;
> >  smc->default_caps.caps[SPAPR_CAP_FWNMI] = SPAPR_CAP_ON;
> > +smc->default_caps.caps[SPAPR_CAP_DAWR1] = SPAPR_CAP_OFF;
> >  spapr_caps_add_properties(smc);
> >  smc->irq = _irq_dual;
> >  smc->dr_phb_enabled = true;
> > diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> > index 9ea7ddd1e9..b2770f73c5 100644
> > --- a/hw/ppc/spapr_caps.c
> > +++ b/hw/ppc/spapr_caps.c
> > @@ -523,6 +523,28 @@ static void cap_fwnmi_apply(SpaprMachineState *spapr, 
> > uint8_t val,
> >  }
> >  }
> >  
> > +static void cap_dawr1_apply(SpaprMachineState *spapr, uint8_t val,
> > +   Error **errp)
> > +{
> > +ERRP_GUARD();
> > +if (!val) {
> > +return; /* Disable by default */
> > +}
> > +
> > +if (tcg_enabled()) {
> > +error_setg(errp, "DAWR1 not supported in TCG.");
> > +error_append_hint(errp, "Try appending -machine cap-dawr1=off\n");
> > +} else if (kvm_enabled()) {
> > +if (!kvmppc_has_cap_dawr1()) {
> > +error_setg(errp, "DAWR1 not supported by KVM.");
> > +error_append_hint(errp, "Try appending -machine 
> > cap-dawr1=off\n");
> > +} else if (kvmppc_set_cap_dawr1(val) < 0) {
> > +error_setg(errp, "DAWR1 not supported by KVM.");
> 
> Well... technically KVM does support DAWR1 but something went wrong when
> trying to enable it. In case you need to repost, maybe change the error
> message in this path, e.g. like in cap_nested_kvm_hv_apply().

This won't be going in until 6.1 anyway, so please to update the
message.

I'd probably prefer to actually wait until the 6.1 tree opens to apply
this, rather than pre-queueing it in ppc-for-6.1, because there's a
fairly good chance the header update patch will conflict with someone
else's during the 6.1 merge flurry.

> 
> Apart from that, LGTM.
> 
> Reviewed-by: Greg Kurz 
> 
> > +error_append_hint(errp, "Try appending -machine 
> > cap-dawr1=off\n");
> > +}
> > +}
> > +}
> > +
> >  SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
> >  [SPAPR_CAP_HTM] = {
> >  .name = "htm",
> > @@ -631,6 +653,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
> >  .type = "bool",
> >  .apply = cap_fwnmi_apply,
> >  },
> > +[SPAPR_CAP_DAWR1] = {
> > +.name = "dawr1",
> > +.description = 

Re: [PATCH-for-6.0] hw/ppc/mac_newworld: Restrict RAM to 2 GiB

2021-04-07 Thread David Gibson
On Wed, Apr 07, 2021 at 03:44:35PM +0200, Philippe Mathieu-Daudé wrote:
> On 4/7/21 3:11 PM, Mark Cave-Ayland wrote:
> > On 06/04/2021 09:48, Philippe Mathieu-Daudé wrote:
> > 
> >> On Mac99 and newer machines, the Uninorth PCI host bridge maps
> >> the PCI hole region at 2GiB, so the RAM area beside 2GiB is not
> >> accessible by the CPU. Restrict the memory to 2GiB to avoid
> >> problems such the one reported in the buglink.
> >>
> >> Buglink: https://bugs.launchpad.net/qemu/+bug/1922391
> >> Reported-by: Håvard Eidnes 
> >> Signed-off-by: Philippe Mathieu-Daudé 
> >> ---
> >>   hw/ppc/mac_newworld.c | 4 
> >>   1 file changed, 4 insertions(+)
> >>
> >> diff --git a/hw/ppc/mac_newworld.c b/hw/ppc/mac_newworld.c
> >> index 21759628466..d88b38e9258 100644
> >> --- a/hw/ppc/mac_newworld.c
> >> +++ b/hw/ppc/mac_newworld.c
> >> @@ -157,6 +157,10 @@ static void ppc_core99_init(MachineState *machine)
> >>   }
> >>     /* allocate RAM */
> >> +    if (machine->ram_size > 2 * GiB) {
> >> +    error_report("RAM size more than 2 GiB is not supported");
> >> +    exit(1);
> >> +    }
> >>   memory_region_add_subregion(get_system_memory(), 0, machine->ram);
> >>     /* allocate and load firmware ROM */
> > 
> > I think the patch is correct, however I'm fairly sure that the default
> > g3beige machine also has the PCI hole located at 0x8000 so the same
> > problem exists there too.
> > 
> > Also are you keen to get this merged for 6.0? It doesn't seem to solve a
> > security issue/release blocker and I'm sure the current behaviour has
> > been like this for a long time...
> 
> No problem. I wanted to revisit this bug anyway, I realized during the
> night, while this patch makes QEMU exit cleanly, it hides the bug which
> is likely in TYPE_MACIO_IDE (I haven't tried Håvard's full
> reproducer).

Ah, given the comments above, I've pulled this out of ppc-for-6.0 and
moved it to ppc-for-6.1.

> 
> Regards,
> 
> Phil.
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [for-6.0 PATCH 0/3] ppc: e500: Bump ppce500 u-boot to v2021.04

2021-04-07 Thread David Gibson
On Tue, Apr 06, 2021 at 04:15:10PM +0800, Bin Meng wrote:
> This series bumps the u-boot.e500 to v2021.04, which fixed a long
> overdue broken pci issue caused by QEMU changes since Nov 2014.
> 
> While we are here, add a reST documentation for the ppce500 machine.
> 
> Please pull the full contents (binary) from https://github.com/lbmeng/qemu/
> ppc branch.

This is much to late to go into ppc-for-6.0, but I'm happy to queue it
for 6.1.  However, I'm not sure which branch from your site I need to
pull in.

> 
> 
> Bin Meng (3):
>   roms/Makefile: Update ppce500 u-boot build directory name
>   roms/u-boot: Bump ppce500 u-boot to v2021.04 to fix broken pci support
>   docs/system: ppc: Add documentation for ppce500 machine
> 
>  docs/system/ppc/ppce500.rst | 156 
>  docs/system/target-ppc.rst  |   1 +
>  pc-bios/u-boot.e500 | Bin 349148 -> 406920 bytes
>  roms/Makefile   |   8 +-
>  roms/u-boot |   2 +-
>  5 files changed, 162 insertions(+), 5 deletions(-)
>  create mode 100644 docs/system/ppc/ppce500.rst
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH-for-6.0] hw/ppc/mac_newworld: Restrict RAM to 2 GiB

2021-04-07 Thread David Gibson
On Tue, Apr 06, 2021 at 10:48:42AM +0200, Philippe Mathieu-Daudé wrote:
> On Mac99 and newer machines, the Uninorth PCI host bridge maps
> the PCI hole region at 2GiB, so the RAM area beside 2GiB is not
> accessible by the CPU. Restrict the memory to 2GiB to avoid
> problems such the one reported in the buglink.
> 
> Buglink: https://bugs.launchpad.net/qemu/+bug/1922391
> Reported-by: Håvard Eidnes 
> Signed-off-by: Philippe Mathieu-Daudé 

Simple, and a bugfix.  Applied to ppc-for-6.0.

> ---
>  hw/ppc/mac_newworld.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/hw/ppc/mac_newworld.c b/hw/ppc/mac_newworld.c
> index 21759628466..d88b38e9258 100644
> --- a/hw/ppc/mac_newworld.c
> +++ b/hw/ppc/mac_newworld.c
> @@ -157,6 +157,10 @@ static void ppc_core99_init(MachineState *machine)
>  }
>  
>  /* allocate RAM */
> +if (machine->ram_size > 2 * GiB) {
> +error_report("RAM size more than 2 GiB is not supported");
> +exit(1);
> +}
>  memory_region_add_subregion(get_system_memory(), 0, machine->ram);
>  
>  /* allocate and load firmware ROM */

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


[PATCH v3 22/26] Hexagon (target/hexagon) circular addressing

2021-04-07 Thread Taylor Simpson
The following instructions are added
L2_loadrub_pci  Rd32 = memub(Rx32++#s4:0:circ(Mu2))
L2_loadrb_pci   Rd32 = memb(Rx32++#s4:0:circ(Mu2))
L2_loadruh_pci  Rd32 = memuh(Rx32++#s4:1:circ(Mu2))
L2_loadrh_pci   Rd32 = memh(Rx32++#s4:1:circ(Mu2))
L2_loadri_pci   Rd32 = memw(Rx32++#s4:2:circ(Mu2))
L2_loadrd_pci   Rdd32 = memd(Rx32++#s4:3:circ(Mu2))
S2_storerb_pci  memb(Rx32++#s4:0:circ(Mu2)) = Rt32
S2_storerh_pci  memh(Rx32++#s4:1:circ(Mu2)) = Rt32
S2_storerf_pci  memh(Rx32++#s4:1:circ(Mu2)) = Rt.H32
S2_storeri_pci  memw(Rx32++#s4:2:circ(Mu2)) = Rt32
S2_storerd_pci  memd(Rx32++#s4:3:circ(Mu2)) = Rtt32
S2_storerbnew_pci   memb(Rx32++#s4:0:circ(Mu2)) = Nt8.new
S2_storerhnew_pci   memw(Rx32++#s4:1:circ(Mu2)) = Nt8.new
S2_storerinew_pci   memw(Rx32++#s4:2:circ(Mu2)) = Nt8.new
L2_loadrub_pcr  Rd32 = memub(Rx32++I:circ(Mu2))
L2_loadrb_pcr   Rd32 = memb(Rx32++I:circ(Mu2))
L2_loadruh_pcr  Rd32 = memuh(Rx32++I:circ(Mu2))
L2_loadrh_pcr   Rd32 = memh(Rx32++I:circ(Mu2))
L2_loadri_pcr   Rd32 = memw(Rx32++I:circ(Mu2))
L2_loadrd_pcr   Rdd32 = memd(Rx32++I:circ(Mu2))
S2_storerb_pcr  memb(Rx32++I:circ(Mu2)) = Rt32
S2_storerh_pcr  memh(Rx32++I:circ(Mu2)) = Rt32
S2_storerf_pcr  memh(Rx32++I:circ(Mu2)) = Rt32.H32
S2_storeri_pcr  memw(Rx32++I:circ(Mu2)) = Rt32
S2_storerd_pcr  memd(Rx32++I:circ(Mu2)) = Rtt32
S2_storerbnew_pcr   memb(Rx32++I:circ(Mu2)) = Nt8.new
S2_storerhnew_pcr   memh(Rx32++I:circ(Mu2)) = Nt8.new
S2_storerinew_pcr   memw(Rx32++I:circ(Mu2)) = Nt8.new

Test cases in tests/tcg/hexagon/circ.c

Signed-off-by: Taylor Simpson 
---
 target/hexagon/gen_tcg.h  | 112 +++-
 target/hexagon/genptr.c   | 100 +++
 target/hexagon/imported/encode_pp.def |  10 +
 target/hexagon/imported/ldst.idef |   4 +
 target/hexagon/imported/macros.def|  26 ++
 target/hexagon/macros.h   |  96 +++
 target/hexagon/op_helper.c|  36 +--
 tests/tcg/hexagon/Makefile.target |   2 +
 tests/tcg/hexagon/circ.c  | 486 ++
 9 files changed, 849 insertions(+), 23 deletions(-)
 create mode 100644 tests/tcg/hexagon/circ.c

diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index 6bc578d..25c228c 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -38,6 +38,8 @@
  * _ap   absolute set  r0 = memw(r1=##variable)
  * _pr   post increment register   r0 = memw(r1++m1)
  * _pi   post increment immediate  r0 = memb(r1++#1)
+ * _pci  post increment circular immediate r0 = memw(r1++#4:circ(m0))
+ * _pcr  post increment circular register  r0 = memw(r1++I:circ(m0))
  */
 
 /* Macros for complex addressing modes */
@@ -56,7 +58,22 @@
 fEA_REG(RxV); \
 fPM_I(RxV, siV); \
 } while (0)
-
+#define GET_EA_pci \
+do { \
+TCGv tcgv_siV = tcg_const_tl(siV); \
+tcg_gen_mov_tl(EA, RxV); \
+gen_helper_fcircadd(RxV, RxV, tcgv_siV, MuV, \
+hex_gpr[HEX_REG_CS0 + MuN]); \
+tcg_temp_free(tcgv_siV); \
+} while (0)
+#define GET_EA_pcr(SHIFT) \
+do { \
+TCGv ireg = tcg_temp_new(); \
+tcg_gen_mov_tl(EA, RxV); \
+gen_read_ireg(ireg, MuV, (SHIFT)); \
+gen_helper_fcircadd(RxV, RxV, ireg, MuV, hex_gpr[HEX_REG_CS0 + MuN]); \
+tcg_temp_free(ireg); \
+} while (0)
 
 /* Instructions with multiple definitions */
 #define fGEN_TCG_LOAD_AP(RES, SIZE, SIGN) \
@@ -80,6 +97,36 @@
 #define fGEN_TCG_L4_loadrd_ap(SHORTCODE) \
 fGEN_TCG_LOAD_AP(RddV, 8, u)
 
+#define fGEN_TCG_L2_loadrub_pci(SHORTCODE)SHORTCODE
+#define fGEN_TCG_L2_loadrb_pci(SHORTCODE) SHORTCODE
+#define fGEN_TCG_L2_loadruh_pci(SHORTCODE)SHORTCODE
+#define fGEN_TCG_L2_loadrh_pci(SHORTCODE) SHORTCODE
+#define fGEN_TCG_L2_loadri_pci(SHORTCODE) SHORTCODE
+#define fGEN_TCG_L2_loadrd_pci(SHORTCODE) SHORTCODE
+
+#define fGEN_TCG_LOAD_pcr(SHIFT, LOAD) \
+do { \
+TCGv ireg = tcg_temp_new(); \
+tcg_gen_mov_tl(EA, RxV); \
+gen_read_ireg(ireg, MuV, SHIFT); \
+gen_helper_fcircadd(RxV, RxV, ireg, MuV, hex_gpr[HEX_REG_CS0 + MuN]); \
+LOAD; \
+tcg_temp_free(ireg); \
+} while (0)
+
+#define fGEN_TCG_L2_loadrub_pcr(SHORTCODE) \
+  fGEN_TCG_LOAD_pcr(0, fLOAD(1, 1, u, EA, RdV))
+#define fGEN_TCG_L2_loadrb_pcr(SHORTCODE) \
+  fGEN_TCG_LOAD_pcr(0, fLOAD(1, 1, s, EA, RdV))
+#define fGEN_TCG_L2_loadruh_pcr(SHORTCODE) \
+  fGEN_TCG_LOAD_pcr(1, fLOAD(1, 2, u, EA, RdV))
+#define fGEN_TCG_L2_loadrh_pcr(SHORTCODE) \
+  fGEN_TCG_LOAD_pcr(1, fLOAD(1, 2, s, EA, RdV))
+#define fGEN_TCG_L2_loadri_pcr(SHORTCODE) \
+

[PATCH v3 19/26] Hexagon (target/hexagon) add A5_ACS (vacsh)

2021-04-07 Thread Taylor Simpson
Rxx32,Pe4 = vacsh(Rss32, Rtt32)
Add compare and select elements of two vectors

Test cases in tests/tcg/hexagon/multi_result.c

Signed-off-by: Taylor Simpson 
---
 target/hexagon/gen_tcg.h  |  5 +++
 target/hexagon/helper.h   |  2 +
 target/hexagon/imported/alu.idef  | 19 ++
 target/hexagon/imported/encode_pp.def |  1 +
 target/hexagon/op_helper.c| 33 +
 tests/tcg/hexagon/multi_result.c  | 69 +++
 6 files changed, 129 insertions(+)

diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index d78e7b8..93310c5 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -199,6 +199,11 @@
  * Mathematical operations with more than one definition require
  * special handling
  */
+#define fGEN_TCG_A5_ACS(SHORTCODE) \
+do { \
+gen_helper_vacsh_pred(PeV, cpu_env, RxxV, RssV, RttV); \
+gen_helper_vacsh_val(RxxV, cpu_env, RxxV, RssV, RttV); \
+} while (0)
 
 /*
  * Approximate reciprocal
diff --git a/target/hexagon/helper.h b/target/hexagon/helper.h
index cb7508f..3824ae0 100644
--- a/target/hexagon/helper.h
+++ b/target/hexagon/helper.h
@@ -26,6 +26,8 @@ DEF_HELPER_2(commit_store, void, env, int)
 DEF_HELPER_FLAGS_4(fcircadd, TCG_CALL_NO_RWG_SE, s32, s32, s32, s32, s32)
 DEF_HELPER_3(sfrecipa, i64, env, f32, f32)
 DEF_HELPER_2(sfinvsqrta, i64, env, f32)
+DEF_HELPER_4(vacsh_val, s64, env, s64, s64, s64)
+DEF_HELPER_FLAGS_4(vacsh_pred, TCG_CALL_NO_RWG_SE, s32, env, s64, s64, s64)
 
 /* Floating point */
 DEF_HELPER_2(conv_sf2df, f64, env, f32)
diff --git a/target/hexagon/imported/alu.idef b/target/hexagon/imported/alu.idef
index 45cc529..e8cc52c 100644
--- a/target/hexagon/imported/alu.idef
+++ b/target/hexagon/imported/alu.idef
@@ -1240,6 +1240,25 @@ MINMAX(uw,WORD,UWORD,2)
 #undef VMINORMAX3
 
 
+Q6INSN(A5_ACS,"Rxx32,Pe4=vacsh(Rss32,Rtt32)",ATTRIBS(),
+"Add Compare and Select elements of two vectors, record the maximums and the 
decisions ",
+{
+fHIDE(int i;)
+fHIDE(int xv;)
+fHIDE(int sv;)
+fHIDE(int tv;)
+for (i = 0; i < 4; i++) {
+xv = (int) fGETHALF(i,RxxV);
+sv = (int) fGETHALF(i,RssV);
+tv = (int) fGETHALF(i,RttV);
+xv = xv + tv;   //assumes 17bit datapath
+sv = sv - tv;   //assumes 17bit datapath
+fSETBIT(i*2,  PeV,  (xv > sv));
+fSETBIT(i*2+1,PeV,  (xv > sv));
+fSETHALF(i,   RxxV, fSATH(fMAX(xv,sv)));
+}
+})
+
 /**/
 /* Vector Min/Max */
 /**/
diff --git a/target/hexagon/imported/encode_pp.def 
b/target/hexagon/imported/encode_pp.def
index 18fe45d..87e0426 100644
--- a/target/hexagon/imported/encode_pp.def
+++ b/target/hexagon/imported/encode_pp.def
@@ -1017,6 +1017,7 @@ MPY_ENC(M7_dcmpyiwc_acc, 
"1010","x","1","0","1","0","10")
 
 
 
+MPY_ENC(A5_ACS,  "1010","x","0","1","0","1","ee")
 /*
 */
 
diff --git a/target/hexagon/op_helper.c b/target/hexagon/op_helper.c
index a25fb98..f9fb655 100644
--- a/target/hexagon/op_helper.c
+++ b/target/hexagon/op_helper.c
@@ -347,6 +347,39 @@ uint64_t HELPER(sfinvsqrta)(CPUHexagonState *env, float32 
RsV)
 return ((uint64_t)RdV << 32) | PeV;
 }
 
+int64_t HELPER(vacsh_val)(CPUHexagonState *env,
+   int64_t RxxV, int64_t RssV, int64_t RttV)
+{
+for (int i = 0; i < 4; i++) {
+int xv = sextract64(RxxV, i * 16, 16);
+int sv = sextract64(RssV, i * 16, 16);
+int tv = sextract64(RttV, i * 16, 16);
+int max;
+xv = xv + tv;
+sv = sv - tv;
+max = xv > sv ? xv : sv;
+/* Note that fSATH can set the OVF bit in usr */
+RxxV = deposit64(RxxV, i * 16, 16, fSATH(max));
+}
+return RxxV;
+}
+
+int32_t HELPER(vacsh_pred)(CPUHexagonState *env,
+   int64_t RxxV, int64_t RssV, int64_t RttV)
+{
+int32_t PeV = 0;
+for (int i = 0; i < 4; i++) {
+int xv = sextract64(RxxV, i * 16, 16);
+int sv = sextract64(RssV, i * 16, 16);
+int tv = sextract64(RttV, i * 16, 16);
+xv = xv + tv;
+sv = sv - tv;
+PeV = deposit32(PeV, i * 2, 1, (xv > sv));
+PeV = deposit32(PeV, i * 2 + 1, 1, (xv > sv));
+}
+return PeV;
+}
+
 /*
  * mem_noshuf
  * Section 5.5 of the Hexagon V67 Programmer's Reference Manual
diff --git a/tests/tcg/hexagon/multi_result.c b/tests/tcg/hexagon/multi_result.c
index 67aa462..c21148f 100644
--- a/tests/tcg/hexagon/multi_result.c
+++ b/tests/tcg/hexagon/multi_result.c
@@ -45,8 +45,41 @@ static int sfinvsqrta(int Rs, int *pred_result)
   return result;
 }
 
+static long long vacsh(long long Rxx, long long Rss, long long Rtt,
+   int *pred_result, int *ovf_result)
+{
+  long long result = Rxx;
+  

[PATCH v3 21/26] Hexagon (target/hexagon) add A4_addp_c/A4_subp_c

2021-04-07 Thread Taylor Simpson
Rdd32 = add(Rss32, Rtt32, Px4):carry
Add with carry
Rdd32 = sub(Rss32, Rtt32, Px4):carry
Sub with carry

Test cases in tests/tcg/hexagon/multi_result.c

Reviewed-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
---
 target/hexagon/gen_tcg.h  | 37 
 target/hexagon/genptr.c   | 11 +
 target/hexagon/imported/alu.idef  | 15 +++
 target/hexagon/imported/encode_pp.def |  2 +
 tests/tcg/hexagon/multi_result.c  | 82 +++
 5 files changed, 147 insertions(+)

diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index aea0c55..6bc578d 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -238,6 +238,43 @@
 } while (0)
 
 /*
+ * Add or subtract with carry.
+ * Predicate register is used as an extra input and output.
+ * r5:4 = add(r1:0, r3:2, p1):carry
+ */
+#define fGEN_TCG_A4_addp_c(SHORTCODE) \
+do { \
+TCGv_i64 carry = tcg_temp_new_i64(); \
+TCGv_i64 zero = tcg_const_i64(0); \
+tcg_gen_extu_i32_i64(carry, PxV); \
+tcg_gen_andi_i64(carry, carry, 1); \
+tcg_gen_add2_i64(RddV, carry, RssV, zero, carry, zero); \
+tcg_gen_add2_i64(RddV, carry, RddV, carry, RttV, zero); \
+tcg_gen_extrl_i64_i32(PxV, carry); \
+gen_8bitsof(PxV, PxV); \
+tcg_temp_free_i64(carry); \
+tcg_temp_free_i64(zero); \
+} while (0)
+
+/* r5:4 = sub(r1:0, r3:2, p1):carry */
+#define fGEN_TCG_A4_subp_c(SHORTCODE) \
+do { \
+TCGv_i64 carry = tcg_temp_new_i64(); \
+TCGv_i64 zero = tcg_const_i64(0); \
+TCGv_i64 not_RttV = tcg_temp_new_i64(); \
+tcg_gen_extu_i32_i64(carry, PxV); \
+tcg_gen_andi_i64(carry, carry, 1); \
+tcg_gen_not_i64(not_RttV, RttV); \
+tcg_gen_add2_i64(RddV, carry, RssV, zero, carry, zero); \
+tcg_gen_add2_i64(RddV, carry, RddV, carry, not_RttV, zero); \
+tcg_gen_extrl_i64_i32(PxV, carry); \
+gen_8bitsof(PxV, PxV); \
+tcg_temp_free_i64(carry); \
+tcg_temp_free_i64(zero); \
+tcg_temp_free_i64(not_RttV); \
+} while (0)
+
+/*
  * Compare each of the 8 unsigned bytes
  * The minimum is placed in each byte of the destination.
  * Each bit of the predicate is set true if the bit from the first operand
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 9dbebc6..333f7d7 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -361,5 +361,16 @@ static inline void gen_store_conditional8(CPUHexagonState 
*env,
 tcg_gen_movi_tl(hex_llsc_addr, ~0);
 }
 
+static TCGv gen_8bitsof(TCGv result, TCGv value)
+{
+TCGv zero = tcg_const_tl(0);
+TCGv ones = tcg_const_tl(0xff);
+tcg_gen_movcond_tl(TCG_COND_NE, result, value, zero, ones, zero);
+tcg_temp_free(zero);
+tcg_temp_free(ones);
+
+return result;
+}
+
 #include "tcg_funcs_generated.c.inc"
 #include "tcg_func_table_generated.c.inc"
diff --git a/target/hexagon/imported/alu.idef b/target/hexagon/imported/alu.idef
index f0c9bb4..58477ae 100644
--- a/target/hexagon/imported/alu.idef
+++ b/target/hexagon/imported/alu.idef
@@ -153,6 +153,21 @@ Q6INSN(A2_subp,"Rdd32=sub(Rtt32,Rss32)",ATTRIBS(),
 "Sub",
 { RddV=RttV-RssV;})
 
+/* 64-bit with carry */
+
+Q6INSN(A4_addp_c,"Rdd32=add(Rss32,Rtt32,Px4):carry",ATTRIBS(),"Add with Carry",
+{
+  RddV = RssV + RttV + fLSBOLD(PxV);
+  PxV = f8BITSOF(fCARRY_FROM_ADD(RssV,RttV,fLSBOLD(PxV)));
+})
+
+Q6INSN(A4_subp_c,"Rdd32=sub(Rss32,Rtt32,Px4):carry",ATTRIBS(),"Sub with Carry",
+{
+  RddV = RssV + ~RttV + fLSBOLD(PxV);
+  PxV = f8BITSOF(fCARRY_FROM_ADD(RssV,~RttV,fLSBOLD(PxV)));
+})
+
+
 /* NEG and ABS */
 
 Q6INSN(A2_negsat,"Rd32=neg(Rs32):sat",ATTRIBS(),
diff --git a/target/hexagon/imported/encode_pp.def 
b/target/hexagon/imported/encode_pp.def
index 4619398..514c240 100644
--- a/target/hexagon/imported/encode_pp.def
+++ b/target/hexagon/imported/encode_pp.def
@@ -1749,6 +1749,8 @@ SH_RRR_ENC(S4_extractp_rp,  
"0001","11-","-","10-","d")
 DEF_FIELDROW_DESC32(ICLASS_S3op" 0010  PP-- ","[#2] 
Rdd=(Rss,Rtt,Pu)")
 SH_RRR_ENC(S2_valignrb, "0010","0--","-","-uu","d")
 SH_RRR_ENC(S2_vsplicerb,"0010","100","-","-uu","d")
+SH_RRR_ENC(A4_addp_c,   "0010","110","-","-xx","d")
+SH_RRR_ENC(A4_subp_c,   "0010","111","-","-xx","d")
 
 
 DEF_FIELDROW_DESC32(ICLASS_S3op" 0011  PP-- ","[#3] 
Rdd=(Rss,Rt)")
diff --git a/tests/tcg/hexagon/multi_result.c b/tests/tcg/hexagon/multi_result.c
index 95d99a0..52997b3 100644
--- a/tests/tcg/hexagon/multi_result.c
+++ b/tests/tcg/hexagon/multi_result.c
@@ -85,6 +85,38 @@ static long long vminub(long long Rtt, long long Rss,
   return result;
 }
 
+static long long add_carry(long long Rss, long long Rtt,
+   int pred_in, int *pred_result)
+{
+  long long result;
+  int predval = pred_in;
+
+  asm volatile("p0 = %1\n\t"
+   "%0 = 

[PATCH v3 25/26] Hexagon (target/hexagon) load into shifted register instructions

2021-04-07 Thread Taylor Simpson
The following instructions are added
L2_loadalignb_io  Ryy32 = memb_fifo(Rs32+#s11:1)
L2_loadalignh_io  Ryy32 = memh_fifo(Rs32+#s11:1)
L4_loadalignb_ur  Ryy32 = memb_fifo(Rt32<<#u2+#U6)
L4_loadalignh_ur  Ryy32 = memh_fifo(Rt32<<#u2+#U6)
L4_loadalignb_ap  Ryy32 = memb_fifo(Re32=#U6)
L4_loadalignh_ap  Ryy32 = memh_fifo(Re32=#U6)
L2_loadalignb_pr  Ryy32 = memb_fifo(Rx32++Mu2)
L2_loadalignh_pr  Ryy32 = memh_fifo(Rx32++Mu2)
L2_loadalignb_pbr Ryy32 = memb_fifo(Rx32++Mu2:brev)
L2_loadalignh_pbr Ryy32 = memh_fifo(Rx32++Mu2:brev)
L2_loadalignb_pi  Ryy32 = memb_fifo(Rx32++#s4:1)
L2_loadalignh_pi  Ryy32 = memh_fifo(Rx32++#s4:1)
L2_loadalignb_pci Ryy32 = memb_fifo(Rx32++#s4:1:circ(Mu2))
L2_loadalignh_pci Ryy32 = memh_fifo(Rx32++#s4:1:circ(Mu2))
L2_loadalignb_pcr Ryy32 = memb_fifo(Rx32++I:circ(Mu2))
L2_loadalignh_pcr Ryy32 = memh_fifo(Rx32++I:circ(Mu2))

Test cases in tests/tcg/hexagon/load_align.c

Reviewed-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
---
 target/hexagon/gen_tcg.h  |  66 ++
 target/hexagon/imported/encode_pp.def |   3 +
 target/hexagon/imported/ldst.idef |  19 ++
 tests/tcg/hexagon/Makefile.target |   1 +
 tests/tcg/hexagon/load_align.c| 415 ++
 5 files changed, 504 insertions(+)
 create mode 100644 tests/tcg/hexagon/load_align.c

diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index 1120aae..18fcdbc 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -261,6 +261,72 @@
 fGEN_TCG_loadbXw4(GET_EA_pi, true)
 
 /*
+ * These instructions load a half word, shift the destination right by 16 bits
+ * and place the loaded value in the high half word of the destination pair.
+ * The GET_EA macro determines the addressing mode.
+ */
+#define fGEN_TCG_loadalignh(GET_EA) \
+do { \
+TCGv tmp = tcg_temp_new(); \
+TCGv_i64 tmp_i64 = tcg_temp_new_i64(); \
+GET_EA;  \
+fLOAD(1, 2, u, EA, tmp);  \
+tcg_gen_extu_i32_i64(tmp_i64, tmp); \
+tcg_gen_shri_i64(RyyV, RyyV, 16); \
+tcg_gen_deposit_i64(RyyV, RyyV, tmp_i64, 48, 16); \
+tcg_temp_free(tmp); \
+tcg_temp_free_i64(tmp_i64); \
+} while (0)
+
+#define fGEN_TCG_L4_loadalignh_ur(SHORTCODE) \
+fGEN_TCG_loadalignh(fEA_IRs(UiV, RtV, uiV))
+#define fGEN_TCG_L2_loadalignh_io(SHORTCODE) \
+fGEN_TCG_loadalignh(fEA_RI(RsV, siV))
+#define fGEN_TCG_L2_loadalignh_pci(SHORTCODE) \
+fGEN_TCG_loadalignh(GET_EA_pci)
+#define fGEN_TCG_L2_loadalignh_pcr(SHORTCODE) \
+fGEN_TCG_loadalignh(GET_EA_pcr(1))
+#define fGEN_TCG_L4_loadalignh_ap(SHORTCODE) \
+fGEN_TCG_loadalignh(GET_EA_ap)
+#define fGEN_TCG_L2_loadalignh_pr(SHORTCODE) \
+fGEN_TCG_loadalignh(GET_EA_pr)
+#define fGEN_TCG_L2_loadalignh_pbr(SHORTCODE) \
+fGEN_TCG_loadalignh(GET_EA_pbr)
+#define fGEN_TCG_L2_loadalignh_pi(SHORTCODE) \
+fGEN_TCG_loadalignh(GET_EA_pi)
+
+/* Same as above, but loads a byte instead of half word */
+#define fGEN_TCG_loadalignb(GET_EA) \
+do { \
+TCGv tmp = tcg_temp_new(); \
+TCGv_i64 tmp_i64 = tcg_temp_new_i64(); \
+GET_EA;  \
+fLOAD(1, 1, u, EA, tmp);  \
+tcg_gen_extu_i32_i64(tmp_i64, tmp); \
+tcg_gen_shri_i64(RyyV, RyyV, 8); \
+tcg_gen_deposit_i64(RyyV, RyyV, tmp_i64, 56, 8); \
+tcg_temp_free(tmp); \
+tcg_temp_free_i64(tmp_i64); \
+} while (0)
+
+#define fGEN_TCG_L2_loadalignb_io(SHORTCODE) \
+fGEN_TCG_loadalignb(fEA_RI(RsV, siV))
+#define fGEN_TCG_L4_loadalignb_ur(SHORTCODE) \
+fGEN_TCG_loadalignb(fEA_IRs(UiV, RtV, uiV))
+#define fGEN_TCG_L2_loadalignb_pci(SHORTCODE) \
+fGEN_TCG_loadalignb(GET_EA_pci)
+#define fGEN_TCG_L2_loadalignb_pcr(SHORTCODE) \
+fGEN_TCG_loadalignb(GET_EA_pcr(0))
+#define fGEN_TCG_L4_loadalignb_ap(SHORTCODE) \
+fGEN_TCG_loadalignb(GET_EA_ap)
+#define fGEN_TCG_L2_loadalignb_pr(SHORTCODE) \
+fGEN_TCG_loadalignb(GET_EA_pr)
+#define fGEN_TCG_L2_loadalignb_pbr(SHORTCODE) \
+fGEN_TCG_loadalignb(GET_EA_pbr)
+#define fGEN_TCG_L2_loadalignb_pi(SHORTCODE) \
+fGEN_TCG_loadalignb(GET_EA_pi)
+
+/*
  * Predicated loads
  * Here is a primer to understand the tag names
  *
diff --git a/target/hexagon/imported/encode_pp.def 
b/target/hexagon/imported/encode_pp.def
index e3582eb..dc4eba4 100644
--- a/target/hexagon/imported/encode_pp.def
+++ b/target/hexagon/imported/encode_pp.def
@@ -348,6 +348,9 @@ STD_LD_ENC(bzw2,"0 011")
 STD_LD_ENC(bsw4,"0 111")
 STD_LD_ENC(bsw2,"0 001")
 
+STD_LDX_ENC(alignh,"0 010")
+STD_LDX_ENC(alignb,"0 100")
+
 STD_LD_ENC(rb,  "1 000")
 STD_LD_ENC(rub, "1 001")
 STD_LD_ENC(rh,  "1 010")
diff --git a/target/hexagon/imported/ldst.idef 
b/target/hexagon/imported/ldst.idef
index 95c0470..359d3b7 100644
--- a/target/hexagon/imported/ldst.idef
+++ 

[PATCH v3 24/26] Hexagon (target/hexagon) load and unpack bytes instructions

2021-04-07 Thread Taylor Simpson
The following instructions are added
L2_loadbzw2_io  Rd32 = memubh(Rs32+#s11:1)
L2_loadbzw4_io  Rdd32 = memubh(Rs32+#s11:1)
L2_loadbsw2_io  Rd32 = membh(Rs32+#s11:1)
L2_loadbsw4_io  Rdd32 = membh(Rs32+#s11:1)

L4_loadbzw2_ur  Rd32 = memubh(Rt32<<#u2+#U6)
L4_loadbzw4_ur  Rdd32 = memubh(Rt32<<#u2+#U6)
L4_loadbsw2_ur  Rd32 = membh(Rt32<<#u2+#U6)
L4_loadbsw4_ur  Rdd32 = membh(Rt32<<#u2+#U6)

L4_loadbzw2_ap  Rd32 = memubh(Re32=#U6)
L4_loadbzw4_ap  Rdd32 = memubh(Re32=#U6)
L4_loadbsw2_ap  Rd32 = membh(Re32=#U6)
L4_loadbsw4_ap  Rdd32 = membh(Re32=#U6)

L2_loadbzw2_pr  Rd32 = memubh(Rx32++Mu2)
L2_loadbzw4_pr  Rdd32 = memubh(Rx32++Mu2)
L2_loadbsw2_pr  Rd32 = membh(Rx32++Mu2)
L2_loadbsw4_pr  Rdd32 = membh(Rx32++Mu2)

L2_loadbzw2_pbr Rd32 = memubh(Rx32++Mu2:brev)
L2_loadbzw4_pbr Rdd32 = memubh(Rx32++Mu2:brev)
L2_loadbsw2_pbr Rd32 = membh(Rx32++Mu2:brev)
L2_loadbsw4_pbr Rdd32 = membh(Rx32++Mu2:brev)

L2_loadbzw2_pi  Rd32 = memubh(Rx32++#s4:1)
L2_loadbzw4_pi  Rdd32 = memubh(Rx32++#s4:1)
L2_loadbsw2_pi  Rd32 = membh(Rx32++#s4:1)
L2_loadbsw4_pi  Rdd32 = membh(Rx32++#s4:1)

L2_loadbzw2_pci Rd32 = memubh(Rx32++#s4:1:circ(Mu2))
L2_loadbzw4_pci Rdd32 = memubh(Rx32++#s4:1:circ(Mu2))
L2_loadbsw2_pci Rd32 = membh(Rx32++#s4:1:circ(Mu2))
L2_loadbsw4_pci Rdd32 = membh(Rx32++#s4:1:circ(Mu2))

L2_loadbzw2_pcr Rd32 = memubh(Rx32++I:circ(Mu2))
L2_loadbzw4_pcr Rdd32 = memubh(Rx32++I:circ(Mu2))
L2_loadbsw2_pcr Rd32 = membh(Rx32++I:circ(Mu2))
L2_loadbsw4_pcr Rdd32 = membh(Rx32++I:circ(Mu2))

Test cases in tests/tcg/hexagon/load_unpack.c

Reviewed-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
---
 target/hexagon/gen_tcg.h  | 108 
 target/hexagon/genptr.c   |  13 +
 target/hexagon/imported/encode_pp.def |   6 +
 target/hexagon/imported/ldst.idef |  43 +++
 target/hexagon/macros.h   |  16 ++
 tests/tcg/hexagon/Makefile.target |   1 +
 tests/tcg/hexagon/load_unpack.c   | 474 ++
 7 files changed, 661 insertions(+)
 create mode 100644 tests/tcg/hexagon/load_unpack.c

diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index 8f0ec01..1120aae 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -153,6 +153,114 @@
 #define fGEN_TCG_L2_loadrd_pi(SHORTCODE)   SHORTCODE
 
 /*
+ * These instructions load 2 bytes and places them in
+ * two halves of the destination register.
+ * The GET_EA macro determines the addressing mode.
+ * The SIGN argument determines whether to zero-extend or
+ * sign-extend.
+ */
+#define fGEN_TCG_loadbXw2(GET_EA, SIGN) \
+do { \
+TCGv tmp = tcg_temp_new(); \
+TCGv byte = tcg_temp_new(); \
+GET_EA; \
+fLOAD(1, 2, u, EA, tmp); \
+tcg_gen_movi_tl(RdV, 0); \
+for (int i = 0; i < 2; i++) { \
+gen_set_half(i, RdV, gen_get_byte(byte, i, tmp, (SIGN))); \
+} \
+tcg_temp_free(tmp); \
+tcg_temp_free(byte); \
+} while (0)
+
+#define fGEN_TCG_L2_loadbzw2_io(SHORTCODE) \
+fGEN_TCG_loadbXw2(fEA_RI(RsV, siV), false)
+#define fGEN_TCG_L4_loadbzw2_ur(SHORTCODE) \
+fGEN_TCG_loadbXw2(fEA_IRs(UiV, RtV, uiV), false)
+#define fGEN_TCG_L2_loadbsw2_io(SHORTCODE) \
+fGEN_TCG_loadbXw2(fEA_RI(RsV, siV), true)
+#define fGEN_TCG_L4_loadbsw2_ur(SHORTCODE) \
+fGEN_TCG_loadbXw2(fEA_IRs(UiV, RtV, uiV), true)
+#define fGEN_TCG_L4_loadbzw2_ap(SHORTCODE) \
+fGEN_TCG_loadbXw2(GET_EA_ap, false)
+#define fGEN_TCG_L2_loadbzw2_pr(SHORTCODE) \
+fGEN_TCG_loadbXw2(GET_EA_pr, false)
+#define fGEN_TCG_L2_loadbzw2_pbr(SHORTCODE) \
+fGEN_TCG_loadbXw2(GET_EA_pbr, false)
+#define fGEN_TCG_L2_loadbzw2_pi(SHORTCODE) \
+fGEN_TCG_loadbXw2(GET_EA_pi, false)
+#define fGEN_TCG_L4_loadbsw2_ap(SHORTCODE) \
+fGEN_TCG_loadbXw2(GET_EA_ap, true)
+#define fGEN_TCG_L2_loadbsw2_pr(SHORTCODE) \
+fGEN_TCG_loadbXw2(GET_EA_pr, true)
+#define fGEN_TCG_L2_loadbsw2_pbr(SHORTCODE) \
+fGEN_TCG_loadbXw2(GET_EA_pbr, true)
+#define fGEN_TCG_L2_loadbsw2_pi(SHORTCODE) \
+fGEN_TCG_loadbXw2(GET_EA_pi, true)
+#define fGEN_TCG_L2_loadbzw2_pci(SHORTCODE) \
+fGEN_TCG_loadbXw2(GET_EA_pci, false)
+#define fGEN_TCG_L2_loadbsw2_pci(SHORTCODE) \
+fGEN_TCG_loadbXw2(GET_EA_pci, true)
+#define fGEN_TCG_L2_loadbzw2_pcr(SHORTCODE) \
+fGEN_TCG_loadbXw2(GET_EA_pcr(1), false)
+#define fGEN_TCG_L2_loadbsw2_pcr(SHORTCODE) \
+fGEN_TCG_loadbXw2(GET_EA_pcr(1), true)
+
+/*
+ * These instructions load 4 bytes and places them in
+ * four halves of the destination register pair.
+ * The GET_EA macro determines the addressing mode.
+ * The SIGN argument determines whether 

[PATCH v3 17/26] Hexagon (target/hexagon) add F2_sfrecipa instruction

2021-04-07 Thread Taylor Simpson
Rd32,Pe4 = sfrecipa(Rs32, Rt32)
Recripocal approx

Test cases in tests/tcg/hexagon/multi_result.c
FP exception tests added to tests/tcg/hexagon/fpstuff.c

Reviewed-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
---
 target/hexagon/arch.c | 26 +--
 target/hexagon/arch.h |  2 +
 target/hexagon/gen_tcg.h  | 21 +
 target/hexagon/helper.h   |  1 +
 target/hexagon/imported/encode_pp.def |  1 +
 target/hexagon/imported/float.idef| 16 +++
 target/hexagon/op_helper.c| 37 
 tests/tcg/hexagon/Makefile.target |  1 +
 tests/tcg/hexagon/fpstuff.c   | 82 +++
 tests/tcg/hexagon/multi_result.c  | 68 +
 10 files changed, 252 insertions(+), 3 deletions(-)
 create mode 100644 tests/tcg/hexagon/multi_result.c

diff --git a/target/hexagon/arch.c b/target/hexagon/arch.c
index 40b6e3d..46edf45 100644
--- a/target/hexagon/arch.c
+++ b/target/hexagon/arch.c
@@ -181,12 +181,13 @@ int arch_sf_recip_common(float32 *Rs, float32 *Rt, 
float32 *Rd, int *adjust,
 /* or put Inf in num fixup? */
 uint8_t RsV_sign = float32_is_neg(RsV);
 uint8_t RtV_sign = float32_is_neg(RtV);
+/* Check that RsV is NOT infinite before we overwrite it */
+if (!float32_is_infinity(RsV)) {
+float_raise(float_flag_divbyzero, fp_status);
+}
 RsV = infinite_float32(RsV_sign ^ RtV_sign);
 RtV = float32_one;
 RdV = float32_one;
-if (float32_is_infinity(RsV)) {
-float_raise(float_flag_divbyzero, fp_status);
-}
 } else if (float32_is_infinity(RtV)) {
 RsV = make_float32(0x8000 & (RsV ^ RtV));
 RtV = float32_one;
@@ -279,3 +280,22 @@ int arch_sf_invsqrt_common(float32 *Rs, float32 *Rd, int 
*adjust,
 *adjust = PeV;
 return ret;
 }
+
+const uint8_t recip_lookup_table[128] = {
+0x0fe, 0x0fa, 0x0f6, 0x0f2, 0x0ef, 0x0eb, 0x0e7, 0x0e4,
+0x0e0, 0x0dd, 0x0d9, 0x0d6, 0x0d2, 0x0cf, 0x0cc, 0x0c9,
+0x0c6, 0x0c2, 0x0bf, 0x0bc, 0x0b9, 0x0b6, 0x0b3, 0x0b1,
+0x0ae, 0x0ab, 0x0a8, 0x0a5, 0x0a3, 0x0a0, 0x09d, 0x09b,
+0x098, 0x096, 0x093, 0x091, 0x08e, 0x08c, 0x08a, 0x087,
+0x085, 0x083, 0x080, 0x07e, 0x07c, 0x07a, 0x078, 0x075,
+0x073, 0x071, 0x06f, 0x06d, 0x06b, 0x069, 0x067, 0x065,
+0x063, 0x061, 0x05f, 0x05e, 0x05c, 0x05a, 0x058, 0x056,
+0x054, 0x053, 0x051, 0x04f, 0x04e, 0x04c, 0x04a, 0x049,
+0x047, 0x045, 0x044, 0x042, 0x040, 0x03f, 0x03d, 0x03c,
+0x03a, 0x039, 0x037, 0x036, 0x034, 0x033, 0x032, 0x030,
+0x02f, 0x02d, 0x02c, 0x02b, 0x029, 0x028, 0x027, 0x025,
+0x024, 0x023, 0x021, 0x020, 0x01f, 0x01e, 0x01c, 0x01b,
+0x01a, 0x019, 0x017, 0x016, 0x015, 0x014, 0x013, 0x012,
+0x011, 0x00f, 0x00e, 0x00d, 0x00c, 0x00b, 0x00a, 0x009,
+0x008, 0x007, 0x006, 0x005, 0x004, 0x003, 0x002, 0x000,
+};
diff --git a/target/hexagon/arch.h b/target/hexagon/arch.h
index 6e0b0d9..b6634e9 100644
--- a/target/hexagon/arch.h
+++ b/target/hexagon/arch.h
@@ -30,4 +30,6 @@ int arch_sf_recip_common(float32 *Rs, float32 *Rt, float32 
*Rd,
 int arch_sf_invsqrt_common(float32 *Rs, float32 *Rd, int *adjust,
   float_status *fp_status);
 
+extern const uint8_t recip_lookup_table[128];
+
 #endif
diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index a30048e..428a670 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -195,6 +195,27 @@
 #define fGEN_TCG_S4_stored_locked(SHORTCODE) \
 do { SHORTCODE; READ_PREG(PdV, PdN); } while (0)
 
+/*
+ * Mathematical operations with more than one definition require
+ * special handling
+ */
+
+/*
+ * Approximate reciprocal
+ * r3,p1 = sfrecipa(r0, r1)
+ *
+ * The helper packs the 2 32-bit results into a 64-bit value,
+ * so unpack them into the proper results.
+ */
+#define fGEN_TCG_F2_sfrecipa(SHORTCODE) \
+do { \
+TCGv_i64 tmp = tcg_temp_new_i64(); \
+gen_helper_sfrecipa(tmp, cpu_env, RsV, RtV);  \
+tcg_gen_extrh_i64_i32(RdV, tmp); \
+tcg_gen_extrl_i64_i32(PeV, tmp); \
+tcg_temp_free_i64(tmp); \
+} while (0)
+
 /* Floating point */
 #define fGEN_TCG_F2_conv_sf2df(SHORTCODE) \
 gen_helper_conv_sf2df(RddV, cpu_env, RsV)
diff --git a/target/hexagon/helper.h b/target/hexagon/helper.h
index efe6069..b377293 100644
--- a/target/hexagon/helper.h
+++ b/target/hexagon/helper.h
@@ -24,6 +24,7 @@ DEF_HELPER_FLAGS_3(debug_check_store_width, TCG_CALL_NO_WG, 
void, env, int, int)
 DEF_HELPER_FLAGS_3(debug_commit_end, TCG_CALL_NO_WG, void, env, int, int)
 DEF_HELPER_2(commit_store, void, env, int)
 DEF_HELPER_FLAGS_4(fcircadd, TCG_CALL_NO_RWG_SE, s32, s32, s32, s32, s32)
+DEF_HELPER_3(sfrecipa, i64, env, f32, f32)
 
 /* Floating point */
 DEF_HELPER_2(conv_sf2df, f64, env, f32)
diff --git a/target/hexagon/imported/encode_pp.def 
b/target/hexagon/imported/encode_pp.def
index 

[PATCH v3 14/26] Hexagon (target/hexagon) cleanup reg_field_info definition

2021-04-07 Thread Taylor Simpson
Include size in declaration
Remove {0, 0} entry

Suggested-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
---
 target/hexagon/reg_fields.c | 3 +--
 target/hexagon/reg_fields.h | 4 ++--
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/target/hexagon/reg_fields.c b/target/hexagon/reg_fields.c
index bdcab79..6713203 100644
--- a/target/hexagon/reg_fields.c
+++ b/target/hexagon/reg_fields.c
@@ -18,10 +18,9 @@
 #include "qemu/osdep.h"
 #include "reg_fields.h"
 
-const RegField reg_field_info[] = {
+const RegField reg_field_info[NUM_REG_FIELDS] = {
 #define DEF_REG_FIELD(TAG, START, WIDTH)\
   { START, WIDTH },
 #include "reg_fields_def.h.inc"
-  { 0, 0 }
 #undef DEF_REG_FIELD
 };
diff --git a/target/hexagon/reg_fields.h b/target/hexagon/reg_fields.h
index d3c86c9..9e2ad5d 100644
--- a/target/hexagon/reg_fields.h
+++ b/target/hexagon/reg_fields.h
@@ -23,8 +23,6 @@ typedef struct {
 int width;
 } RegField;
 
-extern const RegField reg_field_info[];
-
 enum {
 #define DEF_REG_FIELD(TAG, START, WIDTH) \
 TAG,
@@ -33,4 +31,6 @@ enum {
 #undef DEF_REG_FIELD
 };
 
+extern const RegField reg_field_info[NUM_REG_FIELDS];
+
 #endif
-- 
2.7.4




[PATCH v3 16/26] Hexagon (target/hexagon) compile all debug code

2021-04-07 Thread Taylor Simpson
Change #if HEX_DEBUG to if (HEX_DEBUG) so that the debug code doesn't
bit rot.

Suggested-by: Philippe Mathieu-Daudé 
Signed-off-by: Taylor Simpson 
---
 target/hexagon/genptr.c| 72 ++--
 target/hexagon/helper.h|  2 --
 target/hexagon/internal.h  | 11 +++
 target/hexagon/op_helper.c | 14 +++--
 target/hexagon/translate.c | 74 ++
 target/hexagon/translate.h |  2 --
 6 files changed, 81 insertions(+), 94 deletions(-)

diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index b87e264..24d5758 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -42,17 +42,17 @@ static inline void gen_log_predicated_reg_write(int rnum, 
TCGv val, int slot)
 tcg_gen_andi_tl(slot_mask, hex_slot_cancelled, 1 << slot);
 tcg_gen_movcond_tl(TCG_COND_EQ, hex_new_value[rnum], slot_mask, zero,
val, hex_new_value[rnum]);
-#if HEX_DEBUG
-/*
- * Do this so HELPER(debug_commit_end) will know
- *
- * Note that slot_mask indicates the value is not written
- * (i.e., slot was cancelled), so we create a true/false value before
- * or'ing with hex_reg_written[rnum].
- */
-tcg_gen_setcond_tl(TCG_COND_EQ, slot_mask, slot_mask, zero);
-tcg_gen_or_tl(hex_reg_written[rnum], hex_reg_written[rnum], slot_mask);
-#endif
+if (HEX_DEBUG) {
+/*
+ * Do this so HELPER(debug_commit_end) will know
+ *
+ * Note that slot_mask indicates the value is not written
+ * (i.e., slot was cancelled), so we create a true/false value before
+ * or'ing with hex_reg_written[rnum].
+ */
+tcg_gen_setcond_tl(TCG_COND_EQ, slot_mask, slot_mask, zero);
+tcg_gen_or_tl(hex_reg_written[rnum], hex_reg_written[rnum], slot_mask);
+}
 
 tcg_temp_free(zero);
 tcg_temp_free(slot_mask);
@@ -61,10 +61,10 @@ static inline void gen_log_predicated_reg_write(int rnum, 
TCGv val, int slot)
 static inline void gen_log_reg_write(int rnum, TCGv val)
 {
 tcg_gen_mov_tl(hex_new_value[rnum], val);
-#if HEX_DEBUG
-/* Do this so HELPER(debug_commit_end) will know */
-tcg_gen_movi_tl(hex_reg_written[rnum], 1);
-#endif
+if (HEX_DEBUG) {
+/* Do this so HELPER(debug_commit_end) will know */
+tcg_gen_movi_tl(hex_reg_written[rnum], 1);
+}
 }
 
 static void gen_log_predicated_reg_write_pair(int rnum, TCGv_i64 val, int slot)
@@ -84,19 +84,19 @@ static void gen_log_predicated_reg_write_pair(int rnum, 
TCGv_i64 val, int slot)
 tcg_gen_movcond_tl(TCG_COND_EQ, hex_new_value[rnum + 1],
slot_mask, zero,
val32, hex_new_value[rnum + 1]);
-#if HEX_DEBUG
-/*
- * Do this so HELPER(debug_commit_end) will know
- *
- * Note that slot_mask indicates the value is not written
- * (i.e., slot was cancelled), so we create a true/false value before
- * or'ing with hex_reg_written[rnum].
- */
-tcg_gen_setcond_tl(TCG_COND_EQ, slot_mask, slot_mask, zero);
-tcg_gen_or_tl(hex_reg_written[rnum], hex_reg_written[rnum], slot_mask);
-tcg_gen_or_tl(hex_reg_written[rnum + 1], hex_reg_written[rnum + 1],
-  slot_mask);
-#endif
+if (HEX_DEBUG) {
+/*
+ * Do this so HELPER(debug_commit_end) will know
+ *
+ * Note that slot_mask indicates the value is not written
+ * (i.e., slot was cancelled), so we create a true/false value before
+ * or'ing with hex_reg_written[rnum].
+ */
+tcg_gen_setcond_tl(TCG_COND_EQ, slot_mask, slot_mask, zero);
+tcg_gen_or_tl(hex_reg_written[rnum], hex_reg_written[rnum], slot_mask);
+tcg_gen_or_tl(hex_reg_written[rnum + 1], hex_reg_written[rnum + 1],
+  slot_mask);
+}
 
 tcg_temp_free(val32);
 tcg_temp_free(zero);
@@ -107,17 +107,17 @@ static void gen_log_reg_write_pair(int rnum, TCGv_i64 val)
 {
 /* Low word */
 tcg_gen_extrl_i64_i32(hex_new_value[rnum], val);
-#if HEX_DEBUG
-/* Do this so HELPER(debug_commit_end) will know */
-tcg_gen_movi_tl(hex_reg_written[rnum], 1);
-#endif
+if (HEX_DEBUG) {
+/* Do this so HELPER(debug_commit_end) will know */
+tcg_gen_movi_tl(hex_reg_written[rnum], 1);
+}
 
 /* High word */
 tcg_gen_extrh_i64_i32(hex_new_value[rnum + 1], val);
-#if HEX_DEBUG
-/* Do this so HELPER(debug_commit_end) will know */
-tcg_gen_movi_tl(hex_reg_written[rnum + 1], 1);
-#endif
+if (HEX_DEBUG) {
+/* Do this so HELPER(debug_commit_end) will know */
+tcg_gen_movi_tl(hex_reg_written[rnum + 1], 1);
+}
 }
 
 static inline void gen_log_pred_write(DisasContext *ctx, int pnum, TCGv val)
diff --git a/target/hexagon/helper.h b/target/hexagon/helper.h
index 715c246..efe6069 100644
--- a/target/hexagon/helper.h
+++ b/target/hexagon/helper.h
@@ -19,11 +19,9 @@
 #include "helper_protos_generated.h.inc"
 
 

[PATCH v3 12/26] Hexagon (target/hexagon) use softfloat for float-to-int conversions

2021-04-07 Thread Taylor Simpson
Use the proper return for helpers that convert to unsigned
Remove target/hexagon/conv_emu.[ch]

Suggested-by: Richard Henderson 
Reviewed-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
---
 target/hexagon/conv_emu.c   | 177 
 target/hexagon/conv_emu.h   |  31 
 target/hexagon/fma_emu.c|   1 -
 target/hexagon/helper.h |  16 ++--
 target/hexagon/meson.build  |   1 -
 target/hexagon/op_helper.c  | 169 --
 tests/tcg/hexagon/fpstuff.c | 145 
 7 files changed, 281 insertions(+), 259 deletions(-)
 delete mode 100644 target/hexagon/conv_emu.c
 delete mode 100644 target/hexagon/conv_emu.h

diff --git a/target/hexagon/conv_emu.c b/target/hexagon/conv_emu.c
deleted file mode 100644
index 3985b10..000
--- a/target/hexagon/conv_emu.c
+++ /dev/null
@@ -1,177 +0,0 @@
-/*
- *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
- *
- *  This program is free software; you can redistribute it and/or modify
- *  it under the terms of the GNU General Public License as published by
- *  the Free Software Foundation; either version 2 of the License, or
- *  (at your option) any later version.
- *
- *  This program is distributed in the hope that it will be useful,
- *  but WITHOUT ANY WARRANTY; without even the implied warranty of
- *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- *  GNU General Public License for more details.
- *
- *  You should have received a copy of the GNU General Public License
- *  along with this program; if not, see .
- */
-
-#include "qemu/osdep.h"
-#include "qemu/host-utils.h"
-#include "fpu/softfloat.h"
-#include "macros.h"
-#include "conv_emu.h"
-
-#define LL_MAX_POS 0x7fffULL
-#define MAX_POS 0x7fffU
-
-static uint64_t conv_f64_to_8u_n(float64 in, int will_negate,
- float_status *fp_status)
-{
-uint8_t sign = float64_is_neg(in);
-if (float64_is_infinity(in)) {
-float_raise(float_flag_invalid, fp_status);
-if (float64_is_neg(in)) {
-return 0ULL;
-} else {
-return ~0ULL;
-}
-}
-if (float64_is_any_nan(in)) {
-float_raise(float_flag_invalid, fp_status);
-return ~0ULL;
-}
-if (float64_is_zero(in)) {
-return 0;
-}
-if (sign) {
-float_raise(float_flag_invalid, fp_status);
-return 0;
-}
-if (float64_lt(in, float64_half, fp_status)) {
-/* Near zero, captures large fracshifts, denorms, etc */
-float_raise(float_flag_inexact, fp_status);
-switch (get_float_rounding_mode(fp_status)) {
-case float_round_down:
-if (will_negate) {
-return 1;
-} else {
-return 0;
-}
-case float_round_up:
-if (!will_negate) {
-return 1;
-} else {
-return 0;
-}
-default:
-return 0;/* nearest or towards zero */
-}
-}
-return float64_to_uint64(in, fp_status);
-}
-
-static void clr_float_exception_flags(uint8_t flag, float_status *fp_status)
-{
-uint8_t flags = fp_status->float_exception_flags;
-flags &= ~flag;
-set_float_exception_flags(flags, fp_status);
-}
-
-static uint32_t conv_df_to_4u_n(float64 fp64, int will_negate,
-float_status *fp_status)
-{
-uint64_t tmp;
-tmp = conv_f64_to_8u_n(fp64, will_negate, fp_status);
-if (tmp > 0xULL) {
-clr_float_exception_flags(float_flag_inexact, fp_status);
-float_raise(float_flag_invalid, fp_status);
-return ~0U;
-}
-return (uint32_t)tmp;
-}
-
-uint64_t conv_df_to_8u(float64 in, float_status *fp_status)
-{
-return conv_f64_to_8u_n(in, 0, fp_status);
-}
-
-uint32_t conv_df_to_4u(float64 in, float_status *fp_status)
-{
-return conv_df_to_4u_n(in, 0, fp_status);
-}
-
-int64_t conv_df_to_8s(float64 in, float_status *fp_status)
-{
-uint8_t sign = float64_is_neg(in);
-uint64_t tmp;
-if (float64_is_any_nan(in)) {
-float_raise(float_flag_invalid, fp_status);
-return -1;
-}
-if (sign) {
-float64 minus_fp64 = float64_abs(in);
-tmp = conv_f64_to_8u_n(minus_fp64, 1, fp_status);
-} else {
-tmp = conv_f64_to_8u_n(in, 0, fp_status);
-}
-if (tmp > (LL_MAX_POS + sign)) {
-clr_float_exception_flags(float_flag_inexact, fp_status);
-float_raise(float_flag_invalid, fp_status);
-tmp = (LL_MAX_POS + sign);
-}
-if (sign) {
-return -tmp;
-} else {
-return tmp;
-}
-}
-
-int32_t conv_df_to_4s(float64 in, float_status *fp_status)
-{
-uint8_t sign = float64_is_neg(in);
-uint64_t tmp;
-if (float64_is_any_nan(in)) {
-float_raise(float_flag_invalid, 

[PATCH v3 11/26] Hexagon (target/hexagon) replace float32_mul_pow2 with float32_scalbn

2021-04-07 Thread Taylor Simpson
Suggested-by: Richard Henderson 
Reviewed-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
---
 target/hexagon/arch.c | 28 +++-
 1 file changed, 11 insertions(+), 17 deletions(-)

diff --git a/target/hexagon/arch.c b/target/hexagon/arch.c
index bb51f19..40b6e3d 100644
--- a/target/hexagon/arch.c
+++ b/target/hexagon/arch.c
@@ -143,12 +143,6 @@ void arch_fpop_end(CPUHexagonState *env)
 }
 }
 
-static float32 float32_mul_pow2(float32 a, uint32_t p, float_status *fp_status)
-{
-float32 b = make_float32((SF_BIAS + p) << SF_MANTBITS);
-return float32_mul(a, b, fp_status);
-}
-
 int arch_sf_recip_common(float32 *Rs, float32 *Rt, float32 *Rd, int *adjust,
  float_status *fp_status)
 {
@@ -217,22 +211,22 @@ int arch_sf_recip_common(float32 *Rs, float32 *Rt, 
float32 *Rd, int *adjust,
 if ((n_exp - d_exp + SF_BIAS) <= SF_MANTBITS) {
 /* Near quotient underflow / inexact Q */
 PeV = 0x80;
-RtV = float32_mul_pow2(RtV, -64, fp_status);
-RsV = float32_mul_pow2(RsV, 64, fp_status);
+RtV = float32_scalbn(RtV, -64, fp_status);
+RsV = float32_scalbn(RsV, 64, fp_status);
 } else if ((n_exp - d_exp + SF_BIAS) > (SF_MAXEXP - 24)) {
 /* Near quotient overflow */
 PeV = 0x40;
-RtV = float32_mul_pow2(RtV, 32, fp_status);
-RsV = float32_mul_pow2(RsV, -32, fp_status);
+RtV = float32_scalbn(RtV, 32, fp_status);
+RsV = float32_scalbn(RsV, -32, fp_status);
 } else if (n_exp <= SF_MANTBITS + 2) {
-RtV = float32_mul_pow2(RtV, 64, fp_status);
-RsV = float32_mul_pow2(RsV, 64, fp_status);
+RtV = float32_scalbn(RtV, 64, fp_status);
+RsV = float32_scalbn(RsV, 64, fp_status);
 } else if (d_exp <= 1) {
-RtV = float32_mul_pow2(RtV, 32, fp_status);
-RsV = float32_mul_pow2(RsV, 32, fp_status);
+RtV = float32_scalbn(RtV, 32, fp_status);
+RsV = float32_scalbn(RsV, 32, fp_status);
 } else if (d_exp > 252) {
-RtV = float32_mul_pow2(RtV, -32, fp_status);
-RsV = float32_mul_pow2(RsV, -32, fp_status);
+RtV = float32_scalbn(RtV, -32, fp_status);
+RsV = float32_scalbn(RsV, -32, fp_status);
 }
 RdV = 0;
 ret = 1;
@@ -274,7 +268,7 @@ int arch_sf_invsqrt_common(float32 *Rs, float32 *Rd, int 
*adjust,
 /* Basic checks passed */
 r_exp = float32_getexp(RsV);
 if (r_exp <= 24) {
-RsV = float32_mul_pow2(RsV, 64, fp_status);
+RsV = float32_scalbn(RsV, 64, fp_status);
 PeV = 0xe0;
 }
 RdV = 0;
-- 
2.7.4




[PATCH v3 02/26] Hexagon (target/hexagon) cleanup gen_log_predicated_reg_write_pair

2021-04-07 Thread Taylor Simpson
Similar to previous cleanup of gen_log_predicated_reg_write

Signed-off-by: Taylor Simpson 
---
 target/hexagon/genptr.c | 27 +--
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 87f5d92..07d970f 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -69,36 +69,35 @@ static inline void gen_log_reg_write(int rnum, TCGv val)
 static void gen_log_predicated_reg_write_pair(int rnum, TCGv_i64 val, int slot)
 {
 TCGv val32 = tcg_temp_new();
-TCGv one = tcg_const_tl(1);
 TCGv zero = tcg_const_tl(0);
 TCGv slot_mask = tcg_temp_new();
 
 tcg_gen_andi_tl(slot_mask, hex_slot_cancelled, 1 << slot);
 /* Low word */
 tcg_gen_extrl_i64_i32(val32, val);
-tcg_gen_movcond_tl(TCG_COND_EQ, hex_new_value[rnum], slot_mask, zero,
-   val32, hex_new_value[rnum]);
-#if HEX_DEBUG
-/* Do this so HELPER(debug_commit_end) will know */
-tcg_gen_movcond_tl(TCG_COND_EQ, hex_reg_written[rnum],
+tcg_gen_movcond_tl(TCG_COND_EQ, hex_new_value[rnum],
slot_mask, zero,
-   one, hex_reg_written[rnum]);
-#endif
-
+   val32, hex_new_value[rnum]);
 /* High word */
 tcg_gen_extrh_i64_i32(val32, val);
 tcg_gen_movcond_tl(TCG_COND_EQ, hex_new_value[rnum + 1],
slot_mask, zero,
val32, hex_new_value[rnum + 1]);
 #if HEX_DEBUG
-/* Do this so HELPER(debug_commit_end) will know */
-tcg_gen_movcond_tl(TCG_COND_EQ, hex_reg_written[rnum + 1],
-   slot_mask, zero,
-   one, hex_reg_written[rnum + 1]);
+/*
+ * Do this so HELPER(debug_commit_end) will know
+ *
+ * Note that slot_mask indicates the value is not written
+ * (i.e., slot was cancelled), so we create a true/false value before
+ * or'ing with hex_reg_written[rnum].
+ */
+tcg_gen_setcond_tl(TCG_COND_EQ, slot_mask, slot_mask, zero);
+tcg_gen_or_tl(hex_reg_written[rnum], hex_reg_written[rnum], slot_mask);
+tcg_gen_or_tl(hex_reg_written[rnum + 1], hex_reg_written[rnum + 1],
+  slot_mask);
 #endif
 
 tcg_temp_free(val32);
-tcg_temp_free(one);
 tcg_temp_free(zero);
 tcg_temp_free(slot_mask);
 }
-- 
2.7.4




[PATCH v3 18/26] Hexagon (target/hexagon) add F2_sfinvsqrta

2021-04-07 Thread Taylor Simpson
Rd32,Pe4 = sfinvsqrta(Rs32)
Square root approx

The helper packs the 2 32-bit results into a 64-bit value,
and the fGEN_TCG override unpacks them into the proper results.

Test cases in tests/tcg/hexagon/multi_result.c
FP exception tests added to tests/tcg/hexagon/fpstuff.c

Reviewed-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
---
 target/hexagon/arch.c | 21 -
 target/hexagon/arch.h |  2 ++
 target/hexagon/gen_tcg.h  | 16 
 target/hexagon/helper.h   |  1 +
 target/hexagon/imported/encode_pp.def |  1 +
 target/hexagon/imported/float.idef| 16 
 target/hexagon/op_helper.c| 21 +
 tests/tcg/hexagon/fpstuff.c   | 15 +++
 tests/tcg/hexagon/multi_result.c  | 29 +
 9 files changed, 121 insertions(+), 1 deletion(-)

diff --git a/target/hexagon/arch.c b/target/hexagon/arch.c
index 46edf45..dee852e 100644
--- a/target/hexagon/arch.c
+++ b/target/hexagon/arch.c
@@ -247,7 +247,7 @@ int arch_sf_invsqrt_common(float32 *Rs, float32 *Rd, int 
*adjust,
 int r_exp;
 int ret = 0;
 RsV = *Rs;
-if (float32_is_infinity(RsV)) {
+if (float32_is_any_nan(RsV)) {
 if (extract32(RsV, 22, 1) == 0) {
 float_raise(float_flag_invalid, fp_status);
 }
@@ -299,3 +299,22 @@ const uint8_t recip_lookup_table[128] = {
 0x011, 0x00f, 0x00e, 0x00d, 0x00c, 0x00b, 0x00a, 0x009,
 0x008, 0x007, 0x006, 0x005, 0x004, 0x003, 0x002, 0x000,
 };
+
+const uint8_t invsqrt_lookup_table[128] = {
+0x069, 0x066, 0x063, 0x061, 0x05e, 0x05b, 0x059, 0x057,
+0x054, 0x052, 0x050, 0x04d, 0x04b, 0x049, 0x047, 0x045,
+0x043, 0x041, 0x03f, 0x03d, 0x03b, 0x039, 0x037, 0x036,
+0x034, 0x032, 0x030, 0x02f, 0x02d, 0x02c, 0x02a, 0x028,
+0x027, 0x025, 0x024, 0x022, 0x021, 0x01f, 0x01e, 0x01d,
+0x01b, 0x01a, 0x019, 0x017, 0x016, 0x015, 0x014, 0x012,
+0x011, 0x010, 0x00f, 0x00d, 0x00c, 0x00b, 0x00a, 0x009,
+0x008, 0x007, 0x006, 0x005, 0x004, 0x003, 0x002, 0x001,
+0x0fe, 0x0fa, 0x0f6, 0x0f3, 0x0ef, 0x0eb, 0x0e8, 0x0e4,
+0x0e1, 0x0de, 0x0db, 0x0d7, 0x0d4, 0x0d1, 0x0ce, 0x0cb,
+0x0c9, 0x0c6, 0x0c3, 0x0c0, 0x0be, 0x0bb, 0x0b8, 0x0b6,
+0x0b3, 0x0b1, 0x0af, 0x0ac, 0x0aa, 0x0a8, 0x0a5, 0x0a3,
+0x0a1, 0x09f, 0x09d, 0x09b, 0x099, 0x097, 0x095, 0x093,
+0x091, 0x08f, 0x08d, 0x08b, 0x089, 0x087, 0x086, 0x084,
+0x082, 0x080, 0x07f, 0x07d, 0x07b, 0x07a, 0x078, 0x077,
+0x075, 0x074, 0x072, 0x071, 0x06f, 0x06e, 0x06c, 0x06b,
+};
diff --git a/target/hexagon/arch.h b/target/hexagon/arch.h
index b6634e9..3e0c334 100644
--- a/target/hexagon/arch.h
+++ b/target/hexagon/arch.h
@@ -32,4 +32,6 @@ int arch_sf_invsqrt_common(float32 *Rs, float32 *Rd, int 
*adjust,
 
 extern const uint8_t recip_lookup_table[128];
 
+extern const uint8_t invsqrt_lookup_table[128];
+
 #endif
diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index 428a670..d78e7b8 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -216,6 +216,22 @@
 tcg_temp_free_i64(tmp); \
 } while (0)
 
+/*
+ * Approximation of the reciprocal square root
+ * r1,p0 = sfinvsqrta(r0)
+ *
+ * The helper packs the 2 32-bit results into a 64-bit value,
+ * so unpack them into the proper results.
+ */
+#define fGEN_TCG_F2_sfinvsqrta(SHORTCODE) \
+do { \
+TCGv_i64 tmp = tcg_temp_new_i64(); \
+gen_helper_sfinvsqrta(tmp, cpu_env, RsV); \
+tcg_gen_extrh_i64_i32(RdV, tmp); \
+tcg_gen_extrl_i64_i32(PeV, tmp); \
+tcg_temp_free_i64(tmp); \
+} while (0)
+
 /* Floating point */
 #define fGEN_TCG_F2_conv_sf2df(SHORTCODE) \
 gen_helper_conv_sf2df(RddV, cpu_env, RsV)
diff --git a/target/hexagon/helper.h b/target/hexagon/helper.h
index b377293..cb7508f 100644
--- a/target/hexagon/helper.h
+++ b/target/hexagon/helper.h
@@ -25,6 +25,7 @@ DEF_HELPER_FLAGS_3(debug_commit_end, TCG_CALL_NO_WG, void, 
env, int, int)
 DEF_HELPER_2(commit_store, void, env, int)
 DEF_HELPER_FLAGS_4(fcircadd, TCG_CALL_NO_RWG_SE, s32, s32, s32, s32, s32)
 DEF_HELPER_3(sfrecipa, i64, env, f32, f32)
+DEF_HELPER_2(sfinvsqrta, i64, env, f32)
 
 /* Floating point */
 DEF_HELPER_2(conv_sf2df, f64, env, f32)
diff --git a/target/hexagon/imported/encode_pp.def 
b/target/hexagon/imported/encode_pp.def
index b01b4d7..18fe45d 100644
--- a/target/hexagon/imported/encode_pp.def
+++ b/target/hexagon/imported/encode_pp.def
@@ -1642,6 +1642,7 @@ SH2_RR_ENC(F2_conv_sf2w,  
"1011","100","-","000","d")
 SH2_RR_ENC(F2_conv_sf2uw_chop,"1011","011","-","001","d")
 SH2_RR_ENC(F2_conv_sf2w_chop, "1011","100","-","001","d")
 SH2_RR_ENC(F2_sffixupr,   "1011","101","-","000","d")
+SH2_RR_ENC(F2_sfinvsqrta, "1011","111","-","0ee","d")
 
 
 DEF_FIELDROW_DESC32(ICLASS_S2op"  1100  PP-- ","[#12] 
Rd=(Rs,#u6)")
diff --git 

[PATCH v3 26/26] Hexagon (target/hexagon) CABAC decode bin

2021-04-07 Thread Taylor Simpson
The following instruction is added
S2_cabacdecbinRdd32=decbin(Rss32,Rtt32)

Test cases added to tests/tcg/hexagon/misc.c

Reviewed-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
---
 target/hexagon/arch.c | 91 +++
 target/hexagon/arch.h |  4 ++
 target/hexagon/imported/encode_pp.def |  1 +
 target/hexagon/imported/macros.def| 15 ++
 target/hexagon/imported/shift.idef| 47 ++
 target/hexagon/macros.h   |  7 +++
 tests/tcg/hexagon/misc.c  | 28 +++
 7 files changed, 193 insertions(+)

diff --git a/target/hexagon/arch.c b/target/hexagon/arch.c
index dee852e..68a55b3 100644
--- a/target/hexagon/arch.c
+++ b/target/hexagon/arch.c
@@ -27,6 +27,97 @@
 #define SF_MANTBITS23
 #define float32_nanmake_float32(0x)
 
+/*
+ * These three tables are used by the cabacdecbin instruction
+ */
+const uint8_t rLPS_table_64x4[64][4] = {
+{128, 176, 208, 240},
+{128, 167, 197, 227},
+{128, 158, 187, 216},
+{123, 150, 178, 205},
+{116, 142, 169, 195},
+{111, 135, 160, 185},
+{105, 128, 152, 175},
+{100, 122, 144, 166},
+{95, 116, 137, 158},
+{90, 110, 130, 150},
+{85, 104, 123, 142},
+{81, 99, 117, 135},
+{77, 94, 111, 128},
+{73, 89, 105, 122},
+{69, 85, 100, 116},
+{66, 80, 95, 110},
+{62, 76, 90, 104},
+{59, 72, 86, 99},
+{56, 69, 81, 94},
+{53, 65, 77, 89},
+{51, 62, 73, 85},
+{48, 59, 69, 80},
+{46, 56, 66, 76},
+{43, 53, 63, 72},
+{41, 50, 59, 69},
+{39, 48, 56, 65},
+{37, 45, 54, 62},
+{35, 43, 51, 59},
+{33, 41, 48, 56},
+{32, 39, 46, 53},
+{30, 37, 43, 50},
+{29, 35, 41, 48},
+{27, 33, 39, 45},
+{26, 31, 37, 43},
+{24, 30, 35, 41},
+{23, 28, 33, 39},
+{22, 27, 32, 37},
+{21, 26, 30, 35},
+{20, 24, 29, 33},
+{19, 23, 27, 31},
+{18, 22, 26, 30},
+{17, 21, 25, 28},
+{16, 20, 23, 27},
+{15, 19, 22, 25},
+{14, 18, 21, 24},
+{14, 17, 20, 23},
+{13, 16, 19, 22},
+{12, 15, 18, 21},
+{12, 14, 17, 20},
+{11, 14, 16, 19},
+{11, 13, 15, 18},
+{10, 12, 15, 17},
+{10, 12, 14, 16},
+{9, 11, 13, 15},
+{9, 11, 12, 14},
+{8, 10, 12, 14},
+{8, 9, 11, 13},
+{7, 9, 11, 12},
+{7, 9, 10, 12},
+{7, 8, 10, 11},
+{6, 8, 9, 11},
+{6, 7, 9, 10},
+{6, 7, 8, 9},
+{2, 2, 2, 2}
+};
+
+const uint8_t AC_next_state_MPS_64[64] = {
+1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
+21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
+41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
+51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
+61, 62, 62, 63
+};
+
+
+const uint8_t AC_next_state_LPS_64[64] = {
+0, 0, 1, 2, 2, 4, 4, 5, 6, 7,
+8, 9, 9, 11, 11, 12, 13, 13, 15, 15,
+16, 16, 18, 18, 19, 19, 21, 21, 22, 22,
+23, 24, 24, 25, 26, 26, 27, 27, 28, 29,
+29, 30, 30, 30, 31, 32, 32, 33, 33, 33,
+34, 34, 35, 35, 35, 36, 36, 36, 37, 37,
+37, 38, 38, 63
+};
+
 #define BITS_MASK_8 0xULL
 #define PAIR_MASK_8 0xULL
 #define NYBL_MASK_8 0x0f0f0f0f0f0f0f0fULL
diff --git a/target/hexagon/arch.h b/target/hexagon/arch.h
index 3e0c334..7091806 100644
--- a/target/hexagon/arch.h
+++ b/target/hexagon/arch.h
@@ -20,6 +20,10 @@
 
 #include "qemu/int128.h"
 
+extern const uint8_t rLPS_table_64x4[64][4];
+extern const uint8_t AC_next_state_MPS_64[64];
+extern const uint8_t AC_next_state_LPS_64[64];
+
 uint64_t interleave(uint32_t odd, uint32_t even);
 uint64_t deinterleave(uint64_t src);
 int32_t conv_round(int32_t a, int n);
diff --git a/target/hexagon/imported/encode_pp.def 
b/target/hexagon/imported/encode_pp.def
index dc4eba4..35ae3d2 100644
--- a/target/hexagon/imported/encode_pp.def
+++ b/target/hexagon/imported/encode_pp.def
@@ -1767,6 +1767,7 @@ SH_RRR_ENC(S4_vxsubaddh,
"0001","01-","-","110","d")
 SH_RRR_ENC(S4_vxaddsubhr,   "0001","11-","-","00-","d")
 SH_RRR_ENC(S4_vxsubaddhr,   "0001","11-","-","01-","d")
 SH_RRR_ENC(S4_extractp_rp,  "0001","11-","-","10-","d")
+SH_RRR_ENC(S2_cabacdecbin,  "0001","11-","-","11-","d") /* implicit P0 
write */
 
 
 DEF_FIELDROW_DESC32(ICLASS_S3op" 0010  PP-- ","[#2] 
Rdd=(Rss,Rtt,Pu)")
diff --git a/target/hexagon/imported/macros.def 
b/target/hexagon/imported/macros.def
index 56c99b1..32ed3bf 100755
--- a/target/hexagon/imported/macros.def
+++ b/target/hexagon/imported/macros.def
@@ -92,6 +92,21 @@ DEF_MACRO(
 /* attribs */
 )
 
+
+DEF_MACRO(
+fINSERT_RANGE,
+{
+int offset=LOWBIT;
+int width=HIBIT-LOWBIT+1;
+/* clear bits where new bits go */
+INREG &= ~(((fCONSTLL(1)<>29)&3];
+rLPS  = rLPS << 23;   /* left aligned */
+
+/* calculate rMPS */
+rMPS= (range&0xff80) - 

[PATCH v3 20/26] Hexagon (target/hexagon) add A6_vminub_RdP

2021-04-07 Thread Taylor Simpson
Rdd32,Pe4 = vminub(Rtt32, Rss32)
Vector min of bytes

Test cases in tests/tcg/hexagon/multi_result.c

Reviewed-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
---
 target/hexagon/gen_tcg.h  | 27 +++
 target/hexagon/genptr.c   | 22 ++
 target/hexagon/imported/alu.idef  | 10 ++
 target/hexagon/imported/encode_pp.def |  1 +
 tests/tcg/hexagon/multi_result.c  | 34 ++
 5 files changed, 94 insertions(+)

diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index 93310c5..aea0c55 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -237,6 +237,33 @@
 tcg_temp_free_i64(tmp); \
 } while (0)
 
+/*
+ * Compare each of the 8 unsigned bytes
+ * The minimum is placed in each byte of the destination.
+ * Each bit of the predicate is set true if the bit from the first operand
+ * is greater than the bit from the second operand.
+ * r5:4,p1 = vminub(r1:0, r3:2)
+ */
+#define fGEN_TCG_A6_vminub_RdP(SHORTCODE) \
+do { \
+TCGv left = tcg_temp_new(); \
+TCGv right = tcg_temp_new(); \
+TCGv tmp = tcg_temp_new(); \
+tcg_gen_movi_tl(PeV, 0); \
+tcg_gen_movi_i64(RddV, 0); \
+for (int i = 0; i < 8; i++) { \
+gen_get_byte_i64(left, i, RttV, false); \
+gen_get_byte_i64(right, i, RssV, false); \
+tcg_gen_setcond_tl(TCG_COND_GT, tmp, left, right); \
+tcg_gen_deposit_tl(PeV, PeV, tmp, i, 1); \
+tcg_gen_umin_tl(tmp, left, right); \
+gen_set_byte_i64(i, RddV, tmp); \
+} \
+tcg_temp_free(left); \
+tcg_temp_free(right); \
+tcg_temp_free(tmp); \
+} while (0)
+
 /* Floating point */
 #define fGEN_TCG_F2_conv_sf2df(SHORTCODE) \
 gen_helper_conv_sf2df(RddV, cpu_env, RsV)
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 24d5758..9dbebc6 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -266,6 +266,28 @@ static inline void gen_write_ctrl_reg_pair(DisasContext 
*ctx, int reg_num,
 }
 }
 
+static TCGv gen_get_byte_i64(TCGv result, int N, TCGv_i64 src, bool sign)
+{
+TCGv_i64 res64 = tcg_temp_new_i64();
+if (sign) {
+tcg_gen_sextract_i64(res64, src, N * 8, 8);
+} else {
+tcg_gen_extract_i64(res64, src, N * 8, 8);
+}
+tcg_gen_extrl_i64_i32(result, res64);
+tcg_temp_free_i64(res64);
+
+return result;
+}
+
+static void gen_set_byte_i64(int N, TCGv_i64 result, TCGv src)
+{
+TCGv_i64 src64 = tcg_temp_new_i64();
+tcg_gen_extu_i32_i64(src64, src);
+tcg_gen_deposit_i64(result, result, src64, N * 8, 8);
+tcg_temp_free_i64(src64);
+}
+
 static inline void gen_load_locked4u(TCGv dest, TCGv vaddr, int mem_index)
 {
 tcg_gen_qemu_ld32u(dest, vaddr, mem_index);
diff --git a/target/hexagon/imported/alu.idef b/target/hexagon/imported/alu.idef
index e8cc52c..f0c9bb4 100644
--- a/target/hexagon/imported/alu.idef
+++ b/target/hexagon/imported/alu.idef
@@ -1259,6 +1259,16 @@ Q6INSN(A5_ACS,"Rxx32,Pe4=vacsh(Rss32,Rtt32)",ATTRIBS(),
 }
 })
 
+Q6INSN(A6_vminub_RdP,"Rdd32,Pe4=vminub(Rtt32,Rss32)",ATTRIBS(),
+"Vector minimum of bytes, records minimum and decision vector",
+{
+fHIDE(int i;)
+for (i = 0; i < 8; i++) {
+fSETBIT(i, PeV, (fGETUBYTE(i,RttV) > fGETUBYTE(i,RssV)));
+fSETBYTE(i,RddV,fMIN(fGETUBYTE(i,RttV),fGETUBYTE(i,RssV)));
+}
+})
+
 /**/
 /* Vector Min/Max */
 /**/
diff --git a/target/hexagon/imported/encode_pp.def 
b/target/hexagon/imported/encode_pp.def
index 87e0426..4619398 100644
--- a/target/hexagon/imported/encode_pp.def
+++ b/target/hexagon/imported/encode_pp.def
@@ -1018,6 +1018,7 @@ MPY_ENC(M7_dcmpyiwc_acc, 
"1010","x","1","0","1","0","10")
 
 
 MPY_ENC(A5_ACS,  "1010","x","0","1","0","1","ee")
+MPY_ENC(A6_vminub_RdP,   "1010","d","0","1","1","1","ee")
 /*
 */
 
diff --git a/tests/tcg/hexagon/multi_result.c b/tests/tcg/hexagon/multi_result.c
index c21148f..95d99a0 100644
--- a/tests/tcg/hexagon/multi_result.c
+++ b/tests/tcg/hexagon/multi_result.c
@@ -70,6 +70,21 @@ static long long vacsh(long long Rxx, long long Rss, long 
long Rtt,
   return result;
 }
 
+static long long vminub(long long Rtt, long long Rss,
+int *pred_result)
+{
+  long long result;
+  int predval;
+
+  asm volatile("%0,p0 = vminub(%2, %3)\n\t"
+   "%1 = p0\n\t"
+   : "=r"(result), "=r"(predval)
+   : "r"(Rtt), "r"(Rss)
+   : "p0");
+  *pred_result = predval;
+  return result;
+}
+
 int err;
 
 static void check_ll(long long val, long long expect)
@@ -155,11 +170,30 @@ static void test_vacsh()
 check(ovf_result, 0);
 }
 
+static void test_vminub()
+{
+long long res64;

[PATCH v3 04/26] Hexagon (target/hexagon) use env_archcpu and env_cpu

2021-04-07 Thread Taylor Simpson
Remove hexagon_env_get_cpu and replace with env_archcpu
Replace CPU(hexagon_env_get_cpu(env)) with env_cpu(env)

Suggested-by: Richard Henderson 
Reviewed-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
---
 linux-user/hexagon/cpu_loop.c | 2 +-
 target/hexagon/cpu.c  | 4 ++--
 target/hexagon/cpu.h  | 5 -
 target/hexagon/op_helper.c| 2 +-
 target/hexagon/translate.c| 2 +-
 5 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/linux-user/hexagon/cpu_loop.c b/linux-user/hexagon/cpu_loop.c
index 9a68ca0..bc34f5d 100644
--- a/linux-user/hexagon/cpu_loop.c
+++ b/linux-user/hexagon/cpu_loop.c
@@ -25,7 +25,7 @@
 
 void cpu_loop(CPUHexagonState *env)
 {
-CPUState *cs = CPU(hexagon_env_get_cpu(env));
+CPUState *cs = env_cpu(env);
 int trapnr, signum, sigcode;
 target_ulong sigaddr;
 target_ulong syscallnum;
diff --git a/target/hexagon/cpu.c b/target/hexagon/cpu.c
index c2fe357..f044506 100644
--- a/target/hexagon/cpu.c
+++ b/target/hexagon/cpu.c
@@ -71,7 +71,7 @@ const char * const hexagon_regnames[TOTAL_PER_THREAD_REGS] = {
  */
 static target_ulong adjust_stack_ptrs(CPUHexagonState *env, target_ulong addr)
 {
-HexagonCPU *cpu = hexagon_env_get_cpu(env);
+HexagonCPU *cpu = env_archcpu(env);
 target_ulong stack_adjust = cpu->lldb_stack_adjust;
 target_ulong stack_start = env->stack_start;
 target_ulong stack_size = 0x1;
@@ -115,7 +115,7 @@ static void print_reg(FILE *f, CPUHexagonState *env, int 
regnum)
 
 static void hexagon_dump(CPUHexagonState *env, FILE *f)
 {
-HexagonCPU *cpu = hexagon_env_get_cpu(env);
+HexagonCPU *cpu = env_archcpu(env);
 
 if (cpu->lldb_compat) {
 /*
diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
index e04eac5..2855dd3 100644
--- a/target/hexagon/cpu.h
+++ b/target/hexagon/cpu.h
@@ -127,11 +127,6 @@ typedef struct HexagonCPU {
 target_ulong lldb_stack_adjust;
 } HexagonCPU;
 
-static inline HexagonCPU *hexagon_env_get_cpu(CPUHexagonState *env)
-{
-return container_of(env, HexagonCPU, env);
-}
-
 #include "cpu_bits.h"
 
 #define cpu_signal_handler cpu_hexagon_signal_handler
diff --git a/target/hexagon/op_helper.c b/target/hexagon/op_helper.c
index 5d35dfc..7ac8554 100644
--- a/target/hexagon/op_helper.c
+++ b/target/hexagon/op_helper.c
@@ -35,7 +35,7 @@ static void QEMU_NORETURN 
do_raise_exception_err(CPUHexagonState *env,
  uint32_t exception,
  uintptr_t pc)
 {
-CPUState *cs = CPU(hexagon_env_get_cpu(env));
+CPUState *cs = env_cpu(env);
 qemu_log_mask(CPU_LOG_INT, "%s: %d\n", __func__, exception);
 cs->exception_index = exception;
 cpu_loop_exit_restore(cs, pc);
diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
index f975d7a..e235fdb 100644
--- a/target/hexagon/translate.c
+++ b/target/hexagon/translate.c
@@ -585,7 +585,7 @@ static void hexagon_tr_translate_packet(DisasContextBase 
*dcbase, CPUState *cpu)
  * The CPU log is used to compare against LLDB single stepping,
  * so end the TLB after every packet.
  */
-HexagonCPU *hex_cpu = hexagon_env_get_cpu(env);
+HexagonCPU *hex_cpu = env_archcpu(env);
 if (hex_cpu->lldb_compat && qemu_loglevel_mask(CPU_LOG_TB_CPU)) {
 ctx->base.is_jmp = DISAS_TOO_MANY;
 }
-- 
2.7.4




[PATCH v3 23/26] Hexagon (target/hexagon) bit reverse (brev) addressing

2021-04-07 Thread Taylor Simpson
The following instructions are added
L2_loadrub_pbr  Rd32 = memub(Rx32++Mu2:brev)
L2_loadrb_pbr   Rd32 = memb(Rx32++Mu2:brev)
L2_loadruh_pbr  Rd32 = memuh(Rx32++Mu2:brev)
L2_loadrh_pbr   Rd32 = memh(Rx32++Mu2:brev)
L2_loadri_pbr   Rd32 = memw(Rx32++Mu2:brev)
L2_loadrd_pbr   Rdd32 = memd(Rx32++Mu2:brev)
S2_storerb_pbr  memb(Rx32++Mu2:brev).=.Rt32
S2_storerh_pbr  memh(Rx32++Mu2:brev).=.Rt32
S2_storerf_pbr  memh(Rx32++Mu2:brev).=.Rt.H32
S2_storeri_pbr  memw(Rx32++Mu2:brev).=.Rt32
S2_storerd_pbr  memd(Rx32++Mu2:brev).=.Rt32
S2_storerinew_pbr   memw(Rx32++Mu2:brev).=.Nt8.new
S2_storerbnew_pbr   memw(Rx32++Mu2:brev).=.Nt8.new
S2_storerhnew_pbr   memw(Rx32++Mu2:brev).=.Nt8.new

Test cases in tests/tcg/hexagon/brev.c

Reviewed-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
---
 target/hexagon/gen_tcg.h  |  28 +
 target/hexagon/helper.h   |   1 +
 target/hexagon/imported/encode_pp.def |   4 +
 target/hexagon/imported/ldst.idef |   2 +
 target/hexagon/imported/macros.def|   6 ++
 target/hexagon/macros.h   |   1 +
 target/hexagon/op_helper.c|   8 ++
 tests/tcg/hexagon/Makefile.target |   1 +
 tests/tcg/hexagon/brev.c  | 190 ++
 9 files changed, 241 insertions(+)
 create mode 100644 tests/tcg/hexagon/brev.c

diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index 25c228c..8f0ec01 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -37,6 +37,7 @@
  * _sp   stack pointer relativer0 = memw(r29+#12)
  * _ap   absolute set  r0 = memw(r1=##variable)
  * _pr   post increment register   r0 = memw(r1++m1)
+ * _pbr  post increment bit reverser0 = memw(r1++m1:brev)
  * _pi   post increment immediate  r0 = memb(r1++#1)
  * _pci  post increment circular immediate r0 = memw(r1++#4:circ(m0))
  * _pcr  post increment circular register  r0 = memw(r1++I:circ(m0))
@@ -53,6 +54,11 @@
 fEA_REG(RxV); \
 fPM_M(RxV, MuV); \
 } while (0)
+#define GET_EA_pbr \
+do { \
+gen_helper_fbrev(EA, RxV); \
+tcg_gen_add_tl(RxV, RxV, MuV); \
+} while (0)
 #define GET_EA_pi \
 do { \
 fEA_REG(RxV); \
@@ -128,16 +134,22 @@
   fGEN_TCG_LOAD_pcr(3, fLOAD(1, 8, u, EA, RddV))
 
 #define fGEN_TCG_L2_loadrub_pr(SHORTCODE)  SHORTCODE
+#define fGEN_TCG_L2_loadrub_pbr(SHORTCODE) SHORTCODE
 #define fGEN_TCG_L2_loadrub_pi(SHORTCODE)  SHORTCODE
 #define fGEN_TCG_L2_loadrb_pr(SHORTCODE)   SHORTCODE
+#define fGEN_TCG_L2_loadrb_pbr(SHORTCODE)  SHORTCODE
 #define fGEN_TCG_L2_loadrb_pi(SHORTCODE)   SHORTCODE
 #define fGEN_TCG_L2_loadruh_pr(SHORTCODE)  SHORTCODE
+#define fGEN_TCG_L2_loadruh_pbr(SHORTCODE) SHORTCODE
 #define fGEN_TCG_L2_loadruh_pi(SHORTCODE)  SHORTCODE
 #define fGEN_TCG_L2_loadrh_pr(SHORTCODE)   SHORTCODE
+#define fGEN_TCG_L2_loadrh_pbr(SHORTCODE)  SHORTCODE
 #define fGEN_TCG_L2_loadrh_pi(SHORTCODE)   SHORTCODE
 #define fGEN_TCG_L2_loadri_pr(SHORTCODE)   SHORTCODE
+#define fGEN_TCG_L2_loadri_pbr(SHORTCODE)  SHORTCODE
 #define fGEN_TCG_L2_loadri_pi(SHORTCODE)   SHORTCODE
 #define fGEN_TCG_L2_loadrd_pr(SHORTCODE)   SHORTCODE
+#define fGEN_TCG_L2_loadrd_pbr(SHORTCODE)  SHORTCODE
 #define fGEN_TCG_L2_loadrd_pi(SHORTCODE)   SHORTCODE
 
 /*
@@ -265,41 +277,57 @@
 tcg_temp_free(BYTE); \
 } while (0)
 
+#define fGEN_TCG_S2_storerb_pbr(SHORTCODE) \
+fGEN_TCG_STORE(SHORTCODE)
 #define fGEN_TCG_S2_storerb_pci(SHORTCODE) \
 fGEN_TCG_STORE(SHORTCODE)
 #define fGEN_TCG_S2_storerb_pcr(SHORTCODE) \
 fGEN_TCG_STORE_pcr(0, fSTORE(1, 1, EA, fGETBYTE(0, RtV)))
 
+#define fGEN_TCG_S2_storerh_pbr(SHORTCODE) \
+fGEN_TCG_STORE(SHORTCODE)
 #define fGEN_TCG_S2_storerh_pci(SHORTCODE) \
 fGEN_TCG_STORE(SHORTCODE)
 #define fGEN_TCG_S2_storerh_pcr(SHORTCODE) \
 fGEN_TCG_STORE_pcr(1, fSTORE(1, 2, EA, fGETHALF(0, RtV)))
 
+#define fGEN_TCG_S2_storerf_pbr(SHORTCODE) \
+fGEN_TCG_STORE(SHORTCODE)
 #define fGEN_TCG_S2_storerf_pci(SHORTCODE) \
 fGEN_TCG_STORE(SHORTCODE)
 #define fGEN_TCG_S2_storerf_pcr(SHORTCODE) \
 fGEN_TCG_STORE_pcr(1, fSTORE(1, 2, EA, fGETHALF(1, RtV)))
 
+#define fGEN_TCG_S2_storeri_pbr(SHORTCODE) \
+fGEN_TCG_STORE(SHORTCODE)
 #define fGEN_TCG_S2_storeri_pci(SHORTCODE) \
 fGEN_TCG_STORE(SHORTCODE)
 #define fGEN_TCG_S2_storeri_pcr(SHORTCODE) \
 fGEN_TCG_STORE_pcr(2, fSTORE(1, 4, EA, RtV))
 
+#define fGEN_TCG_S2_storerd_pbr(SHORTCODE) \
+fGEN_TCG_STORE(SHORTCODE)
 #define fGEN_TCG_S2_storerd_pci(SHORTCODE) \
 fGEN_TCG_STORE(SHORTCODE)
 #define fGEN_TCG_S2_storerd_pcr(SHORTCODE) \
 fGEN_TCG_STORE_pcr(3, fSTORE(1, 8, EA, RttV))
 
+#define 

[PATCH v3 10/26] Hexagon (target/hexagon) use softfloat default NaN and tininess

2021-04-07 Thread Taylor Simpson
Suggested-by: Richard Henderson 
Reviewed-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
---
 fpu/softfloat-specialize.c.inc |  3 +++
 target/hexagon/cpu.c   |  5 +
 target/hexagon/op_helper.c | 47 --
 3 files changed, 8 insertions(+), 47 deletions(-)

diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index c2f87ad..9ea318f 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -145,6 +145,9 @@ static FloatParts parts_default_nan(float_status *status)
 #elif defined(TARGET_HPPA)
 /* snan_bit_is_one, set msb-1.  */
 frac = 1ULL << (DECOMPOSED_BINARY_POINT - 2);
+#elif defined(TARGET_HEXAGON)
+sign = 1;
+frac = ~0ULL;
 #else
 /* This case is true for Alpha, ARM, MIPS, OpenRISC, PPC, RISC-V,
  * S390, SH4, TriCore, and Xtensa.  I cannot find documentation
diff --git a/target/hexagon/cpu.c b/target/hexagon/cpu.c
index f044506..ff44fd6 100644
--- a/target/hexagon/cpu.c
+++ b/target/hexagon/cpu.c
@@ -23,6 +23,7 @@
 #include "exec/exec-all.h"
 #include "qapi/error.h"
 #include "hw/qdev-properties.h"
+#include "fpu/softfloat-helpers.h"
 
 static void hexagon_v67_cpu_init(Object *obj)
 {
@@ -205,8 +206,12 @@ static void hexagon_cpu_reset(DeviceState *dev)
 CPUState *cs = CPU(dev);
 HexagonCPU *cpu = HEXAGON_CPU(cs);
 HexagonCPUClass *mcc = HEXAGON_CPU_GET_CLASS(cpu);
+CPUHexagonState *env = >env;
 
 mcc->parent_reset(dev);
+
+set_default_nan_mode(1, >fp_status);
+set_float_detect_tininess(float_tininess_before_rounding, >fp_status);
 }
 
 static void hexagon_cpu_disas_set_info(CPUState *s, disassemble_info *info)
diff --git a/target/hexagon/op_helper.c b/target/hexagon/op_helper.c
index 1d91fa2..478421d 100644
--- a/target/hexagon/op_helper.c
+++ b/target/hexagon/op_helper.c
@@ -297,26 +297,6 @@ int32_t HELPER(fcircadd)(int32_t RxV, int32_t offset, 
int32_t M, int32_t CS)
 }
 
 /*
- * Hexagon FP operations return ~0 instead of NaN
- * The hex_check_sfnan/hex_check_dfnan functions perform this check
- */
-static float32 hex_check_sfnan(float32 x)
-{
-if (float32_is_any_nan(x)) {
-return make_float32(0xU);
-}
-return x;
-}
-
-static float64 hex_check_dfnan(float64 x)
-{
-if (float64_is_any_nan(x)) {
-return make_float64(0xULL);
-}
-return x;
-}
-
-/*
  * mem_noshuf
  * Section 5.5 of the Hexagon V67 Programmer's Reference Manual
  *
@@ -373,7 +353,6 @@ float64 HELPER(conv_sf2df)(CPUHexagonState *env, float32 
RsV)
 float64 out_f64;
 arch_fpop_start(env);
 out_f64 = float32_to_float64(RsV, >fp_status);
-out_f64 = hex_check_dfnan(out_f64);
 arch_fpop_end(env);
 return out_f64;
 }
@@ -383,7 +362,6 @@ float32 HELPER(conv_df2sf)(CPUHexagonState *env, float64 
RssV)
 float32 out_f32;
 arch_fpop_start(env);
 out_f32 = float64_to_float32(RssV, >fp_status);
-out_f32 = hex_check_sfnan(out_f32);
 arch_fpop_end(env);
 return out_f32;
 }
@@ -393,7 +371,6 @@ float32 HELPER(conv_uw2sf)(CPUHexagonState *env, int32_t 
RsV)
 float32 RdV;
 arch_fpop_start(env);
 RdV = uint32_to_float32(RsV, >fp_status);
-RdV = hex_check_sfnan(RdV);
 arch_fpop_end(env);
 return RdV;
 }
@@ -403,7 +380,6 @@ float64 HELPER(conv_uw2df)(CPUHexagonState *env, int32_t 
RsV)
 float64 RddV;
 arch_fpop_start(env);
 RddV = uint32_to_float64(RsV, >fp_status);
-RddV = hex_check_dfnan(RddV);
 arch_fpop_end(env);
 return RddV;
 }
@@ -413,7 +389,6 @@ float32 HELPER(conv_w2sf)(CPUHexagonState *env, int32_t RsV)
 float32 RdV;
 arch_fpop_start(env);
 RdV = int32_to_float32(RsV, >fp_status);
-RdV = hex_check_sfnan(RdV);
 arch_fpop_end(env);
 return RdV;
 }
@@ -423,7 +398,6 @@ float64 HELPER(conv_w2df)(CPUHexagonState *env, int32_t RsV)
 float64 RddV;
 arch_fpop_start(env);
 RddV = int32_to_float64(RsV, >fp_status);
-RddV = hex_check_dfnan(RddV);
 arch_fpop_end(env);
 return RddV;
 }
@@ -433,7 +407,6 @@ float32 HELPER(conv_ud2sf)(CPUHexagonState *env, int64_t 
RssV)
 float32 RdV;
 arch_fpop_start(env);
 RdV = uint64_to_float32(RssV, >fp_status);
-RdV = hex_check_sfnan(RdV);
 arch_fpop_end(env);
 return RdV;
 }
@@ -443,7 +416,6 @@ float64 HELPER(conv_ud2df)(CPUHexagonState *env, int64_t 
RssV)
 float64 RddV;
 arch_fpop_start(env);
 RddV = uint64_to_float64(RssV, >fp_status);
-RddV = hex_check_dfnan(RddV);
 arch_fpop_end(env);
 return RddV;
 }
@@ -453,7 +425,6 @@ float32 HELPER(conv_d2sf)(CPUHexagonState *env, int64_t 
RssV)
 float32 RdV;
 arch_fpop_start(env);
 RdV = int64_to_float32(RssV, >fp_status);
-RdV = hex_check_sfnan(RdV);
 arch_fpop_end(env);
 return RdV;
 }
@@ -463,7 +434,6 @@ float64 HELPER(conv_d2df)(CPUHexagonState *env, int64_t 
RssV)
 float64 RddV;
 arch_fpop_start(env);
 RddV = int64_to_float64(RssV, 

[PATCH v3 15/26] Hexagon (target/hexagon) move QEMU_GENERATE to only be on during macros.h

2021-04-07 Thread Taylor Simpson
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Taylor Simpson 
---
 target/hexagon/genptr.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 6b74344..b87e264 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -15,7 +15,6 @@
  *  along with this program; if not, see .
  */
 
-#define QEMU_GENERATE
 #include "qemu/osdep.h"
 #include "qemu/log.h"
 #include "cpu.h"
@@ -24,7 +23,9 @@
 #include "insn.h"
 #include "opcodes.h"
 #include "translate.h"
+#define QEMU_GENERATE   /* Used internally by macros.h */
 #include "macros.h"
+#undef QEMU_GENERATE
 #include "gen_tcg.h"
 
 static inline TCGv gen_read_preg(TCGv pred, uint8_t num)
-- 
2.7.4




[PATCH v3 03/26] Hexagon (target/hexagon) remove unnecessary inline directives

2021-04-07 Thread Taylor Simpson
Suggested-by: Richard Henderson 
Reviewed-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
---
 target/hexagon/cpu.c   |  9 -
 target/hexagon/decode.c|  6 +++---
 target/hexagon/fma_emu.c   | 39 ---
 target/hexagon/op_helper.c | 37 ++---
 target/hexagon/translate.c |  2 +-
 5 files changed, 46 insertions(+), 47 deletions(-)

diff --git a/target/hexagon/cpu.c b/target/hexagon/cpu.c
index b0b3040..c2fe357 100644
--- a/target/hexagon/cpu.c
+++ b/target/hexagon/cpu.c
@@ -69,10 +69,9 @@ const char * const hexagon_regnames[TOTAL_PER_THREAD_REGS] = 
{
  * stacks at different locations.  This is used to compensate so the diff is
  * cleaner.
  */
-static inline target_ulong adjust_stack_ptrs(CPUHexagonState *env,
- target_ulong addr)
+static target_ulong adjust_stack_ptrs(CPUHexagonState *env, target_ulong addr)
 {
-HexagonCPU *cpu = container_of(env, HexagonCPU, env);
+HexagonCPU *cpu = hexagon_env_get_cpu(env);
 target_ulong stack_adjust = cpu->lldb_stack_adjust;
 target_ulong stack_start = env->stack_start;
 target_ulong stack_size = 0x1;
@@ -88,7 +87,7 @@ static inline target_ulong adjust_stack_ptrs(CPUHexagonState 
*env,
 }
 
 /* HEX_REG_P3_0 (aka C4) is an alias for the predicate registers */
-static inline target_ulong read_p3_0(CPUHexagonState *env)
+static target_ulong read_p3_0(CPUHexagonState *env)
 {
 int32_t control_reg = 0;
 int i;
@@ -116,7 +115,7 @@ static void print_reg(FILE *f, CPUHexagonState *env, int 
regnum)
 
 static void hexagon_dump(CPUHexagonState *env, FILE *f)
 {
-HexagonCPU *cpu = container_of(env, HexagonCPU, env);
+HexagonCPU *cpu = hexagon_env_get_cpu(env);
 
 if (cpu->lldb_compat) {
 /*
diff --git a/target/hexagon/decode.c b/target/hexagon/decode.c
index 1c9c074..65d97ce 100644
--- a/target/hexagon/decode.c
+++ b/target/hexagon/decode.c
@@ -354,7 +354,7 @@ static void decode_split_cmpjump(Packet *pkt)
 }
 }
 
-static inline int decode_opcode_can_jump(int opcode)
+static int decode_opcode_can_jump(int opcode)
 {
 if ((GET_ATTRIB(opcode, A_JUMP)) ||
 (GET_ATTRIB(opcode, A_CALL)) ||
@@ -370,7 +370,7 @@ static inline int decode_opcode_can_jump(int opcode)
 return 0;
 }
 
-static inline int decode_opcode_ends_loop(int opcode)
+static int decode_opcode_ends_loop(int opcode)
 {
 return GET_ATTRIB(opcode, A_HWLOOP0_END) ||
GET_ATTRIB(opcode, A_HWLOOP1_END);
@@ -764,7 +764,7 @@ static void decode_add_endloop_insn(Insn *insn, int loopnum)
 }
 }
 
-static inline int decode_parsebits_is_loopend(uint32_t encoding32)
+static int decode_parsebits_is_loopend(uint32_t encoding32)
 {
 uint32_t bits = parse_bits(encoding32);
 return bits == 0x2;
diff --git a/target/hexagon/fma_emu.c b/target/hexagon/fma_emu.c
index 842d903..f324b83 100644
--- a/target/hexagon/fma_emu.c
+++ b/target/hexagon/fma_emu.c
@@ -64,7 +64,7 @@ typedef union {
 };
 } Float;
 
-static inline uint64_t float64_getmant(float64 f64)
+static uint64_t float64_getmant(float64 f64)
 {
 Double a = { .i = f64 };
 if (float64_is_normal(f64)) {
@@ -91,7 +91,7 @@ int32_t float64_getexp(float64 f64)
 return -1;
 }
 
-static inline uint64_t float32_getmant(float32 f32)
+static uint64_t float32_getmant(float32 f32)
 {
 Float a = { .i = f32 };
 if (float32_is_normal(f32)) {
@@ -118,17 +118,17 @@ int32_t float32_getexp(float32 f32)
 return -1;
 }
 
-static inline uint32_t int128_getw0(Int128 x)
+static uint32_t int128_getw0(Int128 x)
 {
 return int128_getlo(x);
 }
 
-static inline uint32_t int128_getw1(Int128 x)
+static uint32_t int128_getw1(Int128 x)
 {
 return int128_getlo(x) >> 32;
 }
 
-static inline Int128 int128_mul_6464(uint64_t ai, uint64_t bi)
+static Int128 int128_mul_6464(uint64_t ai, uint64_t bi)
 {
 Int128 a, b;
 uint64_t pp0, pp1a, pp1b, pp1s, pp2;
@@ -152,7 +152,7 @@ static inline Int128 int128_mul_6464(uint64_t ai, uint64_t 
bi)
 return int128_make128(ret_low, pp2 + (pp1s >> 32));
 }
 
-static inline Int128 int128_sub_borrow(Int128 a, Int128 b, int borrow)
+static Int128 int128_sub_borrow(Int128 a, Int128 b, int borrow)
 {
 Int128 ret = int128_sub(a, b);
 if (borrow != 0) {
@@ -170,7 +170,7 @@ typedef struct {
 uint8_t sticky;
 } Accum;
 
-static inline void accum_init(Accum *p)
+static void accum_init(Accum *p)
 {
 p->mant = int128_zero();
 p->exp = 0;
@@ -180,7 +180,7 @@ static inline void accum_init(Accum *p)
 p->sticky = 0;
 }
 
-static inline Accum accum_norm_left(Accum a)
+static Accum accum_norm_left(Accum a)
 {
 a.exp--;
 a.mant = int128_lshift(a.mant, 1);
@@ -190,6 +190,7 @@ static inline Accum accum_norm_left(Accum a)
 return a;
 }
 
+/* This function is marked inline for performance reasons */
 static inline Accum accum_norm_right(Accum a, int amt)
 {
 if (amt > 130) {
@@ -226,7 +227,7 @@ static inline 

[PATCH v3 01/26] Hexagon (target/hexagon) TCG generation cleanup

2021-04-07 Thread Taylor Simpson
Simplify TCG generation of hex_reg_written

Suggested-by: Richard Henderson 
Reviewed-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
---
 target/hexagon/genptr.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 7481f4c..87f5d92 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -35,7 +35,6 @@ static inline TCGv gen_read_preg(TCGv pred, uint8_t num)
 
 static inline void gen_log_predicated_reg_write(int rnum, TCGv val, int slot)
 {
-TCGv one = tcg_const_tl(1);
 TCGv zero = tcg_const_tl(0);
 TCGv slot_mask = tcg_temp_new();
 
@@ -43,12 +42,17 @@ static inline void gen_log_predicated_reg_write(int rnum, 
TCGv val, int slot)
 tcg_gen_movcond_tl(TCG_COND_EQ, hex_new_value[rnum], slot_mask, zero,
val, hex_new_value[rnum]);
 #if HEX_DEBUG
-/* Do this so HELPER(debug_commit_end) will know */
-tcg_gen_movcond_tl(TCG_COND_EQ, hex_reg_written[rnum], slot_mask, zero,
-   one, hex_reg_written[rnum]);
+/*
+ * Do this so HELPER(debug_commit_end) will know
+ *
+ * Note that slot_mask indicates the value is not written
+ * (i.e., slot was cancelled), so we create a true/false value before
+ * or'ing with hex_reg_written[rnum].
+ */
+tcg_gen_setcond_tl(TCG_COND_EQ, slot_mask, slot_mask, zero);
+tcg_gen_or_tl(hex_reg_written[rnum], hex_reg_written[rnum], slot_mask);
 #endif
 
-tcg_temp_free(one);
 tcg_temp_free(zero);
 tcg_temp_free(slot_mask);
 }
-- 
2.7.4




[PATCH v3 13/26] Hexagon (target/hexagon) cleanup ternary operators in semantics

2021-04-07 Thread Taylor Simpson
Change  (cond ? (res = x) : (res = y)) to res = (cond ? x : y)

This makes the semnatics easier to for idef-parser to deal with

The following instructions are impacted
C2_any8
C2_all8
C2_mux
C2_muxii
C2_muxir
C2_muxri

Signed-off-by: Taylor Simpson 
---
 target/hexagon/imported/compare.idef | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/target/hexagon/imported/compare.idef 
b/target/hexagon/imported/compare.idef
index 3551467..abd016f 100644
--- a/target/hexagon/imported/compare.idef
+++ b/target/hexagon/imported/compare.idef
@@ -198,11 +198,11 @@ 
Q6INSN(C4_or_orn,"Pd4=or(Ps4,or(Pt4,!Pu4))",ATTRIBS(A_CRSLOT23),
 
 Q6INSN(C2_any8,"Pd4=any8(Ps4)",ATTRIBS(A_CRSLOT23),
 "Logical ANY of low 8 predicate bits",
-{ PsV ? (PdV=0xff) : (PdV=0x00); })
+{ PdV = (PsV ? 0xff : 0x00); })
 
 Q6INSN(C2_all8,"Pd4=all8(Ps4)",ATTRIBS(A_CRSLOT23),
 "Logical ALL of low 8 predicate bits",
-{ (PsV==0xff) ? (PdV=0xff) : (PdV=0x00); })
+{ PdV = (PsV == 0xff ? 0xff : 0x00); })
 
 Q6INSN(C2_vitpack,"Rd32=vitpack(Ps4,Pt4)",ATTRIBS(),
 "Pack the odd and even bits of two predicate registers",
@@ -212,7 +212,7 @@ Q6INSN(C2_vitpack,"Rd32=vitpack(Ps4,Pt4)",ATTRIBS(),
 
 Q6INSN(C2_mux,"Rd32=mux(Pu4,Rs32,Rt32)",ATTRIBS(),
 "Scalar MUX",
-{ (fLSBOLD(PuV)) ? (RdV=RsV):(RdV=RtV); })
+{ RdV = (fLSBOLD(PuV) ? RsV : RtV); })
 
 
 Q6INSN(C2_cmovenewit,"if (Pu4.new) Rd32=#s12",ATTRIBS(A_ARCHV2),
@@ -269,18 +269,18 @@ Q6INSN(C2_ccombinewf,"if (!Pu4) 
Rdd32=combine(Rs32,Rt32)",ATTRIBS(A_ARCHV2),
 
 Q6INSN(C2_muxii,"Rd32=mux(Pu4,#s8,#S8)",ATTRIBS(A_ARCHV2),
 "Scalar MUX immediates",
-{ fIMMEXT(siV); (fLSBOLD(PuV)) ? (RdV=siV):(RdV=SiV); })
+{ fIMMEXT(siV); RdV = (fLSBOLD(PuV) ? siV : SiV); })
 
 
 
 Q6INSN(C2_muxir,"Rd32=mux(Pu4,Rs32,#s8)",ATTRIBS(A_ARCHV2),
 "Scalar MUX register immediate",
-{ fIMMEXT(siV); (fLSBOLD(PuV)) ? (RdV=RsV):(RdV=siV); })
+{ fIMMEXT(siV); RdV = (fLSBOLD(PuV) ? RsV : siV); })
 
 
 Q6INSN(C2_muxri,"Rd32=mux(Pu4,#s8,Rs32)",ATTRIBS(A_ARCHV2),
 "Scalar MUX register immediate",
-{ fIMMEXT(siV); (fLSBOLD(PuV)) ? (RdV=siV):(RdV=RsV); })
+{ fIMMEXT(siV); RdV = (fLSBOLD(PuV) ? siV : RsV); })
 
 
 
-- 
2.7.4




[PATCH v3 09/26] Hexagon (target/hexagon) change type of softfloat_roundingmodes

2021-04-07 Thread Taylor Simpson
Suggested-by: Richard Henderson 
Reviewed-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
---
 target/hexagon/arch.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/hexagon/arch.c b/target/hexagon/arch.c
index 699e2cf..bb51f19 100644
--- a/target/hexagon/arch.c
+++ b/target/hexagon/arch.c
@@ -95,7 +95,7 @@ int32_t conv_round(int32_t a, int n)
 
 /* Floating Point Stuff */
 
-static const int softfloat_roundingmodes[] = {
+static const FloatRoundMode softfloat_roundingmodes[] = {
 float_round_nearest_even,
 float_round_to_zero,
 float_round_down,
-- 
2.7.4




[PATCH v3 06/26] Hexagon (target/hexagon) decide if pred has been written at TCG gen time

2021-04-07 Thread Taylor Simpson
Multiple writes to the same preg are and'ed together.  Rather than
generating a runtime check, we can determine at TCG generation time
if the predicate has previously been written in the packet.

Test added to tests/tcg/hexagon/misc.c

Suggested-by: Richard Henderson 
Reviewed-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
---
 target/hexagon/gen_tcg_funcs.py |  2 +-
 target/hexagon/genptr.c | 22 +++---
 target/hexagon/translate.c  |  9 +++--
 target/hexagon/translate.h  |  2 ++
 tests/tcg/hexagon/misc.c| 19 +++
 5 files changed, 44 insertions(+), 10 deletions(-)

diff --git a/target/hexagon/gen_tcg_funcs.py b/target/hexagon/gen_tcg_funcs.py
index db9f663..7ceb25b 100755
--- a/target/hexagon/gen_tcg_funcs.py
+++ b/target/hexagon/gen_tcg_funcs.py
@@ -316,7 +316,7 @@ def genptr_dst_write(f, tag, regtype, regid):
 print("Bad register parse: ", regtype, regid)
 elif (regtype == "P"):
 if (regid in {"d", "e", "x"}):
-f.write("gen_log_pred_write(%s%sN, %s%sV);\n" % \
+f.write("gen_log_pred_write(ctx, %s%sN, %s%sV);\n" % \
 (regtype, regid, regtype, regid))
 f.write("ctx_log_pred_write(ctx, %s%sN);\n" % \
 (regtype, regid))
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 07d970f..6b74344 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -119,20 +119,28 @@ static void gen_log_reg_write_pair(int rnum, TCGv_i64 val)
 #endif
 }
 
-static inline void gen_log_pred_write(int pnum, TCGv val)
+static inline void gen_log_pred_write(DisasContext *ctx, int pnum, TCGv val)
 {
 TCGv zero = tcg_const_tl(0);
 TCGv base_val = tcg_temp_new();
 TCGv and_val = tcg_temp_new();
 TCGv pred_written = tcg_temp_new();
 
-/* Multiple writes to the same preg are and'ed together */
 tcg_gen_andi_tl(base_val, val, 0xff);
-tcg_gen_and_tl(and_val, base_val, hex_new_pred_value[pnum]);
-tcg_gen_andi_tl(pred_written, hex_pred_written, 1 << pnum);
-tcg_gen_movcond_tl(TCG_COND_NE, hex_new_pred_value[pnum],
-   pred_written, zero,
-   and_val, base_val);
+
+/*
+ * Section 6.1.3 of the Hexagon V67 Programmer's Reference Manual
+ *
+ * Multiple writes to the same preg are and'ed together
+ * If this is the first predicate write in the packet, do a
+ * straight assignment.  Otherwise, do an and.
+ */
+if (!test_bit(pnum, ctx->pregs_written)) {
+tcg_gen_mov_tl(hex_new_pred_value[pnum], base_val);
+} else {
+tcg_gen_and_tl(hex_new_pred_value[pnum],
+   hex_new_pred_value[pnum], base_val);
+}
 tcg_gen_ori_tl(hex_pred_written, hex_pred_written, 1 << pnum);
 
 tcg_temp_free(zero);
diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
index 9f2a531..49ec8b7 100644
--- a/target/hexagon/translate.c
+++ b/target/hexagon/translate.c
@@ -172,6 +172,7 @@ static void gen_start_packet(DisasContext *ctx, Packet *pkt)
 ctx->reg_log_idx = 0;
 bitmap_zero(ctx->regs_written, TOTAL_PER_THREAD_REGS);
 ctx->preg_log_idx = 0;
+bitmap_zero(ctx->pregs_written, NUM_PREGS);
 for (i = 0; i < STORES_MAX; i++) {
 ctx->store_width[i] = 0;
 }
@@ -226,7 +227,7 @@ static void mark_implicit_pred_write(DisasContext *ctx, 
Insn *insn,
 }
 }
 
-static void mark_implicit_writes(DisasContext *ctx, Insn *insn)
+static void mark_implicit_reg_writes(DisasContext *ctx, Insn *insn)
 {
 mark_implicit_reg_write(ctx, insn, A_IMPLICIT_WRITES_FP,  HEX_REG_FP);
 mark_implicit_reg_write(ctx, insn, A_IMPLICIT_WRITES_SP,  HEX_REG_SP);
@@ -235,7 +236,10 @@ static void mark_implicit_writes(DisasContext *ctx, Insn 
*insn)
 mark_implicit_reg_write(ctx, insn, A_IMPLICIT_WRITES_SA0, HEX_REG_SA0);
 mark_implicit_reg_write(ctx, insn, A_IMPLICIT_WRITES_LC1, HEX_REG_LC1);
 mark_implicit_reg_write(ctx, insn, A_IMPLICIT_WRITES_SA1, HEX_REG_SA1);
+}
 
+static void mark_implicit_pred_writes(DisasContext *ctx, Insn *insn)
+{
 mark_implicit_pred_write(ctx, insn, A_IMPLICIT_WRITES_P0, 0);
 mark_implicit_pred_write(ctx, insn, A_IMPLICIT_WRITES_P1, 1);
 mark_implicit_pred_write(ctx, insn, A_IMPLICIT_WRITES_P2, 2);
@@ -246,8 +250,9 @@ static void gen_insn(CPUHexagonState *env, DisasContext 
*ctx,
  Insn *insn, Packet *pkt)
 {
 if (insn->generate) {
-mark_implicit_writes(ctx, insn);
+mark_implicit_reg_writes(ctx, insn);
 insn->generate(env, ctx, insn, pkt);
+mark_implicit_pred_writes(ctx, insn);
 } else {
 gen_exception_end_tb(ctx, HEX_EXCP_INVALID_OPCODE);
 }
diff --git a/target/hexagon/translate.h b/target/hexagon/translate.h
index 12506c8..0ecfbd7 100644
--- a/target/hexagon/translate.h
+++ b/target/hexagon/translate.h
@@ -34,6 +34,7 @@ typedef struct DisasContext {
 DECLARE_BITMAP(regs_written, 

[PATCH v3 08/26] Hexagon (target/hexagon) remove unused carry_from_add64 function

2021-04-07 Thread Taylor Simpson
Suggested-by: Richard Henderson 
Reviewed-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
---
 target/hexagon/arch.c   | 13 -
 target/hexagon/arch.h   |  1 -
 target/hexagon/macros.h |  2 --
 3 files changed, 16 deletions(-)

diff --git a/target/hexagon/arch.c b/target/hexagon/arch.c
index 09de124..699e2cf 100644
--- a/target/hexagon/arch.c
+++ b/target/hexagon/arch.c
@@ -76,19 +76,6 @@ uint64_t deinterleave(uint64_t src)
 return myeven | (myodd << 32);
 }
 
-uint32_t carry_from_add64(uint64_t a, uint64_t b, uint32_t c)
-{
-uint64_t tmpa, tmpb, tmpc;
-tmpa = fGETUWORD(0, a);
-tmpb = fGETUWORD(0, b);
-tmpc = tmpa + tmpb + c;
-tmpa = fGETUWORD(1, a);
-tmpb = fGETUWORD(1, b);
-tmpc = tmpa + tmpb + fGETUWORD(1, tmpc);
-tmpc = fGETUWORD(1, tmpc);
-return tmpc;
-}
-
 int32_t conv_round(int32_t a, int n)
 {
 int64_t val;
diff --git a/target/hexagon/arch.h b/target/hexagon/arch.h
index 1f7f036..6e0b0d9 100644
--- a/target/hexagon/arch.h
+++ b/target/hexagon/arch.h
@@ -22,7 +22,6 @@
 
 uint64_t interleave(uint32_t odd, uint32_t even);
 uint64_t deinterleave(uint64_t src);
-uint32_t carry_from_add64(uint64_t a, uint64_t b, uint32_t c);
 int32_t conv_round(int32_t a, int n);
 void arch_fpop_start(CPUHexagonState *env);
 void arch_fpop_end(CPUHexagonState *env);
diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
index cfcb817..8cb211d 100644
--- a/target/hexagon/macros.h
+++ b/target/hexagon/macros.h
@@ -341,8 +341,6 @@ static inline void gen_logical_not(TCGv dest, TCGv src)
 #define fWRITE_LC0(VAL) WRITE_RREG(HEX_REG_LC0, VAL)
 #define fWRITE_LC1(VAL) WRITE_RREG(HEX_REG_LC1, VAL)
 
-#define fCARRY_FROM_ADD(A, B, C) carry_from_add64(A, B, C)
-
 #define fSET_OVERFLOW() SET_USR_FIELD(USR_OVF, 1)
 #define fSET_LPCFG(VAL) SET_USR_FIELD(USR_LPCFG, (VAL))
 #define fGET_LPCFG (GET_USR_FIELD(USR_LPCFG))
-- 
2.7.4




[PATCH v3 00/26] Hexagon (target/hexagon) update

2021-04-07 Thread Taylor Simpson
This patch series is a significant update for the Hexagon target
The first 16 patches address feedback from Richard Henderson
 and Philippe Mathieu-Daud� 
The next 10 patches add the remaining instructions for the Hexagon
scalar core

The patches are logically independent but are organized as a series to
avoid potential confilcts if they are merged out of order.

Note that the new test cases require an update toolchain/container.


*** Changes in v3 ***
Cleanup ternary operators in semantics to make them eaiser for idef-parser
Cleanup gen_log_predicated_reg_write_pair similar to gen_log_predicated_write
Cleanup reg_field_info definition (remove {0, 0} entry and include array size)
Move QEMU_GENERATE to only be on during macros.h
Compile all debug code so it doesn't bit rot
Fix circular addressing to handle negative increment

*** Changes in v2 ***
Address feedback from Richard Henderson 
Break utility function (arch.c) changes into 2 separate patches
Change bit-reverse addressing from TCG generation to helper
Change loadalign[bh] to use shift+deposit
Remove fGET_TCG_tmp
Remove unneeded ireg and tmp variables
Remove unused one variable from gen_log_predicated_reg_write
Rename gen_exception to gen_exception_raw
Remove unreachable tcg_gen_exit_tb
Remove redundant PC assignment
Remove TARGET_HEXAGON code from parts_silence_nan
Change roundrom to uint8_t in arch_recip_lookup and arch_invsqrt_lookup
Rewrite fGEN_TCG_addp_c/fGEN_TCG_subp_c using tcg_gen_add2_i64
Remove gen_carry_from_add64()
Break "instructions with multiple definitions" into multiple patches
Fix fINSERT_RANGE macro

Expand macros inside GET_EA_pci, GET_EA_pcr
Change fGEN_TCG_PCR to fGEN_TCG_LOAD_pcr to be consistent with other macros
Cleanup load and unpack implementation
Cleanup load into shifted register implementation
Cleanup brev.c test case
Change sfinvsqrta/sfrecipa to use a single helper
Cleanup vacsh helpers



Taylor Simpson (26):
  Hexagon (target/hexagon) TCG generation cleanup
  Hexagon (target/hexagon) cleanup gen_log_predicated_reg_write_pair
  Hexagon (target/hexagon) remove unnecessary inline directives
  Hexagon (target/hexagon) use env_archcpu and env_cpu
  Hexagon (target/hexagon) properly generate TB end for DISAS_NORETURN
  Hexagon (target/hexagon) decide if pred has been written at TCG gen
time
  Hexagon (target/hexagon) change variables from int to bool when
appropriate
  Hexagon (target/hexagon) remove unused carry_from_add64 function
  Hexagon (target/hexagon) change type of softfloat_roundingmodes
  Hexagon (target/hexagon) use softfloat default NaN and tininess
  Hexagon (target/hexagon) replace float32_mul_pow2 with float32_scalbn
  Hexagon (target/hexagon) use softfloat for float-to-int conversions
  Hexagon (target/hexagon) cleanup ternary operators in semantics
  Hexagon (target/hexagon) cleanup reg_field_info definition
  Hexagon (target/hexagon) move QEMU_GENERATE to only be on during
macros.h
  Hexagon (target/hexagon) compile all debug code
  Hexagon (target/hexagon) add F2_sfrecipa instruction
  Hexagon (target/hexagon) add F2_sfinvsqrta
  Hexagon (target/hexagon) add A5_ACS (vacsh)
  Hexagon (target/hexagon) add A6_vminub_RdP
  Hexagon (target/hexagon) add A4_addp_c/A4_subp_c
  Hexagon (target/hexagon) circular addressing
  Hexagon (target/hexagon) bit reverse (brev) addressing
  Hexagon (target/hexagon) load and unpack bytes instructions
  Hexagon (target/hexagon) load into shifted register instructions
  Hexagon (target/hexagon) CABAC decode bin

 fpu/softfloat-specialize.c.inc|   3 +
 linux-user/hexagon/cpu_loop.c |   2 +-
 target/hexagon/arch.c | 181 ++---
 target/hexagon/arch.h |   9 +-
 target/hexagon/conv_emu.c | 177 -
 target/hexagon/conv_emu.h |  31 ---
 target/hexagon/cpu.c  |  14 +-
 target/hexagon/cpu.h  |   5 -
 target/hexagon/cpu_bits.h |   2 +-
 target/hexagon/decode.c   |  80 +++---
 target/hexagon/fma_emu.c  |  40 +--
 target/hexagon/gen_tcg.h  | 420 -
 target/hexagon/gen_tcg_funcs.py   |   2 +-
 target/hexagon/genptr.c   | 244 ++---
 target/hexagon/helper.h   |  23 +-
 target/hexagon/imported/alu.idef  |  44 +++
 target/hexagon/imported/compare.idef  |  12 +-
 target/hexagon/imported/encode_pp.def |  30 +++
 target/hexagon/imported/float.idef|  32 +++
 target/hexagon/imported/ldst.idef |  68 +
 target/hexagon/imported/macros.def|  47 
 target/hexagon/imported/shift.idef|  47 
 target/hexagon/insn.h |  21 +-
 target/hexagon/internal.h |  11 +-
 target/hexagon/macros.h   | 122 -
 target/hexagon/meson.build|   1 -
 target/hexagon/op_helper.c| 392 

[PATCH v3 05/26] Hexagon (target/hexagon) properly generate TB end for DISAS_NORETURN

2021-04-07 Thread Taylor Simpson
When exiting a TB, generate all the code before returning from
hexagon_tr_translate_packet so that nothing needs to be done in
hexagon_tr_tb_stop.

Suggested-by: Richard Henderson 
Reviewed-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
---
 target/hexagon/translate.c | 62 --
 target/hexagon/translate.h |  3 ---
 2 files changed, 33 insertions(+), 32 deletions(-)

diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
index e235fdb..9f2a531 100644
--- a/target/hexagon/translate.c
+++ b/target/hexagon/translate.c
@@ -54,16 +54,40 @@ static const char * const hexagon_prednames[] = {
   "p0", "p1", "p2", "p3"
 };
 
-void gen_exception(int excp)
+static void gen_exception_raw(int excp)
 {
 TCGv_i32 helper_tmp = tcg_const_i32(excp);
 gen_helper_raise_exception(cpu_env, helper_tmp);
 tcg_temp_free_i32(helper_tmp);
 }
 
-void gen_exception_debug(void)
+static void gen_exec_counters(DisasContext *ctx)
+{
+tcg_gen_addi_tl(hex_gpr[HEX_REG_QEMU_PKT_CNT],
+hex_gpr[HEX_REG_QEMU_PKT_CNT], ctx->num_packets);
+tcg_gen_addi_tl(hex_gpr[HEX_REG_QEMU_INSN_CNT],
+hex_gpr[HEX_REG_QEMU_INSN_CNT], ctx->num_insns);
+}
+
+static void gen_end_tb(DisasContext *ctx)
 {
-gen_exception(EXCP_DEBUG);
+gen_exec_counters(ctx);
+tcg_gen_mov_tl(hex_gpr[HEX_REG_PC], hex_next_PC);
+if (ctx->base.singlestep_enabled) {
+gen_exception_raw(EXCP_DEBUG);
+} else {
+tcg_gen_exit_tb(NULL, 0);
+}
+ctx->base.is_jmp = DISAS_NORETURN;
+}
+
+static void gen_exception_end_tb(DisasContext *ctx, int excp)
+{
+gen_exec_counters(ctx);
+tcg_gen_mov_tl(hex_gpr[HEX_REG_PC], hex_next_PC);
+gen_exception_raw(excp);
+ctx->base.is_jmp = DISAS_NORETURN;
+
 }
 
 #if HEX_DEBUG
@@ -225,8 +249,7 @@ static void gen_insn(CPUHexagonState *env, DisasContext 
*ctx,
 mark_implicit_writes(ctx, insn);
 insn->generate(env, ctx, insn, pkt);
 } else {
-gen_exception(HEX_EXCP_INVALID_OPCODE);
-ctx->base.is_jmp = DISAS_NORETURN;
+gen_exception_end_tb(ctx, HEX_EXCP_INVALID_OPCODE);
 }
 }
 
@@ -447,14 +470,6 @@ static void update_exec_counters(DisasContext *ctx, Packet 
*pkt)
 ctx->num_insns += num_real_insns;
 }
 
-static void gen_exec_counters(DisasContext *ctx)
-{
-tcg_gen_addi_tl(hex_gpr[HEX_REG_QEMU_PKT_CNT],
-hex_gpr[HEX_REG_QEMU_PKT_CNT], ctx->num_packets);
-tcg_gen_addi_tl(hex_gpr[HEX_REG_QEMU_INSN_CNT],
-hex_gpr[HEX_REG_QEMU_INSN_CNT], ctx->num_insns);
-}
-
 static void gen_commit_packet(DisasContext *ctx, Packet *pkt)
 {
 gen_reg_writes(ctx);
@@ -478,7 +493,7 @@ static void gen_commit_packet(DisasContext *ctx, Packet 
*pkt)
 #endif
 
 if (pkt->pkt_has_cof) {
-ctx->base.is_jmp = DISAS_NORETURN;
+gen_end_tb(ctx);
 }
 }
 
@@ -491,8 +506,7 @@ static void decode_and_translate_packet(CPUHexagonState 
*env, DisasContext *ctx)
 
 nwords = read_packet_words(env, ctx, words);
 if (!nwords) {
-gen_exception(HEX_EXCP_INVALID_PACKET);
-ctx->base.is_jmp = DISAS_NORETURN;
+gen_exception_end_tb(ctx, HEX_EXCP_INVALID_PACKET);
 return;
 }
 
@@ -505,8 +519,7 @@ static void decode_and_translate_packet(CPUHexagonState 
*env, DisasContext *ctx)
 gen_commit_packet(ctx, );
 ctx->base.pc_next += pkt.encod_pkt_size_in_bytes;
 } else {
-gen_exception(HEX_EXCP_INVALID_PACKET);
-ctx->base.is_jmp = DISAS_NORETURN;
+gen_exception_end_tb(ctx, HEX_EXCP_INVALID_PACKET);
 }
 }
 
@@ -536,9 +549,7 @@ static bool hexagon_tr_breakpoint_check(DisasContextBase 
*dcbase, CPUState *cpu,
 {
 DisasContext *ctx = container_of(dcbase, DisasContext, base);
 
-tcg_gen_movi_tl(hex_gpr[HEX_REG_PC], ctx->base.pc_next);
-ctx->base.is_jmp = DISAS_NORETURN;
-gen_exception_debug();
+gen_exception_end_tb(ctx, EXCP_DEBUG);
 /*
  * The address covered by the breakpoint must be included in
  * [tb->pc, tb->pc + tb->size) in order to for it to be
@@ -601,19 +612,12 @@ static void hexagon_tr_tb_stop(DisasContextBase *dcbase, 
CPUState *cpu)
 gen_exec_counters(ctx);
 tcg_gen_movi_tl(hex_gpr[HEX_REG_PC], ctx->base.pc_next);
 if (ctx->base.singlestep_enabled) {
-gen_exception_debug();
+gen_exception_raw(EXCP_DEBUG);
 } else {
 tcg_gen_exit_tb(NULL, 0);
 }
 break;
 case DISAS_NORETURN:
-gen_exec_counters(ctx);
-tcg_gen_mov_tl(hex_gpr[HEX_REG_PC], hex_next_PC);
-if (ctx->base.singlestep_enabled) {
-gen_exception_debug();
-} else {
-tcg_gen_exit_tb(NULL, 0);
-}
 break;
 default:
 g_assert_not_reached();
diff --git a/target/hexagon/translate.h b/target/hexagon/translate.h
index 938f7fb..12506c8 100644
--- a/target/hexagon/translate.h
+++ 

[PATCH v3 07/26] Hexagon (target/hexagon) change variables from int to bool when appropriate

2021-04-07 Thread Taylor Simpson
Suggested-by: Richard Henderson 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
---
 target/hexagon/cpu_bits.h  |  2 +-
 target/hexagon/decode.c| 80 +++---
 target/hexagon/insn.h  | 21 ++--
 target/hexagon/op_helper.c |  8 ++---
 target/hexagon/translate.c |  6 ++--
 target/hexagon/translate.h |  2 +-
 6 files changed, 60 insertions(+), 59 deletions(-)

diff --git a/target/hexagon/cpu_bits.h b/target/hexagon/cpu_bits.h
index 96af834..96fef71 100644
--- a/target/hexagon/cpu_bits.h
+++ b/target/hexagon/cpu_bits.h
@@ -47,7 +47,7 @@ static inline uint32_t iclass_bits(uint32_t encoding)
 return iclass;
 }
 
-static inline int is_packet_end(uint32_t endocing)
+static inline bool is_packet_end(uint32_t endocing)
 {
 uint32_t bits = parse_bits(endocing);
 return ((bits == 0x3) || (bits == 0x0));
diff --git a/target/hexagon/decode.c b/target/hexagon/decode.c
index 65d97ce..dffe1d1 100644
--- a/target/hexagon/decode.c
+++ b/target/hexagon/decode.c
@@ -340,8 +340,8 @@ static void decode_split_cmpjump(Packet *pkt)
 if (GET_ATTRIB(pkt->insn[i].opcode, A_NEWCMPJUMP)) {
 last = pkt->num_insns;
 pkt->insn[last] = pkt->insn[i];/* copy the instruction */
-pkt->insn[last].part1 = 1;/* last instruction does the CMP */
-pkt->insn[i].part1 = 0;/* existing instruction does the JUMP */
+pkt->insn[last].part1 = true;  /* last insn does the CMP */
+pkt->insn[i].part1 = false;/* existing insn does the JUMP 
*/
 pkt->num_insns++;
 }
 }
@@ -354,7 +354,7 @@ static void decode_split_cmpjump(Packet *pkt)
 }
 }
 
-static int decode_opcode_can_jump(int opcode)
+static bool decode_opcode_can_jump(int opcode)
 {
 if ((GET_ATTRIB(opcode, A_JUMP)) ||
 (GET_ATTRIB(opcode, A_CALL)) ||
@@ -362,15 +362,15 @@ static int decode_opcode_can_jump(int opcode)
 (opcode == J2_pause)) {
 /* Exception to A_JUMP attribute */
 if (opcode == J4_hintjumpr) {
-return 0;
+return false;
 }
-return 1;
+return true;
 }
 
-return 0;
+return false;
 }
 
-static int decode_opcode_ends_loop(int opcode)
+static bool decode_opcode_ends_loop(int opcode)
 {
 return GET_ATTRIB(opcode, A_HWLOOP0_END) ||
GET_ATTRIB(opcode, A_HWLOOP1_END);
@@ -383,9 +383,9 @@ static void decode_set_insn_attr_fields(Packet *pkt)
 int numinsns = pkt->num_insns;
 uint16_t opcode;
 
-pkt->pkt_has_cof = 0;
-pkt->pkt_has_endloop = 0;
-pkt->pkt_has_dczeroa = 0;
+pkt->pkt_has_cof = false;
+pkt->pkt_has_endloop = false;
+pkt->pkt_has_dczeroa = false;
 
 for (i = 0; i < numinsns; i++) {
 opcode = pkt->insn[i].opcode;
@@ -394,14 +394,14 @@ static void decode_set_insn_attr_fields(Packet *pkt)
 }
 
 if (GET_ATTRIB(opcode, A_DCZEROA)) {
-pkt->pkt_has_dczeroa = 1;
+pkt->pkt_has_dczeroa = true;
 }
 
 if (GET_ATTRIB(opcode, A_STORE)) {
 if (pkt->insn[i].slot == 0) {
-pkt->pkt_has_store_s0 = 1;
+pkt->pkt_has_store_s0 = true;
 } else {
-pkt->pkt_has_store_s1 = 1;
+pkt->pkt_has_store_s1 = true;
 }
 }
 
@@ -422,9 +422,9 @@ static void decode_set_insn_attr_fields(Packet *pkt)
  */
 static void decode_shuffle_for_execution(Packet *packet)
 {
-int changed = 0;
+bool changed = false;
 int i;
-int flag;/* flag means we've seen a non-memory instruction */
+bool flag;/* flag means we've seen a non-memory instruction */
 int n_mems;
 int last_insn = packet->num_insns - 1;
 
@@ -437,7 +437,7 @@ static void decode_shuffle_for_execution(Packet *packet)
 }
 
 do {
-changed = 0;
+changed = false;
 /*
  * Stores go last, must not reorder.
  * Cannot shuffle stores past loads, either.
@@ -445,13 +445,13 @@ static void decode_shuffle_for_execution(Packet *packet)
  * then a store, shuffle the store to the front.  Don't shuffle
  * stores wrt each other or a load.
  */
-for (flag = n_mems = 0, i = last_insn; i >= 0; i--) {
+for (flag = false, n_mems = 0, i = last_insn; i >= 0; i--) {
 int opcode = packet->insn[i].opcode;
 
 if (flag && GET_ATTRIB(opcode, A_STORE)) {
 decode_send_insn_to(packet, i, last_insn - n_mems);
 n_mems++;
-changed = 1;
+changed = true;
 } else if (GET_ATTRIB(opcode, A_STORE)) {
 n_mems++;
 } else if (GET_ATTRIB(opcode, A_LOAD)) {
@@ -466,7 +466,7 @@ static void decode_shuffle_for_execution(Packet *packet)
  * a .new value
  */
 } else {
-

[PATCH v6 4/4] target/arm: set ID_AA64ISAR0.TLB to 2 for max AARCH64 CPU type

2021-04-07 Thread Rebecca Cran
Indicate support for FEAT_TLBIOS and FEAT_TLBIRANGE by setting
ID_AA64ISAR0.TLB to 2 for the max AARCH64 CPU type.

Signed-off-by: Rebecca Cran 
Reviewed-by: Richard Henderson 
---
 target/arm/cpu64.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index f0a9e968c9c1..f42803ecaf1d 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -651,6 +651,7 @@ static void aarch64_max_initfn(Object *obj)
 t = FIELD_DP64(t, ID_AA64ISAR0, DP, 1);
 t = FIELD_DP64(t, ID_AA64ISAR0, FHM, 1);
 t = FIELD_DP64(t, ID_AA64ISAR0, TS, 2); /* v8.5-CondM */
+t = FIELD_DP64(t, ID_AA64ISAR0, TLB, 2); /* FEAT_TLBIRANGE */
 t = FIELD_DP64(t, ID_AA64ISAR0, RNDR, 1);
 cpu->isar.id_aa64isar0 = t;
 
-- 
2.26.2




[PATCH v6 3/4] target/arm: Add support for FEAT_TLBIOS

2021-04-07 Thread Rebecca Cran
ARMv8.4 adds the mandatory FEAT_TLBIOS. It provides TLBI
maintenance instructions that extend to the Outer Shareable domain.

Signed-off-by: Rebecca Cran 
---
 target/arm/cpu.h|  5 ++
 target/arm/helper.c | 75 
 2 files changed, 80 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 32b78a4ef587..272fde83ca4e 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -4043,6 +4043,11 @@ static inline bool isar_feature_aa64_tlbirange(const 
ARMISARegisters *id)
 return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, TLB) == 2;
 }
 
+static inline bool isar_feature_aa64_tlbios(const ARMISARegisters *id)
+{
+return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, TLB) != 0;
+}
+
 static inline bool isar_feature_aa64_sb(const ARMISARegisters *id)
 {
 return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, SB) != 0;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index ce913deff490..5b10f179b761 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -7211,6 +7211,78 @@ static const ARMCPRegInfo tlbirange_reginfo[] = {
 REGINFO_SENTINEL
 };
 
+static const ARMCPRegInfo tlbios_reginfo[] = {
+{ .name = "TLBI_VMALLE1OS", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 0,
+  .access = PL1_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbi_aa64_vmalle1is_write },
+{ .name = "TLBI_ASIDE1OS", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 2,
+  .access = PL1_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbi_aa64_vmalle1is_write },
+{ .name = "TLBI_RVAE1OS", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 1,
+  .access = PL1_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbi_aa64_rvae1is_write },
+{ .name = "TLBI_RVAAE1OS", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 3,
+  .access = PL1_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbi_aa64_rvae1is_write },
+   { .name = "TLBI_RVALE1OS", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 5,
+  .access = PL1_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbi_aa64_rvae1is_write },
+{ .name = "TLBI_RVAALE1OS", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 7,
+  .access = PL1_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbi_aa64_rvae1is_write },
+{ .name = "TLBI_ALLE2OS", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 1, .opc2 = 0,
+  .access = PL2_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbi_aa64_alle2is_write },
+   { .name = "TLBI_ALLE1OS", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 1, .opc2 = 4,
+  .access = PL2_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbi_aa64_alle1is_write },
+{ .name = "TLBI_VMALLS12E1OS", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 1, .opc2 = 6,
+  .access = PL2_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbi_aa64_alle1is_write },
+{ .name = "TLBI_IPAS2E1OS", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 0,
+  .access = PL2_W, .type = ARM_CP_NOP },
+{ .name = "TLBI_RIPAS2E1OS", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 3,
+  .access = PL2_W, .type = ARM_CP_NOP },
+{ .name = "TLBI_IPAS2LE1OS", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 4,
+  .access = PL2_W, .type = ARM_CP_NOP },
+{ .name = "TLBI_RIPAS2LE1OS", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 7,
+  .access = PL2_W, .type = ARM_CP_NOP },
+   { .name = "TLBI_RVAE2OS", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 5, .opc2 = 1,
+  .access = PL2_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbi_aa64_rvae2is_write },
+   { .name = "TLBI_RVALE2OS", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 5, .opc2 = 5,
+  .access = PL2_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbi_aa64_rvae2is_write },
+{ .name = "TLBI_ALLE3OS", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 1, .opc2 = 0,
+  .access = PL3_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbi_aa64_alle3is_write },
+   { .name = "TLBI_RVAE3OS", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 5, .opc2 = 1,
+  .access = PL3_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbi_aa64_rvae3is_write },
+   { .name = "TLBI_RVALE3OS", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 5, .opc2 = 5,
+  .access = PL3_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbi_aa64_rvae3is_write },
+REGINFO_SENTINEL
+};
+
 static uint64_t rndr_readfn(CPUARMState *env, const ARMCPRegInfo *ri)
 {
 Error *err = NULL;
@@ -8583,6 +8655,9 @@ void register_cp_regs_for_features(ARMCPU *cpu)
 if 

[PATCH v6 0/4] target/arm: Add support for FEAT_TLBIOS and FEAT_TLBIRANGE

2021-04-07 Thread Rebecca Cran
ARMv8.4 adds the mandatory FEAT_TLBIOS and FEAT_TLBIRANGE. 
They provides TLBI maintenance instructions that extend to the Outer
Shareable domain and that apply to a range of input addresses.

Changes from v5 to v6:

Fixed wrapping of functions in exec-all.h to avoid exceeding the
80 character limit. checkpatch.pl now passes.

Rebecca Cran (4):
  accel/tcg: Add TLB invalidation support for ranges of addresses
  target/arm: Add support for FEAT_TLBIRANGE
  target/arm: Add support for FEAT_TLBIOS
  target/arm: set ID_AA64ISAR0.TLB to 2 for max AARCH64 CPU type

 accel/tcg/cputlb.c  | 130 ++-
 include/exec/exec-all.h |  46 +++
 target/arm/cpu.h|  10 +
 target/arm/cpu64.c  |   1 +
 target/arm/helper.c | 369 
 5 files changed, 553 insertions(+), 3 deletions(-)

-- 
2.26.2




[PATCH v6 1/4] accel/tcg: Add TLB invalidation support for ranges of addresses

2021-04-07 Thread Rebecca Cran
Add functions to support the FEAT_TLBIRANGE ARMv8.4 feature that adds
TLB invalidation instructions to invalidate ranges of addresses.

Signed-off-by: Rebecca Cran 
---
 accel/tcg/cputlb.c  | 130 +++-
 include/exec/exec-all.h |  46 +++
 2 files changed, 173 insertions(+), 3 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 8a7b779270a4..dc44967dcf8e 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -709,7 +709,7 @@ void tlb_flush_page_all_cpus_synced(CPUState *src, 
target_ulong addr)
 tlb_flush_page_by_mmuidx_all_cpus_synced(src, addr, ALL_MMUIDX_BITS);
 }
 
-static void tlb_flush_page_bits_locked(CPUArchState *env, int midx,
+static bool tlb_flush_page_bits_locked(CPUArchState *env, int midx,
target_ulong page, unsigned bits)
 {
 CPUTLBDesc *d = _tlb(env)->d[midx];
@@ -729,7 +729,7 @@ static void tlb_flush_page_bits_locked(CPUArchState *env, 
int midx,
   TARGET_FMT_lx "/" TARGET_FMT_lx ")\n",
   midx, page, mask);
 tlb_flush_one_mmuidx_locked(env, midx, get_clock_realtime());
-return;
+return true;
 }
 
 /* Check if we need to flush due to large pages.  */
@@ -738,13 +738,14 @@ static void tlb_flush_page_bits_locked(CPUArchState *env, 
int midx,
   TARGET_FMT_lx "/" TARGET_FMT_lx ")\n",
   midx, d->large_page_addr, d->large_page_mask);
 tlb_flush_one_mmuidx_locked(env, midx, get_clock_realtime());
-return;
+return true;
 }
 
 if (tlb_flush_entry_mask_locked(tlb_entry(env, midx, page), page, mask)) {
 tlb_n_used_entries_dec(env, midx);
 }
 tlb_flush_vtlb_page_mask_locked(env, midx, page, mask);
+return false;
 }
 
 typedef struct {
@@ -943,6 +944,129 @@ void 
tlb_flush_page_bits_by_mmuidx_all_cpus_synced(CPUState *src_cpu,
 }
 }
 
+typedef struct {
+target_ulong addr;
+target_ulong length;
+uint16_t idxmap;
+uint16_t bits;
+}  TLBFlushPageRangeBitsByMMUIdxData;
+
+static void
+tlb_flush_page_range_bits_by_mmuidx_async_0(CPUState *cpu,
+target_ulong addr,
+target_ulong length,
+uint16_t idxmap,
+unsigned bits)
+{
+CPUArchState *env = cpu->env_ptr;
+int mmu_idx;
+target_ulong l;
+target_ulong page = addr;
+bool full_flush;
+
+assert_cpu_is_self(cpu);
+
+tlb_debug("page addr:" TARGET_FMT_lx "/%u len: " TARGET_FMT_lx
+  " mmu_map:0x%x\n",
+  addr, bits, length, idxmap);
+
+qemu_spin_lock(_tlb(env)->c.lock);
+for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
+if ((idxmap >> mmu_idx) & 1) {
+for (l = 0; l < length; l += TARGET_PAGE_SIZE) {
+page = addr + l;
+full_flush = tlb_flush_page_bits_locked(env, mmu_idx,
+page, bits);
+if (full_flush) {
+break;
+}
+}
+}
+}
+qemu_spin_unlock(_tlb(env)->c.lock);
+
+for (l = 0; l < length; l += TARGET_PAGE_SIZE) {
+tb_flush_jmp_cache(cpu, page);
+}
+}
+
+static void
+tlb_flush_page_range_bits_by_mmuidx_async_1(CPUState *cpu,
+run_on_cpu_data data)
+{
+TLBFlushPageRangeBitsByMMUIdxData *d = data.host_ptr;
+
+tlb_flush_page_range_bits_by_mmuidx_async_0(cpu, d->addr, d->length,
+d->idxmap, d->bits);
+
+g_free(d);
+}
+
+void tlb_flush_page_range_bits_by_mmuidx(CPUState *cpu,
+ target_ulong addr,
+ target_ulong length,
+ uint16_t idxmap,
+ unsigned bits)
+{
+TLBFlushPageRangeBitsByMMUIdxData d;
+TLBFlushPageRangeBitsByMMUIdxData *p;
+
+/* This should already be page aligned */
+addr &= TARGET_PAGE_BITS;
+
+d.addr = addr & TARGET_PAGE_MASK;
+d.idxmap = idxmap;
+d.bits = bits;
+d.length = length;
+
+if (qemu_cpu_is_self(cpu)) {
+tlb_flush_page_range_bits_by_mmuidx_async_0(cpu, addr, length,
+idxmap, bits);
+} else {
+p = g_new(TLBFlushPageRangeBitsByMMUIdxData, 1);
+
+/* Allocate a structure, freed by the worker.  */
+*p = d;
+async_run_on_cpu(cpu, tlb_flush_page_range_bits_by_mmuidx_async_1,
+ RUN_ON_CPU_HOST_PTR(p));
+}
+}
+
+void tlb_flush_page_range_bits_by_mmuidx_all_cpus_synced(CPUState *src_cpu,
+ target_ulong addr,
+ target_ulong length,
+   

[PATCH v6 2/4] target/arm: Add support for FEAT_TLBIRANGE

2021-04-07 Thread Rebecca Cran
ARMv8.4 adds the mandatory FEAT_TLBIRANGE. It provides TLBI
maintenance instructions that apply to a range of input addresses.

Signed-off-by: Rebecca Cran 
---
 target/arm/cpu.h|   5 +
 target/arm/helper.c | 294 
 2 files changed, 299 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 193a49ec7fac..32b78a4ef587 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -4038,6 +4038,11 @@ static inline bool isar_feature_aa64_pauth_arch(const 
ARMISARegisters *id)
 return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, APA) != 0;
 }
 
+static inline bool isar_feature_aa64_tlbirange(const ARMISARegisters *id)
+{
+return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, TLB) == 2;
+}
+
 static inline bool isar_feature_aa64_sb(const ARMISARegisters *id)
 {
 return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, SB) != 0;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index d9220be7c5a0..ce913deff490 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -4759,6 +4759,217 @@ static void tlbi_aa64_vae3is_write(CPUARMState *env, 
const ARMCPRegInfo *ri,
   ARMMMUIdxBit_SE3, bits);
 }
 
+#ifdef TARGET_AARCH64
+static uint64_t tlbi_aa64_range_get_length(CPUARMState *env,
+   uint64_t value)
+{
+unsigned int page_shift;
+unsigned int page_size_granule;
+uint64_t num;
+uint64_t scale;
+uint64_t exponent;
+uint64_t length;
+
+num = extract64(value, 39, 4);
+scale = extract64(value, 44, 2);
+page_size_granule = extract64(value, 46, 2);
+
+page_shift = page_size_granule * 2 + 10;
+
+if (page_size_granule == 0) {
+qemu_log_mask(LOG_GUEST_ERROR, "Invalid page size granule %d\n",
+  page_size_granule);
+return 0;
+}
+
+exponent = (5 * scale) + 1;
+length = (num + 1) << (exponent + page_shift);
+
+return length;
+}
+
+static void tlbi_aa64_rvae1_write(CPUARMState *env, const ARMCPRegInfo *ri,
+  uint64_t value)
+{
+/*
+ * Invalidate by VA range, EL1&0.
+ * Currently handles all of RVAE1, RVAAE1, RVAALE1 and RVALE1,
+ * since we don't support flush-for-specific-ASID-only or
+ * flush-last-level-only.
+ */
+ARMMMUIdx mask;
+int bits;
+uint64_t pageaddr;
+uint64_t length;
+
+CPUState *cs = env_cpu(env);
+mask = vae1_tlbmask(env);
+if (regime_has_2_ranges(ctz32(mask))) {
+pageaddr = sextract64(value, 0, 37) << TARGET_PAGE_BITS;
+} else {
+pageaddr = extract64(value, 0, 37) << TARGET_PAGE_BITS;
+}
+length = tlbi_aa64_range_get_length(env, value);
+bits = tlbbits_for_regime(env, mask, pageaddr);
+
+if (tlb_force_broadcast(env)) {
+tlb_flush_page_range_bits_by_mmuidx_all_cpus_synced(cs, pageaddr,
+length, mask,
+bits);
+} else {
+tlb_flush_page_range_bits_by_mmuidx(cs, pageaddr, length, mask,
+bits);
+}
+}
+
+static void tlbi_aa64_rvae1is_write(CPUARMState *env, const ARMCPRegInfo *ri,
+uint64_t value)
+{
+/*
+ * Invalidate by VA range, Inner/Outer Shareable EL1&0.
+ * Currently handles all of RVAE1IS, RVAE1OS, RVAAE1IS, RVAAE1OS,
+ * RVAALE1IS, RVAALE1OS, RVALE1IS and RVALE1OS, since we don't support
+ * flush-for-specific-ASID-only, flush-last-level-only or inner/outer
+ * shareable specific flushes.
+ */
+ARMMMUIdx mask;
+int bits;
+uint64_t pageaddr;
+uint64_t length;
+
+CPUState *cs = env_cpu(env);
+mask = vae1_tlbmask(env);
+if (regime_has_2_ranges(ctz32(mask))) {
+pageaddr = sextract64(value, 0, 37) << TARGET_PAGE_BITS;
+} else {
+pageaddr = extract64(value, 0, 37) << TARGET_PAGE_BITS;
+}
+length = tlbi_aa64_range_get_length(env, value);
+bits = tlbbits_for_regime(env, mask, pageaddr);
+
+tlb_flush_page_range_bits_by_mmuidx_all_cpus_synced(cs, pageaddr,
+length, mask,
+bits);
+}
+
+static void tlbi_aa64_rvae2_write(CPUARMState *env, const ARMCPRegInfo *ri,
+  uint64_t value)
+{
+/*
+ * Invalidate by VA range, EL2.
+ * Currently handles all of RVAE2, RVAAE2, RVAALE2 and RVALE2,
+ * since we don't support flush-for-specific-ASID-only or
+ * flush-last-level-only.
+ */
+ARMMMUIdx mask;
+bool secure;
+int bits;
+uint64_t pageaddr;
+uint64_t length;
+
+CPUState *cs = env_cpu(env);
+secure = arm_is_secure_below_el3(env);
+pageaddr = extract64(value, 0, 37) << TARGET_PAGE_BITS;
+length = tlbi_aa64_range_get_length(env, value);
+mask = secure ? ARMMMUIdxBit_SE2 : 

Re: [PATCH 07/27] arc: TCG instruction definitions

2021-04-07 Thread Richard Henderson

On 4/5/21 7:31 AM, cupertinomira...@gmail.com wrote:

+/*
+ * ADD
+ *Variables: @b, @c, @a
+ *Functions: getCCFlag, getFFlag, setZFlag, setNFlag, setCFlag, CarryADD,
+ *   setVFlag, OverflowADD
+ * --- code ---
+ * {
+ *   cc_flag = getCCFlag ();
+ *   lb = @b;
+ *   lc = @c;
+ *   if((cc_flag == true))
+ * {
+ *   lb = @b;
+ *   lc = @c;
+ *   @a = (@b + @c);
+ *   if((getFFlag () == true))
+ * {
+ *   setZFlag (@a);
+ *   setNFlag (@a);
+ *   setCFlag (CarryADD (@a, lb, lc));
+ *   setVFlag (OverflowADD (@a, lb, lc));
+ * };
+ * };
+ * }
+ */
+
+int
+arc_gen_ADD(DisasCtxt *ctx, TCGv b, TCGv c, TCGv a)
+{
+int ret = DISAS_NEXT;
+TCGv temp_3 = tcg_temp_local_new();
+TCGv cc_flag = tcg_temp_local_new();
+TCGv lb = tcg_temp_local_new();
+TCGv lc = tcg_temp_local_new();
+TCGv temp_1 = tcg_temp_local_new();
+TCGv temp_2 = tcg_temp_local_new();
+TCGv temp_5 = tcg_temp_local_new();
+TCGv temp_4 = tcg_temp_local_new();
+TCGv temp_7 = tcg_temp_local_new();
+TCGv temp_6 = tcg_temp_local_new();
+getCCFlag(temp_3);
+tcg_gen_mov_tl(cc_flag, temp_3);
+tcg_gen_mov_tl(lb, b);
+tcg_gen_mov_tl(lc, c);
+TCGLabel *done_1 = gen_new_label();
+tcg_gen_setcond_tl(TCG_COND_EQ, temp_1, cc_flag, arc_true);
+tcg_gen_xori_tl(temp_2, temp_1, 1);
+tcg_gen_andi_tl(temp_2, temp_2, 1);
+tcg_gen_brcond_tl(TCG_COND_EQ, temp_2, arc_true, done_1);
+tcg_gen_mov_tl(lb, b);
+tcg_gen_mov_tl(lc, c);
+tcg_gen_add_tl(a, b, c);
+if ((getFFlag () == true)) {
+setZFlag(a);
+setNFlag(a);
+CarryADD(temp_5, a, lb, lc);
+tcg_gen_mov_tl(temp_4, temp_5);
+setCFlag(temp_4);
+OverflowADD(temp_7, a, lb, lc);
+tcg_gen_mov_tl(temp_6, temp_7);
+setVFlag(temp_6);
+}
+gen_set_label(done_1);
+tcg_temp_free(temp_3);
+tcg_temp_free(cc_flag);
+tcg_temp_free(lb);
+tcg_temp_free(lc);
+tcg_temp_free(temp_1);
+tcg_temp_free(temp_2);
+tcg_temp_free(temp_5);
+tcg_temp_free(temp_4);
+tcg_temp_free(temp_7);
+tcg_temp_free(temp_6);
+
+return ret;
+}


I must say I'm not really impressed by the results here.

Your input is clearly intended to be fed to an optimizing compiler, which TCG 
is not.




+/*
+ * DIV
+ *Variables: @src2, @src1, @dest
+ *Functions: getCCFlag, divSigned, getFFlag, setZFlag, setNFlag, setVFlag
+ * --- code ---
+ * {
+ *   cc_flag = getCCFlag ();
+ *   if((cc_flag == true))
+ * {
+ *   if(((@src2 != 0) && ((@src1 != 2147483648) || (@src2 != 4294967295
+ * {
+ *   @dest = divSigned (@src1, @src2);
+ *   if((getFFlag () == true))
+ * {
+ *   setZFlag (@dest);
+ *   setNFlag (@dest);
+ *   setVFlag (0);
+ * };
+ * }
+ *   else
+ * {
+ * };
+ * };
+ * }
+ */
+
+int
+arc_gen_DIV(DisasCtxt *ctx, TCGv src2, TCGv src1, TCGv dest)
+{
+int ret = DISAS_NEXT;
+TCGv temp_9 = tcg_temp_local_new();
+TCGv cc_flag = tcg_temp_local_new();
+TCGv temp_1 = tcg_temp_local_new();
+TCGv temp_2 = tcg_temp_local_new();
+TCGv temp_3 = tcg_temp_local_new();
+TCGv temp_4 = tcg_temp_local_new();
+TCGv temp_5 = tcg_temp_local_new();
+TCGv temp_6 = tcg_temp_local_new();
+TCGv temp_7 = tcg_temp_local_new();
+TCGv temp_8 = tcg_temp_local_new();
+TCGv temp_10 = tcg_temp_local_new();
+TCGv temp_11 = tcg_temp_local_new();
+getCCFlag(temp_9);
+tcg_gen_mov_tl(cc_flag, temp_9);
+TCGLabel *done_1 = gen_new_label();
+tcg_gen_setcond_tl(TCG_COND_EQ, temp_1, cc_flag, arc_true);
+tcg_gen_xori_tl(temp_2, temp_1, 1);
+tcg_gen_andi_tl(temp_2, temp_2, 1);
+tcg_gen_brcond_tl(TCG_COND_EQ, temp_2, arc_true, done_1);
+TCGLabel *else_2 = gen_new_label();
+TCGLabel *done_2 = gen_new_label();
+tcg_gen_setcondi_tl(TCG_COND_NE, temp_3, src2, 0);
+tcg_gen_setcondi_tl(TCG_COND_NE, temp_4, src1, 2147483648);
+tcg_gen_setcondi_tl(TCG_COND_NE, temp_5, src2, 4294967295);
+tcg_gen_or_tl(temp_6, temp_4, temp_5);
+tcg_gen_and_tl(temp_7, temp_3, temp_6);
+tcg_gen_xori_tl(temp_8, temp_7, 1);
+tcg_gen_andi_tl(temp_8, temp_8, 1);
+tcg_gen_brcond_tl(TCG_COND_EQ, temp_8, arc_true, else_2);
+divSigned(temp_10, src1, src2);
+tcg_gen_mov_tl(dest, temp_10);
+if ((getFFlag () == true)) {
+setZFlag(dest);
+setNFlag(dest);
+tcg_gen_movi_tl(temp_11, 0);
+setVFlag(temp_11);
+}
+tcg_gen_br(done_2);
+gen_set_label(else_2);
+gen_set_label(done_2);
+gen_set_label(done_1);


Nor is your compiler, for that matter, creating branches for empty elses.  The 
two together produce cringe-worthy results.


I can't help but feeling that the same amount of effort would have produced a 
legible, maintainable conversion directly to TCG, and 

Re: [PATCH 22/27] arcv3: TCG instruction definitions

2021-04-07 Thread Richard Henderson

On 4/5/21 7:31 AM, cupertinomira...@gmail.com wrote:

From: Cupertino Miranda

---
  target/arc/semfunc-helper.c |13 +
  target/arc/semfunc-helper.h |31 +
  target/arc/semfunc-v3.c | 14653 ++
  target/arc/semfunc-v3.h |55 +
  4 files changed, 14752 insertions(+)
  create mode 100644 target/arc/semfunc-v3.c
  create mode 100644 target/arc/semfunc-v3.h


And there's no good way to share code between v2 and v3?

r~



Re: [PATCH 21/27] arcv3: TCG instruction generator changes

2021-04-07 Thread Richard Henderson

On 4/5/21 7:31 AM, cupertinomira...@gmail.com wrote:

+if(ctx->insn.limm & 0x8000)
+  ctx->insn.limm += 0x;


(1) bad braces, but
(2) use an unconditional cast to int32_t.

Qemu forces the compiler to use standard 2's compliment arithmetic. We don't 
have to go out of our way to work around the ISO-C lunacy of "undefined values" 
that for no good reason still allows sign-magnitude and 1's compliment arithmetic.



+if (ctx->insn.cc) {
+TCGv cc = tcg_temp_local_new();
+arc_gen_verifyCCFlag(ctx, cc);
+tcg_gen_brcondi_tl(TCG_COND_NE, cc, 1, done);
+tcg_temp_free(cc);
+}
+


Lots of non-uses of gen_cc_prologue/epilogue.


r~



Re: [PATCH 20/27] arcv3: TCG, decoder glue code and helper changes

2021-04-07 Thread Richard Henderson

On 4/5/21 7:31 AM, cupertinomira...@gmail.com wrote:

+uint64_t helper_carry_add_flag32(uint64_t dest, uint64_t b, uint64_t c) {
+return carry_add_flag(dest, b, c, 32);
+}
+
+target_ulong helper_overflow_add_flag32(target_ulong dest, target_ulong b, 
target_ulong c) {
+return overflow_add_flag(dest, b, c, 32);
+}
+
+target_ulong helper_overflow_sub_flag32(target_ulong dest, target_ulong b, 
target_ulong c) {
+dest = dest & 0x;
+b = b & 0x;
+c = c & 0x;
+return overflow_sub_flag(dest, b, c, 32);
+}


You shouldn't need to replicate these functions.  Use the correct types and 
masking in the first place.




+uint64_t helper_rotate_left32(uint64_t orig, uint64_t n)
+{
+uint64_t t;
+uint64_t dest = (orig << n) & ((0x << n) & 0x);
+
+t = (orig >> (32 - n)) & ((1 << n) - 1);
+dest |= t;
+
+return dest;
+}
+
+uint64_t helper_rotate_right32(uint64_t orig, uint64_t n)
+{
+uint64_t t;
+uint64_t dest = (orig >> n) & (0x >> n);
+
+t = ((orig & ((1 << n) - 1)) << (32 - n));
+dest |= t;
+
+return dest;
+}


rol32 and ror32.


+uint64_t helper_asr_32(uint64_t b, uint64_t c)
+{
+  uint64_t t;
+  c = c & 31;
+  t = b;
+  for(int i = 0; i < c; i++) {
+t >>= 1;
+if((b & 0x8000) != 0)
+  t |= 0x8000;
+  }
+  //t |= ((1 << (c+1)) - 1) << (32 - c);
+
+  return t;


Really?  I can't imagine what lead you to write this.
Who writes a simple shift operation with a loop?

Perhaps no helper at all and

  tcg_gen_sra_tl(ret, b, c);
  tcg_gen_ext32s_tl(ret, ret);



+target_ulong helper_ffs32(CPUARCState *env, uint64_t src)
+{
+int i;
+if (src == 0) {
+  return 31;
+}
+for (i = 0; i <= 31; i++) {
+  if (((src >> i) & 1) != 0) {
+break;
+  }
+}
+return i;
+}


tcg_gen_ori_tl(ret, src, MAKE_64BIT_MASK(32, 32));
tcg_gen_ctzi_tl(ret, ret, 31);

Though I really wonder if you've got that function correct, as it's not the 
*normal* definition of ffs...




+target_ulong helper_norml(CPUARCState *env, uint64_t src1)
+{
+int i;
+int64_t tmp = (int64_t) src1;
+if (tmp == 0 || tmp == -1) {
+  return 0;
+}
+for (i = 0; i <= 63; i++) {
+  if ((tmp >> i) == 0) {
+  break;
+  }
+  if ((tmp >> i) == -1) {
+  break;
+  }
+}
+return i;
+}


This is some cognate of count-leading-repititions-of-sign-bit, 
tcg_gen_clrsb_tl.  A decent computation should be like


  tcg_gen_clrsb_i64(ret, src);
  tcg_gen_subfi_i64(ret, 63, ret);



diff --git a/target/arc/semfunc-v2_mapping.def 
b/target/arc/semfunc-v2_mapping.def
new file mode 100644
index 00..ab8d9ff123
--- /dev/null
+++ b/target/arc/semfunc-v2_mapping.def


You could have named this properly to start.


r~



Re: [PATCH 0/2] hw/i2c: Adds pca954x i2c mux switch device

2021-04-07 Thread Patrick Venture
On Tue, Apr 6, 2021 at 4:39 PM Corey Minyard  wrote:
>
> On Tue, Apr 06, 2021 at 03:21:18PM -0700, Patrick Venture wrote:
> > On Tue, Apr 6, 2021 at 11:36 AM Corey Minyard  wrote:
> > >
> > > On Tue, Apr 06, 2021 at 08:55:14AM -0700, Patrick Venture wrote:
> > > > On Tue, Apr 6, 2021 at 8:41 AM Patrick Venture  
> > > > wrote:
> > > > >
> > > > > On Mon, Apr 5, 2021 at 12:58 PM Corey Minyard  
> > > > > wrote:
> > > > > >
> > > > > > On Sat, Apr 03, 2021 at 03:28:08PM -0700, Patrick Venture wrote:
> > > > > > > The i2c mux device pca954x implements two devices:
> > > > > > >  - the pca9546 and pca9548.
> > > > > > >
> > > > > > > Patrick Venture (2):
> > > > > > >   hw/i2c/core: add reachable state boolean
> > > > > > >   hw/i2c: add pca954x i2c-mux switch
> > > > > >
> > > > > > Looking this over, the code looks good, but I have a few general
> > > > > > questions:
> > > > > >
> > > > > > * Can you register the same slave address on different channels?  
> > > > > > That's
> > > > > >   something you could do with real hardware and might be required at
> > > > > >   some time.  It looks like to me that you can't with this patch 
> > > > > > set,
> > > > > >   but maybe I'm missing something.
> > > > >
> > > > > If I understand the hardware's implementation properly you can have
> > > > > collisions, and this allows for collisions.  I'm not sure what you
> > > > > mean by having both accessible.  For instance, on hardware you can
> > > > > have a switch with N channels, and on two of the channels there is an
> > > > > eeprom at 50.  But you're unable to talk to both eeproms at the same
> > > > > time, because the addresses collide -- so how would the hardware know
> > > > > which you're talking to?  My understanding of the behavior in this
> > > > > collision case is that it just talks to the first one that responds
> > > > > and can lead to unexpected things.
> > > > >
> > > > > There is a board, the quanta-q71l where we had to set the
> > > > > idle-disconnect because there were two muxes on the same bus, with
> > > > > conflicting addresses, and so we had to use idle disconnect explicitly
> > > > > to make the software happy talking to the hardware -- not ideal as
> > > > > having two devices behind different channels, but ultimately it's the
> > > > > same idea because the devices are conflicting.
> > > > >
> > > > > >
> > > > > > * Can you add devices to the secondary I2C busses on the mux using 
> > > > > > the
> > > > > >   standard QEMU device model, or is the function call required?
> > > > >
> > > > > I added the function call because I didn't see a clean way to bridge
> > > > > the issue as well as, the quasi-arbitrary bus numbering used by the
> > > > > kernel isn't how the hardware truly behaves, and my goal was to
> > > > > implement closer to the hardware.  I thought about adding an I2cBus to
> > > > > the device and then you'd be able to access it, but wasn't sure of a
> > > > > nice clean way to plumb that through -- I considered adding/removing
> > > > > devices from the parent i2c bus instead of the boolean reachable, but
> > > > > that seemed way less clean - although do-able.
> > > > >
> > > > > >
> > > > > > I ask because I did a pca9540 and pca9541 device, but I've never
> > > > > > submitted it because I didn't think it would ever be needed.  It 
> > > > > > takes a
> > > > > > different tack on the problem; it creates the secondary busses as
> > > > > > standard QEMU I2C busses and bridges them.  You can see it at
> > > > > >
> > > > > >github.com:cminyard/qemu.git master-i2c-rebase
> > > > > >
> > > > >
> > > > > I'll have to take a look at your approach, but the idea that it
> > > > > wouldn't be needed sounds bizarre to me as nearly all BMC-based qemu
> > > > > boards leverage i2c muxes to handle their PCIe slot i2c routing.
> > > > >
> > > > > > If you design can do the things I ask, then it's better.  If not, 
> > > > > > then
> > > > > > I'm not sure.
> > > >
> > > > Corey,
> > > >
> > > > looking at your design, I should be able to do something similar with
> > > > a small tweak.
> > > >
> > > > I think my design follows the hardware where there can be conflicts,
> > > > etc, but what I didn't know how to do was add the faux I2cBuses in a
> > > > useful way -- but if I add the I2cBuses to the device, and then on
> > > > add/remove it registers the device on the parent bus -- i can still
> > > > use the reachable boolean to control whether it's present.  The faux
> > > > I2cBuses would be a simplification for adding/removing i2c devices --
> > > > and would act as the device list in my object.  So then setting the
> > > > channels would change to walking the devices held by the bus that
> > > > corresponds with the bit -- but _still_ using the reachable boolean.
> > > >
> > > > If you'd like, I can update my patchset to use an i2cbus for the
> > > > purpose above, then it would satisfy the requirement of leveraging the
> > > > normal device process and no longer require 

[Bug 1915063] Re: Windows 10 wil not install using qemu-system-x86_64

2021-04-07 Thread Babu Moger
I remember seeing something similar before. This was supposed to be
fixed by the linux kernel commit.

commit 841c2be09fe4f495fe5224952a419bd8c7e5b455
Author: Maxim Levitsky 
Date:   Wed Jul 8 14:57:31 2020 +0300

kvm: x86: replace kvm_spec_ctrl_test_value with runtime test on the host

# git describe --contains 841c2be09fe4f495fe5224952a419bd8c7e5b455
v5.9-rc1~121^2~67

Problem seems to happen with EPYC-Rome model which exposes the feature
STIBP but not IBRS.

Did you guys  try "-cpu host"? It might work.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1915063

Title:
  Windows 10 wil not install using qemu-system-x86_64

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  Steps to reproduce
  install virt-manager and ovmf if nopt already there
  copy windows and virtio iso files to /var/lib/libvirt/images

  Use virt-manager from local machine to create your VMs with the disk, CPUs 
and memory required
  Select customize configuration then select OVMF(UEFI) instead of seabios
  set first CDROM to the windows installation iso (enable in boot options)
  add a second CDROM and load with the virtio iso
change spice display to VNC

Always get a security error from windows and it fails to launch the 
installer (works on RHEL and Fedora)
  I tried updating the qemu version from Focals 4.2 to Groovy 5.0 which was of 
no help
  --- 
  ProblemType: Bug
  ApportVersion: 2.20.11-0ubuntu27.14
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  DistributionChannelDescriptor:
   # This is the distribution channel descriptor for the OEM CDs
   # For more information see 
http://wiki.ubuntu.com/DistributionChannelDescriptor
   
canonical-oem-sutton-focal-amd64-20201030-422+pc-sutton-bachman-focal-amd64+X00
  DistroRelease: Ubuntu 20.04
  InstallationDate: Installed on 2021-01-20 (19 days ago)
  InstallationMedia: Ubuntu 20.04 "Focal" - Build amd64 LIVE Binary 
20201030-14:39
  MachineType: LENOVO 30E102Z
  NonfreeKernelModules: nvidia_modeset nvidia
  Package: linux (not installed)
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 EFI VGA
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.6.0-1042-oem 
root=UUID=389cd165-fc52-4814-b837-a1090b9c2387 ro locale=en_US quiet splash 
vt.handoff=7
  ProcVersionSignature: Ubuntu 5.6.0-1042.46-oem 5.6.19
  RelatedPackageVersions:
   linux-restricted-modules-5.6.0-1042-oem N/A
   linux-backports-modules-5.6.0-1042-oem  N/A
   linux-firmware  1.187.8
  RfKill:
   
  Tags:  focal
  Uname: Linux 5.6.0-1042-oem x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: adm cdrom dip docker kvm libvirt lpadmin plugdev sambashare sudo
  _MarkForUpload: True
  dmi.bios.date: 07/29/2020
  dmi.bios.vendor: LENOVO
  dmi.bios.version: S07KT08A
  dmi.board.name: 1046
  dmi.board.vendor: LENOVO
  dmi.board.version: Not Defined
  dmi.chassis.type: 3
  dmi.chassis.vendor: LENOVO
  dmi.chassis.version: None
  dmi.modalias: 
dmi:bvnLENOVO:bvrS07KT08A:bd07/29/2020:svnLENOVO:pn30E102Z:pvrThinkStationP620:rvnLENOVO:rn1046:rvrNotDefined:cvnLENOVO:ct3:cvrNone:
  dmi.product.family: INVALID
  dmi.product.name: 30E102Z
  dmi.product.sku: LENOVO_MT_30E1_BU_Think_FM_ThinkStation P620
  dmi.product.version: ThinkStation P620
  dmi.sys.vendor: LENOVO

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1915063/+subscriptions



Re: [PATCH 18/27] arcv3: Decoder code

2021-04-07 Thread Richard Henderson

On 4/5/21 7:31 AM, cupertinomira...@gmail.com wrote:

From: Cupertino Miranda 

---
  disas/arc.c|   51 +-
  target/arc/decoder-v3.c| 1547 
  target/arc/decoder-v3.h|  322 
  target/arc/flags-v3.def|  103 +++
  target/arc/operands-v3.def |  133 
  5 files changed, 2147 insertions(+), 9 deletions(-)
  create mode 100644 target/arc/decoder-v3.c
  create mode 100644 target/arc/decoder-v3.h
  create mode 100644 target/arc/flags-v3.def
  create mode 100644 target/arc/operands-v3.def


Do we really need a complete copy of the v2 decoder included with the v3 
decoder?


r~



Re: [PATCH v2 1/1] decodetree: Add support for 64-bit instructions

2021-04-07 Thread Philippe Mathieu-Daudé
On 4/8/21 12:18 AM, Luis Fernando Fujita Pires wrote:
> Allow '64' to be specified for the instruction width command line params
> and use the appropriate insn/field data types, mask, extract and deposit
> functions in that case.
> 
> This will be used to implement the new 64-bit Power ISA 3.1 instructions.
> 
> Signed-off-by: Luis Pires 
> ---
>  docs/devel/decodetree.rst |  5 +++--
>  scripts/decodetree.py | 25 -
>  2 files changed, 23 insertions(+), 7 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé 



Re: [PATCH v2 1/1] decodetree: Add support for 64-bit instructions

2021-04-07 Thread Richard Henderson

On 4/7/21 3:18 PM, Luis Fernando Fujita Pires wrote:

Allow '64' to be specified for the instruction width command line params
and use the appropriate insn/field data types, mask, extract and deposit
functions in that case.

This will be used to implement the new 64-bit Power ISA 3.1 instructions.

Signed-off-by: Luis Pires
---
  docs/devel/decodetree.rst |  5 +++--
  scripts/decodetree.py | 25 -
  2 files changed, 23 insertions(+), 7 deletions(-)


Reviewed-by: Richard Henderson 
Queued for 6.1.


r~



Re: [PATCH-for-6.0?] hw/arm/imx25_pdk: Fix error message for invalid RAM size

2021-04-07 Thread Richard Henderson

On 4/7/21 3:56 PM, Philippe Mathieu-Daudé wrote:

The i.MX25 PDK board has 2 banks for SDRAM, each can
address up to 256 MiB. So the total RAM usable for this
board is 512M. When we ask for more we get a misleading
error message:

   $ qemu-system-arm -M imx25-pdk -m 513M
   qemu-system-arm: Invalid RAM size, should be 128 MiB

Update the error message to better match the reality:

   $ qemu-system-arm -M imx25-pdk -m 513M
   qemu-system-arm: RAM size more than 512 MiB is not supported

Fixes: bf350daae02 ("arm/imx25_pdk: drop RAM size fixup")
Signed-off-by: Philippe Mathieu-Daudé
---
  hw/arm/imx25_pdk.c | 5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH-for-6.0?] hw/rx/rx-gdbsim: Do not accept invalid memory size

2021-04-07 Thread Richard Henderson

On 4/7/21 3:30 PM, Philippe Mathieu-Daudé wrote:

We check the amount of RAM is enough, warn when it is
not, but if so we neglect to bail out. Fix that by
adding the missing exit() call.

Fixes: bda19d7bb56 ("hw/rx: Add RX GDB simulator")
Signed-off-by: Philippe Mathieu-Daudé
---
  hw/rx/rx-gdbsim.c | 1 +
  1 file changed, 1 insertion(+)


Reviewed-by: Richard Henderson 

r~



[PATCH-for-6.0?] hw/arm/imx25_pdk: Fix error message for invalid RAM size

2021-04-07 Thread Philippe Mathieu-Daudé
The i.MX25 PDK board has 2 banks for SDRAM, each can
address up to 256 MiB. So the total RAM usable for this
board is 512M. When we ask for more we get a misleading
error message:

  $ qemu-system-arm -M imx25-pdk -m 513M
  qemu-system-arm: Invalid RAM size, should be 128 MiB

Update the error message to better match the reality:

  $ qemu-system-arm -M imx25-pdk -m 513M
  qemu-system-arm: RAM size more than 512 MiB is not supported

Fixes: bf350daae02 ("arm/imx25_pdk: drop RAM size fixup")
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/arm/imx25_pdk.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/hw/arm/imx25_pdk.c b/hw/arm/imx25_pdk.c
index 1c201d0d8ed..51fde71b1bd 100644
--- a/hw/arm/imx25_pdk.c
+++ b/hw/arm/imx25_pdk.c
@@ -67,7 +67,6 @@ static struct arm_boot_info imx25_pdk_binfo;
 
 static void imx25_pdk_init(MachineState *machine)
 {
-MachineClass *mc = MACHINE_GET_CLASS(machine);
 IMX25PDK *s = g_new0(IMX25PDK, 1);
 unsigned int ram_size;
 unsigned int alias_offset;
@@ -79,8 +78,8 @@ static void imx25_pdk_init(MachineState *machine)
 
 /* We need to initialize our memory */
 if (machine->ram_size > (FSL_IMX25_SDRAM0_SIZE + FSL_IMX25_SDRAM1_SIZE)) {
-char *sz = size_to_str(mc->default_ram_size);
-error_report("Invalid RAM size, should be %s", sz);
+char *sz = size_to_str(FSL_IMX25_SDRAM0_SIZE + FSL_IMX25_SDRAM1_SIZE);
+error_report("RAM size more than %s is not supported", sz);
 g_free(sz);
 exit(EXIT_FAILURE);
 }
-- 
2.26.3




[PATCH-for-6.0?] hw/rx/rx-gdbsim: Do not accept invalid memory size

2021-04-07 Thread Philippe Mathieu-Daudé
We check the amount of RAM is enough, warn when it is
not, but if so we neglect to bail out. Fix that by
adding the missing exit() call.

Fixes: bda19d7bb56 ("hw/rx: Add RX GDB simulator")
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/rx/rx-gdbsim.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/rx/rx-gdbsim.c b/hw/rx/rx-gdbsim.c
index b1d7c2488ff..4e4ececae4b 100644
--- a/hw/rx/rx-gdbsim.c
+++ b/hw/rx/rx-gdbsim.c
@@ -93,6 +93,7 @@ static void rx_gdbsim_init(MachineState *machine)
 char *sz = size_to_str(mc->default_ram_size);
 error_report("Invalid RAM size, should be more than %s", sz);
 g_free(sz);
+exit(1);
 }
 
 /* Allocate memory space */
-- 
2.26.3




[PATCH v2 0/1] Add 64-bit instruction support to decodetree

2021-04-07 Thread Luis Fernando Fujita Pires
This adds support for 64-bit instructions to decodetree.py.

It will be necessary to later on use decodetree to implement the new 64-bit 
Power ISA 3.1 instructions.

While doing this change, I thought it would also be nice to be able to specify 
different sizes for each field in arg structs and also infer whether to use 
signed/unsigned data types based on the field definition. But those would be 
different changes, anyway, and I limited myself to using int64_t for the data 
type of fields in arg structs when insnwidth == 64.

v2:
- Added information about the field data types used in arg structs


Luis Pires (1):
  decodetree: Add support for 64-bit instructions

 docs/devel/decodetree.rst |  5 +++--
 scripts/decodetree.py | 25 -
 2 files changed, 23 insertions(+), 7 deletions(-)

-- 
2.25.1



[PATCH v2 1/1] decodetree: Add support for 64-bit instructions

2021-04-07 Thread Luis Fernando Fujita Pires
Allow '64' to be specified for the instruction width command line params
and use the appropriate insn/field data types, mask, extract and deposit
functions in that case.

This will be used to implement the new 64-bit Power ISA 3.1 instructions.

Signed-off-by: Luis Pires 
---
 docs/devel/decodetree.rst |  5 +++--
 scripts/decodetree.py | 25 -
 2 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/docs/devel/decodetree.rst b/docs/devel/decodetree.rst
index 74f66bf46e..d776dae14f 100644
--- a/docs/devel/decodetree.rst
+++ b/docs/devel/decodetree.rst
@@ -40,8 +40,9 @@ and returns an integral value extracted from there.
 
 A field with no ``unnamed_fields`` and no ``!function`` is in error.
 
-FIXME: the fields of the structure into which this result will be stored
-is restricted to ``int``.  Which means that we cannot expand 64-bit items.
+The fields of the structure into which this result will be stored are
+defined as ``int`` when the instruction size is set to 16 or 32 bits
+and as ``int64_t`` when the instruction size is set to 64 bits.
 
 Field examples:
 
diff --git a/scripts/decodetree.py b/scripts/decodetree.py
index 4637b633e7..3450a2a08d 100644
--- a/scripts/decodetree.py
+++ b/scripts/decodetree.py
@@ -42,6 +42,10 @@
 output_fd = None
 insntype = 'uint32_t'
 decode_function = 'decode'
+field_data_type = 'int'
+extract_function = 'extract32'
+sextract_function = 'sextract32'
+deposit_function = 'deposit32'
 
 # An identifier for C.
 re_C_ident = '[a-zA-Z][a-zA-Z0-9_]*'
@@ -185,9 +189,9 @@ def __str__(self):
 
 def str_extract(self):
 if self.sign:
-extr = 'sextract32'
+extr = sextract_function
 else:
-extr = 'extract32'
+extr = extract_function
 return '{0}(insn, {1}, {2})'.format(extr, self.pos, self.len)
 
 def __eq__(self, other):
@@ -215,8 +219,8 @@ def str_extract(self):
 if pos == 0:
 ret = f.str_extract()
 else:
-ret = 'deposit32({0}, {1}, {2}, {3})' \
-  .format(ret, pos, 32 - pos, f.str_extract())
+ret = '{4}({0}, {1}, {2}, {3})' \
+  .format(ret, pos, insnwidth - pos, f.str_extract(), 
deposit_function)
 pos += f.len
 return ret
 
@@ -311,7 +315,7 @@ def output_def(self):
 if not self.extern:
 output('typedef struct {\n')
 for n in self.fields:
-output('int ', n, ';\n')
+output('', field_data_type, ' ', n, ';\n')
 output('} ', self.struct_name(), ';\n\n')
 # end Arguments
 
@@ -1264,6 +1268,10 @@ def main():
 global insntype
 global insnmask
 global decode_function
+global extract_function
+global sextract_function
+global deposit_function
+global field_data_type
 global variablewidth
 global anyextern
 
@@ -1293,6 +1301,13 @@ def main():
 if insnwidth == 16:
 insntype = 'uint16_t'
 insnmask = 0x
+elif insnwidth == 64:
+insntype = 'uint64_t'
+insnmask = 0x
+field_data_type = 'int64_t'
+extract_function = 'extract64'
+sextract_function = 'sextract64'
+deposit_function = 'deposit64'
 elif insnwidth != 32:
 error(0, 'cannot handle insns of width', insnwidth)
 else:
-- 
2.25.1



[Bug 1921948] Re: MTE tags not checked properly for unaligned accesses at EL1

2021-04-07 Thread Andrey Konovalov
Ah, there's v4 now.

Tested with KASAN tests + a custom test to check unaligned accesses that
span across two granules, everything works.

Thank you!

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1921948

Title:
  MTE tags not checked properly for unaligned accesses at EL1

Status in QEMU:
  In Progress

Bug description:
  For kernel memory accesses that span across two memory granules,
  QEMU's MTE implementation only checks the tag of the first granule but
  not of the second one.

  To reproduce this, build the Linux kernel with CONFIG_KASAN_HW_TAGS
  enabled, apply the patch below, and boot the kernel:

  diff --git a/sound/last.c b/sound/last.c
  index f0bb98780e70..04745cb30b74 100644
  --- a/sound/last.c
  +++ b/sound/last.c
  @@ -5,12 +5,18 @@
*/
   
   #include 
  +#include 
   #include 
   
   static int __init alsa_sound_last_init(void)
   {
  struct snd_card *card;
  int idx, ok = 0;
  +
  +   char *ptr = kmalloc(128, GFP_KERNEL);
  +   pr_err("KASAN report should follow:\n");
  +   *(volatile unsigned long *)(ptr + 124);
  +   kfree(ptr);
  
  printk(KERN_INFO "ALSA device list:\n");
  for (idx = 0; idx < SNDRV_CARDS; idx++) {

  KASAN tags the 128 allocated bytes with the same tag as the returned
  pointer. The memory granule that follows the 128 allocated bytes has a
  different tag (with 1/15 probability).

  Expected result: a tag fault is detected and a KASAN report is printed when 
accessing bytes [124, 130).
  Observed result: no tag fault is detected and no KASAN report is printed.

  Here are the flags that I use to run QEMU if they matter:

  qemu-system-aarch64 -s -machine virt,mte=on -cpu max -m 2G -smp 2 -net
  user,host=10.0.2.10,hostfwd=tcp:127.0.0.1:10021-:22 -net nic
  -nographic -kernel ./Image -append "console=ttyAMA0 root=/dev/vda
  earlyprintk=serial" -drive file=./fs.img,format=raw,if=virtio -no-
  shutdown -no-reboot

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1921948/+subscriptions



Live migration support for virtio-fs

2021-04-07 Thread Ge, Xiongzi
Thanks Dr Gilbert, Vivek, Stefan, Greg!
I put together the discussion into this thread and CC qemu-devel@nongnu.org. 

Problem:
Current Virtio-FS does not support live migration.  Even when the virtiofs 
directory is not mounted on the guest, VM cannot do live migration. Any 
suggestions/interest If we want to make this work so we can still have the live 
migration feature for other purpose on that VM? 

If we already have a shared file system like Ceph, does this make it different? 

Thanks,
Xiongzi

-
Stefan's reply:
The virtio-fs device holds a bunch of state, like the ino_map, dirp_map,
and the fd_map in the virtiofsd C implementation. That is the FUSE
session state that needs to be migrated in order to resume seamlessly
(without unmounting and mounting again).

If the backend is a distributed file system then it my have APIs that
make migration easier. If it's possible to re-attach to open files from
another host then that is perfect. But on the flipside if there are no
APIs for doing that then it might be impossible to reliably live
migration because some state cannot be transferred between hosts.

Max Reitz is laying the foundation for live migration by working on
open_by_handle_at(2) support. This will probably be needed in order to
transfer open files from one host to another.

It should be possible to enable live migration when the filesystem is
not mounted. That might be a good first step to enabling live migration.


From Greg Kurz:
Live migration of virtio-fs is still at the early discussion stage AFAICT.
It might take time before we have something working. But in the meantime,
it seems abusive to block migration if we have a guarantee that the device
isn't servicing requests. FWIW virtio-9p only blocks migration when the
shared directory is mounted on the guest.


On 3/23/21, 11:01 AM, "Dr. David Alan Gilbert"  wrote:

* Vivek Goyal (vgo...@redhat.com) wrote:
> On Tue, Mar 23, 2021 at 10:29:09AM +, Dr. David Alan Gilbert wrote:
> > * Ge, Xiongzi (xiongzi...@netapp.com) wrote:
> > > Hello Vivek, Dr. Gilbert, and the virtio-fs team,
> >
> > Hi Xiongzi,
> >
> > > It seems that virtio-fs does not support live migration. Once a vm
> > > is configured with virtio-fs, live migration cannot be performed even
> > > it is not mounted on the guest.
> >
> > Right.
> >
> > > Is there any progress for this?
> >
> > Max Reitz is looking at some parts of it; in particular storing file
> > handles that can later be reopened.
> > But there are a bunch of other parts we've not looked at yet either
> > (like the dirty page marking around all syscalls).
> >
> > > If the
> > > shared directory is from a distributed file system like Ceph, would it
> > > be easier than the general case to be implemented?
> >
> > Maybe; there are some tricky semantics problems; for example, lets
> > imagine that you open the file   'a/b/c'  on the source, and sometime
> > after you open it, 'b' gets renamed to 'd';  when you resume on the
> > destination you need to make sure you know how to get to that file.
> > Depending on the filesystem semantics you might need to make that work
> > even if 'a/b/c' had been deleted but you still had it open.
>
> Hi Dave,
>
> I am assuming that with file handles, renaming of file probably is not
> a problem. open_by_handle_at() will still be able to find it.

Right; although again with soemthing like CEPH you might not even need
that if you had a virtiofs daemon that spoke direct to CEPH, you might
be dealing with a CEPH filehandle.

> I think real problem (as you pointed out later) is unliked file which is
> still in use by the virtiofsd.

Dave

> Thanks
> Vivek
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [Bug 1915063] Re: Windows 10 wil not install using qemu-system-x86_64

2021-04-07 Thread Igor Mammedov
On Wed, 07 Apr 2021 13:00:23 -
David Ober <1915...@bugs.launchpad.net> wrote:

> I have not done any of what you are asking so not exactly sure how to
> change those values, been looking and reading but not finding what I
> want so thought it might be better to just ask how to do what yo are
> asking.

see https://libvirt.org/formatdomain.html#cpu-model-and-topology
for the way to describe topology in domain xml.
Pick a real AMD CPU for cpu model you're are having problem with,
and use its config to define topology.

> I did try CPU type EPYC and that did get past the error I am
> seeing on install
So it works with EPYC but not with ECPY-Rome, then probably topology
is not issue.

CCing Babu,
who added EPYC-Rome cpu model, maybe he can help





Re: [Bug 1921948] Re: MTE tags not checked properly for unaligned accesses at EL1

2021-04-07 Thread Alex Bennée


Alex Bennée  writes:

> Andrey Konovalov <1921...@bugs.launchpad.net> writes:
>
>> Is this with QEMU master without the patches mentioned in this bug?
>
> This is with Richard's latest series.
>
>>
>> Which kernel version do you use?
>
> v5.11
>
>> Could you share your kernel config?
>
> We are just testing with Richard's config and eliminating compiler
> shenanigans now.

OK with v5.12-rc5 and Richard's config I get a clean pass.


-- 
Alex Bennée



Re: [PATCH v4 12/12] exec: Fix overlap of PAGE_ANON and PAGE_TARGET_1

2021-04-07 Thread Nathan Chancellor
On Tue, Apr 06, 2021 at 10:40:31AM -0700, Richard Henderson wrote:
> Unfortuately, the elements of PAGE_* were not in numerical
> order and so PAGE_ANON was added to an "unused" bit.
> As an arbitrary choice, move PAGE_TARGET_{1,2} together.
> 
> Cc: Laurent Vivier 
> Fixes: 26bab757d41b ("linux-user: Introduce PAGE_ANON")
> Buglink: https://bugs.launchpad.net/bugs/1922617
> Signed-off-by: Richard Henderson 

Tested-by: Nathan Chancellor 

> ---
>  include/exec/cpu-all.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
> index d76b0b9e02..32cfb634c6 100644
> --- a/include/exec/cpu-all.h
> +++ b/include/exec/cpu-all.h
> @@ -268,8 +268,8 @@ extern intptr_t qemu_host_page_mask;
>  #define PAGE_RESERVED  0x0100
>  #endif
>  /* Target-specific bits that will be used via page_get_flags().  */
> -#define PAGE_TARGET_1  0x0080
> -#define PAGE_TARGET_2  0x0200
> +#define PAGE_TARGET_1  0x0200
> +#define PAGE_TARGET_2  0x0400
>  
>  #if defined(CONFIG_USER_ONLY)
>  void page_dump(FILE *f);
> -- 
> 2.25.1
> 
> 



Re: [Bug 1921948] Re: MTE tags not checked properly for unaligned accesses at EL1

2021-04-07 Thread Alex Bennée


Andrey Konovalov <1921...@bugs.launchpad.net> writes:

> Is this with QEMU master without the patches mentioned in this bug?

This is with Richard's latest series.

>
> Which kernel version do you use?

v5.11

> Could you share your kernel config?

We are just testing with Richard's config and eliminating compiler
shenanigans now.


-- 
Alex Bennée



Re: [PATCH v4 for-6.0 02/12] esp: rework write_response() to avoid using the FIFO for DMA transactions

2021-04-07 Thread Philippe Mathieu-Daudé
On 4/7/21 9:57 PM, Mark Cave-Ayland wrote:
> The code for write_response() has always used the FIFO to store the data for
> the status/message in phases, even for DMA transactions. Switch to using a
> separate buffer that can be used directly for DMA transactions and restrict
> the FIFO use to the non-DMA case.
> 
> Signed-off-by: Mark Cave-Ayland 
> Tested-by: Alexander Bulekov 
> ---
>  hw/scsi/esp.c | 13 ++---
>  1 file changed, 6 insertions(+), 7 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé 



Re: [PULL v2 11/19] pci: acpi: ensure that acpi-index is unique

2021-04-07 Thread Igor Mammedov
On Wed, 7 Apr 2021 09:29:45 +0100
Daniel P. Berrangé  wrote:

> On Tue, Apr 06, 2021 at 08:15:46PM +0200, Igor Mammedov wrote:
> > On Tue, 6 Apr 2021 16:07:25 +0100
> > Daniel P. Berrangé  wrote:
> >   
> > > On Tue, Apr 06, 2021 at 03:54:24PM +0100, Daniel P. Berrangé wrote:  
> > > > On Mon, Mar 22, 2021 at 07:00:18PM -0400, Michael S. Tsirkin wrote:
> > > > > From: Igor Mammedov 
> > > > > 
> > > > > it helps to avoid device naming conflicts when guest OS is
> > > > > configured to use acpi-index for naming.
> > > > > Spec ialso says so:
> > > > > 
> > > > > PCI Firmware Specification Revision 3.2
> > > > > 4.6.7.  _DSM for Naming a PCI or PCI Express Device Under Operating 
> > > > > Systems
> > > > > "
> > > > > Instance number must be unique under \_SB scope. This instance number 
> > > > > does not have to
> > > > > be sequential in a given system configuration.
> > > > > "
> > > > > 
> > > > > Signed-off-by: Igor Mammedov 
> > > > > Message-Id: <20210315180102.3008391-4-imamm...@redhat.com>
> > > > > Reviewed-by: Michael S. Tsirkin 
> > > > > Signed-off-by: Michael S. Tsirkin 
> > > > > ---
> > > > >  hw/acpi/pcihp.c | 46 ++
> > > > >  1 file changed, 46 insertions(+)
> > > > > 
> > > > > diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
> > > > > index ceab287bd3..f4cb3c979d 100644
> > > > > --- a/hw/acpi/pcihp.c
> > > > > +++ b/hw/acpi/pcihp.c
> > > > > @@ -52,6 +52,21 @@ typedef struct AcpiPciHpFind {
> > > > >  PCIBus *bus;
> > > > >  } AcpiPciHpFind;
> > > > >  
> > > > > +static gint g_cmp_uint32(gconstpointer a, gconstpointer b, gpointer 
> > > > > user_data)
> > > > > +{
> > > > > +return a - b;
> > > > > +}
> > > > > +
> > > > > +static GSequence *pci_acpi_index_list(void)
> > > > > +{
> > > > > +static GSequence *used_acpi_index_list;
> > > > > +
> > > > > +if (!used_acpi_index_list) {
> > > > > +used_acpi_index_list = g_sequence_new(NULL);
> > > > > +}
> > > > > +return used_acpi_index_list;
> > > > > +}
> > > > > +
> > > > >  static int acpi_pcihp_get_bsel(PCIBus *bus)
> > > > >  {
> > > > >  Error *local_err = NULL;
> > > > > @@ -277,6 +292,23 @@ void 
> > > > > acpi_pcihp_device_pre_plug_cb(HotplugHandler *hotplug_dev,
> > > > > ONBOARD_INDEX_MAX);
> > > > >  return;
> > > > >  }
> > > > > +
> > > > > +/*
> > > > > + * make sure that acpi-index is unique across all present PCI 
> > > > > devices
> > > > > + */
> > > > > +if (pdev->acpi_index) {
> > > > > +GSequence *used_indexes = pci_acpi_index_list();
> > > > > +
> > > > > +if (g_sequence_lookup(used_indexes, 
> > > > > GINT_TO_POINTER(pdev->acpi_index),
> > > > > +  g_cmp_uint32, NULL)) {
> > > > > +error_setg(errp, "a PCI device with acpi-index = %" 
> > > > > PRIu32
> > > > > +   " already exist", pdev->acpi_index);
> > > > > +return;
> > > > > +}
> > > > > +g_sequence_insert_sorted(used_indexes,
> > > > > + GINT_TO_POINTER(pdev->acpi_index),
> > > > > + g_cmp_uint32, NULL);
> > > > > +}
> > > > 
> > > > This doesn't appear to ensure uniqueness when using PCIe topologies:
> > > > 
> > > > $ ./build/x86_64-softmmu/qemu-system-x86_64 \
> > > >  -device virtio-net,acpi-index=100 \
> > > >  -device virtio-net,acpi-index=100
> > > > qemu-system-x86_64: -device virtio-net,acpi-index=100: a PCI device 
> > > > with acpi-index = 100 already exist
> > > > 
> > > > $ ./build/x86_64-softmmu/qemu-system-x86_64 \
> > > >  -M q35 \
> > > >  -device virtio-net,acpi-index=100
> > > >  -device virtio-net,acpi-index=100
> > > > happily running
> > > 
> > > In fact the entire concept doesn't appear to work with Q35 at all as
> > > implemented.
> > > 
> > > The 'acpi_index' file in the guest OS never gets created and the NICs
> > > are still called 'eth0', 'eth1'
> > > 
> > > Only with i440fx can I can the "enoNNN" based naming to work with
> > > acpi-index set from QEMU  
> > 
> > It is not supported on Q35 yet as it depends on ACPI PCI hotplug 
> > infrastructure.
> > Once Julia is done with porting it to Q35, acpi-index will be pulled along 
> > with it.  
> 
> Will the PCI hotplug support work in the same way
> 
> Looking at this doc I see two options:
> 
>   
> https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/
> 
>  1. Names incorporating Firmware/BIOS provided index numbers for on-board 
> devices (example: eno1)
>  2. Names incorporating Firmware/BIOS provided PCI Express hotplug slot index 
> numbers (example: ens1) 
> 
> Is the stuff Julia is implementing for Q35 going to end up
> triggering scenario (1) still, or will it trigger scenario two
> which mentions "hotplug slot index" as a distinct concept from
> the ACPI index we're setting for i440fx ?

it will trigger (1) 

Re: [PATCH v4 for-6.0 10/12] esp: don't reset async_len directly in esp_select() if cancelling request

2021-04-07 Thread Philippe Mathieu-Daudé
On 4/7/21 9:57 PM, Mark Cave-Ayland wrote:
> Instead let the SCSI layer invoke the .cancel callback itself to cancel and
> reset the request state.
> 
> Signed-off-by: Mark Cave-Ayland 
> Tested-by: Alexander Bulekov 
> ---
>  hw/scsi/esp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Philippe Mathieu-Daudé 



[Bug 1921948] Re: MTE tags not checked properly for unaligned accesses at EL1

2021-04-07 Thread Richard Henderson
Re comments #8 and #10, I don't replicate that.
I get full pass on KASAN_UNIT_TEST with
and without virtualization enabled.

Re comment #9, if there are bugs suspected in qemu, they
need to be reported, or we'll never hear about them.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1921948

Title:
  MTE tags not checked properly for unaligned accesses at EL1

Status in QEMU:
  In Progress

Bug description:
  For kernel memory accesses that span across two memory granules,
  QEMU's MTE implementation only checks the tag of the first granule but
  not of the second one.

  To reproduce this, build the Linux kernel with CONFIG_KASAN_HW_TAGS
  enabled, apply the patch below, and boot the kernel:

  diff --git a/sound/last.c b/sound/last.c
  index f0bb98780e70..04745cb30b74 100644
  --- a/sound/last.c
  +++ b/sound/last.c
  @@ -5,12 +5,18 @@
*/
   
   #include 
  +#include 
   #include 
   
   static int __init alsa_sound_last_init(void)
   {
  struct snd_card *card;
  int idx, ok = 0;
  +
  +   char *ptr = kmalloc(128, GFP_KERNEL);
  +   pr_err("KASAN report should follow:\n");
  +   *(volatile unsigned long *)(ptr + 124);
  +   kfree(ptr);
  
  printk(KERN_INFO "ALSA device list:\n");
  for (idx = 0; idx < SNDRV_CARDS; idx++) {

  KASAN tags the 128 allocated bytes with the same tag as the returned
  pointer. The memory granule that follows the 128 allocated bytes has a
  different tag (with 1/15 probability).

  Expected result: a tag fault is detected and a KASAN report is printed when 
accessing bytes [124, 130).
  Observed result: no tag fault is detected and no KASAN report is printed.

  Here are the flags that I use to run QEMU if they matter:

  qemu-system-aarch64 -s -machine virt,mte=on -cpu max -m 2G -smp 2 -net
  user,host=10.0.2.10,hostfwd=tcp:127.0.0.1:10021-:22 -net nic
  -nographic -kernel ./Image -append "console=ttyAMA0 root=/dev/vda
  earlyprintk=serial" -drive file=./fs.img,format=raw,if=virtio -no-
  shutdown -no-reboot

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1921948/+subscriptions



Re: [PATCH v4 for-6.0 11/12] esp: ensure that do_cmd is set to zero before submitting an ESP select command

2021-04-07 Thread Philippe Mathieu-Daudé
On 4/7/21 9:58 PM, Mark Cave-Ayland wrote:
> When a CDB has been received and is about to be submitted to the SCSI layer
> via one of the ESP select commands, ensure that do_cmd is set to zero before
> executing the command.
> 
> Otherwise a guest executing 2 valid CDBs in quick sequence can invoke the SCSI
> .transfer_data callback again before do_cmd is set to zero by the callback
> function triggering an assert at the start of esp_transfer_data().
> 
> Signed-off-by: Mark Cave-Ayland 
> ---
>  hw/scsi/esp.c | 2 ++
>  1 file changed, 2 insertions(+)

Reviewed-by: Philippe Mathieu-Daudé 



Re: [PULL v2 11/19] pci: acpi: ensure that acpi-index is unique

2021-04-07 Thread Igor Mammedov
On Wed, 7 Apr 2021 09:29:22 -0400
"Michael S. Tsirkin"  wrote:

> On Tue, Apr 06, 2021 at 08:15:46PM +0200, Igor Mammedov wrote:
> > On Tue, 6 Apr 2021 16:07:25 +0100
> > Daniel P. Berrangé  wrote:
> >   
> > > On Tue, Apr 06, 2021 at 03:54:24PM +0100, Daniel P. Berrangé wrote:  
> > > > On Mon, Mar 22, 2021 at 07:00:18PM -0400, Michael S. Tsirkin wrote:
> > > > > From: Igor Mammedov 
> > > > > 
> > > > > it helps to avoid device naming conflicts when guest OS is
> > > > > configured to use acpi-index for naming.
> > > > > Spec ialso says so:
> > > > > 
> > > > > PCI Firmware Specification Revision 3.2
> > > > > 4.6.7.  _DSM for Naming a PCI or PCI Express Device Under Operating 
> > > > > Systems
> > > > > "
> > > > > Instance number must be unique under \_SB scope. This instance number 
> > > > > does not have to
> > > > > be sequential in a given system configuration.
> > > > > "
> > > > > 
> > > > > Signed-off-by: Igor Mammedov 
> > > > > Message-Id: <20210315180102.3008391-4-imamm...@redhat.com>
> > > > > Reviewed-by: Michael S. Tsirkin 
> > > > > Signed-off-by: Michael S. Tsirkin 
> > > > > ---
> > > > >  hw/acpi/pcihp.c | 46 ++
> > > > >  1 file changed, 46 insertions(+)
> > > > > 
> > > > > diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
> > > > > index ceab287bd3..f4cb3c979d 100644
> > > > > --- a/hw/acpi/pcihp.c
> > > > > +++ b/hw/acpi/pcihp.c
> > > > > @@ -52,6 +52,21 @@ typedef struct AcpiPciHpFind {
> > > > >  PCIBus *bus;
> > > > >  } AcpiPciHpFind;
> > > > >  
> > > > > +static gint g_cmp_uint32(gconstpointer a, gconstpointer b, gpointer 
> > > > > user_data)
> > > > > +{
> > > > > +return a - b;
> > > > > +}
> > > > > +
> > > > > +static GSequence *pci_acpi_index_list(void)
> > > > > +{
> > > > > +static GSequence *used_acpi_index_list;
> > > > > +
> > > > > +if (!used_acpi_index_list) {
> > > > > +used_acpi_index_list = g_sequence_new(NULL);
> > > > > +}
> > > > > +return used_acpi_index_list;
> > > > > +}
> > > > > +
> > > > >  static int acpi_pcihp_get_bsel(PCIBus *bus)
> > > > >  {
> > > > >  Error *local_err = NULL;
> > > > > @@ -277,6 +292,23 @@ void 
> > > > > acpi_pcihp_device_pre_plug_cb(HotplugHandler *hotplug_dev,
> > > > > ONBOARD_INDEX_MAX);
> > > > >  return;
> > > > >  }
> > > > > +
> > > > > +/*
> > > > > + * make sure that acpi-index is unique across all present PCI 
> > > > > devices
> > > > > + */
> > > > > +if (pdev->acpi_index) {
> > > > > +GSequence *used_indexes = pci_acpi_index_list();
> > > > > +
> > > > > +if (g_sequence_lookup(used_indexes, 
> > > > > GINT_TO_POINTER(pdev->acpi_index),
> > > > > +  g_cmp_uint32, NULL)) {
> > > > > +error_setg(errp, "a PCI device with acpi-index = %" 
> > > > > PRIu32
> > > > > +   " already exist", pdev->acpi_index);
> > > > > +return;
> > > > > +}
> > > > > +g_sequence_insert_sorted(used_indexes,
> > > > > + GINT_TO_POINTER(pdev->acpi_index),
> > > > > + g_cmp_uint32, NULL);
> > > > > +}
> > > > 
> > > > This doesn't appear to ensure uniqueness when using PCIe topologies:
> > > > 
> > > > $ ./build/x86_64-softmmu/qemu-system-x86_64 \
> > > >  -device virtio-net,acpi-index=100 \
> > > >  -device virtio-net,acpi-index=100
> > > > qemu-system-x86_64: -device virtio-net,acpi-index=100: a PCI device 
> > > > with acpi-index = 100 already exist
> > > > 
> > > > $ ./build/x86_64-softmmu/qemu-system-x86_64 \
> > > >  -M q35 \
> > > >  -device virtio-net,acpi-index=100
> > > >  -device virtio-net,acpi-index=100
> > > > happily running
> > > 
> > > In fact the entire concept doesn't appear to work with Q35 at all as
> > > implemented.
> > > 
> > > The 'acpi_index' file in the guest OS never gets created and the NICs
> > > are still called 'eth0', 'eth1'
> > > 
> > > Only with i440fx can I can the "enoNNN" based naming to work with
> > > acpi-index set from QEMU  
> > 
> > It is not supported on Q35 yet as it depends on ACPI PCI hotplug 
> > infrastructure.
> > Once Julia is done with porting it to Q35, acpi-index will be pulled along 
> > with it.  
> 
> 
> Right. But for now, should we make it fail instead of being ignored silently?
> If we don't how will managament find out it's not really supported?
> And if we make it fail how will management then find out when it's finally
> supported?

I had an idea to add capability flag to MachineInfo in QMP schema
and then do ugly check from PCIDevice.realize()
1)
 if (acpi_index!=0 && current_machine->has_pci_acpi_index)
  error out

However Daniel said that he didn't think that MachineInfo was
the right place for it.

Problem is that we can't check acpi-index unsupported configuration
at PCIDevice.realize() time since we don't know about availability

Re: [PATCH 03/24] aspeed/i2c: Fix DMA address mask

2021-04-07 Thread Philippe Mathieu-Daudé
Hi Cédric,

On 4/7/21 7:16 PM, Cédric Le Goater wrote:
> The RAM memory region is now used for DMAs accesses instead of the
> memory address space region. Mask off the top bits of the DMA address
> to reflect this change.
> 
> Cc: Philippe Mathieu-Daudé 
> Signed-off-by: Cédric Le Goater 
> ---
>  hw/i2c/aspeed_i2c.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/i2c/aspeed_i2c.c b/hw/i2c/aspeed_i2c.c
> index 518a3f5c6f9d..e7133528899f 100644
> --- a/hw/i2c/aspeed_i2c.c
> +++ b/hw/i2c/aspeed_i2c.c
> @@ -601,7 +601,7 @@ static void aspeed_i2c_bus_write(void *opaque, hwaddr 
> offset,
>  break;
>  }
>  
> -bus->dma_addr = value & 0xfffc;
> +bus->dma_addr = value & 0x3ffc;

This field is migrated (aspeed_i2c_bus_vmstate).

Does the first patch "aspeed/smc: Use the RAM memory region for DMAs"
break the migration?



[Bug 1921948] Re: MTE tags not checked properly for unaligned accesses at EL1

2021-04-07 Thread Andrey Konovalov
Is this with QEMU master without the patches mentioned in this bug?

Which kernel version do you use?

Could you share your kernel config?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1921948

Title:
  MTE tags not checked properly for unaligned accesses at EL1

Status in QEMU:
  In Progress

Bug description:
  For kernel memory accesses that span across two memory granules,
  QEMU's MTE implementation only checks the tag of the first granule but
  not of the second one.

  To reproduce this, build the Linux kernel with CONFIG_KASAN_HW_TAGS
  enabled, apply the patch below, and boot the kernel:

  diff --git a/sound/last.c b/sound/last.c
  index f0bb98780e70..04745cb30b74 100644
  --- a/sound/last.c
  +++ b/sound/last.c
  @@ -5,12 +5,18 @@
*/
   
   #include 
  +#include 
   #include 
   
   static int __init alsa_sound_last_init(void)
   {
  struct snd_card *card;
  int idx, ok = 0;
  +
  +   char *ptr = kmalloc(128, GFP_KERNEL);
  +   pr_err("KASAN report should follow:\n");
  +   *(volatile unsigned long *)(ptr + 124);
  +   kfree(ptr);
  
  printk(KERN_INFO "ALSA device list:\n");
  for (idx = 0; idx < SNDRV_CARDS; idx++) {

  KASAN tags the 128 allocated bytes with the same tag as the returned
  pointer. The memory granule that follows the 128 allocated bytes has a
  different tag (with 1/15 probability).

  Expected result: a tag fault is detected and a KASAN report is printed when 
accessing bytes [124, 130).
  Observed result: no tag fault is detected and no KASAN report is printed.

  Here are the flags that I use to run QEMU if they matter:

  qemu-system-aarch64 -s -machine virt,mte=on -cpu max -m 2G -smp 2 -net
  user,host=10.0.2.10,hostfwd=tcp:127.0.0.1:10021-:22 -net nic
  -nographic -kernel ./Image -append "console=ttyAMA0 root=/dev/vda
  earlyprintk=serial" -drive file=./fs.img,format=raw,if=virtio -no-
  shutdown -no-reboot

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1921948/+subscriptions



[Bug 1921948] Re: MTE tags not checked properly for unaligned accesses at EL1

2021-04-07 Thread Alex Bennée
It gets further without but still spams a lot of failure messages:

The buggy address belongs to the object at ff80036a2200
 which belongs to the cache kmalloc-128 of size 128
The buggy address is located 11 bytes to the right of
 128-byte region [ff80036a2200, ff80036a2280)
The buggy address belongs to the page:
page:46e01872 refcount:1 mapcount:0 mapping: index:0x0 
pfn:0x436a2
flags: 0x3fc00200(slab)
raw: 3fc00200 dead0100 dead0122 f98001c01e00
raw:  80100010 0001 f380036a2401
page dumped because: kasan: bad access detected
pages's memcg:f380036a2401

Memory state around the buggy address:
 ff80036a2000: f6 f6 f6 f6 f6 f6 f6 f6 fe fe fe fe fe fe fe fe
 ff80036a2100: fa fa fa fa fe fe fe fe fe fe fe fe fe fe fe fe
>ff80036a2200: f9 f9 f9 f9 f9 f9 f9 f9 fe fe fe fe fe fe fe fe
   ^
 ff80036a2300: fc fc fc fc fe fe fe fe fe fe fe fe fe fe fe fe
 ff80036a2400: f3 f3 f3 f3 f3 f3 f3 f3 fe fe fe fe fe fe fe fe
==
Disabling lock debugging due to kernel taint
# kmalloc_oob_right: EXPECTATION FAILED at lib/test_kasan.c:86
Expected fail_data.report_expected == fail_data.report_found, but
fail_data.report_expected == 1
fail_data.report_found == 0
not ok 1 - kmalloc_oob_right
# kmalloc_oob_left: EXPECTATION FAILED at lib/test_kasan.c:98
Expected fail_data.report_expected == fail_data.report_found, but
fail_data.report_expected == 1
fail_data.report_found == 0
not ok 2 - kmalloc_oob_left
# kmalloc_node_oob_right: EXPECTATION FAILED at lib/test_kasan.c:110
Expected fail_data.report_expected == fail_data.report_found, but
fail_data.report_expected == 1
fail_data.report_found == 0
not ok 3 - kmalloc_node_oob_right
# kmalloc_pagealloc_oob_right: EXPECTATION FAILED at lib/test_kasan.c:130
Expected fail_data.report_expected == fail_data.report_found, but
fail_data.report_expected == 1
fail_data.report_found == 0
not ok 4 - kmalloc_pagealloc_oob_right
# kmalloc_pagealloc_uaf: EXPECTATION FAILED at lib/test_kasan.c:148
Expected fail_data.report_expected == fail_data.report_found, but
fail_data.report_expected == 1
fail_data.report_found == 0
not ok 5 - kmalloc_pagealloc_uaf

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1921948

Title:
  MTE tags not checked properly for unaligned accesses at EL1

Status in QEMU:
  In Progress

Bug description:
  For kernel memory accesses that span across two memory granules,
  QEMU's MTE implementation only checks the tag of the first granule but
  not of the second one.

  To reproduce this, build the Linux kernel with CONFIG_KASAN_HW_TAGS
  enabled, apply the patch below, and boot the kernel:

  diff --git a/sound/last.c b/sound/last.c
  index f0bb98780e70..04745cb30b74 100644
  --- a/sound/last.c
  +++ b/sound/last.c
  @@ -5,12 +5,18 @@
*/
   
   #include 
  +#include 
   #include 
   
   static int __init alsa_sound_last_init(void)
   {
  struct snd_card *card;
  int idx, ok = 0;
  +
  +   char *ptr = kmalloc(128, GFP_KERNEL);
  +   pr_err("KASAN report should follow:\n");
  +   *(volatile unsigned long *)(ptr + 124);
  +   kfree(ptr);
  
  printk(KERN_INFO "ALSA device list:\n");
  for (idx = 0; idx < SNDRV_CARDS; idx++) {

  KASAN tags the 128 allocated bytes with the same tag as the returned
  pointer. The memory granule that follows the 128 allocated bytes has a
  different tag (with 1/15 probability).

  Expected result: a tag fault is detected and a KASAN report is printed when 
accessing bytes [124, 130).
  Observed result: no tag fault is detected and no KASAN report is printed.

  Here are the flags that I use to run QEMU if they matter:

  qemu-system-aarch64 -s -machine virt,mte=on -cpu max -m 2G -smp 2 -net
  user,host=10.0.2.10,hostfwd=tcp:127.0.0.1:10021-:22 -net nic
  -nographic -kernel ./Image -append "console=ttyAMA0 root=/dev/vda
  earlyprintk=serial" -drive file=./fs.img,format=raw,if=virtio -no-
  shutdown -no-reboot

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1921948/+subscriptions



Re: [PATCH 16/27] tests/acceptance: ARC: Add linux boot testing.

2021-04-07 Thread Richard Henderson

On 4/5/21 7:31 AM, cupertinomira...@gmail.com wrote:

From: Cupertino Miranda 

Just an acceptance test with ARC Linux booting.

Signed-off-by: Cupertino Miranda 
---
  tests/acceptance/boot_linux_console.py | 55 ++
  1 file changed, 55 insertions(+)

diff --git a/tests/acceptance/boot_linux_console.py 
b/tests/acceptance/boot_linux_console.py
index 1ca32ecf25..b5a781b6b4 100644
--- a/tests/acceptance/boot_linux_console.py
+++ b/tests/acceptance/boot_linux_console.py
@@ -138,6 +138,26 @@ def test_mips_malta(self):
  console_pattern = 'Kernel command line: %s' % kernel_command_line
  self.wait_for_console_pattern(console_pattern)
  
+def test_mips_malta(self):

+"""
+:avocado: tags=arch:arc
+"""
+deb_url = ('http://snapshot.debian.org/archive/debian/'
+   '20130217T032700Z/pool/main/l/linux-2.6/'
+   'linux-image-2.6.32-5-4kc-malta_2.6.32-48_mips.deb')
+deb_hash = 'a8cfc28ad8f45f54811fc6cf74fc43ffcfe0ba04'
+deb_path = self.fetch_asset(deb_url, asset_hash=deb_hash)
+kernel_path = self.extract_from_deb(deb_path,
+'/boot/vmlinux-archs')
+
+self.vm.set_console()
+kernel_command_line = self.KERNEL_COMMON_COMMAND_LINE + 'console=ttyS0'
+self.vm.add_args('-kernel', kernel_path,
+ '-append', kernel_command_line)
+self.vm.launch()
+console_pattern = 'Kernel command line: %s' % kernel_command_line
+self.wait_for_console_pattern(console_pattern)
+


Careful with your rebasing.  This is obviously not what you wanted.


r~



Re: [PATCH 15/27] tests/tcg: ARC: Add TCG instruction definition tests

2021-04-07 Thread Richard Henderson

On 4/5/21 7:31 AM, cupertinomira...@gmail.com wrote:

From: Claudiu Zissulescu 

The added tests verify basic instructions execution as well
as more advanced features such as zero overhead loops interrupt
system, memory management unit and memory protection unit.

Signed-off-by: Claudiu Zissulescu 
Signed-off-by: Cupertino Miranda 
---
  tests/Makefile.include|   1 +
  tests/tcg/arc/Makefile| 114 
  tests/tcg/arc/Makefile.softmmu-target |  43 ++
  tests/tcg/arc/Makefile.target | 101 


You shoud *only* need Makefile.softmmu-target.

The bare Makefile is (or should be) unused.
The Makefile.target is for arc-linux-user,
which you do not build.


r~



Re: [PATCH 14/27] arc: Add support for ARCv2

2021-04-07 Thread Richard Henderson

On 4/5/21 7:31 AM, cupertinomira...@gmail.com wrote:

diff --git a/configure b/configure
index 535e6a9269..80d993fac7 100755
--- a/configure
+++ b/configure
@@ -680,6 +680,8 @@ elif check_define __arm__ ; then
cpu="arm"
  elif check_define __aarch64__ ; then
cpu="aarch64"
+elif check_define __arc__ ; then
+  cpu="arc"
  else
cpu=$(uname -m)
  fi


This is host related, not target.


diff --git a/disas.c b/disas.c
index a61f95b580..a10fa41330 100644
--- a/disas.c
+++ b/disas.c
@@ -208,6 +208,8 @@ static void initialize_debug_host(CPUDebug *s)
  s->info.cap_insn_split = 6;
  #elif defined(__hppa__)
  s->info.print_insn = print_insn_hppa;
+#elif defined(__arc__)
+s->info.print_insn = print_insn_arc;
  #endif


As is this.

Until arc can self-host qemu, you don't need them.


--- a/hw/arc/Makefile.objs


Leftover from before meson.build.


r~



Re: [PATCH 13/24] hw/misc/aspeed_xdma: Add AST2600 support

2021-04-07 Thread Eddie James
On Wed, 2021-04-07 at 19:16 +0200, Cédric Le Goater wrote:
> When we introduced support for the AST2600 SoC, the XDMA controller
> was forgotten. It went unnoticed because it's not used under
> emulation.
> But the register layout being different, the reset procedure is bogus
> and this breaks kexec.
> 
> Add a AspeedXDMAClass to take into account the register differences.

Thanks Cedric!

Reviewed-by: Eddie James 

> 
> Cc: Eddie James 
> Signed-off-by: Cédric Le Goater 
> ---
>  include/hw/misc/aspeed_xdma.h |  17 -
>  hw/arm/aspeed_ast2600.c   |   3 +-
>  hw/arm/aspeed_soc.c   |   3 +-
>  hw/misc/aspeed_xdma.c | 124 +++-
> --
>  4 files changed, 121 insertions(+), 26 deletions(-)
> 
> diff --git a/include/hw/misc/aspeed_xdma.h
> b/include/hw/misc/aspeed_xdma.h
> index a2dea96984f3..b1478fd1c681 100644
> --- a/include/hw/misc/aspeed_xdma.h
> +++ b/include/hw/misc/aspeed_xdma.h
> @@ -13,7 +13,10 @@
>  #include "qom/object.h"
>  
>  #define TYPE_ASPEED_XDMA "aspeed.xdma"
> -OBJECT_DECLARE_SIMPLE_TYPE(AspeedXDMAState, ASPEED_XDMA)
> +#define TYPE_ASPEED_2400_XDMA TYPE_ASPEED_XDMA "-ast2400"
> +#define TYPE_ASPEED_2500_XDMA TYPE_ASPEED_XDMA "-ast2500"
> +#define TYPE_ASPEED_2600_XDMA TYPE_ASPEED_XDMA "-ast2600"
> +OBJECT_DECLARE_TYPE(AspeedXDMAState, AspeedXDMAClass, ASPEED_XDMA)
>  
>  #define ASPEED_XDMA_NUM_REGS (ASPEED_XDMA_REG_SIZE /
> sizeof(uint32_t))
>  #define ASPEED_XDMA_REG_SIZE 0x7C
> @@ -28,4 +31,16 @@ struct AspeedXDMAState {
>  uint32_t regs[ASPEED_XDMA_NUM_REGS];
>  };
>  
> +struct AspeedXDMAClass {
> +SysBusDeviceClass parent_class;
> +
> +uint8_t cmdq_endp;
> +uint8_t cmdq_wrp;
> +uint8_t cmdq_rdp;
> +uint8_t intr_ctrl;
> +uint32_t intr_ctrl_mask;
> +uint8_t intr_status;
> +uint32_t intr_complete;
> +};
> +
>  #endif /* ASPEED_XDMA_H */
> diff --git a/hw/arm/aspeed_ast2600.c b/hw/arm/aspeed_ast2600.c
> index e0fbb020c770..c60824bfeecb 100644
> --- a/hw/arm/aspeed_ast2600.c
> +++ b/hw/arm/aspeed_ast2600.c
> @@ -187,7 +187,8 @@ static void aspeed_soc_ast2600_init(Object *obj)
>  object_initialize_child(obj, "mii[*]", >mii[i],
> TYPE_ASPEED_MII);
>  }
>  
> -object_initialize_child(obj, "xdma", >xdma,
> TYPE_ASPEED_XDMA);
> +snprintf(typename, sizeof(typename), TYPE_ASPEED_XDMA "-%s",
> socname);
> +object_initialize_child(obj, "xdma", >xdma, typename);
>  
>  snprintf(typename, sizeof(typename), "aspeed.gpio-%s", socname);
>  object_initialize_child(obj, "gpio", >gpio, typename);
> diff --git a/hw/arm/aspeed_soc.c b/hw/arm/aspeed_soc.c
> index 8ed29113f79f..4a95d27d9d63 100644
> --- a/hw/arm/aspeed_soc.c
> +++ b/hw/arm/aspeed_soc.c
> @@ -199,7 +199,8 @@ static void aspeed_soc_init(Object *obj)
>  TYPE_FTGMAC100);
>  }
>  
> -object_initialize_child(obj, "xdma", >xdma,
> TYPE_ASPEED_XDMA);
> +snprintf(typename, sizeof(typename), TYPE_ASPEED_XDMA "-%s",
> socname);
> +object_initialize_child(obj, "xdma", >xdma, typename);
>  
>  snprintf(typename, sizeof(typename), "aspeed.gpio-%s", socname);
>  object_initialize_child(obj, "gpio", >gpio, typename);
> diff --git a/hw/misc/aspeed_xdma.c b/hw/misc/aspeed_xdma.c
> index 533d237e3ce2..1c21577c98c9 100644
> --- a/hw/misc/aspeed_xdma.c
> +++ b/hw/misc/aspeed_xdma.c
> @@ -30,6 +30,19 @@
>  #define  XDMA_IRQ_ENG_STAT_US_COMP BIT(4)
>  #define  XDMA_IRQ_ENG_STAT_DS_COMP BIT(5)
>  #define  XDMA_IRQ_ENG_STAT_RESET   0xF800
> +
> +#define XDMA_AST2600_BMC_CMDQ_ADDR   0x14
> +#define XDMA_AST2600_BMC_CMDQ_ENDP   0x18
> +#define XDMA_AST2600_BMC_CMDQ_WRP0x1c
> +#define XDMA_AST2600_BMC_CMDQ_RDP0x20
> +#define XDMA_AST2600_IRQ_CTRL0x38
> +#define  XDMA_AST2600_IRQ_CTRL_US_COMPBIT(16)
> +#define  XDMA_AST2600_IRQ_CTRL_DS_COMPBIT(17)
> +#define  XDMA_AST2600_IRQ_CTRL_W_MASK 0x017003FF
> +#define XDMA_AST2600_IRQ_STATUS  0x3c
> +#define  XDMA_AST2600_IRQ_STATUS_US_COMP  BIT(16)
> +#define  XDMA_AST2600_IRQ_STATUS_DS_COMP  BIT(17)
> +
>  #define XDMA_MEM_SIZE  0x1000
>  
>  #define TO_REG(addr) ((addr) / sizeof(uint32_t))
> @@ -52,56 +65,48 @@ static void aspeed_xdma_write(void *opaque,
> hwaddr addr, uint64_t val,
>  unsigned int idx;
>  uint32_t val32 = (uint32_t)val;
>  AspeedXDMAState *xdma = opaque;
> +AspeedXDMAClass *axc = ASPEED_XDMA_GET_CLASS(xdma);
>  
>  if (addr >= ASPEED_XDMA_REG_SIZE) {
>  return;
>  }
>  
> -switch (addr) {
> -case XDMA_BMC_CMDQ_ENDP:
> +if (addr == axc->cmdq_endp) {
>  xdma->regs[TO_REG(addr)] = val32 & XDMA_BMC_CMDQ_W_MASK;
> -break;
> -case XDMA_BMC_CMDQ_WRP:
> +} else if (addr == axc->cmdq_wrp) {
>  idx = TO_REG(addr);
>  xdma->regs[idx] = val32 & XDMA_BMC_CMDQ_W_MASK;
> -xdma->regs[TO_REG(XDMA_BMC_CMDQ_RDP)] = xdma->regs[idx];
> +xdma->regs[TO_REG(axc->cmdq_rdp)] = xdma->regs[idx];
>  
>  

[Bug 1921948] Re: MTE tags not checked properly for unaligned accesses at EL1

2021-04-07 Thread Andrey Konovalov
This warning is caused by "virtualization=on" QEMU option. This is
another QEMU bug AFAIU, see [1] and [2].

[1] 
https://lore.kernel.org/lkml/CAAeHK+wDz8aSLyjq1b=q3+hg9ajxxwyr6+gn_ftttmn5osm...@mail.gmail.com/
[2] https://lore.kernel.org/lkml/20210311123315.GF37303@C02TD0UTHF1T.local/T/

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1921948

Title:
  MTE tags not checked properly for unaligned accesses at EL1

Status in QEMU:
  In Progress

Bug description:
  For kernel memory accesses that span across two memory granules,
  QEMU's MTE implementation only checks the tag of the first granule but
  not of the second one.

  To reproduce this, build the Linux kernel with CONFIG_KASAN_HW_TAGS
  enabled, apply the patch below, and boot the kernel:

  diff --git a/sound/last.c b/sound/last.c
  index f0bb98780e70..04745cb30b74 100644
  --- a/sound/last.c
  +++ b/sound/last.c
  @@ -5,12 +5,18 @@
*/
   
   #include 
  +#include 
   #include 
   
   static int __init alsa_sound_last_init(void)
   {
  struct snd_card *card;
  int idx, ok = 0;
  +
  +   char *ptr = kmalloc(128, GFP_KERNEL);
  +   pr_err("KASAN report should follow:\n");
  +   *(volatile unsigned long *)(ptr + 124);
  +   kfree(ptr);
  
  printk(KERN_INFO "ALSA device list:\n");
  for (idx = 0; idx < SNDRV_CARDS; idx++) {

  KASAN tags the 128 allocated bytes with the same tag as the returned
  pointer. The memory granule that follows the 128 allocated bytes has a
  different tag (with 1/15 probability).

  Expected result: a tag fault is detected and a KASAN report is printed when 
accessing bytes [124, 130).
  Observed result: no tag fault is detected and no KASAN report is printed.

  Here are the flags that I use to run QEMU if they matter:

  qemu-system-aarch64 -s -machine virt,mte=on -cpu max -m 2G -smp 2 -net
  user,host=10.0.2.10,hostfwd=tcp:127.0.0.1:10021-:22 -net nic
  -nographic -kernel ./Image -append "console=ttyAMA0 root=/dev/vda
  earlyprintk=serial" -drive file=./fs.img,format=raw,if=virtio -no-
  shutdown -no-reboot

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1921948/+subscriptions



[PATCH v4 for-6.0 12/12] tests/qtest: add tests for am53c974 device

2021-04-07 Thread Mark Cave-Ayland
Use the autogenerated fuzzer test cases as the basis for a set of am53c974
regression tests.

Signed-off-by: Mark Cave-Ayland 
Tested-by: Alexander Bulekov 
---
 MAINTAINERS |   1 +
 tests/qtest/am53c974-test.c | 216 
 tests/qtest/meson.build |   1 +
 3 files changed, 218 insertions(+)
 create mode 100644 tests/qtest/am53c974-test.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 58f342108e..fa258b7a92 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1772,6 +1772,7 @@ F: include/hw/scsi/*
 F: hw/scsi/*
 F: tests/qtest/virtio-scsi-test.c
 F: tests/qtest/fuzz-virtio-scsi-test.c
+F: tests/qtest/am53c974-test.c
 T: git https://github.com/bonzini/qemu.git scsi-next
 
 SSI
diff --git a/tests/qtest/am53c974-test.c b/tests/qtest/am53c974-test.c
new file mode 100644
index 00..9b06f2cf45
--- /dev/null
+++ b/tests/qtest/am53c974-test.c
@@ -0,0 +1,216 @@
+/*
+ * QTest testcase for am53c974
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later. See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+
+#include "libqos/libqtest.h"
+
+
+static void test_cmdfifo_underflow_ok(void)
+{
+QTestState *s = qtest_init(
+"-device am53c974,id=scsi "
+"-device scsi-hd,drive=disk0 -drive "
+"id=disk0,if=none,file=null-co://,format=raw -nodefaults");
+qtest_outl(s, 0xcf8, 0x80001004);
+qtest_outw(s, 0xcfc, 0x01);
+qtest_outl(s, 0xcf8, 0x8000100e);
+qtest_outl(s, 0xcfc, 0x8a00);
+qtest_outl(s, 0x8a09, 0x4200);
+qtest_outl(s, 0x8a0d, 0x00);
+qtest_outl(s, 0x8a0b, 0x1000);
+qtest_quit(s);
+}
+
+/* Reported as crash_1548bd10e7 */
+static void test_cmdfifo_underflow2_ok(void)
+{
+QTestState *s = qtest_init(
+"-device am53c974,id=scsi -device scsi-hd,drive=disk0 "
+"-drive id=disk0,if=none,file=null-co://,format=raw -nodefaults");
+qtest_outl(s, 0xcf8, 0x80001010);
+qtest_outl(s, 0xcfc, 0xc000);
+qtest_outl(s, 0xcf8, 0x80001004);
+qtest_outw(s, 0xcfc, 0x01);
+qtest_outw(s, 0xc00c, 0x41);
+qtest_outw(s, 0xc00a, 0x00);
+qtest_outl(s, 0xc00a, 0x00);
+qtest_outw(s, 0xc00c, 0x43);
+qtest_outw(s, 0xc00b, 0x00);
+qtest_outw(s, 0xc00b, 0x00);
+qtest_outw(s, 0xc00c, 0x00);
+qtest_outl(s, 0xc00a, 0x00);
+qtest_outw(s, 0xc00a, 0x00);
+qtest_outl(s, 0xc00a, 0x00);
+qtest_outw(s, 0xc00c, 0x00);
+qtest_outl(s, 0xc00a, 0x00);
+qtest_outw(s, 0xc00a, 0x00);
+qtest_outl(s, 0xc00a, 0x00);
+qtest_outw(s, 0xc00c, 0x00);
+qtest_outl(s, 0xc00a, 0x00);
+qtest_outw(s, 0xc00a, 0x00);
+qtest_outl(s, 0xc00a, 0x00);
+qtest_outw(s, 0xc00c, 0x00);
+qtest_outl(s, 0xc00a, 0x00);
+qtest_outl(s, 0xc006, 0x00);
+qtest_outl(s, 0xc00b, 0x00);
+qtest_outw(s, 0xc00b, 0x0800);
+qtest_outw(s, 0xc00b, 0x00);
+qtest_outw(s, 0xc00b, 0x00);
+qtest_outl(s, 0xc006, 0x00);
+qtest_outl(s, 0xc00b, 0x00);
+qtest_outw(s, 0xc00b, 0x0800);
+qtest_outw(s, 0xc00b, 0x00);
+qtest_outw(s, 0xc00b, 0x4100);
+qtest_outw(s, 0xc00a, 0x00);
+qtest_outl(s, 0xc00a, 0x10);
+qtest_outl(s, 0xc00a, 0x00);
+qtest_outw(s, 0xc00c, 0x43);
+qtest_outl(s, 0xc00a, 0x10);
+qtest_outl(s, 0xc00a, 0x10);
+qtest_quit(s);
+}
+
+static void test_cmdfifo_overflow_ok(void)
+{
+QTestState *s = qtest_init(
+"-device am53c974,id=scsi "
+"-device scsi-hd,drive=disk0 -drive "
+"id=disk0,if=none,file=null-co://,format=raw -nodefaults");
+qtest_outl(s, 0xcf8, 0x80001004);
+qtest_outw(s, 0xcfc, 0x01);
+qtest_outl(s, 0xcf8, 0x8000100e);
+qtest_outl(s, 0xcfc, 0x0e00);
+qtest_outl(s, 0xe40, 0x03);
+qtest_outl(s, 0xe0b, 0x4100);
+qtest_outl(s, 0xe0b, 0x9000);
+qtest_quit(s);
+}
+
+/* Reported as crash_530ff2e211 */
+static void test_cmdfifo_overflow2_ok(void)
+{
+QTestState *s = qtest_init(
+"-device am53c974,id=scsi -device scsi-hd,drive=disk0 "
+"-drive id=disk0,if=none,file=null-co://,format=raw -nodefaults");
+qtest_outl(s, 0xcf8, 0x80001010);
+qtest_outl(s, 0xcfc, 0xc000);
+qtest_outl(s, 0xcf8, 0x80001004);
+qtest_outw(s, 0xcfc, 0x01);
+qtest_outl(s, 0xc00b, 0x4100);
+qtest_outw(s, 0xc00b, 0xc200);
+qtest_outl(s, 0xc03f, 0x0300);
+qtest_quit(s);
+}
+
+/* Reported as crash_0900379669 */
+static void test_fifo_pop_buf(void)
+{
+QTestState *s = qtest_init(
+"-device am53c974,id=scsi -device scsi-hd,drive=disk0 "
+"-drive id=disk0,if=none,file=null-co://,format=raw -nodefaults");
+qtest_outl(s, 0xcf8, 0x80001010);
+qtest_outl(s, 0xcfc, 0xc000);
+qtest_outl(s, 0xcf8, 0x80001004);
+qtest_outw(s, 0xcfc, 0x01);
+qtest_outb(s, 0xc000, 0x4);
+qtest_outb(s, 0xc008, 0xa0);
+qtest_outl(s, 0xc03f, 0x0300);
+qtest_outl(s, 0xc00b, 0xc300);
+qtest_outw(s, 0xc00b, 0x9000);
+qtest_outl(s, 0xc00b, 

[PATCH v4 for-6.0 11/12] esp: ensure that do_cmd is set to zero before submitting an ESP select command

2021-04-07 Thread Mark Cave-Ayland
When a CDB has been received and is about to be submitted to the SCSI layer
via one of the ESP select commands, ensure that do_cmd is set to zero before
executing the command.

Otherwise a guest executing 2 valid CDBs in quick sequence can invoke the SCSI
.transfer_data callback again before do_cmd is set to zero by the callback
function triggering an assert at the start of esp_transfer_data().

Signed-off-by: Mark Cave-Ayland 
---
 hw/scsi/esp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index 3b9037e4f4..326643aa39 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -357,6 +357,7 @@ static void handle_satn(ESPState *s)
 cmdlen = get_cmd(s, ESP_CMDFIFO_SZ);
 if (cmdlen > 0) {
 s->cmdfifo_cdb_offset = 1;
+s->do_cmd = 0;
 do_cmd(s);
 } else if (cmdlen == 0) {
 s->do_cmd = 1;
@@ -390,6 +391,7 @@ static void handle_s_without_atn(ESPState *s)
 cmdlen = get_cmd(s, ESP_CMDFIFO_SZ);
 if (cmdlen > 0) {
 s->cmdfifo_cdb_offset = 0;
+s->do_cmd = 0;
 do_busid_cmd(s, 0);
 } else if (cmdlen == 0) {
 s->do_cmd = 1;
-- 
2.20.1




[PATCH v4 for-6.0 10/12] esp: don't reset async_len directly in esp_select() if cancelling request

2021-04-07 Thread Mark Cave-Ayland
Instead let the SCSI layer invoke the .cancel callback itself to cancel and
reset the request state.

Signed-off-by: Mark Cave-Ayland 
Tested-by: Alexander Bulekov 
---
 hw/scsi/esp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index 782c6ee357..3b9037e4f4 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -95,6 +95,7 @@ void esp_request_cancelled(SCSIRequest *req)
 scsi_req_unref(s->current_req);
 s->current_req = NULL;
 s->current_dev = NULL;
+s->async_len = 0;
 }
 }
 
@@ -206,7 +207,6 @@ static int esp_select(ESPState *s)
 if (s->current_req) {
 /* Started a new command before the old one finished.  Cancel it.  */
 scsi_req_cancel(s->current_req);
-s->async_len = 0;
 }
 
 s->current_dev = scsi_device_find(>bus, 0, target, 0);
-- 
2.20.1




[PATCH v4 for-6.0 08/12] esp: don't overflow cmdfifo in get_cmd()

2021-04-07 Thread Mark Cave-Ayland
If the guest tries to read a CDB using DMA and cmdfifo is not empty then it is
possible to overflow cmdfifo.

Since this can only occur by issuing deliberately incorrect instruction
sequences, ensure that the maximum length of the CDB transferred to cmdfifo is
limited to the available free space within cmdfifo.

Buglink: https://bugs.launchpad.net/qemu/+bug/1909247
Signed-off-by: Mark Cave-Ayland 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Alexander Bulekov 
---
 hw/scsi/esp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index 7f49522e1d..53cc569e8a 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -243,6 +243,7 @@ static uint32_t get_cmd(ESPState *s, uint32_t maxlen)
 }
 if (s->dma_memory_read) {
 s->dma_memory_read(s->dma_opaque, buf, dmalen);
+dmalen = MIN(fifo8_num_free(>cmdfifo), dmalen);
 fifo8_push_all(>cmdfifo, buf, dmalen);
 } else {
 if (esp_select(s) < 0) {
@@ -262,6 +263,7 @@ static uint32_t get_cmd(ESPState *s, uint32_t maxlen)
 if (n >= 3) {
 buf[0] = buf[2] >> 5;
 }
+n = MIN(fifo8_num_free(>cmdfifo), n);
 fifo8_push_all(>cmdfifo, buf, n);
 }
 trace_esp_get_cmd(dmalen, target);
-- 
2.20.1




[PATCH v4 for-6.0 03/12] esp: consolidate esp_cmdfifo_push() into esp_fifo_push()

2021-04-07 Thread Mark Cave-Ayland
Each FIFO currently has its own push functions with the only difference being
the capacity check. The original reason for this was that the fifo8
implementation doesn't have a formal API for retrieving the FIFO capacity,
however there are multiple examples within QEMU where the capacity field is
accessed directly.

Change esp_fifo_push() to access the FIFO capacity directly and then consolidate
esp_cmdfifo_push() into esp_fifo_push().

Signed-off-by: Mark Cave-Ayland 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Alexander Bulekov 
---
 hw/scsi/esp.c | 27 ---
 1 file changed, 8 insertions(+), 19 deletions(-)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index 26fe1dcb9d..16aaf8be93 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -98,16 +98,15 @@ void esp_request_cancelled(SCSIRequest *req)
 }
 }
 
-static void esp_fifo_push(ESPState *s, uint8_t val)
+static void esp_fifo_push(Fifo8 *fifo, uint8_t val)
 {
-if (fifo8_num_used(>fifo) == ESP_FIFO_SZ) {
+if (fifo8_num_used(fifo) == fifo->capacity) {
 trace_esp_error_fifo_overrun();
 return;
 }
 
-fifo8_push(>fifo, val);
+fifo8_push(fifo, val);
 }
-
 static uint8_t esp_fifo_pop(ESPState *s)
 {
 if (fifo8_is_empty(>fifo)) {
@@ -117,16 +116,6 @@ static uint8_t esp_fifo_pop(ESPState *s)
 return fifo8_pop(>fifo);
 }
 
-static void esp_cmdfifo_push(ESPState *s, uint8_t val)
-{
-if (fifo8_num_used(>cmdfifo) == ESP_CMDFIFO_SZ) {
-trace_esp_error_fifo_overrun();
-return;
-}
-
-fifo8_push(>cmdfifo, val);
-}
-
 static uint8_t esp_cmdfifo_pop(ESPState *s)
 {
 if (fifo8_is_empty(>cmdfifo)) {
@@ -187,9 +176,9 @@ static void esp_pdma_write(ESPState *s, uint8_t val)
 }
 
 if (s->do_cmd) {
-esp_cmdfifo_push(s, val);
+esp_fifo_push(>cmdfifo, val);
 } else {
-esp_fifo_push(s, val);
+esp_fifo_push(>fifo, val);
 }
 
 dmalen--;
@@ -645,7 +634,7 @@ static void esp_do_dma(ESPState *s)
  */
 if (len < esp_get_tc(s) && esp_get_tc(s) <= ESP_FIFO_SZ) {
 while (fifo8_num_used(>fifo) < ESP_FIFO_SZ) {
-esp_fifo_push(s, 0);
+esp_fifo_push(>fifo, 0);
 len++;
 }
 }
@@ -947,9 +936,9 @@ void esp_reg_write(ESPState *s, uint32_t saddr, uint64_t 
val)
 break;
 case ESP_FIFO:
 if (s->do_cmd) {
-esp_cmdfifo_push(s, val);
+esp_fifo_push(>cmdfifo, val);
 } else {
-esp_fifo_push(s, val);
+esp_fifo_push(>fifo, val);
 }
 
 /* Non-DMA transfers raise an interrupt after every byte */
-- 
2.20.1




Re: [PATCH 0/3] tests/acceptance: Handle tests with "cpu" tag

2021-04-07 Thread Eduardo Habkost
On Tue, Mar 23, 2021 at 05:01:09PM -0400, John Snow wrote:
> On 3/17/21 3:16 PM, Wainer dos Santos Moschetta wrote:
> > Added John and Eduardo,
> > 
> > On 3/9/21 3:52 PM, Cleber Rosa wrote:
> > > On Wed, Feb 24, 2021 at 06:26:51PM -0300, Wainer dos Santos
> > > Moschetta wrote:
> > > > Currently the acceptance tests tagged with "machine" have the "-M TYPE"
> > > > automatically added to the list of arguments of the QEMUMachine object.
> > > > In other words, that option is passed to the launched QEMU. On this
> > > > series it is implemented the same feature but instead for tests marked
> > > > with "cpu".
> > > > 
> > > Good!
> > > 
> > > > There is a caveat, however, in case the test needs additional
> > > > arguments to
> > > > the CPU type they cannot be passed via tag, because the tags
> > > > parser split
> > > > values by comma. For example, in
> > > > tests/acceptance/x86_cpu_model_versions.py,
> > > > there are cases where:
> > > > 
> > > >    * -cpu is set to
> > > > "Cascadelake-Server,x-force-features=on,check=off,enforce=off"
> > > >    * if it was tagged like
> > > > "cpu:Cascadelake-Server,x-force-features=on,check=off,enforce=off"
> > > >  then the parser would break it into 4 tags
> > > > ("cpu:Cascadelake-Server",
> > > >  "x-force-features=on", "check=off", "enforce=off")
> > > >    * resulting on "-cpu Cascadelake-Server" and the remaining
> > > > arguments are ignored.
> > > > 
> > > > For the example above, one should tag it (or not at all) as
> > > > "cpu:Cascadelake-Server"
> > > > AND self.vm.add_args('-cpu',
> > > > "Cascadelake-Server,x-force-features=on,check=off,enforce=off"),
> > > > and that results on something like:
> > > > 
> > > >    "qemu-system-x86_64 (...) -cpu Cascadelake-Server -cpu
> > > > Cascadelake-Server,x-force-features=on,check=off,enforce=off".
> > > > 
> > > There are clearly two problems here:
> > > 
> > > 1) the tag is meant to be succinct, so that it can be used by users
> > >     selecting which tests to run.  At the same time, it's a waste
> > >     to throw away the other information or keep it duplicate or
> > >     incosistent.
> > > 
> > > 2) QEMUMachine doesn't keep track of command line arguments
> > >     (add_args() makes it pretty clear what's doing).  But, on this type
> > >     of use case, a "set_args()" is desirable, in which case it would
> > >     overwrite the existing arguments for a given command line option.
> > 
> > I like the idea of a "set_args()" to QEMUMachine as you describe above
> > but it needs further discussion because I can see at least one corner
> > case; for example, one can set the machine type as either -machine or
> > -M, then what key it should be searched-and-replaced (if any) on the
> > list of args?
> > 
> > Unlike your suggestion, I thought on implement the method to deal with a
> > single argument at time, as:
> > 
> >      def set_arg(self, arg: Union[str, list], value: str) -> None:
> >      """
> >      Set the value of an argument from the list of extra arguments
> > to be
> >      given to the QEMU binary. If the argument does not exist then
> > it is
> >      added to the list.
> > 
> >      If the ``arg`` parameter is a list then it will search and
> > replace all
> >      occurencies (if any). Otherwise a new argument is added and it is
> >      used the first value of the ``arg`` list.
> >      """
> >      pass
> > 
> > Does it sound good to you?
> > 
> > Thanks!
> > 
> > Wainer
> > 
> 
> A little hokey, but I suppose that's true of our CLI interface in general.
> 
> I'd prefer not get into the business of building a "config" inside the
> python module if we can help it right now, but if "setting" individual args
> is something you truly need to do, I won't stand in the way.
> 
> Do what's least-gross.

I don't have any specific suggestions on how the API should look
like, but I'm having trouble understanding the documentation
above.

I don't know what "it will search and replace all occurrences"
means.  Occurrences of what?

I don't understand what "it is used the first value of the `arg`
list" means, either.  I understand you are going to use the first
value of the list, but you don't say what you are going to do
with it.

-- 
Eduardo




[PATCH v4 for-6.0 02/12] esp: rework write_response() to avoid using the FIFO for DMA transactions

2021-04-07 Thread Mark Cave-Ayland
The code for write_response() has always used the FIFO to store the data for
the status/message in phases, even for DMA transactions. Switch to using a
separate buffer that can be used directly for DMA transactions and restrict
the FIFO use to the non-DMA case.

Signed-off-by: Mark Cave-Ayland 
Tested-by: Alexander Bulekov 
---
 hw/scsi/esp.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index bafea0d4e6..26fe1dcb9d 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -445,18 +445,16 @@ static void write_response_pdma_cb(ESPState *s)
 
 static void write_response(ESPState *s)
 {
-uint32_t n;
+uint8_t buf[2];
 
 trace_esp_write_response(s->status);
 
-fifo8_reset(>fifo);
-esp_fifo_push(s, s->status);
-esp_fifo_push(s, 0);
+buf[0] = s->status;
+buf[1] = 0;
 
 if (s->dma) {
 if (s->dma_memory_write) {
-s->dma_memory_write(s->dma_opaque,
-(uint8_t *)fifo8_pop_buf(>fifo, 2, ), 2);
+s->dma_memory_write(s->dma_opaque, buf, 2);
 s->rregs[ESP_RSTAT] = STAT_TC | STAT_ST;
 s->rregs[ESP_RINTR] |= INTR_BS | INTR_FC;
 s->rregs[ESP_RSEQ] = SEQ_CD;
@@ -466,7 +464,8 @@ static void write_response(ESPState *s)
 return;
 }
 } else {
-s->ti_size = 2;
+fifo8_reset(>fifo);
+fifo8_push_all(>fifo, buf, 2);
 s->rregs[ESP_RFLAGS] = 2;
 }
 esp_raise_irq(s);
-- 
2.20.1




[PATCH v4 for-6.0 09/12] esp: don't overflow cmdfifo if TC is larger than the cmdfifo size

2021-04-07 Thread Mark Cave-Ayland
If a guest transfers the message out/command phase data using DMA with a TC
that is larger than the cmdfifo size then the cmdfifo overflows triggering
an assert. Limit the size of the transfer to the free space available in
cmdfifo.

Buglink: https://bugs.launchpad.net/qemu/+bug/1919036
Signed-off-by: Mark Cave-Ayland 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Alexander Bulekov 
---
 hw/scsi/esp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index 53cc569e8a..782c6ee357 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -578,6 +578,7 @@ static void esp_do_dma(ESPState *s)
 cmdlen = fifo8_num_used(>cmdfifo);
 trace_esp_do_dma(cmdlen, len);
 if (s->dma_memory_read) {
+len = MIN(len, fifo8_num_free(>cmdfifo));
 s->dma_memory_read(s->dma_opaque, buf, len);
 fifo8_push_all(>cmdfifo, buf, len);
 } else {
-- 
2.20.1




[PATCH v4 for-6.0 07/12] esp: don't underflow cmdfifo in do_cmd()

2021-04-07 Thread Mark Cave-Ayland
If the guest tries to execute a CDB when cmdfifo is not empty before the start
of the message out phase then clearing the message out phase data will cause
cmdfifo to underflow due to cmdfifo_cdb_offset being larger than the amount of
data within.

Since this can only occur by issuing deliberately incorrect instruction
sequences, ensure that the maximum length of esp_fifo_pop_buf() is limited to
the size of the data within cmdfifo.

Buglink: https://bugs.launchpad.net/qemu/+bug/1909247
Signed-off-by: Mark Cave-Ayland 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Alexander Bulekov 
---
 hw/scsi/esp.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index 4decbbfc29..7f49522e1d 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -319,13 +319,15 @@ static void do_busid_cmd(ESPState *s, uint8_t busid)
 
 static void do_cmd(ESPState *s)
 {
-uint8_t busid = fifo8_pop(>cmdfifo);
+uint8_t busid = esp_fifo_pop(>cmdfifo);
+int len;
 
 s->cmdfifo_cdb_offset--;
 
 /* Ignore extended messages for now */
 if (s->cmdfifo_cdb_offset) {
-esp_fifo_pop_buf(>cmdfifo, NULL, s->cmdfifo_cdb_offset);
+len = MIN(s->cmdfifo_cdb_offset, fifo8_num_used(>cmdfifo));
+esp_fifo_pop_buf(>cmdfifo, NULL, len);
 s->cmdfifo_cdb_offset = 0;
 }
 
-- 
2.20.1




[PATCH v4 for-6.0 01/12] esp: always check current_req is not NULL before use in DMA callbacks

2021-04-07 Thread Mark Cave-Ayland
After issuing a SCSI command the SCSI layer can call the SCSIBusInfo .cancel
callback which resets both current_req and current_dev to NULL. If any data
is left in the transfer buffer (async_len != 0) then the next TI (Transfer
Information) command will attempt to reference the NULL pointer causing a
segfault.

Buglink: https://bugs.launchpad.net/qemu/+bug/1910723
Buglink: https://bugs.launchpad.net/qemu/+bug/1909247
Signed-off-by: Mark Cave-Ayland 
Tested-by: Alexander Bulekov 
---
 hw/scsi/esp.c | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index 507ab363bc..bafea0d4e6 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -496,6 +496,10 @@ static void do_dma_pdma_cb(ESPState *s)
 return;
 }
 
+if (!s->current_req) {
+return;
+}
+
 if (to_device) {
 /* Copy FIFO data to device */
 len = MIN(s->async_len, ESP_FIFO_SZ);
@@ -527,11 +531,9 @@ static void do_dma_pdma_cb(ESPState *s)
 return;
 } else {
 if (s->async_len == 0) {
-if (s->current_req) {
-/* Defer until the scsi layer has completed */
-scsi_req_continue(s->current_req);
-s->data_in_ready = false;
-}
+/* Defer until the scsi layer has completed */
+scsi_req_continue(s->current_req);
+s->data_in_ready = false;
 return;
 }
 
@@ -604,6 +606,9 @@ static void esp_do_dma(ESPState *s)
 }
 return;
 }
+if (!s->current_req) {
+return;
+}
 if (s->async_len == 0) {
 /* Defer until data is available.  */
 return;
@@ -713,6 +718,10 @@ static void esp_do_nodma(ESPState *s)
 return;
 }
 
+if (!s->current_req) {
+return;
+}
+
 if (s->async_len == 0) {
 /* Defer until data is available.  */
 return;
-- 
2.20.1




[PATCH v4 for-6.0 06/12] esp: ensure cmdfifo is not empty and current_dev is non-NULL

2021-04-07 Thread Mark Cave-Ayland
When about to execute a SCSI command, ensure that cmdfifo is not empty and
current_dev is non-NULL. This can happen if the guest tries to execute a TI
(Transfer Information) command without issuing one of the select commands
first.

Buglink: https://bugs.launchpad.net/qemu/+bug/1910723
Buglink: https://bugs.launchpad.net/qemu/+bug/1909247
Signed-off-by: Mark Cave-Ayland 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Alexander Bulekov 
---
 hw/scsi/esp.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index 1aa2caf57d..4decbbfc29 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -284,6 +284,9 @@ static void do_busid_cmd(ESPState *s, uint8_t busid)
 trace_esp_do_busid_cmd(busid);
 lun = busid & 7;
 cmdlen = fifo8_num_used(>cmdfifo);
+if (!cmdlen || !s->current_dev) {
+return;
+}
 esp_fifo_pop_buf(>cmdfifo, buf, cmdlen);
 
 current_lun = scsi_device_find(>bus, 0, s->current_dev->id, lun);
-- 
2.20.1




[PATCH v4 for-6.0 05/12] esp: introduce esp_fifo_pop_buf() and use it instead of fifo8_pop_buf()

2021-04-07 Thread Mark Cave-Ayland
The const pointer returned by fifo8_pop_buf() lies directly within the array 
used
to model the FIFO. Building with address sanitizers enabled shows that if the
caller expects a minimum number of bytes present then if the FIFO is nearly 
full,
the caller may unexpectedly access past the end of the array.

Introduce esp_fifo_pop_buf() which takes a destination buffer and performs a
memcpy() in it to guarantee that the caller cannot overwrite the FIFO array and
update all callers to use it. Similarly add underflow protection similar to
esp_fifo_push() and esp_fifo_pop() so that instead of triggering an assert()
the operation becomes a no-op.

Buglink: https://bugs.launchpad.net/qemu/+bug/1909247
Signed-off-by: Mark Cave-Ayland 
Tested-by: Alexander Bulekov 
---
 hw/scsi/esp.c | 40 
 1 file changed, 28 insertions(+), 12 deletions(-)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index ff8fa73de9..1aa2caf57d 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -117,6 +117,23 @@ static uint8_t esp_fifo_pop(Fifo8 *fifo)
 return fifo8_pop(fifo);
 }
 
+static uint32_t esp_fifo_pop_buf(Fifo8 *fifo, uint8_t *dest, int maxlen)
+{
+const uint8_t *buf;
+uint32_t n;
+
+if (maxlen == 0) {
+return 0;
+}
+
+buf = fifo8_pop_buf(fifo, maxlen, );
+if (dest) {
+memcpy(dest, buf, n);
+}
+
+return n;
+}
+
 static uint32_t esp_get_tc(ESPState *s)
 {
 uint32_t dmalen;
@@ -241,11 +258,11 @@ static uint32_t get_cmd(ESPState *s, uint32_t maxlen)
 if (dmalen == 0) {
 return 0;
 }
-memcpy(buf, fifo8_pop_buf(>fifo, dmalen, ), dmalen);
-if (dmalen >= 3) {
+n = esp_fifo_pop_buf(>fifo, buf, dmalen);
+if (n >= 3) {
 buf[0] = buf[2] >> 5;
 }
-fifo8_push_all(>cmdfifo, buf, dmalen);
+fifo8_push_all(>cmdfifo, buf, n);
 }
 trace_esp_get_cmd(dmalen, target);
 
@@ -258,16 +275,16 @@ static uint32_t get_cmd(ESPState *s, uint32_t maxlen)
 
 static void do_busid_cmd(ESPState *s, uint8_t busid)
 {
-uint32_t n, cmdlen;
+uint32_t cmdlen;
 int32_t datalen;
 int lun;
 SCSIDevice *current_lun;
-uint8_t *buf;
+uint8_t buf[ESP_CMDFIFO_SZ];
 
 trace_esp_do_busid_cmd(busid);
 lun = busid & 7;
 cmdlen = fifo8_num_used(>cmdfifo);
-buf = (uint8_t *)fifo8_pop_buf(>cmdfifo, cmdlen, );
+esp_fifo_pop_buf(>cmdfifo, buf, cmdlen);
 
 current_lun = scsi_device_find(>bus, 0, s->current_dev->id, lun);
 s->current_req = scsi_req_new(current_lun, 0, lun, buf, s);
@@ -300,13 +317,12 @@ static void do_busid_cmd(ESPState *s, uint8_t busid)
 static void do_cmd(ESPState *s)
 {
 uint8_t busid = fifo8_pop(>cmdfifo);
-uint32_t n;
 
 s->cmdfifo_cdb_offset--;
 
 /* Ignore extended messages for now */
 if (s->cmdfifo_cdb_offset) {
-fifo8_pop_buf(>cmdfifo, s->cmdfifo_cdb_offset, );
+esp_fifo_pop_buf(>cmdfifo, NULL, s->cmdfifo_cdb_offset);
 s->cmdfifo_cdb_offset = 0;
 }
 
@@ -484,7 +500,7 @@ static void do_dma_pdma_cb(ESPState *s)
 /* Copy FIFO data to device */
 len = MIN(s->async_len, ESP_FIFO_SZ);
 len = MIN(len, fifo8_num_used(>fifo));
-memcpy(s->async_buf, fifo8_pop_buf(>fifo, len, ), len);
+n = esp_fifo_pop_buf(>fifo, s->async_buf, len);
 s->async_buf += n;
 s->async_len -= n;
 s->ti_size += n;
@@ -492,7 +508,7 @@ static void do_dma_pdma_cb(ESPState *s)
 if (n < len) {
 /* Unaligned accesses can cause FIFO wraparound */
 len = len - n;
-memcpy(s->async_buf, fifo8_pop_buf(>fifo, len, ), len);
+n = esp_fifo_pop_buf(>fifo, s->async_buf, len);
 s->async_buf += n;
 s->async_len -= n;
 s->ti_size += n;
@@ -668,7 +684,7 @@ static void esp_do_dma(ESPState *s)
 static void esp_do_nodma(ESPState *s)
 {
 int to_device = ((s->rregs[ESP_RSTAT] & 7) == STAT_DO);
-uint32_t cmdlen, n;
+uint32_t cmdlen;
 int len;
 
 if (s->do_cmd) {
@@ -709,7 +725,7 @@ static void esp_do_nodma(ESPState *s)
 
 if (to_device) {
 len = MIN(fifo8_num_used(>fifo), ESP_FIFO_SZ);
-memcpy(s->async_buf, fifo8_pop_buf(>fifo, len, ), len);
+esp_fifo_pop_buf(>fifo, s->async_buf, len);
 s->async_buf += len;
 s->async_len -= len;
 s->ti_size += len;
-- 
2.20.1




[PATCH v4 for-6.0 04/12] esp: consolidate esp_cmdfifo_pop() into esp_fifo_pop()

2021-04-07 Thread Mark Cave-Ayland
Each FIFO currently has its own pop functions with the only difference being
the capacity check. The original reason for this was that the fifo8
implementation doesn't have a formal API for retrieving the FIFO capacity,
however there are multiple examples within QEMU where the capacity field is
accessed directly.

Change esp_fifo_pop() to access the FIFO capacity directly and then consolidate
esp_cmdfifo_pop() into esp_fifo_pop().

Signed-off-by: Mark Cave-Ayland 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Alexander Bulekov 
---
 hw/scsi/esp.c | 20 ++--
 1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index 16aaf8be93..ff8fa73de9 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -107,22 +107,14 @@ static void esp_fifo_push(Fifo8 *fifo, uint8_t val)
 
 fifo8_push(fifo, val);
 }
-static uint8_t esp_fifo_pop(ESPState *s)
-{
-if (fifo8_is_empty(>fifo)) {
-return 0;
-}
-
-return fifo8_pop(>fifo);
-}
 
-static uint8_t esp_cmdfifo_pop(ESPState *s)
+static uint8_t esp_fifo_pop(Fifo8 *fifo)
 {
-if (fifo8_is_empty(>cmdfifo)) {
+if (fifo8_is_empty(fifo)) {
 return 0;
 }
 
-return fifo8_pop(>cmdfifo);
+return fifo8_pop(fifo);
 }
 
 static uint32_t esp_get_tc(ESPState *s)
@@ -159,9 +151,9 @@ static uint8_t esp_pdma_read(ESPState *s)
 uint8_t val;
 
 if (s->do_cmd) {
-val = esp_cmdfifo_pop(s);
+val = esp_fifo_pop(>cmdfifo);
 } else {
-val = esp_fifo_pop(s);
+val = esp_fifo_pop(>fifo);
 }
 
 return val;
@@ -887,7 +879,7 @@ uint64_t esp_reg_read(ESPState *s, uint32_t saddr)
 qemu_log_mask(LOG_UNIMP, "esp: PIO data read not implemented\n");
 s->rregs[ESP_FIFO] = 0;
 } else {
-s->rregs[ESP_FIFO] = esp_fifo_pop(s);
+s->rregs[ESP_FIFO] = esp_fifo_pop(>fifo);
 }
 val = s->rregs[ESP_FIFO];
 break;
-- 
2.20.1




[PATCH v4 for-6.0 00/12] esp: fix asserts/segfaults discovered by fuzzer

2021-04-07 Thread Mark Cave-Ayland
Recently there have been a number of issues raised on Launchpad as a result of
fuzzing the am53c974 (ESP) device. I spent some time over the past couple of
days checking to see if anything had improved since my last patchset: from
what I can tell the issues are still present, but the cmdfifo related failures
now assert rather than corrupting memory.

This patchset applied to master passes my local tests using the qtest fuzz test
cases added by Alexander for the following Launchpad bugs:

  https://bugs.launchpad.net/qemu/+bug/1919035
  https://bugs.launchpad.net/qemu/+bug/1919036
  https://bugs.launchpad.net/qemu/+bug/1910723
  https://bugs.launchpad.net/qemu/+bug/1909247
  
I'm posting this now just before soft freeze since I see that some of the issues
have recently been allocated CVEs and so it could be argued that even though
they have existed for some time, it is worth fixing them for 6.0.

Signed-off-by: Mark Cave-Ayland 

v4:
- Rebase onto master
- Add R-B tags from Phil
- Fix accidental line space removal in patch 3 discovered by Phil
- Change spelling of sanitiser -> sanitizer in patch 5 as suggested by Phil
- Fix up cmdfifo length checks in patch 8
- Add T-B tags from Alex
- Add patch 11 to handle additional assert discovered by Alex during fuzzing

v3:
- Rebase onto master
- Rearrange patch ordering (move patch 5 to the front) to help reduce cross-talk
  between the regression tests
- Introduce patch 2 to remove unnecessary FIFO usage
- Introduce patches 3-4 to consolidate esp_fifo_pop()/esp_fifo_push() wrapper
  functions to avoid having to introduce 2 variants of esp_fifo_pop_buf()
- Introduce esp_fifo_pop_buf() in patch 5 to prevent callers from overflowing
  the array used to model the FIFO
- Introduce patch 10 to clarify cancellation logic should all occur in the 
.cancel
  SCSI callback rather than at the site of the caller
- Add extra qtests in patch 11 to cover addition test cases provided on LP

v2:
- Add Alexander's R-B tag for patch 2 and Phil's R-B for patch 3
- Add patch 4 for additional testcase provided in Alexander's patch 1 comment
- Move current_req NULL checks forward in DMA functions (fixes ASAN bug reported
  at https://bugs.launchpad.net/qemu/+bug/1909247/comments/6) in patch 3
- Add qtest for am53c974 containing a basic set of regression tests using the
  automatic test cases generated by the fuzzer as requested by Paolo


Mark Cave-Ayland (12):
  esp: always check current_req is not NULL before use in DMA callbacks
  esp: rework write_response() to avoid using the FIFO for DMA
transactions
  esp: consolidate esp_cmdfifo_push() into esp_fifo_push()
  esp: consolidate esp_cmdfifo_pop() into esp_fifo_pop()
  esp: introduce esp_fifo_pop_buf() and use it instead of
fifo8_pop_buf()
  esp: ensure cmdfifo is not empty and current_dev is non-NULL
  esp: don't underflow cmdfifo in do_cmd()
  esp: don't overflow cmdfifo in get_cmd()
  esp: don't overflow cmdfifo if TC is larger than the cmdfifo size
  esp: don't reset async_len directly in esp_select() if cancelling
request
  esp: ensure that do_cmd is set to zero before submitting an ESP select
command
  tests/qtest: add tests for am53c974 device

 MAINTAINERS |   1 +
 hw/scsi/esp.c   | 119 +++-
 tests/qtest/am53c974-test.c | 216 
 tests/qtest/meson.build |   1 +
 4 files changed, 285 insertions(+), 52 deletions(-)
 create mode 100644 tests/qtest/am53c974-test.c

-- 
2.20.1




Re: [PATCH v4 03/12] target/arm: Fix mte_checkN

2021-04-07 Thread Richard Henderson

On 4/7/21 11:39 AM, Alex Bennée wrote:


Richard Henderson  writes:


We were incorrectly assuming that only the first byte of an MTE access
is checked against the tags.  But per the ARM, unaligned accesses are
pre-decomposed into single-byte accesses.  So by the time we reach the
actual MTE check in the ARM pseudocode, all accesses are aligned.

Therefore, the first failure is always either the first byte of the
access, or the first byte of the granule.

In addition, some of the arithmetic is off for last-first -> count.
This does not become directly visible until a later patch that passes
single bytes into this function, so ptr == ptr_last.

Buglink: https://bugs.launchpad.net/bugs/1921948


Minor note: you can Cc: Bug 1921948 <1921...@bugs.launchpad.net> to
automatically copy patches to the appropriate bugs which is useful if
you don't have the Cc for the reporter.

Anyway I'm trying to get the kasas unit tests running as a way of
testing this (and maybe expanding with a version of Andrey's test). I
suspect this may be a PEBCAC issue but I built an MTE enabled kernel
with:

   CONFIG_HAVE_ARCH_KASAN=y
   CONFIG_HAVE_ARCH_KASAN_SW_TAGS=y
   CONFIG_HAVE_ARCH_KASAN_HW_TAGS=y
   CONFIG_CC_HAS_KASAN_GENERIC=y
   CONFIG_KASAN=y
   # CONFIG_KASAN_GENERIC is not set
   CONFIG_KASAN_HW_TAGS=y
   CONFIG_KASAN_STACK=1
   CONFIG_KASAN_KUNIT_TEST=m
   CONFIG_TEST_KASAN_MODULE=m


I built it all in:

CONFIG_HAVE_ARCH_KASAN=y
CONFIG_HAVE_ARCH_KASAN_SW_TAGS=y
CONFIG_HAVE_ARCH_KASAN_HW_TAGS=y
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_KASAN=y
# CONFIG_KASAN_GENERIC is not set
CONFIG_KASAN_HW_TAGS=y
CONFIG_KASAN_KUNIT_TEST=y

Then I just boot the raw kernel (no filesystem or anything):

./qemu-system-aarch64 -M virt,mte=on -cpu max -nographic \
  -kernel ~/linux/bld-aa/arch/arm64/boot/Image

There's a ton of output, but at the end I see

[   11.901185] ok 48 - match_all_mem_tag
[   11.901422] ok 1 - kasan

just before the "VFS: Cannot open root device" panic.
Which has done all we wanted, so, yay.


r~



Re: [PATCH 08/24] tests/qtest: Add test for Aspeed HACE

2021-04-07 Thread Klaus Heinrich Kiwi




On 4/7/2021 2:16 PM, Cédric Le Goater wrote:

From: Joel Stanley 

This adds a test for the Aspeed Hash and Crypto (HACE) engine. It tests
the currently implemented behavior of the hash functionality.

The tests are similar, but are cut/pasted instead of broken out into a
common function so the assert machinery produces useful output when a
test fails.

Signed-off-by: Joel Stanley 
Reviewed-by: Cédric Le Goater 
Acked-by: Thomas Huth 
[ clg: - qtest_quit() fix ]
Signed-off-by: Cédric Le Goater 
Message-Id: <20210324070955.125941-4-j...@jms.id.au>
Signed-off-by: Cédric Le Goater 


Reviewed-by: Klaus Heinrich Kiwi 


---
  tests/qtest/aspeed_hace-test.c | 321 +
  MAINTAINERS|   1 +
  tests/qtest/meson.build|   3 +
  3 files changed, 325 insertions(+)
  create mode 100644 tests/qtest/aspeed_hace-test.c

diff --git a/tests/qtest/aspeed_hace-test.c b/tests/qtest/aspeed_hace-test.c
new file mode 100644
index ..675774e96eb9
--- /dev/null
+++ b/tests/qtest/aspeed_hace-test.c
@@ -0,0 +1,321 @@
+/*
+ * QTest testcase for the ASPEED Hash and Crypto Engine
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright 2021 IBM Corp.
+ */
+
+#include "qemu/osdep.h"
+
+#include "libqos/libqtest.h"
+#include "qemu-common.h"
+#include "qemu/bitops.h"
+
+#define HACE_CMD 0x10
+#define  HACE_SHA_BE_EN  BIT(3)
+#define  HACE_MD5_LE_EN  BIT(2)
+#define  HACE_ALGO_MD5   0
+#define  HACE_ALGO_SHA1  BIT(5)
+#define  HACE_ALGO_SHA224BIT(6)
+#define  HACE_ALGO_SHA256(BIT(4) | BIT(6))
+#define  HACE_ALGO_SHA512(BIT(5) | BIT(6))
+#define  HACE_ALGO_SHA384(BIT(5) | BIT(6) | BIT(10))
+#define  HACE_SG_EN  BIT(18)
+
+#define HACE_STS 0x1c
+#define  HACE_RSA_ISRBIT(13)
+#define  HACE_CRYPTO_ISR BIT(12)
+#define  HACE_HASH_ISR   BIT(9)
+#define  HACE_RSA_BUSY   BIT(2)
+#define  HACE_CRYPTO_BUSYBIT(1)
+#define  HACE_HASH_BUSY  BIT(0)
+#define HACE_HASH_SRC0x20
+#define HACE_HASH_DIGEST 0x24
+#define HACE_HASH_KEY_BUFF   0x28
+#define HACE_HASH_DATA_LEN   0x2c
+#define HACE_HASH_CMD0x30
+
+/*
+ * Test vector is the ascii "abc"
+ *
+ * Expected results were generated using command line utitiles:
+ *
+ *  echo -n -e 'abc' | dd of=/tmp/test
+ *  for hash in sha512sum sha256sum md5sum; do $hash /tmp/test; done
+ *
+ */
+static const uint8_t test_vector[] = {0x61, 0x62, 0x63};
+
+static const uint8_t test_result_sha512[] = {
+0xdd, 0xaf, 0x35, 0xa1, 0x93, 0x61, 0x7a, 0xba, 0xcc, 0x41, 0x73, 0x49,
+0xae, 0x20, 0x41, 0x31, 0x12, 0xe6, 0xfa, 0x4e, 0x89, 0xa9, 0x7e, 0xa2,
+0x0a, 0x9e, 0xee, 0xe6, 0x4b, 0x55, 0xd3, 0x9a, 0x21, 0x92, 0x99, 0x2a,
+0x27, 0x4f, 0xc1, 0xa8, 0x36, 0xba, 0x3c, 0x23, 0xa3, 0xfe, 0xeb, 0xbd,
+0x45, 0x4d, 0x44, 0x23, 0x64, 0x3c, 0xe8, 0x0e, 0x2a, 0x9a, 0xc9, 0x4f,
+0xa5, 0x4c, 0xa4, 0x9f};
+
+static const uint8_t test_result_sha256[] = {
+0xba, 0x78, 0x16, 0xbf, 0x8f, 0x01, 0xcf, 0xea, 0x41, 0x41, 0x40, 0xde,
+0x5d, 0xae, 0x22, 0x23, 0xb0, 0x03, 0x61, 0xa3, 0x96, 0x17, 0x7a, 0x9c,
+0xb4, 0x10, 0xff, 0x61, 0xf2, 0x00, 0x15, 0xad};
+
+static const uint8_t test_result_md5[] = {
+0x90, 0x01, 0x50, 0x98, 0x3c, 0xd2, 0x4f, 0xb0, 0xd6, 0x96, 0x3f, 0x7d,
+0x28, 0xe1, 0x7f, 0x72};
+
+
+static void write_regs(QTestState *s, uint32_t base, uint32_t src,
+   uint32_t length, uint32_t out, uint32_t method)
+{
+qtest_writel(s, base + HACE_HASH_SRC, src);
+qtest_writel(s, base + HACE_HASH_DIGEST, out);
+qtest_writel(s, base + HACE_HASH_DATA_LEN, length);
+qtest_writel(s, base + HACE_HASH_CMD, HACE_SHA_BE_EN | method);
+}
+
+static void test_md5(const char *machine, const uint32_t base,
+ const uint32_t src_addr)
+
+{
+QTestState *s = qtest_init(machine);
+
+uint32_t digest_addr = src_addr + 0x0100;
+uint8_t digest[16] = {0};
+
+/* Check engine is idle, no busy or irq bits set */
+g_assert_cmphex(qtest_readl(s, base + HACE_STS), ==, 0);
+
+/* Write test vector into memory */
+qtest_memwrite(s, src_addr, test_vector, sizeof(test_vector));
+
+write_regs(s, base, src_addr, sizeof(test_vector), digest_addr, 
HACE_ALGO_MD5);
+
+/* Check hash IRQ status is asserted */
+g_assert_cmphex(qtest_readl(s, base + HACE_STS), ==, 0x0200);
+
+/* Clear IRQ status and check status is deasserted */
+qtest_writel(s, base + HACE_STS, 0x0200);
+g_assert_cmphex(qtest_readl(s, base + HACE_STS), ==, 0);
+
+/* Read computed digest from memory */
+qtest_memread(s, digest_addr, digest, sizeof(digest));
+
+/* Check result of computation */
+g_assert_cmpmem(digest, sizeof(digest),
+test_result_md5, sizeof(digest));
+
+qtest_quit(s);
+}
+
+static void test_sha256(const char 

Re: [PATCH 07/27] arc: TCG instruction definitions

2021-04-07 Thread Richard Henderson

On 4/5/21 7:31 AM, cupertinomira...@gmail.com wrote:

+void arc_gen_verifyCCFlag(const DisasCtxt *ctx, TCGv ret)


Why "verify"?  I don't see anything that verifies here...

I'll note that this can be done better, if you expose the actual comparison 
rather than a simple boolean.  This could remove 2-3 insns from gen_cc_prologue().


See e.g. disas_jcc and DisasCompare from target/s390x.



+{ MO_UL, MO_UB, MO_UW }, /* non sign-extended */


"non sign-extended" => "zero-extended".


+void arc_gen_no_further_loads_pending(const DisasCtxt *ctx, TCGv ret)
+{
+/* TODO: To complete on SMP support. */
+tcg_gen_movi_tl(ret, 1);
+}
+
+void arc_gen_set_debug(const DisasCtxt *ctx, bool value)
+{
+/* TODO: Could not find a reson to set this. */
+}


What's the point of these within the semantics?  It seems like some sort of 
in-chip debugging thing that tcg should ignore?



+void
+arc_gen_execute_delayslot(DisasCtxt *ctx, TCGv bta, TCGv take_branch)
+{
+assert(ctx->insn.limm_p == 0 && !ctx->in_delay_slot);
+
+ctx->in_delay_slot = true;
+uint32_t cpc = ctx->cpc;
+uint32_t pcl = ctx->pcl;
+insn_t insn = ctx->insn;
+
+ctx->cpc = ctx->npc;
+ctx->pcl = ctx->cpc & ((target_ulong) 0xfffc);
+
+++ctx->ds;
+
+TCGLabel *do_not_set_bta_and_de = gen_new_label();
+tcg_gen_brcondi_tl(TCG_COND_NE, take_branch, 1, do_not_set_bta_and_de);
+/*
+ * In case an exception should be raised during the execution
+ * of delay slot, bta value is used to set erbta.
+ */
+tcg_gen_mov_tl(cpu_bta, bta);
+/* We are in a delay slot */
+tcg_gen_mov_tl(cpu_DEf, take_branch);
+gen_set_label(do_not_set_bta_and_de);
+
+tcg_gen_movi_tl(cpu_is_delay_slot_instruction, 1);
+
+/* Set the pc to the next pc */
+tcg_gen_movi_tl(cpu_pc, ctx->npc);
+/* Necessary for the likely call to restore_state_to_opc() */
+tcg_gen_insn_start(ctx->npc);


This is unlikely to work reliably.
I suspect it does not work at all with icount.


+ctx->env->enabled_interrupts = false;


Illegal, as mentioned before.


+/*
+ * In case we might be in a situation where the delayslot is in a
+ * different MMU page. Make a fake exception to interrupt
+ * delayslot execution in the context of the branch.
+ * The delayslot will then be re-executed in isolation after the
+ * branch code has set bta and DEf status flag.
+ */
+if ((cpc & PAGE_MASK) < 0x8000 &&
+(cpc & PAGE_MASK) != (ctx->cpc & PAGE_MASK)) {
+ctx->in_delay_slot = false;
+TCGv dpc = tcg_const_local_tl(ctx->npc);
+tcg_gen_mov_tl(cpu_pc, dpc);
+gen_helper_fake_exception(cpu_env, dpc);


I think you should *always* execute the delay slot separately.  That's the only 
way the instruction count will be done right.


I'm pretty sure I asked you before to have a look at some of the other targets 
that implement delay slots for ideas on how to do this correctly.




+void arc_gen_get_bit(TCGv ret, TCGv a, TCGv pos)
+{
+tcg_gen_rotr_tl(ret, a, pos);
+tcg_gen_andi_tl(ret, ret, 1);
+}


Should be a plain shift, not a rotate, surely.


+void arc_gen_extract_bits(TCGv ret, TCGv a, TCGv start, TCGv end)
+{
+TCGv tmp1 = tcg_temp_new();
+
+tcg_gen_shr_tl(ret, a, end);
+
+tcg_gen_sub_tl(tmp1, start, end);
+tcg_gen_addi_tl(tmp1, tmp1, 1);
+tcg_gen_shlfi_tl(tmp1, 1, tmp1);
+tcg_gen_subi_tl(tmp1, tmp1, 1);
+
+tcg_gen_and_tl(ret, ret, tmp1);


Doesn't work for start == 31, end = 0,
due to shift count of 32.

You could rewrite this to

  t = 31 - start;
  ret = a << t;
  t = 31 - end;
  ret = ret >> t;

Amusingly, there's exactly one instance of extractBits that doesn't use 
constant arguments, and that's in ROR.  And there, the extract *would* use 
constant arguments if the extract was from @dest instead of from lsrc.  At 
which point you could just use tcg_gen_extract_tl.




+TCGv arc_gen_next_reg(const DisasCtxt *ctx, TCGv reg)
+{
+int i;
+for (i = 0; i < 64; i += 2) {
+if (reg == cpu_r[i]) {
+return cpu_r[i + 1];
+}
+}
+/* Check if REG is an odd register. */
+for (i = 1; i < 64; i += 2) {
+/* If so, that is unsanctioned. */
+if (reg == cpu_r[i]) {
+arc_gen_excp(ctx, EXCP_INST_ERROR, 0, 0);
+return NULL;
+}
+}


This is really ugly.  Surely you can do something better.

Perhaps not resolving regno to TCGv quite so early, so that it's easy to simply 
add one and index.



+void arc_gen_verifyCCFlag(const DisasCtxt *ctx, TCGv ret);
+#define getCCFlag(R)arc_gen_verifyCCFlag(ctx, R)


I wonder if it would be clearer if the ruby translator simply added the context 
parameter itself, rather than have 99 macros to do the same.



+#define getNFlag(R) cpu_Nf
+#define setNFlag(ELEM)  tcg_gen_shri_tl(cpu_Nf, ELEM, (TARGET_LONG_BITS - 1))


I'll note that setting of flags happens much more often than checking of flags. 

  1   2   3   4   >