date:20230422

[PATCH v3 43/47] e1000e: Notify only new interrupts

2023-04-22 Thread Akihiko Odaki

In MSI-X mode, if there are interrupts already notified but not cleared
and a new interrupt arrives, e1000e incorrectly notifies the notified
ones again along with the new one.

To fix this issue, replace e1000e_update_interrupt_state() with
two new functions: e1000e_raise_interrupts() and
e1000e_lower_interrupts(). These functions don't only raise or lower
interrupts, but it also performs register writes which updates the
interrupt state. Before it performs a register write, these function
determines the interrupts already raised, and compares with the
interrupts raised after the register write to determine the interrupts
to notify.

The introduction of these functions made tracepoints which assumes that
the caller of e1000e_update_interrupt_state() performs register writes
obsolete. These tracepoints are now removed, and alternative ones are
added to the new functions.

Signed-off-by: Akihiko Odaki 
---
 hw/net/e1000e_core.h |   2 -
 hw/net/e1000e_core.c | 153 +++
 hw/net/trace-events  |   2 +
 3 files changed, 69 insertions(+), 88 deletions(-)

diff --git a/hw/net/e1000e_core.h b/hw/net/e1000e_core.h
index 213a70530d..66b025cc43 100644
--- a/hw/net/e1000e_core.h
+++ b/hw/net/e1000e_core.h
@@ -111,8 +111,6 @@ struct E1000Core {
 PCIDevice *owner;
 void (*owner_start_recv)(PCIDevice *d);
 
-uint32_t msi_causes_pending;
-
 int64_t timadj;
 };
 
diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 347162a9d0..78373d7db7 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -165,14 +165,14 @@ e1000e_intrmgr_on_throttling_timer(void *opaque)
 
 timer->running = false;
 
-if (msi_enabled(timer->core->owner)) {
-trace_e1000e_irq_msi_notify_postponed();
-/* Clear msi_causes_pending to fire MSI eventually */
-timer->core->msi_causes_pending = 0;
-e1000e_set_interrupt_cause(timer->core, 0);
-} else {
-trace_e1000e_irq_legacy_notify_postponed();
-e1000e_set_interrupt_cause(timer->core, 0);
+if (timer->core->mac[IMS] & timer->core->mac[ICR]) {
+if (msi_enabled(timer->core->owner)) {
+trace_e1000e_irq_msi_notify_postponed();
+msi_notify(timer->core->owner, 0);
+} else {
+trace_e1000e_irq_legacy_notify_postponed();
+e1000e_raise_legacy_irq(timer->core);
+}
 }
 }
 
@@ -366,10 +366,6 @@ static void
 e1000e_intrmgr_fire_all_timers(E1000ECore *core)
 {
 int i;
-uint32_t val = e1000e_intmgr_collect_delayed_causes(core);
-
-trace_e1000e_irq_adding_delayed_causes(val, core->mac[ICR]);
-core->mac[ICR] |= val;
 
 if (core->itr.running) {
 timer_del(core->itr.timer);
@@ -1974,13 +1970,6 @@ 
void(*e1000e_phyreg_writeops[E1000E_PHY_PAGES][E1000E_PHY_PAGE_SIZE])
 }
 };
 
-static inline void
-e1000e_clear_ims_bits(E1000ECore *core, uint32_t bits)
-{
-trace_e1000e_irq_clear_ims(bits, core->mac[IMS], core->mac[IMS] & ~bits);
-core->mac[IMS] &= ~bits;
-}
-
 static inline bool
 e1000e_postpone_interrupt(E1000IntrDelayTimer *timer)
 {
@@ -2038,7 +2027,6 @@ e1000e_msix_notify_one(E1000ECore *core, uint32_t cause, 
uint32_t int_cfg)
 effective_eiac = core->mac[EIAC] & cause;
 
 core->mac[ICR] &= ~effective_eiac;
-core->msi_causes_pending &= ~effective_eiac;
 
 if (!(core->mac[CTRL_EXT] & E1000_CTRL_EXT_IAME)) {
 core->mac[IMS] &= ~effective_eiac;
@@ -2130,33 +2118,17 @@ e1000e_fix_icr_asserted(E1000ECore *core)
 trace_e1000e_irq_fix_icr_asserted(core->mac[ICR]);
 }
 
-static void
-e1000e_send_msi(E1000ECore *core, bool msix)
+static void e1000e_raise_interrupts(E1000ECore *core,
+size_t index, uint32_t causes)
 {
-uint32_t causes = core->mac[ICR] & core->mac[IMS] & ~E1000_ICR_ASSERTED;
-
-core->msi_causes_pending &= causes;
-causes ^= core->msi_causes_pending;
-if (causes == 0) {
-return;
-}
-core->msi_causes_pending |= causes;
+bool is_msix = msix_enabled(core->owner);
+uint32_t old_causes = core->mac[IMS] & core->mac[ICR];
+uint32_t raised_causes;
 
-if (msix) {
-e1000e_msix_notify(core, causes);
-} else {
-if (!e1000e_itr_should_postpone(core)) {
-trace_e1000e_irq_msi_notify(causes);
-msi_notify(core->owner, 0);
-}
-}
-}
+trace_e1000e_irq_set(index << 2,
+ core->mac[index], core->mac[index] | causes);
 
-static void
-e1000e_update_interrupt_state(E1000ECore *core)
-{
-bool interrupts_pending;
-bool is_msix = msix_enabled(core->owner);
+core->mac[index] |= causes;
 
 /* Set ICR[OTHER] for MSI-X */
 if (is_msix) {
@@ -2178,40 +2150,58 @@ e1000e_update_interrupt_state(E1000ECore *core)
  */
 core->mac[ICS] = core->mac[ICR];
 
-interrupts_pending = (core->mac[IMS] & core->mac[ICR]) ? true : false;
-if (!interrupts_pending) {
-core->msi_causes_pending =

[PATCH v3 20/47] igb: Remove goto

2023-04-22 Thread Akihiko Odaki

The goto is a bit confusing as it changes the control flow only if L4
protocol is not recognized. It is also different from e1000e, and
noisy when comparing e1000e and igb.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/igb_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 167e1f949d..2de04fabfe 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -1297,7 +1297,7 @@ igb_build_rx_metadata(IGBCore *core,
 break;
 
 default:
-goto func_exit;
+break;
 }
 } else {
 trace_e1000e_rx_metadata_l4_cso_disabled();
-- 
2.40.0

[PATCH v3 09/47] igb: Always copy ethernet header

2023-04-22 Thread Akihiko Odaki

igb_receive_internal() used to check the iov length to determine
copy the iovs to a contiguous buffer, but the check is flawed in two
ways:
- It does not ensure that iovcnt > 0.
- It does not take virtio-net header into consideration.

The size of this copy is just 22 octets, which can be even less than
the code size required for checks. This (wrong) optimization is probably
not worth so just remove it. Removing this also allows igb to assume
aligned accesses for the ethernet header.

Fixes: 3a977deebe ("Intrdocue igb device emulation")
Signed-off-by: Akihiko Odaki 
---
 hw/net/igb_core.c | 39 +--
 1 file changed, 21 insertions(+), 18 deletions(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 21a8d9ada4..1d7f913e5a 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -67,6 +67,11 @@ typedef struct IGBTxPktVmdqCallbackContext {
 NetClientState *nc;
 } IGBTxPktVmdqCallbackContext;
 
+typedef struct L2Header {
+struct eth_header eth;
+struct vlan_header vlan;
+} L2Header;
+
 static ssize_t
 igb_receive_internal(IGBCore *core, const struct iovec *iov, int iovcnt,
  bool has_vnet, bool *external_tx);
@@ -961,15 +966,16 @@ igb_rx_is_oversized(IGBCore *core, uint16_t qn, size_t 
size)
 return size > (lpe ? max_ethernet_lpe_size : max_ethernet_vlan_size);
 }
 
-static uint16_t igb_receive_assign(IGBCore *core, const struct eth_header 
*ehdr,
+static uint16_t igb_receive_assign(IGBCore *core, const L2Header *l2_header,
size_t size, E1000E_RSSInfo *rss_info,
bool *external_tx)
 {
 static const int ta_shift[] = { 4, 3, 2, 0 };
+const struct eth_header *ehdr = _header->eth;
 uint32_t f, ra[2], *macp, rctl = core->mac[RCTL];
 uint16_t queues = 0;
 uint16_t oversized = 0;
-uint16_t vid = lduw_be_p(_GET_VLAN_HDR(ehdr)->h_tci) & VLAN_VID_MASK;
+uint16_t vid = be16_to_cpu(l2_header->vlan.h_tci) & VLAN_VID_MASK;
 bool accepted = false;
 int i;
 
@@ -1590,14 +1596,13 @@ static ssize_t
 igb_receive_internal(IGBCore *core, const struct iovec *iov, int iovcnt,
  bool has_vnet, bool *external_tx)
 {
-static const int maximum_ethernet_hdr_len = (ETH_HLEN + 4);
-
 uint16_t queues = 0;
 uint32_t n = 0;
-uint8_t min_buf[ETH_ZLEN];
+union {
+L2Header l2_header;
+uint8_t octets[ETH_ZLEN];
+} min_buf;
 struct iovec min_iov;
-struct eth_header *ehdr;
-uint8_t *filter_buf;
 size_t size, orig_size;
 size_t iov_ofs = 0;
 E1000E_RxRing rxr;
@@ -1623,24 +1628,21 @@ igb_receive_internal(IGBCore *core, const struct iovec 
*iov, int iovcnt,
 net_rx_pkt_unset_vhdr(core->rx_pkt);
 }
 
-filter_buf = iov->iov_base + iov_ofs;
 orig_size = iov_size(iov, iovcnt);
 size = orig_size - iov_ofs;
 
 /* Pad to minimum Ethernet frame length */
 if (size < sizeof(min_buf)) {
-iov_to_buf(iov, iovcnt, iov_ofs, min_buf, size);
-memset(_buf[size], 0, sizeof(min_buf) - size);
+iov_to_buf(iov, iovcnt, iov_ofs, _buf, size);
+memset(_buf.octets[size], 0, sizeof(min_buf) - size);
 e1000x_inc_reg_if_not_full(core->mac, RUC);
-min_iov.iov_base = filter_buf = min_buf;
+min_iov.iov_base = _buf;
 min_iov.iov_len = size = sizeof(min_buf);
 iovcnt = 1;
 iov = _iov;
 iov_ofs = 0;
-} else if (iov->iov_len < maximum_ethernet_hdr_len) {
-/* This is very unlikely, but may happen. */
-iov_to_buf(iov, iovcnt, iov_ofs, min_buf, maximum_ethernet_hdr_len);
-filter_buf = min_buf;
+} else {
+iov_to_buf(iov, iovcnt, iov_ofs, _buf, sizeof(min_buf.l2_header));
 }
 
 /* Discard oversized packets if !LPE and !SBP. */
@@ -1648,11 +1650,12 @@ igb_receive_internal(IGBCore *core, const struct iovec 
*iov, int iovcnt,
 return orig_size;
 }
 
-ehdr = PKT_GET_ETH_HDR(filter_buf);
-net_rx_pkt_set_packet_type(core->rx_pkt, get_eth_packet_type(ehdr));
+net_rx_pkt_set_packet_type(core->rx_pkt,
+   get_eth_packet_type(_buf.l2_header.eth));
 net_rx_pkt_set_protocols(core->rx_pkt, iov, iovcnt, iov_ofs);
 
-queues = igb_receive_assign(core, ehdr, size, _info, external_tx);
+queues = igb_receive_assign(core, _buf.l2_header, size,
+_info, external_tx);
 if (!queues) {
 trace_e1000e_rx_flt_dropped();
 return orig_size;
-- 
2.40.0

[PATCH v3 33/47] tests/qtest/libqos/igb: Set GPIE.Multiple_MSIX

2023-04-22 Thread Akihiko Odaki

GPIE.Multiple_MSIX is not set by default, and needs to be set to get
interrupts from multiple MSI-X vectors.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 tests/qtest/libqos/igb.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/qtest/libqos/igb.c b/tests/qtest/libqos/igb.c
index 12fb531bf0..a603468beb 100644
--- a/tests/qtest/libqos/igb.c
+++ b/tests/qtest/libqos/igb.c
@@ -114,6 +114,7 @@ static void igb_pci_start_hw(QOSGraphObject *obj)
 e1000e_macreg_write(>e1000e, E1000_RCTL, E1000_RCTL_EN);
 
 /* Enable all interrupts */
+e1000e_macreg_write(>e1000e, E1000_GPIE,  E1000_GPIE_MSIX_MODE);
 e1000e_macreg_write(>e1000e, E1000_IMS,  0x);
 e1000e_macreg_write(>e1000e, E1000_EIMS, 0x);
 
-- 
2.40.0

[PATCH v3 21/47] igb: Read DCMD.VLE of the first Tx descriptor

2023-04-22 Thread Akihiko Odaki

Section 7.2.2.3 Advanced Transmit Data Descriptor says:
> For frames that spans multiple descriptors, all fields apart from
> DCMD.EOP, DCMD.RS, DCMD.DEXT, DTALEN, Address and DTYP are valid only
> in the first descriptors and are ignored in the subsequent ones.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/igb_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 2de04fabfe..4ac7e7af44 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -613,7 +613,7 @@ igb_process_tx_desc(IGBCore *core,
 idx = (tx->first_olinfo_status >> 4) & 1;
 igb_tx_insert_vlan(core, queue_index, tx,
 tx->ctx[idx].vlan_macip_lens >> 16,
-!!(cmd_type_len & E1000_TXD_CMD_VLE));
+!!(tx->first_cmd_type_len & E1000_TXD_CMD_VLE));
 
 if (igb_tx_pkt_send(core, tx, queue_index)) {
 igb_on_tx_done_update_stats(core, tx->tx_pkt, queue_index);
-- 
2.40.0

[PATCH v3 39/47] igb: Filter with the second VLAN tag for extended VLAN

2023-04-22 Thread Akihiko Odaki

Signed-off-by: Akihiko Odaki 
---
 hw/net/igb_core.c | 23 ++-
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index a51c435084..667ff47701 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -69,7 +69,7 @@ typedef struct IGBTxPktVmdqCallbackContext {
 
 typedef struct L2Header {
 struct eth_header eth;
-struct vlan_header vlan;
+struct vlan_header vlan[2];
 } L2Header;
 
 static ssize_t
@@ -1001,7 +1001,7 @@ static uint16_t igb_receive_assign(IGBCore *core, const 
L2Header *l2_header,
 uint32_t f, ra[2], *macp, rctl = core->mac[RCTL];
 uint16_t queues = 0;
 uint16_t oversized = 0;
-uint16_t vid = be16_to_cpu(l2_header->vlan.h_tci) & VLAN_VID_MASK;
+size_t vlan_num = 0;
 int i;
 
 memset(rss_info, 0, sizeof(E1000E_RSSInfo));
@@ -1010,8 +1010,19 @@ static uint16_t igb_receive_assign(IGBCore *core, const 
L2Header *l2_header,
 *external_tx = true;
 }
 
-if (e1000x_is_vlan_packet(ehdr, core->mac[VET] & 0x) &&
-!e1000x_rx_vlan_filter(core->mac, PKT_GET_VLAN_HDR(ehdr))) {
+if (core->mac[CTRL_EXT] & BIT(26)) {
+if (be16_to_cpu(ehdr->h_proto) == core->mac[VET] >> 16 &&
+be16_to_cpu(l2_header->vlan[0].h_proto) == (core->mac[VET] & 
0x)) {
+vlan_num = 2;
+}
+} else {
+if (be16_to_cpu(ehdr->h_proto) == (core->mac[VET] & 0x)) {
+vlan_num = 1;
+}
+}
+
+if (vlan_num &&
+!e1000x_rx_vlan_filter(core->mac, l2_header->vlan + vlan_num - 1)) {
 return queues;
 }
 
@@ -1065,7 +1076,9 @@ static uint16_t igb_receive_assign(IGBCore *core, const 
L2Header *l2_header,
 if (e1000x_vlan_rx_filter_enabled(core->mac)) {
 uint16_t mask = 0;
 
-if (e1000x_is_vlan_packet(ehdr, core->mac[VET] & 0x)) {
+if (vlan_num) {
+uint16_t vid = be16_to_cpu(l2_header->vlan[vlan_num - 
1].h_tci) & VLAN_VID_MASK;
+
 for (i = 0; i < E1000_VLVF_ARRAY_SIZE; i++) {
 if ((core->mac[VLVF0 + i] & E1000_VLVF_VLANID_MASK) == vid 
&&
 (core->mac[VLVF0 + i] & E1000_VLVF_VLANID_ENABLE)) {
-- 
2.40.0

[PATCH v3 32/47] hw/net/net_rx_pkt: Enforce alignment for eth_header

2023-04-22 Thread Akihiko Odaki

eth_strip_vlan and eth_strip_vlan_ex refers to ehdr_buf as struct
eth_header. Enforce alignment for the structure.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/net_rx_pkt.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/hw/net/net_rx_pkt.c b/hw/net/net_rx_pkt.c
index 6125a063d7..1de42b4f51 100644
--- a/hw/net/net_rx_pkt.c
+++ b/hw/net/net_rx_pkt.c
@@ -23,7 +23,10 @@
 
 struct NetRxPkt {
 struct virtio_net_hdr virt_hdr;
-uint8_t ehdr_buf[sizeof(struct eth_header) + sizeof(struct vlan_header)];
+struct {
+struct eth_header eth;
+struct vlan_header vlan;
+} ehdr_buf;
 struct iovec *vec;
 uint16_t vec_len_total;
 uint16_t vec_len;
@@ -89,7 +92,7 @@ net_rx_pkt_pull_data(struct NetRxPkt *pkt,
 if (pkt->ehdr_buf_len) {
 net_rx_pkt_iovec_realloc(pkt, iovcnt + 1);
 
-pkt->vec[0].iov_base = pkt->ehdr_buf;
+pkt->vec[0].iov_base = >ehdr_buf;
 pkt->vec[0].iov_len = pkt->ehdr_buf_len;
 
 pkt->tot_len = pllen + pkt->ehdr_buf_len;
@@ -120,7 +123,7 @@ void net_rx_pkt_attach_iovec(struct NetRxPkt *pkt,
 assert(pkt);
 
 if (strip_vlan) {
-pkt->ehdr_buf_len = eth_strip_vlan(iov, iovcnt, iovoff, pkt->ehdr_buf,
+pkt->ehdr_buf_len = eth_strip_vlan(iov, iovcnt, iovoff, >ehdr_buf,
, );
 } else {
 pkt->ehdr_buf_len = 0;
@@ -142,7 +145,7 @@ void net_rx_pkt_attach_iovec_ex(struct NetRxPkt *pkt,
 
 if (strip_vlan) {
 pkt->ehdr_buf_len = eth_strip_vlan_ex(iov, iovcnt, iovoff, vet,
-  pkt->ehdr_buf,
+  >ehdr_buf,
   , );
 } else {
 pkt->ehdr_buf_len = 0;
-- 
2.40.0

[PATCH v3 41/47] igb: Implement Rx PTP2 timestamp

2023-04-22 Thread Akihiko Odaki

Signed-off-by: Akihiko Odaki 
---
 hw/net/igb_common.h |  16 +++---
 hw/net/igb_regs.h   |  23 
 hw/net/igb_core.c   | 129 
 3 files changed, 127 insertions(+), 41 deletions(-)

diff --git a/hw/net/igb_common.h b/hw/net/igb_common.h
index f2a9065791..5c261ba9d3 100644
--- a/hw/net/igb_common.h
+++ b/hw/net/igb_common.h
@@ -51,7 +51,7 @@
defreg_indexeda(x, 0), defreg_indexeda(x, 1), \
defreg_indexeda(x, 2), defreg_indexeda(x, 3)
 
-#define defregv(x) defreg_indexed(x, 0), defreg_indexed(x, 1),   \
+#define defreg8(x) defreg_indexed(x, 0), defreg_indexed(x, 1),   \
defreg_indexed(x, 2), defreg_indexed(x, 3),   \
defreg_indexed(x, 4), defreg_indexed(x, 5),   \
defreg_indexed(x, 6), defreg_indexed(x, 7)
@@ -122,6 +122,8 @@ enum {
 defreg(EICS),defreg(EIMS),defreg(EIMC),   defreg(EIAM),
 defreg(EICR),defreg(IVAR_MISC),   defreg(GPIE),
 
+defreg(TSYNCRXCFG), defreg8(ETQF),
+
 defreg(RXPBS),  defregd(RDBAL),   defregd(RDBAH), 
defregd(RDLEN),
 defregd(SRRCTL),defregd(RDH), defregd(RDT),
 defregd(RXDCTL),defregd(RXCTL),   defregd(RQDPC), defreg(RA2),
@@ -133,15 +135,15 @@ enum {
 
 defreg(VT_CTL),
 
-defregv(P2VMAILBOX), defregv(V2PMAILBOX), defreg(MBVFICR),
defreg(MBVFIMR),
+defreg8(P2VMAILBOX), defreg8(V2PMAILBOX), defreg(MBVFICR),
defreg(MBVFIMR),
 defreg(VFLRE),   defreg(VFRE),defreg(VFTE),   defreg(WVBR),
 defreg(QDE), defreg(DTXSWC),  defreg_indexed(VLVF, 0),
-defregv(VMOLR),  defreg(RPLOLR),  defregv(VMBMEM),
defregv(VMVIR),
+defreg8(VMOLR),  defreg(RPLOLR),  defreg8(VMBMEM),
defreg8(VMVIR),
 
-defregv(PVTCTRL),defregv(PVTEICS),defregv(PVTEIMS),   
defregv(PVTEIMC),
-defregv(PVTEIAC),defregv(PVTEIAM),defregv(PVTEICR),   
defregv(PVFGPRC),
-defregv(PVFGPTC),defregv(PVFGORC),defregv(PVFGOTC),   
defregv(PVFMPRC),
-defregv(PVFGPRLBC),  defregv(PVFGPTLBC),  defregv(PVFGORLBC), 
defregv(PVFGOTLBC),
+defreg8(PVTCTRL),defreg8(PVTEICS),defreg8(PVTEIMS),   
defreg8(PVTEIMC),
+defreg8(PVTEIAC),defreg8(PVTEIAM),defreg8(PVTEICR),   
defreg8(PVFGPRC),
+defreg8(PVFGPTC),defreg8(PVFGORC),defreg8(PVFGOTC),   
defreg8(PVFMPRC),
+defreg8(PVFGPRLBC),  defreg8(PVFGPTLBC),  defreg8(PVFGORLBC), 
defreg8(PVFGOTLBC),
 
 defreg(MTA_A),
 
diff --git a/hw/net/igb_regs.h b/hw/net/igb_regs.h
index 4b4ebd3369..894705599d 100644
--- a/hw/net/igb_regs.h
+++ b/hw/net/igb_regs.h
@@ -210,6 +210,15 @@ union e1000_adv_rx_desc {
 #define E1000_DCA_TXCTRL_CPUID_SHIFT 24 /* Tx CPUID now in the last byte */
 #define E1000_DCA_RXCTRL_CPUID_SHIFT 24 /* Rx CPUID now in the last byte */
 
+/* ETQF register bit definitions */
+#define E1000_ETQF_FILTER_ENABLE   BIT(26)
+#define E1000_ETQF_1588BIT(30)
+#define E1000_ETQF_IMM_INT BIT(29)
+#define E1000_ETQF_QUEUE_ENABLEBIT(31)
+#define E1000_ETQF_QUEUE_SHIFT 16
+#define E1000_ETQF_QUEUE_MASK  0x0007
+#define E1000_ETQF_ETYPE_MASK  0x
+
 #define E1000_DTXSWC_MAC_SPOOF_MASK   0x00FF /* Per VF MAC spoof control */
 #define E1000_DTXSWC_VLAN_SPOOF_MASK  0xFF00 /* Per VF VLAN spoof control 
*/
 #define E1000_DTXSWC_LLE_MASK 0x00FF /* Per VF Local LB enables */
@@ -384,6 +393,20 @@ union e1000_adv_rx_desc {
 #define E1000_FRTIMER   0x01048  /* Free Running Timer - RW */
 #define E1000_FCRTV 0x02460  /* Flow Control Refresh Timer Value - RW */
 
+#define E1000_TSYNCRXCFG 0x05F50 /* Time Sync Rx Configuration - RW */
+
+/* Filtering Registers */
+#define E1000_SAQF(_n) (0x5980 + 4 * (_n))
+#define E1000_DAQF(_n) (0x59A0 + 4 * (_n))
+#define E1000_SPQF(_n) (0x59C0 + 4 * (_n))
+#define E1000_FTQF(_n) (0x59E0 + 4 * (_n))
+#define E1000_SAQF0 E1000_SAQF(0)
+#define E1000_DAQF0 E1000_DAQF(0)
+#define E1000_SPQF0 E1000_SPQF(0)
+#define E1000_FTQF0 E1000_FTQF(0)
+#define E1000_SYNQF(_n) (0x055FC + (4 * (_n))) /* SYN Packet Queue Fltr */
+#define E1000_ETQF(_n)  (0x05CB0 + (4 * (_n))) /* EType Queue Fltr */
+
 #define E1000_RQDPC(_n) (0x0C030 + ((_n) * 0x40))
 
 #define E1000_RXPBS 0x02404  /* Rx Packet Buffer Size - RW */
diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 16c563cf36..627d75d370 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -72,6 +72,24 @@ typedef struct L2Header {
 struct vlan_header vlan[2];
 } L2Header;
 
+typedef struct PTP2 {
+uint8_t message_id_transport_specific;
+uint8_t version_ptp;
+uint16_t message_length;
+uint8_t subdomain_number;
+uint8_t reserved0;
+uint16_t flags;
+uint64_t correction;
+uint8_t reserved1[5];
+uint8_t source_communication_technology;
+uint32_t source_uuid_lo;
+uint16_t source_uuid_hi;
+uint16_t source_port_id;
+uint16_t

[PATCH v3 05/47] igb: Do not require CTRL.VME for tx VLAN tagging

2023-04-22 Thread Akihiko Odaki

While the datasheet of e1000e says it checks CTRL.VME for tx VLAN
tagging, igb's datasheet has no such statements. It also says for
"CTRL.VLE":
> This register only affects the VLAN Strip in Rx it does not have any
> influence in the Tx path in the 82576.
(Appendix A. Changes from the 82575)

There is no "CTRL.VLE" so it is more likely that it is a mistake of
CTRL.VME.

Fixes: fba7c3b788 ("igb: respect VMVIR and VMOLR for VLAN")
Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/igb_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index dbd1192a8e..96a118b6c1 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -402,7 +402,7 @@ igb_tx_insert_vlan(IGBCore *core, uint16_t qn, struct 
igb_tx *tx,
 }
 }
 
-if (insert_vlan && e1000x_vlan_enabled(core->mac)) {
+if (insert_vlan) {
 net_tx_pkt_setup_vlan_header_ex(tx->tx_pkt, vlan,
 core->mac[VET] & 0x);
 }
-- 
2.40.0

[PATCH v3 25/47] igb: Share common VF constants

2023-04-22 Thread Akihiko Odaki

The constants need to be consistent between the PF and VF.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/igb_common.h |  8 
 hw/net/igb.c| 10 +-
 hw/net/igbvf.c  |  7 ---
 3 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/hw/net/igb_common.h b/hw/net/igb_common.h
index 69ac490f75..f2a9065791 100644
--- a/hw/net/igb_common.h
+++ b/hw/net/igb_common.h
@@ -28,6 +28,14 @@
 
 #include "igb_regs.h"
 
+#define TYPE_IGBVF "igbvf"
+
+#define IGBVF_MMIO_BAR_IDX  (0)
+#define IGBVF_MSIX_BAR_IDX  (3)
+
+#define IGBVF_MMIO_SIZE (16 * 1024)
+#define IGBVF_MSIX_SIZE (16 * 1024)
+
 #define defreg(x) x = (E1000_##x >> 2)
 #define defreg_indexed(x, i) x##i = (E1000_##x(i) >> 2)
 #define defreg_indexeda(x, i) x##i##_A = (E1000_##x##_A(i) >> 2)
diff --git a/hw/net/igb.c b/hw/net/igb.c
index 51a7e9133e..1c989d7677 100644
--- a/hw/net/igb.c
+++ b/hw/net/igb.c
@@ -433,16 +433,16 @@ static void igb_pci_realize(PCIDevice *pci_dev, Error 
**errp)
 
 pcie_ari_init(pci_dev, 0x150, 1);
 
-pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET, "igbvf",
+pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET, TYPE_IGBVF,
 IGB_82576_VF_DEV_ID, IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS,
 IGB_VF_OFFSET, IGB_VF_STRIDE);
 
-pcie_sriov_pf_init_vf_bar(pci_dev, 0,
+pcie_sriov_pf_init_vf_bar(pci_dev, IGBVF_MMIO_BAR_IDX,
 PCI_BASE_ADDRESS_MEM_TYPE_64 | PCI_BASE_ADDRESS_MEM_PREFETCH,
-16 * KiB);
-pcie_sriov_pf_init_vf_bar(pci_dev, 3,
+IGBVF_MMIO_SIZE);
+pcie_sriov_pf_init_vf_bar(pci_dev, IGBVF_MSIX_BAR_IDX,
 PCI_BASE_ADDRESS_MEM_TYPE_64 | PCI_BASE_ADDRESS_MEM_PREFETCH,
-16 * KiB);
+IGBVF_MSIX_SIZE);
 
 igb_init_net_peer(s, pci_dev, macaddr);
 
diff --git a/hw/net/igbvf.c b/hw/net/igbvf.c
index 70beb7af50..284ea61184 100644
--- a/hw/net/igbvf.c
+++ b/hw/net/igbvf.c
@@ -50,15 +50,8 @@
 #include "trace.h"
 #include "qapi/error.h"
 
-#define TYPE_IGBVF "igbvf"
 OBJECT_DECLARE_SIMPLE_TYPE(IgbVfState, IGBVF)
 
-#define IGBVF_MMIO_BAR_IDX  (0)
-#define IGBVF_MSIX_BAR_IDX  (3)
-
-#define IGBVF_MMIO_SIZE (16 * 1024)
-#define IGBVF_MSIX_SIZE (16 * 1024)
-
 struct IgbVfState {
 PCIDevice parent_obj;
 
-- 
2.40.0

[PATCH v3 45/47] vmxnet3: Do not depend on PC

2023-04-22 Thread Akihiko Odaki

vmxnet3 has no dependency on PC, and VMware Fusion actually makes it
available on Apple Silicon according to:
https://kb.vmware.com/s/article/90364

Signed-off-by: Akihiko Odaki 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/net/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/Kconfig b/hw/net/Kconfig
index 18c7851efe..98e00be4f9 100644
--- a/hw/net/Kconfig
+++ b/hw/net/Kconfig
@@ -56,7 +56,7 @@ config RTL8139_PCI
 
 config VMXNET3_PCI
 bool
-default y if PCI_DEVICES && PC_PCI
+default y if PCI_DEVICES
 depends on PCI
 
 config SMC91C111
-- 
2.40.0

[PATCH v3 36/47] igb: Implement Rx SCTP CSO

2023-04-22 Thread Akihiko Odaki

Signed-off-by: Akihiko Odaki 
---
 hw/net/igb_regs.h |  1 +
 include/net/eth.h |  4 ++-
 include/qemu/crc32c.h |  1 +
 hw/net/e1000e_core.c  |  5 
 hw/net/igb_core.c | 15 +-
 hw/net/net_rx_pkt.c   | 64 +++
 net/eth.c |  4 +++
 util/crc32c.c |  8 ++
 8 files changed, 89 insertions(+), 13 deletions(-)

diff --git a/hw/net/igb_regs.h b/hw/net/igb_regs.h
index e6ac26dc0e..4b4ebd3369 100644
--- a/hw/net/igb_regs.h
+++ b/hw/net/igb_regs.h
@@ -670,6 +670,7 @@ union e1000_adv_rx_desc {
 #define E1000_ADVRXD_PKT_IP6 BIT(6)
 #define E1000_ADVRXD_PKT_TCP BIT(8)
 #define E1000_ADVRXD_PKT_UDP BIT(9)
+#define E1000_ADVRXD_PKT_SCTP BIT(10)
 
 static inline uint8_t igb_ivar_entry_rx(uint8_t i)
 {
diff --git a/include/net/eth.h b/include/net/eth.h
index 048e434685..75e7f1551c 100644
--- a/include/net/eth.h
+++ b/include/net/eth.h
@@ -224,6 +224,7 @@ struct tcp_hdr {
 #define IP_HEADER_VERSION_6   (6)
 #define IP_PROTO_TCP  (6)
 #define IP_PROTO_UDP  (17)
+#define IP_PROTO_SCTP (132)
 #define IPTOS_ECN_MASK0x03
 #define IPTOS_ECN(x)  ((x) & IPTOS_ECN_MASK)
 #define IPTOS_ECN_CE  0x03
@@ -379,7 +380,8 @@ typedef struct eth_ip4_hdr_info_st {
 typedef enum EthL4HdrProto {
 ETH_L4_HDR_PROTO_INVALID,
 ETH_L4_HDR_PROTO_TCP,
-ETH_L4_HDR_PROTO_UDP
+ETH_L4_HDR_PROTO_UDP,
+ETH_L4_HDR_PROTO_SCTP
 } EthL4HdrProto;
 
 typedef struct eth_l4_hdr_info_st {
diff --git a/include/qemu/crc32c.h b/include/qemu/crc32c.h
index 5b78884c38..88b4d2b3b3 100644
--- a/include/qemu/crc32c.h
+++ b/include/qemu/crc32c.h
@@ -30,5 +30,6 @@
 
 
 uint32_t crc32c(uint32_t crc, const uint8_t *data, unsigned int length);
+uint32_t iov_crc32c(uint32_t crc, const struct iovec *iov, size_t iov_cnt);
 
 #endif
diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 0c0c45a6ce..c06c8b20c8 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -1114,6 +1114,11 @@ e1000e_verify_csum_in_sw(E1000ECore *core,
 return;
 }
 
+if (l4hdr_proto != ETH_L4_HDR_PROTO_TCP &&
+l4hdr_proto != ETH_L4_HDR_PROTO_UDP) {
+return;
+}
+
 if (!net_rx_pkt_validate_l4_csum(pkt, _valid)) {
 trace_e1000e_rx_metadata_l4_csum_validation_failed();
 return;
diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index a3267c0b7a..01d2788cf6 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -1220,7 +1220,7 @@ igb_build_rx_metadata(IGBCore *core,
   uint16_t *vlan_tag)
 {
 struct virtio_net_hdr *vhdr;
-bool hasip4, hasip6;
+bool hasip4, hasip6, csum_valid;
 EthL4HdrProto l4hdr_proto;
 
 *status_flags = E1000_RXD_STAT_DD;
@@ -1280,6 +1280,10 @@ igb_build_rx_metadata(IGBCore *core,
 *pkt_info |= E1000_ADVRXD_PKT_UDP;
 break;
 
+case ETH_L4_HDR_PROTO_SCTP:
+*pkt_info |= E1000_ADVRXD_PKT_SCTP;
+break;
+
 default:
 break;
 }
@@ -1312,6 +1316,15 @@ igb_build_rx_metadata(IGBCore *core,
 
 if (igb_rx_l4_cso_enabled(core)) {
 switch (l4hdr_proto) {
+case ETH_L4_HDR_PROTO_SCTP:
+if (!net_rx_pkt_validate_l4_csum(pkt, _valid)) {
+trace_e1000e_rx_metadata_l4_csum_validation_failed();
+goto func_exit;
+}
+if (!csum_valid) {
+*status_flags |= E1000_RXDEXT_STATERR_TCPE;
+}
+/* fall through */
 case ETH_L4_HDR_PROTO_TCP:
 *status_flags |= E1000_RXD_STAT_TCPCS;
 break;
diff --git a/hw/net/net_rx_pkt.c b/hw/net/net_rx_pkt.c
index 1de42b4f51..3575c8b9f9 100644
--- a/hw/net/net_rx_pkt.c
+++ b/hw/net/net_rx_pkt.c
@@ -16,6 +16,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/crc32c.h"
 #include "trace.h"
 #include "net_rx_pkt.h"
 #include "net/checksum.h"
@@ -554,32 +555,73 @@ _net_rx_pkt_calc_l4_csum(struct NetRxPkt *pkt)
 return csum;
 }
 
-bool net_rx_pkt_validate_l4_csum(struct NetRxPkt *pkt, bool *csum_valid)
+static bool
+_net_rx_pkt_validate_sctp_sum(struct NetRxPkt *pkt)
 {
-uint16_t csum;
+size_t csum_off;
+size_t off = pkt->l4hdr_off;
+size_t vec_len = pkt->vec_len;
+struct iovec *vec;
+uint32_t calculated = 0;
+uint32_t original;
+bool valid;
 
-trace_net_rx_pkt_l4_csum_validate_entry();
+for (vec = pkt->vec; vec->iov_len < off; vec++) {
+off -= vec->iov_len;
+vec_len--;
+}
 
-if (pkt->l4hdr_info.proto != ETH_L4_HDR_PROTO_TCP &&
-pkt->l4hdr_info.proto != ETH_L4_HDR_PROTO_UDP) {
-trace_net_rx_pkt_l4_csum_validate_not_xxp();
+csum_off = off + 8;
+
+if (!iov_to_buf(vec, vec_len, csum_off, , sizeof(original))) {
 return false;
 }
 
-if (pkt->l4hdr_info.proto == ETH_L4_HDR_PROTO_UDP &&
-pkt->l4hdr_info.hdr.udp.uh_sum == 0) {
-

[PATCH v3 10/47] Fix references to igb Avocado test

2023-04-22 Thread Akihiko Odaki

Fixes: 9f95111474 ("tests/avocado: re-factor igb test to avoid timeouts")
Signed-off-by: Akihiko Odaki 
Reviewed-by: Philippe Mathieu-Daudé 
---
 MAINTAINERS| 2 +-
 docs/system/devices/igb.rst| 2 +-
 scripts/ci/org.centos/stream/8/x86_64/test-avocado | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index ef45b5e71e..c31d2279ab 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2256,7 +2256,7 @@ R: Sriram Yagnaraman 
 S: Maintained
 F: docs/system/devices/igb.rst
 F: hw/net/igb*
-F: tests/avocado/igb.py
+F: tests/avocado/netdev-ethtool.py
 F: tests/qtest/igb-test.c
 F: tests/qtest/libqos/igb.c
 
diff --git a/docs/system/devices/igb.rst b/docs/system/devices/igb.rst
index 70edadd574..afe036dad2 100644
--- a/docs/system/devices/igb.rst
+++ b/docs/system/devices/igb.rst
@@ -60,7 +60,7 @@ Avocado test and can be ran with the following command:
 
 .. code:: shell
 
-  make check-avocado AVOCADO_TESTS=tests/avocado/igb.py
+  make check-avocado AVOCADO_TESTS=tests/avocado/netdev-ethtool.py
 
 References
 ==
diff --git a/scripts/ci/org.centos/stream/8/x86_64/test-avocado 
b/scripts/ci/org.centos/stream/8/x86_64/test-avocado
index d2c0e5fb4c..a1aa601ee3 100755
--- a/scripts/ci/org.centos/stream/8/x86_64/test-avocado
+++ b/scripts/ci/org.centos/stream/8/x86_64/test-avocado
@@ -30,7 +30,7 @@ make get-vm-images
 tests/avocado/cpu_queries.py:QueryCPUModelExpansion.test \
 tests/avocado/empty_cpu_model.py:EmptyCPUModel.test \
 tests/avocado/hotplug_cpu.py:HotPlugCPU.test \
-tests/avocado/igb.py:IGB.test \
+tests/avocado/netdev-ethtool.py:NetDevEthtool.test_igb_nomsi \
 tests/avocado/info_usernet.py:InfoUsernet.test_hostfwd \
 tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu \
 tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_pt \
-- 
2.40.0

[PATCH v3 16/47] e1000x: Take CRC into consideration for size check

2023-04-22 Thread Akihiko Odaki

Section 13.7.15 Receive Length Error Count says:
>  Packets over 1522 bytes are oversized if LongPacketEnable is 0b
> (RCTL.LPE). If LongPacketEnable (LPE) is 1b, then an incoming packet
> is considered oversized if it exceeds 16384 bytes.

> These lengths are based on bytes in the received packet from
>  through , inclusively.

As QEMU processes packets without CRC, the number of bytes for CRC
need to be subtracted. This change adds some size definitions to be used
to derive the new size thresholds to eth.h.

Signed-off-by: Akihiko Odaki 
---
 include/net/eth.h  |  2 ++
 hw/net/e1000x_common.c | 10 +-
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/include/net/eth.h b/include/net/eth.h
index e8af5742be..05f56931e7 100644
--- a/include/net/eth.h
+++ b/include/net/eth.h
@@ -32,6 +32,8 @@
 #define ETH_ALEN 6
 #define ETH_HLEN 14
 #define ETH_ZLEN 60 /* Min. octets in frame without FCS */
+#define ETH_FCS_LEN 4
+#define ETH_MTU 1500
 
 struct eth_header {
 uint8_t  h_dest[ETH_ALEN];   /* destination eth addr */
diff --git a/hw/net/e1000x_common.c b/hw/net/e1000x_common.c
index 6cc23138a8..212873fd77 100644
--- a/hw/net/e1000x_common.c
+++ b/hw/net/e1000x_common.c
@@ -140,16 +140,16 @@ bool e1000x_hw_rx_enabled(uint32_t *mac)
 
 bool e1000x_is_oversized(uint32_t *mac, size_t size)
 {
+size_t header_size = sizeof(struct eth_header) + sizeof(struct 
vlan_header);
 /* this is the size past which hardware will
drop packets when setting LPE=0 */
-static const int maximum_ethernet_vlan_size = 1522;
+size_t maximum_short_size = header_size + ETH_MTU;
 /* this is the size past which hardware will
drop packets when setting LPE=1 */
-static const int maximum_ethernet_lpe_size = 16 * KiB;
+size_t maximum_large_size = 16 * KiB - ETH_FCS_LEN;
 
-if ((size > maximum_ethernet_lpe_size ||
-(size > maximum_ethernet_vlan_size
-&& !(mac[RCTL] & E1000_RCTL_LPE)))
+if ((size > maximum_large_size ||
+(size > maximum_short_size && !(mac[RCTL] & E1000_RCTL_LPE)))
 && !(mac[RCTL] & E1000_RCTL_SBP)) {
 e1000x_inc_reg_if_not_full(mac, ROC);
 trace_e1000x_rx_oversized(size);
-- 
2.40.0

[PATCH v3 27/47] igb: Clear EICR bits for delayed MSI-X interrupts

2023-04-22 Thread Akihiko Odaki

Section 7.3.4.1 says:
> When auto-clear is enabled for an interrupt cause, the EICR bit is
> set when a cause event mapped to this vector occurs. When the EITR
> Counter reaches zero, the MSI-X message is sent on PCIe. Then the
> EICR bit is cleared and enabled to be set by a new cause event

Signed-off-by: Akihiko Odaki 
---
 hw/net/igb_core.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 3c1ef11afd..ef29e68096 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -97,23 +97,31 @@ igb_lower_legacy_irq(IGBCore *core)
 pci_set_irq(core->owner, 0);
 }
 
-static void igb_msix_notify(IGBCore *core, unsigned int vector)
+static void igb_msix_notify(IGBCore *core, unsigned int cause)
 {
 PCIDevice *dev = core->owner;
 uint16_t vfn;
+uint32_t effective_eiac;
+unsigned int vector;
 
-vfn = 8 - (vector + 2) / IGBVF_MSIX_VEC_NUM;
+vfn = 8 - (cause + 2) / IGBVF_MSIX_VEC_NUM;
 if (vfn < pcie_sriov_num_vfs(core->owner)) {
 dev = pcie_sriov_get_vf_at_index(core->owner, vfn);
 assert(dev);
-vector = (vector + 2) % IGBVF_MSIX_VEC_NUM;
-} else if (vector >= IGB_MSIX_VEC_NUM) {
+vector = (cause + 2) % IGBVF_MSIX_VEC_NUM;
+} else if (cause >= IGB_MSIX_VEC_NUM) {
 qemu_log_mask(LOG_GUEST_ERROR,
   "igb: Tried to use vector unavailable for PF");
 return;
+} else {
+vector = cause;
 }
 
 msix_notify(dev, vector);
+
+trace_e1000e_irq_icr_clear_eiac(core->mac[EICR], core->mac[EIAC]);
+effective_eiac = core->mac[EIAC] & BIT(cause);
+core->mac[EICR] &= ~effective_eiac;
 }
 
 static inline void
@@ -1834,7 +1842,6 @@ igb_eitr_should_postpone(IGBCore *core, int idx)
 static void igb_send_msix(IGBCore *core)
 {
 uint32_t causes = core->mac[EICR] & core->mac[EIMS];
-uint32_t effective_eiac;
 int vector;
 
 for (vector = 0; vector < IGB_INTR_NUM; ++vector) {
@@ -1842,10 +1849,6 @@ static void igb_send_msix(IGBCore *core)
 
 trace_e1000e_irq_msix_notify_vec(vector);
 igb_msix_notify(core, vector);
-
-trace_e1000e_irq_icr_clear_eiac(core->mac[EICR], core->mac[EIAC]);
-effective_eiac = core->mac[EIAC] & BIT(vector);
-core->mac[EICR] &= ~effective_eiac;
 }
 }
 }
-- 
2.40.0

[PATCH v3 35/47] igb: Use UDP for RSS hash

2023-04-22 Thread Akihiko Odaki

e1000e does not support using UDP for RSS hash, but igb does.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/igb_regs.h |  3 +++
 hw/net/igb_core.c | 16 
 2 files changed, 19 insertions(+)

diff --git a/hw/net/igb_regs.h b/hw/net/igb_regs.h
index eb995d8b2e..e6ac26dc0e 100644
--- a/hw/net/igb_regs.h
+++ b/hw/net/igb_regs.h
@@ -659,6 +659,9 @@ union e1000_adv_rx_desc {
 
 #define E1000_RSS_QUEUE(reta, hash) (E1000_RETA_VAL(reta, hash) & 0x0F)
 
+#define E1000_MRQ_RSS_TYPE_IPV4UDP 7
+#define E1000_MRQ_RSS_TYPE_IPV6UDP 8
+
 #define E1000_STATUS_IOV_MODE 0x0004
 
 #define E1000_STATUS_NUM_VFS_SHIFT 14
diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 46babe85a9..a3267c0b7a 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -287,6 +287,11 @@ igb_rss_get_hash_type(IGBCore *core, struct NetRxPkt *pkt)
 return E1000_MRQ_RSS_TYPE_IPV4TCP;
 }
 
+if (l4hdr_proto == ETH_L4_HDR_PROTO_UDP &&
+(core->mac[MRQC] & E1000_MRQC_RSS_FIELD_IPV4_UDP)) {
+return E1000_MRQ_RSS_TYPE_IPV4UDP;
+}
+
 if (E1000_MRQC_EN_IPV4(core->mac[MRQC])) {
 return E1000_MRQ_RSS_TYPE_IPV4;
 }
@@ -322,6 +327,11 @@ igb_rss_get_hash_type(IGBCore *core, struct NetRxPkt *pkt)
 return E1000_MRQ_RSS_TYPE_IPV6TCPEX;
 }
 
+if (l4hdr_proto == ETH_L4_HDR_PROTO_UDP &&
+(core->mac[MRQC] & E1000_MRQC_RSS_FIELD_IPV6_UDP)) {
+return E1000_MRQ_RSS_TYPE_IPV6UDP;
+}
+
 if (E1000_MRQC_EN_IPV6EX(core->mac[MRQC])) {
 return E1000_MRQ_RSS_TYPE_IPV6EX;
 }
@@ -360,6 +370,12 @@ igb_rss_calc_hash(IGBCore *core, struct NetRxPkt *pkt, 
E1000E_RSSInfo *info)
 case E1000_MRQ_RSS_TYPE_IPV6EX:
 type = NetPktRssIpV6Ex;
 break;
+case E1000_MRQ_RSS_TYPE_IPV4UDP:
+type = NetPktRssIpV4Udp;
+break;
+case E1000_MRQ_RSS_TYPE_IPV6UDP:
+type = NetPktRssIpV6Udp;
+break;
 default:
 assert(false);
 return 0;
-- 
2.40.0

[PATCH v3 46/47] MAINTAINERS: Add a reviewer for network packet abstractions

2023-04-22 Thread Akihiko Odaki

I have made significant changes for network packet abstractions so add
me as a reviewer.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Philippe Mathieu-Daudé 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index c31d2279ab..8b2ef5943c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2214,6 +2214,7 @@ F: tests/qtest/fuzz-megasas-test.c
 
 Network packet abstractions
 M: Dmitry Fleytman 
+R: Akihiko Odaki 
 S: Maintained
 F: include/net/eth.h
 F: net/eth.c
-- 
2.40.0

[PATCH v3 31/47] net/eth: Always add VLAN tag

2023-04-22 Thread Akihiko Odaki

It is possible to have another VLAN tag even if the packet is already
tagged.

Signed-off-by: Akihiko Odaki 
---
 include/net/eth.h   |  4 ++--
 hw/net/net_tx_pkt.c | 16 +++-
 net/eth.c   | 22 ++
 3 files changed, 15 insertions(+), 27 deletions(-)

diff --git a/include/net/eth.h b/include/net/eth.h
index 95ff24d6b8..048e434685 100644
--- a/include/net/eth.h
+++ b/include/net/eth.h
@@ -353,8 +353,8 @@ eth_strip_vlan_ex(const struct iovec *iov, int iovcnt, 
size_t iovoff,
 uint16_t
 eth_get_l3_proto(const struct iovec *l2hdr_iov, int iovcnt, size_t l2hdr_len);
 
-void eth_setup_vlan_headers(struct eth_header *ehdr, uint16_t vlan_tag,
-uint16_t vlan_ethtype, bool *is_new);
+void eth_setup_vlan_headers(struct eth_header *ehdr, size_t *ehdr_size,
+uint16_t vlan_tag, uint16_t vlan_ethtype);
 
 
 uint8_t eth_get_gso_type(uint16_t l3_proto, uint8_t *l3_hdr, uint8_t l4proto);
diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
index ce6b102391..af8f77a3f0 100644
--- a/hw/net/net_tx_pkt.c
+++ b/hw/net/net_tx_pkt.c
@@ -40,7 +40,10 @@ struct NetTxPkt {
 
 struct iovec *vec;
 
-uint8_t l2_hdr[ETH_MAX_L2_HDR_LEN];
+struct {
+struct eth_header eth;
+struct vlan_header vlan[3];
+} l2_hdr;
 union {
 struct ip_header ip;
 struct ip6_header ip6;
@@ -365,18 +368,13 @@ bool net_tx_pkt_build_vheader(struct NetTxPkt *pkt, bool 
tso_enable,
 void net_tx_pkt_setup_vlan_header_ex(struct NetTxPkt *pkt,
 uint16_t vlan, uint16_t vlan_ethtype)
 {
-bool is_new;
 assert(pkt);
 
 eth_setup_vlan_headers(pkt->vec[NET_TX_PKT_L2HDR_FRAG].iov_base,
-vlan, vlan_ethtype, _new);
+   >vec[NET_TX_PKT_L2HDR_FRAG].iov_len,
+   vlan, vlan_ethtype);
 
-/* update l2hdrlen */
-if (is_new) {
-pkt->hdr_len += sizeof(struct vlan_header);
-pkt->vec[NET_TX_PKT_L2HDR_FRAG].iov_len +=
-sizeof(struct vlan_header);
-}
+pkt->hdr_len += sizeof(struct vlan_header);
 }
 
 bool net_tx_pkt_add_raw_fragment(struct NetTxPkt *pkt, void *base, size_t len)
diff --git a/net/eth.c b/net/eth.c
index f7ffbda600..5307978486 100644
--- a/net/eth.c
+++ b/net/eth.c
@@ -21,26 +21,16 @@
 #include "net/checksum.h"
 #include "net/tap.h"
 
-void eth_setup_vlan_headers(struct eth_header *ehdr, uint16_t vlan_tag,
-uint16_t vlan_ethtype, bool *is_new)
+void eth_setup_vlan_headers(struct eth_header *ehdr, size_t *ehdr_size,
+uint16_t vlan_tag, uint16_t vlan_ethtype)
 {
 struct vlan_header *vhdr = PKT_GET_VLAN_HDR(ehdr);
 
-switch (be16_to_cpu(ehdr->h_proto)) {
-case ETH_P_VLAN:
-case ETH_P_DVLAN:
-/* vlan hdr exists */
-*is_new = false;
-break;
-
-default:
-/* No VLAN header, put a new one */
-vhdr->h_proto = ehdr->h_proto;
-ehdr->h_proto = cpu_to_be16(vlan_ethtype);
-*is_new = true;
-break;
-}
+memmove(vhdr + 1, vhdr, *ehdr_size - ETH_HLEN);
 vhdr->h_tci = cpu_to_be16(vlan_tag);
+vhdr->h_proto = ehdr->h_proto;
+ehdr->h_proto = cpu_to_be16(vlan_ethtype);
+*ehdr_size += sizeof(*vhdr);
 }
 
 uint8_t
-- 
2.40.0

[PATCH v3 12/47] tests/avocado: Remove test_igb_nomsi_kvm

2023-04-22 Thread Akihiko Odaki

It is unlikely to find more bugs with KVM so remove test_igb_nomsi_kvm
to save time to run it.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Thomas Huth 
Acked-by: Alex Bennée 
---
 tests/avocado/netdev-ethtool.py | 12 +---
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/tests/avocado/netdev-ethtool.py b/tests/avocado/netdev-ethtool.py
index 8de118e313..6da800f62b 100644
--- a/tests/avocado/netdev-ethtool.py
+++ b/tests/avocado/netdev-ethtool.py
@@ -29,7 +29,7 @@ def get_asset(self, name, sha1):
 # URL into a unique one
 return self.fetch_asset(name=name, locations=(url), asset_hash=sha1)
 
-def common_test_code(self, netdev, extra_args=None, kvm=False):
+def common_test_code(self, netdev, extra_args=None):
 
 # This custom kernel has drivers for all the supported network
 # devices we can emulate in QEMU
@@ -57,9 +57,6 @@ def common_test_code(self, netdev, extra_args=None, 
kvm=False):
  '-drive', drive,
  '-device', netdev)
 
-if kvm:
-self.vm.add_args('-accel', 'kvm')
-
 self.vm.set_console(console_index=0)
 self.vm.launch()
 
@@ -86,13 +83,6 @@ def test_igb_nomsi(self):
 """
 self.common_test_code("igb", "pci=nomsi")
 
-def test_igb_nomsi_kvm(self):
-"""
-:avocado: tags=device:igb
-"""
-self.require_accelerator('kvm')
-self.common_test_code("igb", "pci=nomsi", True)
-
 # It seems the other popular cards we model in QEMU currently fail
 # the pattern test with:
 #
-- 
2.40.0

[PATCH v3 06/47] igb: Clear IMS bits when committing ICR access

2023-04-22 Thread Akihiko Odaki

The datasheet says contradicting statements regarding ICR accesses so it
is not reliable to determine the behavior of ICR accesses. However,
e1000e does clear IMS bits when reading ICR accesses and Linux also
expects ICR accesses will clear IMS bits according to:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/intel/igb/igb_main.c?h=v6.2#n8048

Fixes: 3a977deebe ("Intrdocue igb device emulation")
Signed-off-by: Akihiko Odaki 
---
 hw/net/igb_core.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 96a118b6c1..eaca5bd2b6 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -2452,16 +2452,16 @@ igb_set_ims(IGBCore *core, int index, uint32_t val)
 static void igb_commit_icr(IGBCore *core)
 {
 /*
- * If GPIE.NSICR = 0, then the copy of IAM to IMS will occur only if at
+ * If GPIE.NSICR = 0, then the clear of IMS will occur only if at
  * least one bit is set in the IMS and there is a true interrupt as
  * reflected in ICR.INTA.
  */
 if ((core->mac[GPIE] & E1000_GPIE_NSICR) ||
 (core->mac[IMS] && (core->mac[ICR] & E1000_ICR_INT_ASSERTED))) {
-igb_set_ims(core, IMS, core->mac[IAM]);
-} else {
-igb_update_interrupt_state(core);
+igb_clear_ims_bits(core, core->mac[IAM]);
 }
+
+igb_update_interrupt_state(core);
 }
 
 static void igb_set_icr(IGBCore *core, int index, uint32_t val)
-- 
2.40.0

[PATCH v3 47/47] docs/system/devices/igb: Note igb is tested for DPDK

2023-04-22 Thread Akihiko Odaki

Signed-off-by: Akihiko Odaki 
---
 docs/system/devices/igb.rst | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/docs/system/devices/igb.rst b/docs/system/devices/igb.rst
index afe036dad2..60c10bf7c7 100644
--- a/docs/system/devices/igb.rst
+++ b/docs/system/devices/igb.rst
@@ -14,7 +14,8 @@ Limitations
 ===
 
 This igb implementation was tested with Linux Test Project [2]_ and Windows HLK
-[3]_ during the initial development. The command used when testing with LTP is:
+[3]_ during the initial development. Later it was also tested with DPDK Test
+Suite [4]_. The command used when testing with LTP is:
 
 .. code-block:: shell
 
@@ -22,8 +23,8 @@ This igb implementation was tested with Linux Test Project 
[2]_ and Windows HLK
 
 Be aware that this implementation lacks many functionalities available with the
 actual hardware, and you may experience various failures if you try to use it
-with a different operating system other than Linux and Windows or if you try
-functionalities not covered by the tests.
+with a different operating system other than DPDK, Linux, and Windows or if you
+try functionalities not covered by the tests.
 
 Using igb
 =
@@ -32,7 +33,7 @@ Using igb should be nothing different from using another 
network device. See
 :ref:`pcsys_005fnetwork` in general.
 
 However, you may also need to perform additional steps to activate SR-IOV
-feature on your guest. For Linux, refer to [4]_.
+feature on your guest. For Linux, refer to [5]_.
 
 Developing igb
 ==
@@ -68,4 +69,5 @@ References
 .. [1] 
https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/82576eb-gigabit-ethernet-controller-datasheet.pdf
 .. [2] https://github.com/linux-test-project/ltp
 .. [3] https://learn.microsoft.com/en-us/windows-hardware/test/hlk/
-.. [4] https://docs.kernel.org/PCI/pci-iov-howto.html
+.. [4] https://doc.dpdk.org/dts/gsg/
+.. [5] https://docs.kernel.org/PCI/pci-iov-howto.html
-- 
2.40.0

[PATCH v3 07/47] net/net_rx_pkt: Use iovec for net_rx_pkt_set_protocols()

2023-04-22 Thread Akihiko Odaki

igb does not properly ensure the buffer passed to
net_rx_pkt_set_protocols() is contiguous for the entire L2/L3/L4 header.
Allow it to pass scattered data to net_rx_pkt_set_protocols().

Fixes: 3a977deebe ("Intrdocue igb device emulation")
Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/net_rx_pkt.h | 10 ++
 include/net/eth.h   |  6 +++---
 hw/net/igb_core.c   |  2 +-
 hw/net/net_rx_pkt.c | 14 +-
 hw/net/virtio-net.c |  7 +--
 hw/net/vmxnet3.c|  7 ++-
 net/eth.c   | 18 --
 7 files changed, 34 insertions(+), 30 deletions(-)

diff --git a/hw/net/net_rx_pkt.h b/hw/net/net_rx_pkt.h
index d00b484900..a06f5c2675 100644
--- a/hw/net/net_rx_pkt.h
+++ b/hw/net/net_rx_pkt.h
@@ -55,12 +55,14 @@ size_t net_rx_pkt_get_total_len(struct NetRxPkt *pkt);
  * parse and set packet analysis results
  *
  * @pkt:packet
- * @data:   pointer to the data buffer to be parsed
- * @len:data length
+ * @iov:received data scatter-gather list
+ * @iovcnt: number of elements in iov
+ * @iovoff: data start offset in the iov
  *
  */
-void net_rx_pkt_set_protocols(struct NetRxPkt *pkt, const void *data,
-  size_t len);
+void net_rx_pkt_set_protocols(struct NetRxPkt *pkt,
+  const struct iovec *iov, size_t iovcnt,
+  size_t iovoff);
 
 /**
  * fetches packet analysis results
diff --git a/include/net/eth.h b/include/net/eth.h
index c5ae4493b4..9f19c3a695 100644
--- a/include/net/eth.h
+++ b/include/net/eth.h
@@ -312,10 +312,10 @@ eth_get_l2_hdr_length(const void *p)
 }
 
 static inline uint32_t
-eth_get_l2_hdr_length_iov(const struct iovec *iov, int iovcnt)
+eth_get_l2_hdr_length_iov(const struct iovec *iov, size_t iovcnt, size_t 
iovoff)
 {
 uint8_t p[sizeof(struct eth_header) + sizeof(struct vlan_header)];
-size_t copied = iov_to_buf(iov, iovcnt, 0, p, ARRAY_SIZE(p));
+size_t copied = iov_to_buf(iov, iovcnt, iovoff, p, ARRAY_SIZE(p));
 
 if (copied < ARRAY_SIZE(p)) {
 return copied;
@@ -397,7 +397,7 @@ typedef struct eth_l4_hdr_info_st {
 bool has_tcp_data;
 } eth_l4_hdr_info;
 
-void eth_get_protocols(const struct iovec *iov, int iovcnt,
+void eth_get_protocols(const struct iovec *iov, size_t iovcnt, size_t iovoff,
bool *hasip4, bool *hasip6,
size_t *l3hdr_off,
size_t *l4hdr_off,
diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index eaca5bd2b6..21a8d9ada4 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -1650,7 +1650,7 @@ igb_receive_internal(IGBCore *core, const struct iovec 
*iov, int iovcnt,
 
 ehdr = PKT_GET_ETH_HDR(filter_buf);
 net_rx_pkt_set_packet_type(core->rx_pkt, get_eth_packet_type(ehdr));
-net_rx_pkt_set_protocols(core->rx_pkt, filter_buf, size);
+net_rx_pkt_set_protocols(core->rx_pkt, iov, iovcnt, iov_ofs);
 
 queues = igb_receive_assign(core, ehdr, size, _info, external_tx);
 if (!queues) {
diff --git a/hw/net/net_rx_pkt.c b/hw/net/net_rx_pkt.c
index 39cdea06de..63be6e05ad 100644
--- a/hw/net/net_rx_pkt.c
+++ b/hw/net/net_rx_pkt.c
@@ -103,7 +103,7 @@ net_rx_pkt_pull_data(struct NetRxPkt *pkt,
 iov, iovcnt, ploff, pkt->tot_len);
 }
 
-eth_get_protocols(pkt->vec, pkt->vec_len, >hasip4, >hasip6,
+eth_get_protocols(pkt->vec, pkt->vec_len, 0, >hasip4, >hasip6,
   >l3hdr_off, >l4hdr_off, >l5hdr_off,
   >ip6hdr_info, >ip4hdr_info, >l4hdr_info);
 
@@ -186,17 +186,13 @@ size_t net_rx_pkt_get_total_len(struct NetRxPkt *pkt)
 return pkt->tot_len;
 }
 
-void net_rx_pkt_set_protocols(struct NetRxPkt *pkt, const void *data,
-  size_t len)
+void net_rx_pkt_set_protocols(struct NetRxPkt *pkt,
+  const struct iovec *iov, size_t iovcnt,
+  size_t iovoff)
 {
-const struct iovec iov = {
-.iov_base = (void *)data,
-.iov_len = len
-};
-
 assert(pkt);
 
-eth_get_protocols(, 1, >hasip4, >hasip6,
+eth_get_protocols(iov, iovcnt, iovoff, >hasip4, >hasip6,
   >l3hdr_off, >l4hdr_off, >l5hdr_off,
   >ip6hdr_info, >ip4hdr_info, >l4hdr_info);
 }
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 53e1c32643..37551fd854 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1835,9 +1835,12 @@ static int virtio_net_process_rss(NetClientState *nc, 
const uint8_t *buf,
 VIRTIO_NET_HASH_REPORT_UDPv6,
 VIRTIO_NET_HASH_REPORT_UDPv6_EX
 };
+struct iovec iov = {
+.iov_base = (void *)buf,
+.iov_len = size
+};
 
-net_rx_pkt_set_protocols(pkt, buf + n->host_hdr_len,
- size - n->host_hdr_len);
+net_rx_pkt_set_protocols(pkt, , 1, n->host_hdr_len);

[PATCH v3 18/47] e1000e: Always log status after building rx metadata

2023-04-22 Thread Akihiko Odaki

Without this change, the status flags may not be traced e.g. if checksum
offloading is disabled.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/net/e1000e_core.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 481db41931..d4a9984fe4 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -1244,9 +1244,8 @@ e1000e_build_rx_metadata(E1000ECore *core,
 trace_e1000e_rx_metadata_l4_cso_disabled();
 }
 
-trace_e1000e_rx_metadata_status_flags(*status_flags);
-
 func_exit:
+trace_e1000e_rx_metadata_status_flags(*status_flags);
 *status_flags = cpu_to_le32(*status_flags);
 }
 
-- 
2.40.0

[PATCH v3 40/47] igb: Implement igb-specific oversize check

2023-04-22 Thread Akihiko Odaki

igb has a configurable size limit for LPE, and uses different limits
depending on whether the packet is treated as a VLAN packet.

Signed-off-by: Akihiko Odaki 
---
 hw/net/igb_core.c | 36 +---
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 667ff47701..16c563cf36 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -980,16 +980,13 @@ igb_rx_l4_cso_enabled(IGBCore *core)
 return !!(core->mac[RXCSUM] & E1000_RXCSUM_TUOFLD);
 }
 
-static bool
-igb_rx_is_oversized(IGBCore *core, uint16_t qn, size_t size)
+static bool igb_rx_is_oversized(IGBCore *core, const struct eth_header *ehdr,
+size_t size, size_t vlan_num,
+bool lpe, uint16_t rlpml)
 {
-uint16_t pool = qn % IGB_NUM_VM_POOLS;
-bool lpe = !!(core->mac[VMOLR0 + pool] & E1000_VMOLR_LPE);
-int max_ethernet_lpe_size =
-core->mac[VMOLR0 + pool] & E1000_VMOLR_RLPML_MASK;
-int max_ethernet_vlan_size = 1522;
-
-return size > (lpe ? max_ethernet_lpe_size : max_ethernet_vlan_size);
+size_t vlan_header_size = sizeof(struct vlan_header) * vlan_num;
+size_t header_size = sizeof(struct eth_header) + vlan_header_size;
+return lpe ? size + ETH_FCS_LEN > rlpml : size > header_size + ETH_MTU;
 }
 
 static uint16_t igb_receive_assign(IGBCore *core, const L2Header *l2_header,
@@ -1002,6 +999,8 @@ static uint16_t igb_receive_assign(IGBCore *core, const 
L2Header *l2_header,
 uint16_t queues = 0;
 uint16_t oversized = 0;
 size_t vlan_num = 0;
+bool lpe;
+uint16_t rlpml;
 int i;
 
 memset(rss_info, 0, sizeof(E1000E_RSSInfo));
@@ -1021,6 +1020,14 @@ static uint16_t igb_receive_assign(IGBCore *core, const 
L2Header *l2_header,
 }
 }
 
+lpe = !!(core->mac[RCTL] & E1000_RCTL_LPE);
+rlpml = core->mac[RLPML];
+if (!(core->mac[RCTL] & E1000_RCTL_SBP) &&
+igb_rx_is_oversized(core, ehdr, size, vlan_num, lpe, rlpml)) {
+trace_e1000x_rx_oversized(size);
+return queues;
+}
+
 if (vlan_num &&
 !e1000x_rx_vlan_filter(core->mac, l2_header->vlan + vlan_num - 1)) {
 return queues;
@@ -1106,7 +1113,11 @@ static uint16_t igb_receive_assign(IGBCore *core, const 
L2Header *l2_header,
 queues &= core->mac[VFRE];
 if (queues) {
 for (i = 0; i < IGB_NUM_VM_POOLS; i++) {
-if ((queues & BIT(i)) && igb_rx_is_oversized(core, i, size)) {
+lpe = !!(core->mac[VMOLR0 + i] & E1000_VMOLR_LPE);
+rlpml = core->mac[VMOLR0 + i] & E1000_VMOLR_RLPML_MASK;
+if ((queues & BIT(i)) &&
+igb_rx_is_oversized(core, ehdr, size, vlan_num,
+lpe, rlpml)) {
 oversized |= BIT(i);
 }
 }
@@ -1662,11 +1673,6 @@ igb_receive_internal(IGBCore *core, const struct iovec 
*iov, int iovcnt,
 iov_to_buf(iov, iovcnt, iov_ofs, _buf, sizeof(min_buf.l2_header));
 }
 
-/* Discard oversized packets if !LPE and !SBP. */
-if (e1000x_is_oversized(core->mac, size)) {
-return orig_size;
-}
-
 net_rx_pkt_set_packet_type(core->rx_pkt,
get_eth_packet_type(_buf.l2_header.eth));
 net_rx_pkt_set_protocols(core->rx_pkt, iov, iovcnt, iov_ofs);
-- 
2.40.0

[PATCH v3 02/47] hw/net/net_tx_pkt: Decouple interface from PCI

2023-04-22 Thread Akihiko Odaki

This allows to use the network packet abstractions even if PCI is not
used.

Signed-off-by: Akihiko Odaki 
---
 hw/net/net_tx_pkt.h  | 31 ---
 hw/net/e1000e_core.c | 13 -
 hw/net/igb_core.c| 13 ++---
 hw/net/net_tx_pkt.c  | 36 +---
 hw/net/vmxnet3.c | 14 +++---
 5 files changed, 54 insertions(+), 53 deletions(-)

diff --git a/hw/net/net_tx_pkt.h b/hw/net/net_tx_pkt.h
index 5eb123ef90..4d7233e975 100644
--- a/hw/net/net_tx_pkt.h
+++ b/hw/net/net_tx_pkt.h
@@ -26,17 +26,16 @@
 
 struct NetTxPkt;
 
-typedef void (* NetTxPktCallback)(void *, const struct iovec *, int, const 
struct iovec *, int);
+typedef void (*NetTxPktFreeFrag)(void *, void *, size_t);
+typedef void (*NetTxPktSend)(void *, const struct iovec *, int, const struct 
iovec *, int);
 
 /**
  * Init function for tx packet functionality
  *
  * @pkt:packet pointer
- * @pci_dev:PCI device processing this packet
  * @max_frags:  max tx ip fragments
  */
-void net_tx_pkt_init(struct NetTxPkt **pkt, PCIDevice *pci_dev,
-uint32_t max_frags);
+void net_tx_pkt_init(struct NetTxPkt **pkt, uint32_t max_frags);
 
 /**
  * Clean all tx packet resources.
@@ -95,12 +94,11 @@ net_tx_pkt_setup_vlan_header(struct NetTxPkt *pkt, uint16_t 
vlan)
  * populate data fragment into pkt context.
  *
  * @pkt:packet
- * @pa: physical address of fragment
+ * @base:   pointer to fragment
  * @len:length of fragment
  *
  */
-bool net_tx_pkt_add_raw_fragment(struct NetTxPkt *pkt, hwaddr pa,
-size_t len);
+bool net_tx_pkt_add_raw_fragment(struct NetTxPkt *pkt, void *base, size_t len);
 
 /**
  * Fix ip header fields and calculate IP header and pseudo header checksums.
@@ -148,10 +146,11 @@ void net_tx_pkt_dump(struct NetTxPkt *pkt);
  * reset tx packet private context (needed to be called between packets)
  *
  * @pkt:packet
- * @dev:PCI device processing the next packet
- *
+ * @callback:   function to free the fragments
+ * @context:pointer to be passed to the callback
  */
-void net_tx_pkt_reset(struct NetTxPkt *pkt, PCIDevice *dev);
+void net_tx_pkt_reset(struct NetTxPkt *pkt,
+  NetTxPktFreeFrag callback, void *context);
 
 /**
  * Unmap a fragment mapped from a PCI device.
@@ -162,6 +161,16 @@ void net_tx_pkt_reset(struct NetTxPkt *pkt, PCIDevice 
*dev);
  */
 void net_tx_pkt_unmap_frag_pci(void *context, void *base, size_t len);
 
+/**
+ * map data fragment from PCI device and populate it into pkt context.
+ *
+ * @pci_dev:PCI device owning fragment
+ * @pa: physical address of fragment
+ * @len:length of fragment
+ */
+bool net_tx_pkt_add_raw_fragment_pci(struct NetTxPkt *pkt, PCIDevice *pci_dev,
+ dma_addr_t pa, size_t len);
+
 /**
  * Send packet to qemu. handles sw offloads if vhdr is not supported.
  *
@@ -182,7 +191,7 @@ bool net_tx_pkt_send(struct NetTxPkt *pkt, NetClientState 
*nc);
  * @ret:operation result
  */
 bool net_tx_pkt_send_custom(struct NetTxPkt *pkt, bool offload,
-NetTxPktCallback callback, void *context);
+NetTxPktSend callback, void *context);
 
 /**
  * parse raw packet data and analyze offload requirements.
diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index cfa3f55e96..15821a75e0 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -746,7 +746,8 @@ e1000e_process_tx_desc(E1000ECore *core,
 addr = le64_to_cpu(dp->buffer_addr);
 
 if (!tx->skip_cp) {
-if (!net_tx_pkt_add_raw_fragment(tx->tx_pkt, addr, split_size)) {
+if (!net_tx_pkt_add_raw_fragment_pci(tx->tx_pkt, core->owner,
+ addr, split_size)) {
 tx->skip_cp = true;
 }
 }
@@ -764,7 +765,7 @@ e1000e_process_tx_desc(E1000ECore *core,
 }
 
 tx->skip_cp = false;
-net_tx_pkt_reset(tx->tx_pkt, core->owner);
+net_tx_pkt_reset(tx->tx_pkt, net_tx_pkt_unmap_frag_pci, core->owner);
 
 tx->sum_needed = 0;
 tx->cptse = 0;
@@ -3421,7 +3422,7 @@ e1000e_core_pci_realize(E1000ECore *core,
 qemu_add_vm_change_state_handler(e1000e_vm_state_change, core);
 
 for (i = 0; i < E1000E_NUM_QUEUES; i++) {
-net_tx_pkt_init(>tx[i].tx_pkt, core->owner, E1000E_MAX_TX_FRAGS);
+net_tx_pkt_init(>tx[i].tx_pkt, E1000E_MAX_TX_FRAGS);
 }
 
 net_rx_pkt_init(>rx_pkt);
@@ -3446,7 +3447,8 @@ e1000e_core_pci_uninit(E1000ECore *core)
 qemu_del_vm_change_state_handler(core->vmstate);
 
 for (i = 0; i < E1000E_NUM_QUEUES; i++) {
-net_tx_pkt_reset(core->tx[i].tx_pkt, core->owner);
+net_tx_pkt_reset(core->tx[i].tx_pkt,
+ net_tx_pkt_unmap_frag_pci, core->owner);
 net_tx_pkt_uninit(core->tx[i].tx_pkt);
 }
 
@@ -3571,7

[PATCH v3 37/47] igb: Implement Tx SCTP CSO

2023-04-22 Thread Akihiko Odaki

Signed-off-by: Akihiko Odaki 
---
 hw/net/net_tx_pkt.h |  8 
 hw/net/igb_core.c   | 12 +++-
 hw/net/net_tx_pkt.c | 18 ++
 3 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/hw/net/net_tx_pkt.h b/hw/net/net_tx_pkt.h
index 4d7233e975..0a716e74a5 100644
--- a/hw/net/net_tx_pkt.h
+++ b/hw/net/net_tx_pkt.h
@@ -116,6 +116,14 @@ void net_tx_pkt_update_ip_checksums(struct NetTxPkt *pkt);
  */
 void net_tx_pkt_update_ip_hdr_checksum(struct NetTxPkt *pkt);
 
+/**
+ * Calculate the SCTP checksum.
+ *
+ * @pkt:packet
+ *
+ */
+bool net_tx_pkt_update_sctp_checksum(struct NetTxPkt *pkt);
+
 /**
  * get length of all populated data.
  *
diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 01d2788cf6..24a90cd35f 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -440,8 +440,9 @@ igb_tx_insert_vlan(IGBCore *core, uint16_t qn, struct 
igb_tx *tx,
 static bool
 igb_setup_tx_offloads(IGBCore *core, struct igb_tx *tx)
 {
+uint32_t idx = (tx->first_olinfo_status >> 4) & 1;
+
 if (tx->first_cmd_type_len & E1000_ADVTXD_DCMD_TSE) {
-uint32_t idx = (tx->first_olinfo_status >> 4) & 1;
 uint32_t mss = tx->ctx[idx].mss_l4len_idx >> E1000_ADVTXD_MSS_SHIFT;
 if (!net_tx_pkt_build_vheader(tx->tx_pkt, true, true, mss)) {
 return false;
@@ -452,10 +453,11 @@ igb_setup_tx_offloads(IGBCore *core, struct igb_tx *tx)
 return true;
 }
 
-if (tx->first_olinfo_status & E1000_ADVTXD_POTS_TXSM) {
-if (!net_tx_pkt_build_vheader(tx->tx_pkt, false, true, 0)) {
-return false;
-}
+if ((tx->first_olinfo_status & E1000_ADVTXD_POTS_TXSM) &&
+!((tx->ctx[idx].type_tucmd_mlhl & E1000_ADVTXD_TUCMD_L4T_SCTP) ?
+  net_tx_pkt_update_sctp_checksum(tx->tx_pkt) :
+  net_tx_pkt_build_vheader(tx->tx_pkt, false, true, 0))) {
+return false;
 }
 
 if (tx->first_olinfo_status & E1000_ADVTXD_POTS_IXSM) {
diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
index af8f77a3f0..2e5f58b3c9 100644
--- a/hw/net/net_tx_pkt.c
+++ b/hw/net/net_tx_pkt.c
@@ -16,6 +16,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/crc32c.h"
 #include "net/eth.h"
 #include "net/checksum.h"
 #include "net/tap.h"
@@ -135,6 +136,23 @@ void net_tx_pkt_update_ip_checksums(struct NetTxPkt *pkt)
  pkt->virt_hdr.csum_offset, , sizeof(csum));
 }
 
+bool net_tx_pkt_update_sctp_checksum(struct NetTxPkt *pkt)
+{
+uint32_t csum = 0;
+struct iovec *pl_start_frag = pkt->vec + NET_TX_PKT_PL_START_FRAG;
+
+if (iov_from_buf(pl_start_frag, pkt->payload_frags, 8, , 
sizeof(csum)) < sizeof(csum)) {
+return false;
+}
+
+csum = cpu_to_le32(iov_crc32c(0x, pl_start_frag, 
pkt->payload_frags));
+if (iov_from_buf(pl_start_frag, pkt->payload_frags, 8, , 
sizeof(csum)) < sizeof(csum)) {
+return false;
+}
+
+return true;
+}
+
 static void net_tx_pkt_calculate_hdr_len(struct NetTxPkt *pkt)
 {
 pkt->hdr_len = pkt->vec[NET_TX_PKT_L2HDR_FRAG].iov_len +
-- 
2.40.0

[PATCH v3 26/47] igb: Fix igb_mac_reg_init coding style alignment

2023-04-22 Thread Akihiko Odaki

Signed-off-by: Akihiko Odaki 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/net/igb_core.c | 96 +++
 1 file changed, 48 insertions(+), 48 deletions(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 5fb2a38a6f..3c1ef11afd 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -4027,54 +4027,54 @@ static const uint32_t igb_mac_reg_init[] = {
 [VMOLR0 ... VMOLR0 + 7] = 0x2600 | E1000_VMOLR_STRCRC,
 [RPLOLR]= E1000_RPLOLR_STRCRC,
 [RLPML] = 0x2600,
-[TXCTL0]   = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
-[TXCTL1]   = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
-[TXCTL2]   = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
-[TXCTL3]   = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
-[TXCTL4]   = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
-[TXCTL5]   = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
-[TXCTL6]   = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
-[TXCTL7]   = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
-[TXCTL8]   = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
-[TXCTL9]   = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
-[TXCTL10]  = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
-[TXCTL11]  = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
-[TXCTL12]  = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
-[TXCTL13]  = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
-[TXCTL14]  = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
-[TXCTL15]  = E1000_DCA_TXCTRL_DATA_RRO_EN |
- E1000_DCA_TXCTRL_TX_WB_RO_EN |
- E1000_DCA_TXCTRL_DESC_RRO_EN,
+[TXCTL0]= E1000_DCA_TXCTRL_DATA_RRO_EN |
+  E1000_DCA_TXCTRL_TX_WB_RO_EN |
+  E1000_DCA_TXCTRL_DESC_RRO_EN,
+[TXCTL1]= E1000_DCA_TXCTRL_DATA_RRO_EN |
+  E1000_DCA_TXCTRL_TX_WB_RO_EN |
+  E1000_DCA_TXCTRL_DESC_RRO_EN,
+[TXCTL2]= E1000_DCA_TXCTRL_DATA_RRO_EN |
+  E1000_DCA_TXCTRL_TX_WB_RO_EN |
+  E1000_DCA_TXCTRL_DESC_RRO_EN,
+[TXCTL3]= E1000_DCA_TXCTRL_DATA_RRO_EN |
+  E1000_DCA_TXCTRL_TX_WB_RO_EN |
+  E1000_DCA_TXCTRL_DESC_RRO_EN,
+[TXCTL4]= E1000_DCA_TXCTRL_DATA_RRO_EN |
+  E1000_DCA_TXCTRL_TX_WB_RO_EN |
+  E1000_DCA_TXCTRL_DESC_RRO_EN,
+[TXCTL5]= E1000_DCA_TXCTRL_DATA_RRO_EN |
+  E1000_DCA_TXCTRL_TX_WB_RO_EN |
+  E1000_DCA_TXCTRL_DESC_RRO_EN,
+[TXCTL6]= E1000_DCA_TXCTRL_DATA_RRO_EN |
+  E1000_DCA_TXCTRL_TX_WB_RO_EN |
+  E1000_DCA_TXCTRL_DESC_RRO_EN,
+[TXCTL7]= E1000_DCA_TXCTRL_DATA_RRO_EN |
+  E1000_DCA_TXCTRL_TX_WB_RO_EN |
+  E1000_DCA_TXCTRL_DESC_RRO_EN,
+[TXCTL8]= E1000_DCA_TXCTRL_DATA_RRO_EN |
+  E1000_DCA_TXCTRL_TX_WB_RO_EN |
+  E1000_DCA_TXCTRL_DESC_RRO_EN,
+[TXCTL9]= E1000_DCA_TXCTRL_DATA_RRO_EN |
+  E1000_DCA_TXCTRL_TX_WB_RO_EN |
+  E1000_DCA_TXCTRL_DESC_RRO_EN,
+[TXCTL10]   = E1000_DCA_TXCTRL_DATA_RRO_EN |
+  E1000_DCA_TXCTRL_TX_WB_RO_EN |
+  E1000_DCA_TXCTRL_DESC_RRO_EN,
+[TXCTL11]   = E1000_DCA_TXCTRL_DATA_RRO_EN |
+  E1000_DCA_TXCTRL_TX_WB_RO_EN |
+  E1000_DCA_TXCTRL_DESC_RRO_EN,
+

[PATCH v3 42/47] igb: Implement Tx timestamp

2023-04-22 Thread Akihiko Odaki

Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/igb_regs.h | 3 +++
 hw/net/igb_core.c | 7 +++
 2 files changed, 10 insertions(+)

diff --git a/hw/net/igb_regs.h b/hw/net/igb_regs.h
index 894705599d..82ff195dfc 100644
--- a/hw/net/igb_regs.h
+++ b/hw/net/igb_regs.h
@@ -322,6 +322,9 @@ union e1000_adv_rx_desc {
 /* E1000_EITR_CNT_IGNR is only for 82576 and newer */
 #define E1000_EITR_CNT_IGNR 0x8000 /* Don't reset counters on write */
 
+#define E1000_TSYNCTXCTL_VALID0x0001 /* tx timestamp valid */
+#define E1000_TSYNCTXCTL_ENABLED  0x0010 /* enable tx timestampping */
+
 /* PCI Express Control */
 #define E1000_GCR_CMPL_TMOUT_MASK   0xF000
 #define E1000_GCR_CMPL_TMOUT_10ms   0x1000
diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 627d75d370..1519a90aa6 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -659,6 +659,13 @@ igb_process_tx_desc(IGBCore *core,
 tx->ctx[idx].vlan_macip_lens >> IGB_TX_FLAGS_VLAN_SHIFT,
 !!(tx->first_cmd_type_len & E1000_TXD_CMD_VLE));
 
+if ((tx->first_cmd_type_len & E1000_ADVTXD_MAC_TSTAMP) &&
+(core->mac[TSYNCTXCTL] & E1000_TSYNCTXCTL_ENABLED) &&
+!(core->mac[TSYNCTXCTL] & E1000_TSYNCTXCTL_VALID)) {
+core->mac[TSYNCTXCTL] |= E1000_TSYNCTXCTL_VALID;
+e1000x_timestamp(core->mac, core->timadj, TXSTMPL, TXSTMPH);
+}
+
 if (igb_tx_pkt_send(core, tx, queue_index)) {
 igb_on_tx_done_update_stats(core, tx->tx_pkt, queue_index);
 }
-- 
2.40.0

[PATCH v3 03/47] e1000x: Fix BPRC and MPRC

2023-04-22 Thread Akihiko Odaki

Before this change, e1000 and the common code updated BPRC and MPRC
depending on the matched filter, but e1000e and igb decided to update
those counters by deriving the packet type independently. This
inconsistency caused a multicast packet to be counted twice.

Updating BPRC and MPRC depending on are fundamentally flawed anyway as
a filter can be used for different types of packets. For example, it is
possible to filter broadcast packets with MTA.

Always determine what counters to update by inspecting the packets.

Fixes: 3b27430177 ("e1000: Implementing various counters")
Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/e1000x_common.h |  5 +++--
 hw/net/e1000.c |  6 +++---
 hw/net/e1000e_core.c   | 20 +++-
 hw/net/e1000x_common.c | 25 +++--
 hw/net/igb_core.c  | 22 +-
 5 files changed, 33 insertions(+), 45 deletions(-)

diff --git a/hw/net/e1000x_common.h b/hw/net/e1000x_common.h
index 911abd8a90..0298e06283 100644
--- a/hw/net/e1000x_common.h
+++ b/hw/net/e1000x_common.h
@@ -91,8 +91,9 @@ e1000x_update_regs_on_link_up(uint32_t *mac, uint16_t *phy)
 }
 
 void e1000x_update_rx_total_stats(uint32_t *mac,
-  size_t data_size,
-  size_t data_fcs_size);
+  eth_pkt_types_e pkt_type,
+  size_t pkt_size,
+  size_t pkt_fcs_size);
 
 void e1000x_core_prepare_eeprom(uint16_t   *eeprom,
 const uint16_t *templ,
diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 59bacb5d3b..18eb6d8876 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -826,12 +826,10 @@ receive_filter(E1000State *s, const uint8_t *buf, int 
size)
 }
 
 if (ismcast && (rctl & E1000_RCTL_MPE)) {  /* promiscuous mcast */
-e1000x_inc_reg_if_not_full(s->mac_reg, MPRC);
 return 1;
 }
 
 if (isbcast && (rctl & E1000_RCTL_BAM)) {  /* broadcast enabled */
-e1000x_inc_reg_if_not_full(s->mac_reg, BPRC);
 return 1;
 }
 
@@ -922,6 +920,7 @@ e1000_receive_iov(NetClientState *nc, const struct iovec 
*iov, int iovcnt)
 size_t desc_offset;
 size_t desc_size;
 size_t total_size;
+eth_pkt_types_e pkt_type;
 
 if (!e1000x_hw_rx_enabled(s->mac_reg)) {
 return -1;
@@ -971,6 +970,7 @@ e1000_receive_iov(NetClientState *nc, const struct iovec 
*iov, int iovcnt)
 size -= 4;
 }
 
+pkt_type = get_eth_packet_type(PKT_GET_ETH_HDR(filter_buf));
 rdh_start = s->mac_reg[RDH];
 desc_offset = 0;
 total_size = size + e1000x_fcs_len(s->mac_reg);
@@ -1036,7 +1036,7 @@ e1000_receive_iov(NetClientState *nc, const struct iovec 
*iov, int iovcnt)
 }
 } while (desc_offset < total_size);
 
-e1000x_update_rx_total_stats(s->mac_reg, size, total_size);
+e1000x_update_rx_total_stats(s->mac_reg, pkt_type, size, total_size);
 
 n = E1000_ICS_RXT0;
 if ((rdt = s->mac_reg[RDT]) < s->mac_reg[RDH])
diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 15821a75e0..c2d864a504 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -1488,24 +1488,10 @@ e1000e_write_to_rx_buffers(E1000ECore *core,
 }
 
 static void
-e1000e_update_rx_stats(E1000ECore *core,
-   size_t data_size,
-   size_t data_fcs_size)
+e1000e_update_rx_stats(E1000ECore *core, size_t pkt_size, size_t pkt_fcs_size)
 {
-e1000x_update_rx_total_stats(core->mac, data_size, data_fcs_size);
-
-switch (net_rx_pkt_get_packet_type(core->rx_pkt)) {
-case ETH_PKT_BCAST:
-e1000x_inc_reg_if_not_full(core->mac, BPRC);
-break;
-
-case ETH_PKT_MCAST:
-e1000x_inc_reg_if_not_full(core->mac, MPRC);
-break;
-
-default:
-break;
-}
+eth_pkt_types_e pkt_type = net_rx_pkt_get_packet_type(core->rx_pkt);
+e1000x_update_rx_total_stats(core->mac, pkt_type, pkt_size, pkt_fcs_size);
 }
 
 static inline bool
diff --git a/hw/net/e1000x_common.c b/hw/net/e1000x_common.c
index 4c8e7dcf70..7694673bcc 100644
--- a/hw/net/e1000x_common.c
+++ b/hw/net/e1000x_common.c
@@ -80,7 +80,6 @@ bool e1000x_rx_group_filter(uint32_t *mac, const uint8_t *buf)
 f = mta_shift[(rctl >> E1000_RCTL_MO_SHIFT) & 3];
 f = (((buf[5] << 8) | buf[4]) >> f) & 0xfff;
 if (mac[MTA + (f >> 5)] & (1 << (f & 0x1f))) {
-e1000x_inc_reg_if_not_full(mac, MPRC);
 return true;
 }
 
@@ -212,13 +211,14 @@ e1000x_rxbufsize(uint32_t rctl)
 
 void
 e1000x_update_rx_total_stats(uint32_t *mac,
- size_t data_size,
- size_t data_fcs_size)
+ eth_pkt_types_e pkt_type,
+ size_t pkt_size,
+ size_t pkt_fcs_size)
 {
 static const int PRCregs[6] = { PRC64, PRC127, PRC255, PRC511,

[PATCH v3 29/47] igb: Rename a variable in igb_receive_internal()

2023-04-22 Thread Akihiko Odaki

Rename variable "n" to "causes", which properly represents the content
of the variable.

Signed-off-by: Akihiko Odaki 
---
 hw/net/igb_core.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index ef29e68096..77e4ee42a5 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -1569,7 +1569,7 @@ igb_receive_internal(IGBCore *core, const struct iovec 
*iov, int iovcnt,
  bool has_vnet, bool *external_tx)
 {
 uint16_t queues = 0;
-uint32_t n = 0;
+uint32_t causes = 0;
 union {
 L2Header l2_header;
 uint8_t octets[ETH_ZLEN];
@@ -1649,19 +1649,19 @@ igb_receive_internal(IGBCore *core, const struct iovec 
*iov, int iovcnt,
 e1000x_fcs_len(core->mac);
 
 if (!igb_has_rxbufs(core, rxr.i, total_size)) {
-n |= E1000_ICS_RXO;
+causes |= E1000_ICS_RXO;
 trace_e1000e_rx_not_written_to_guest(rxr.i->idx);
 continue;
 }
 
-n |= E1000_ICR_RXDW;
+causes |= E1000_ICR_RXDW;
 
 igb_rx_fix_l4_csum(core, core->rx_pkt);
 igb_write_packet_to_guest(core, core->rx_pkt, , _info);
 
 /* Check if receive descriptor minimum threshold hit */
 if (igb_rx_descr_threshold_hit(core, rxr.i)) {
-n |= E1000_ICS_RXDMT0;
+causes |= E1000_ICS_RXDMT0;
 }
 
 core->mac[EICR] |= igb_rx_wb_eic(core, rxr.i->idx);
@@ -1669,8 +1669,8 @@ igb_receive_internal(IGBCore *core, const struct iovec 
*iov, int iovcnt,
 trace_e1000e_rx_written_to_guest(rxr.i->idx);
 }
 
-trace_e1000e_rx_interrupt_set(n);
-igb_set_interrupt_cause(core, n);
+trace_e1000e_rx_interrupt_set(causes);
+igb_set_interrupt_cause(core, causes);
 
 return orig_size;
 }
-- 
2.40.0

[PATCH v3 22/47] e1000e: Reset packet state after emptying Tx queue

2023-04-22 Thread Akihiko Odaki

Keeping Tx packet state after the transmit queue is emptied has some
problems:
- The datasheet says the descriptors can be reused after the transmit
  queue is emptied, but the Tx packet state may keep references to them.
- The Tx packet state cannot be migrated so it can be reset anytime the
  migration happens.

Always reset Tx packet state always after the queue is emptied.

Signed-off-by: Akihiko Odaki 
---
 hw/net/e1000e_core.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index d4a9984fe4..27124bba07 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -959,6 +959,8 @@ e1000e_start_xmit(E1000ECore *core, const E1000E_TxRing 
*txr)
 if (!ide || !e1000e_intrmgr_delay_tx_causes(core, )) {
 e1000e_set_interrupt_cause(core, cause);
 }
+
+net_tx_pkt_reset(txr->tx->tx_pkt, net_tx_pkt_unmap_frag_pci, core->owner);
 }
 
 static bool
@@ -3389,8 +3391,6 @@ e1000e_core_pci_uninit(E1000ECore *core)
 qemu_del_vm_change_state_handler(core->vmstate);
 
 for (i = 0; i < E1000E_NUM_QUEUES; i++) {
-net_tx_pkt_reset(core->tx[i].tx_pkt,
- net_tx_pkt_unmap_frag_pci, core->owner);
 net_tx_pkt_uninit(core->tx[i].tx_pkt);
 }
 
@@ -3515,8 +3515,6 @@ static void e1000e_reset(E1000ECore *core, bool sw)
 e1000x_reset_mac_addr(core->owner_nic, core->mac, core->permanent_mac);
 
 for (i = 0; i < ARRAY_SIZE(core->tx); i++) {
-net_tx_pkt_reset(core->tx[i].tx_pkt,
- net_tx_pkt_unmap_frag_pci, core->owner);
 memset(>tx[i].props, 0, sizeof(core->tx[i].props));
 core->tx[i].skip_cp = false;
 }
-- 
2.40.0

[PATCH v3 30/47] net/eth: Use void pointers

2023-04-22 Thread Akihiko Odaki

The uses of uint8_t pointers were misleading as they are never accessed
as an array of octets and it even require more strict alignment to
access as struct eth_header.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Philippe Mathieu-Daudé 
---
 include/net/eth.h | 4 ++--
 net/eth.c | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/net/eth.h b/include/net/eth.h
index 05f56931e7..95ff24d6b8 100644
--- a/include/net/eth.h
+++ b/include/net/eth.h
@@ -342,12 +342,12 @@ eth_get_pkt_tci(const void *p)
 
 size_t
 eth_strip_vlan(const struct iovec *iov, int iovcnt, size_t iovoff,
-   uint8_t *new_ehdr_buf,
+   void *new_ehdr_buf,
uint16_t *payload_offset, uint16_t *tci);
 
 size_t
 eth_strip_vlan_ex(const struct iovec *iov, int iovcnt, size_t iovoff,
-  uint16_t vet, uint8_t *new_ehdr_buf,
+  uint16_t vet, void *new_ehdr_buf,
   uint16_t *payload_offset, uint16_t *tci);
 
 uint16_t
diff --git a/net/eth.c b/net/eth.c
index b6ff89c460..f7ffbda600 100644
--- a/net/eth.c
+++ b/net/eth.c
@@ -226,11 +226,11 @@ void eth_get_protocols(const struct iovec *iov, size_t 
iovcnt, size_t iovoff,
 
 size_t
 eth_strip_vlan(const struct iovec *iov, int iovcnt, size_t iovoff,
-   uint8_t *new_ehdr_buf,
+   void *new_ehdr_buf,
uint16_t *payload_offset, uint16_t *tci)
 {
 struct vlan_header vlan_hdr;
-struct eth_header *new_ehdr = (struct eth_header *) new_ehdr_buf;
+struct eth_header *new_ehdr = new_ehdr_buf;
 
 size_t copied = iov_to_buf(iov, iovcnt, iovoff,
new_ehdr, sizeof(*new_ehdr));
@@ -276,7 +276,7 @@ eth_strip_vlan(const struct iovec *iov, int iovcnt, size_t 
iovoff,
 
 size_t
 eth_strip_vlan_ex(const struct iovec *iov, int iovcnt, size_t iovoff,
-  uint16_t vet, uint8_t *new_ehdr_buf,
+  uint16_t vet, void *new_ehdr_buf,
   uint16_t *payload_offset, uint16_t *tci)
 {
 struct vlan_header vlan_hdr;
-- 
2.40.0

[PATCH v3 38/47] igb: Strip the second VLAN tag for extended VLAN

2023-04-22 Thread Akihiko Odaki

Signed-off-by: Akihiko Odaki 
---
 hw/net/net_rx_pkt.h  | 19 
 include/net/eth.h|  4 ++--
 hw/net/e1000e_core.c |  3 ++-
 hw/net/igb_core.c| 14 ++--
 hw/net/net_rx_pkt.c  | 15 +
 net/eth.c| 52 
 6 files changed, 65 insertions(+), 42 deletions(-)

diff --git a/hw/net/net_rx_pkt.h b/hw/net/net_rx_pkt.h
index ce8dbdb284..55ec67a1a7 100644
--- a/hw/net/net_rx_pkt.h
+++ b/hw/net/net_rx_pkt.h
@@ -223,18 +223,19 @@ void net_rx_pkt_attach_iovec(struct NetRxPkt *pkt,
 /**
 * attach scatter-gather data to rx packet
 *
-* @pkt:packet
-* @iov:received data scatter-gather list
-* @iovcnt  number of elements in iov
-* @iovoff  data start offset in the iov
-* @strip_vlan: should the module strip vlan from data
-* @vet:VLAN tag Ethernet type
+* @pkt:  packet
+* @iov:  received data scatter-gather list
+* @iovcnt:   number of elements in iov
+* @iovoff:   data start offset in the iov
+* @strip_vlan_index: index of Q tag if it is to be stripped. negative 
otherwise.
+* @vet:  VLAN tag Ethernet type
+* @vet_ext:  outer VLAN tag Ethernet type
 *
 */
 void net_rx_pkt_attach_iovec_ex(struct NetRxPkt *pkt,
-   const struct iovec *iov, int iovcnt,
-   size_t iovoff, bool strip_vlan,
-   uint16_t vet);
+const struct iovec *iov, int iovcnt,
+size_t iovoff, int strip_vlan_index,
+uint16_t vet, uint16_t vet_ext);
 
 /**
  * attach data to rx packet
diff --git a/include/net/eth.h b/include/net/eth.h
index 75e7f1551c..3b80b6e07f 100644
--- a/include/net/eth.h
+++ b/include/net/eth.h
@@ -347,8 +347,8 @@ eth_strip_vlan(const struct iovec *iov, int iovcnt, size_t 
iovoff,
uint16_t *payload_offset, uint16_t *tci);
 
 size_t
-eth_strip_vlan_ex(const struct iovec *iov, int iovcnt, size_t iovoff,
-  uint16_t vet, void *new_ehdr_buf,
+eth_strip_vlan_ex(const struct iovec *iov, int iovcnt, size_t iovoff, int 
index,
+  uint16_t vet, uint16_t vet_ext, void *new_ehdr_buf,
   uint16_t *payload_offset, uint16_t *tci);
 
 uint16_t
diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index c06c8b20c8..347162a9d0 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -1711,7 +1711,8 @@ e1000e_receive_internal(E1000ECore *core, const struct 
iovec *iov, int iovcnt,
 }
 
 net_rx_pkt_attach_iovec_ex(core->rx_pkt, iov, iovcnt, iov_ofs,
-   e1000x_vlan_enabled(core->mac), core->mac[VET]);
+   e1000x_vlan_enabled(core->mac) ? 0 : -1,
+   core->mac[VET], 0);
 
 e1000e_rss_parse_packet(core, core->rx_pkt, _info);
 e1000e_rx_ring_init(core, , rss_info.queue);
diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 24a90cd35f..a51c435084 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -1611,6 +1611,7 @@ igb_receive_internal(IGBCore *core, const struct iovec 
*iov, int iovcnt,
 E1000E_RxRing rxr;
 E1000E_RSSInfo rss_info;
 size_t total_size;
+int strip_vlan_index;
 int i;
 
 trace_e1000e_rx_receive_iov(iovcnt);
@@ -1672,9 +1673,18 @@ igb_receive_internal(IGBCore *core, const struct iovec 
*iov, int iovcnt,
 
 igb_rx_ring_init(core, , i);
 
+if (!igb_rx_strip_vlan(core, rxr.i)) {
+strip_vlan_index = -1;
+} else if (core->mac[CTRL_EXT] & BIT(26)) {
+strip_vlan_index = 1;
+} else {
+strip_vlan_index = 0;
+}
+
 net_rx_pkt_attach_iovec_ex(core->rx_pkt, iov, iovcnt, iov_ofs,
-   igb_rx_strip_vlan(core, rxr.i),
-   core->mac[VET] & 0x);
+   strip_vlan_index,
+   core->mac[VET] & 0x,
+   core->mac[VET] >> 16);
 
 total_size = net_rx_pkt_get_total_len(core->rx_pkt) +
 e1000x_fcs_len(core->mac);
diff --git a/hw/net/net_rx_pkt.c b/hw/net/net_rx_pkt.c
index 3575c8b9f9..32e5f3f9cf 100644
--- a/hw/net/net_rx_pkt.c
+++ b/hw/net/net_rx_pkt.c
@@ -137,20 +137,17 @@ void net_rx_pkt_attach_iovec(struct NetRxPkt *pkt,
 
 void net_rx_pkt_attach_iovec_ex(struct NetRxPkt *pkt,
 const struct iovec *iov, int iovcnt,
-size_t iovoff, bool strip_vlan,
-uint16_t vet)
+size_t iovoff, int strip_vlan_index,
+uint16_t vet, uint16_t vet_ext)
 {
 uint16_t tci = 0;
 uint16_t ploff = iovoff;
 assert(pkt);
 
-if (strip_vlan) {
-

[PATCH v3 01/47] hw/net/net_tx_pkt: Decouple implementation from PCI

2023-04-22 Thread Akihiko Odaki

This is intended to be followed by another change for the interface.
It also fixes the leak of memory mapping when the specified memory is
partially mapped.

Fixes: e263cd49c7 ("Packet abstraction for VMWARE network devices")
Signed-off-by: Akihiko Odaki 
---
 hw/net/net_tx_pkt.h |  9 
 hw/net/net_tx_pkt.c | 53 -
 2 files changed, 42 insertions(+), 20 deletions(-)

diff --git a/hw/net/net_tx_pkt.h b/hw/net/net_tx_pkt.h
index e5ce6f20bc..5eb123ef90 100644
--- a/hw/net/net_tx_pkt.h
+++ b/hw/net/net_tx_pkt.h
@@ -153,6 +153,15 @@ void net_tx_pkt_dump(struct NetTxPkt *pkt);
  */
 void net_tx_pkt_reset(struct NetTxPkt *pkt, PCIDevice *dev);
 
+/**
+ * Unmap a fragment mapped from a PCI device.
+ *
+ * @context:PCI device owning fragment
+ * @base:   pointer to fragment
+ * @len:length of fragment
+ */
+void net_tx_pkt_unmap_frag_pci(void *context, void *base, size_t len);
+
 /**
  * Send packet to qemu. handles sw offloads if vhdr is not supported.
  *
diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
index 8dc8568ba2..aca12ff035 100644
--- a/hw/net/net_tx_pkt.c
+++ b/hw/net/net_tx_pkt.c
@@ -384,10 +384,9 @@ void net_tx_pkt_setup_vlan_header_ex(struct NetTxPkt *pkt,
 }
 }
 
-bool net_tx_pkt_add_raw_fragment(struct NetTxPkt *pkt, hwaddr pa,
-size_t len)
+static bool net_tx_pkt_add_raw_fragment_common(struct NetTxPkt *pkt,
+   void *base, size_t len)
 {
-hwaddr mapped_len = 0;
 struct iovec *ventry;
 assert(pkt);
 
@@ -395,23 +394,12 @@ bool net_tx_pkt_add_raw_fragment(struct NetTxPkt *pkt, 
hwaddr pa,
 return false;
 }
 
-if (!len) {
-return true;
- }
-
 ventry = >raw[pkt->raw_frags];
-mapped_len = len;
+ventry->iov_base = base;
+ventry->iov_len = len;
+pkt->raw_frags++;
 
-ventry->iov_base = pci_dma_map(pkt->pci_dev, pa,
-   _len, DMA_DIRECTION_TO_DEVICE);
-
-if ((ventry->iov_base != NULL) && (len == mapped_len)) {
-ventry->iov_len = mapped_len;
-pkt->raw_frags++;
-return true;
-} else {
-return false;
-}
+return true;
 }
 
 bool net_tx_pkt_has_fragments(struct NetTxPkt *pkt)
@@ -465,8 +453,9 @@ void net_tx_pkt_reset(struct NetTxPkt *pkt, PCIDevice 
*pci_dev)
 assert(pkt->raw);
 for (i = 0; i < pkt->raw_frags; i++) {
 assert(pkt->raw[i].iov_base);
-pci_dma_unmap(pkt->pci_dev, pkt->raw[i].iov_base,
-  pkt->raw[i].iov_len, DMA_DIRECTION_TO_DEVICE, 0);
+net_tx_pkt_unmap_frag_pci(pkt->pci_dev,
+  pkt->raw[i].iov_base,
+  pkt->raw[i].iov_len);
 }
 }
 pkt->pci_dev = pci_dev;
@@ -476,6 +465,30 @@ void net_tx_pkt_reset(struct NetTxPkt *pkt, PCIDevice 
*pci_dev)
 pkt->l4proto = 0;
 }
 
+void net_tx_pkt_unmap_frag_pci(void *context, void *base, size_t len)
+{
+pci_dma_unmap(context, base, len, DMA_DIRECTION_TO_DEVICE, 0);
+}
+
+bool net_tx_pkt_add_raw_fragment(struct NetTxPkt *pkt, hwaddr pa,
+size_t len)
+{
+dma_addr_t mapped_len = len;
+void *base = pci_dma_map(pkt->pci_dev, pa, _len,
+ DMA_DIRECTION_TO_DEVICE);
+if (!base) {
+return false;
+}
+
+if (mapped_len != len ||
+!net_tx_pkt_add_raw_fragment_common(pkt, base, len)) {
+net_tx_pkt_unmap_frag_pci(pkt->pci_dev, base, mapped_len);
+return false;
+}
+
+return true;
+}
+
 static void net_tx_pkt_do_sw_csum(struct NetTxPkt *pkt,
   struct iovec *iov, uint32_t iov_len,
   uint16_t csl)
-- 
2.40.0

[PATCH v3 34/47] igb: Implement MSI-X single vector mode

2023-04-22 Thread Akihiko Odaki

Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/igb_core.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 77e4ee42a5..46babe85a9 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -1873,7 +1873,7 @@ igb_update_interrupt_state(IGBCore *core)
 
 icr = core->mac[ICR] & core->mac[IMS];
 
-if (msix_enabled(core->owner)) {
+if (core->mac[GPIE] & E1000_GPIE_MSIX_MODE) {
 if (icr) {
 causes = 0;
 if (icr & E1000_ICR_DRSTA) {
@@ -1908,7 +1908,12 @@ igb_update_interrupt_state(IGBCore *core)
 trace_e1000e_irq_pending_interrupts(core->mac[ICR] & core->mac[IMS],
 core->mac[ICR], core->mac[IMS]);
 
-if (msi_enabled(core->owner)) {
+if (msix_enabled(core->owner)) {
+if (icr) {
+trace_e1000e_irq_msix_notify_vec(0);
+msix_notify(core->owner, 0);
+}
+} else if (msi_enabled(core->owner)) {
 if (icr) {
 msi_notify(core->owner, 0);
 }
-- 
2.40.0

[PATCH v3 14/47] net/eth: Rename eth_setup_vlan_headers_ex

2023-04-22 Thread Akihiko Odaki

The old eth_setup_vlan_headers has no user so remove it and rename
eth_setup_vlan_headers_ex.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Philippe Mathieu-Daudé 
---
 include/net/eth.h   | 9 +
 hw/net/net_tx_pkt.c | 2 +-
 net/eth.c   | 2 +-
 3 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/include/net/eth.h b/include/net/eth.h
index 9f19c3a695..e8af5742be 100644
--- a/include/net/eth.h
+++ b/include/net/eth.h
@@ -351,16 +351,9 @@ eth_strip_vlan_ex(const struct iovec *iov, int iovcnt, 
size_t iovoff,
 uint16_t
 eth_get_l3_proto(const struct iovec *l2hdr_iov, int iovcnt, size_t l2hdr_len);
 
-void eth_setup_vlan_headers_ex(struct eth_header *ehdr, uint16_t vlan_tag,
+void eth_setup_vlan_headers(struct eth_header *ehdr, uint16_t vlan_tag,
 uint16_t vlan_ethtype, bool *is_new);
 
-static inline void
-eth_setup_vlan_headers(struct eth_header *ehdr, uint16_t vlan_tag,
-bool *is_new)
-{
-eth_setup_vlan_headers_ex(ehdr, vlan_tag, ETH_P_VLAN, is_new);
-}
-
 
 uint8_t eth_get_gso_type(uint16_t l3_proto, uint8_t *l3_hdr, uint8_t l4proto);
 
diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
index cc36750c9b..ce6b102391 100644
--- a/hw/net/net_tx_pkt.c
+++ b/hw/net/net_tx_pkt.c
@@ -368,7 +368,7 @@ void net_tx_pkt_setup_vlan_header_ex(struct NetTxPkt *pkt,
 bool is_new;
 assert(pkt);
 
-eth_setup_vlan_headers_ex(pkt->vec[NET_TX_PKT_L2HDR_FRAG].iov_base,
+eth_setup_vlan_headers(pkt->vec[NET_TX_PKT_L2HDR_FRAG].iov_base,
 vlan, vlan_ethtype, _new);
 
 /* update l2hdrlen */
diff --git a/net/eth.c b/net/eth.c
index d7b30df79f..b6ff89c460 100644
--- a/net/eth.c
+++ b/net/eth.c
@@ -21,7 +21,7 @@
 #include "net/checksum.h"
 #include "net/tap.h"
 
-void eth_setup_vlan_headers_ex(struct eth_header *ehdr, uint16_t vlan_tag,
+void eth_setup_vlan_headers(struct eth_header *ehdr, uint16_t vlan_tag,
 uint16_t vlan_ethtype, bool *is_new)
 {
 struct vlan_header *vhdr = PKT_GET_VLAN_HDR(ehdr);
-- 
2.40.0

[PATCH v3 19/47] igb: Always log status after building rx metadata

2023-04-22 Thread Akihiko Odaki

Without this change, the status flags may not be traced e.g. if checksum
offloading is disabled.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/net/igb_core.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 1b69775fd6..167e1f949d 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -1303,9 +1303,8 @@ igb_build_rx_metadata(IGBCore *core,
 trace_e1000e_rx_metadata_l4_cso_disabled();
 }
 
-trace_e1000e_rx_metadata_status_flags(*status_flags);
-
 func_exit:
+trace_e1000e_rx_metadata_status_flags(*status_flags);
 *status_flags = cpu_to_le32(*status_flags);
 }
 
-- 
2.40.0

[PATCH v3 44/47] igb: Notify only new interrupts

2023-04-22 Thread Akihiko Odaki

This follows the corresponding change for e1000e. This fixes:
tests/avocado/netdev-ethtool.py:NetDevEthtool.test_igb

Signed-off-by: Akihiko Odaki 
---
 hw/net/igb_core.c | 201 --
 hw/net/trace-events   |  11 +-
 .../org.centos/stream/8/x86_64/test-avocado   |   1 +
 tests/avocado/netdev-ethtool.py   |   4 -
 4 files changed, 87 insertions(+), 130 deletions(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 1519a90aa6..96b7335b31 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -94,10 +94,7 @@ static ssize_t
 igb_receive_internal(IGBCore *core, const struct iovec *iov, int iovcnt,
  bool has_vnet, bool *external_tx);
 
-static inline void
-igb_set_interrupt_cause(IGBCore *core, uint32_t val);
-
-static void igb_update_interrupt_state(IGBCore *core);
+static void igb_raise_interrupts(IGBCore *core, size_t index, uint32_t causes);
 static void igb_reset(IGBCore *core, bool sw);
 
 static inline void
@@ -913,8 +910,8 @@ igb_start_xmit(IGBCore *core, const IGB_TxRing *txr)
 }
 
 if (eic) {
-core->mac[EICR] |= eic;
-igb_set_interrupt_cause(core, E1000_ICR_TXDW);
+igb_raise_interrupts(core, EICR, eic);
+igb_raise_interrupts(core, ICR, E1000_ICR_TXDW);
 }
 
 net_tx_pkt_reset(txr->tx->tx_pkt, net_tx_pkt_unmap_frag_pci, d);
@@ -1686,6 +1683,7 @@ igb_receive_internal(IGBCore *core, const struct iovec 
*iov, int iovcnt,
 {
 uint16_t queues = 0;
 uint32_t causes = 0;
+uint32_t ecauses = 0;
 union {
 L2Header l2_header;
 uint8_t octets[ETH_ZLEN];
@@ -1788,13 +1786,14 @@ igb_receive_internal(IGBCore *core, const struct iovec 
*iov, int iovcnt,
 causes |= E1000_ICS_RXDMT0;
 }
 
-core->mac[EICR] |= igb_rx_wb_eic(core, rxr.i->idx);
+ecauses |= igb_rx_wb_eic(core, rxr.i->idx);
 
 trace_e1000e_rx_written_to_guest(rxr.i->idx);
 }
 
 trace_e1000e_rx_interrupt_set(causes);
-igb_set_interrupt_cause(core, causes);
+igb_raise_interrupts(core, EICR, ecauses);
+igb_raise_interrupts(core, ICR, causes);
 
 return orig_size;
 }
@@ -1854,7 +1853,7 @@ void igb_core_set_link_status(IGBCore *core)
 }
 
 if (core->mac[STATUS] != old_status) {
-igb_set_interrupt_cause(core, E1000_ICR_LSC);
+igb_raise_interrupts(core, ICR, E1000_ICR_LSC);
 }
 }
 
@@ -1934,13 +1933,6 @@ igb_set_rx_control(IGBCore *core, int index, uint32_t 
val)
 }
 }
 
-static inline void
-igb_clear_ims_bits(IGBCore *core, uint32_t bits)
-{
-trace_e1000e_irq_clear_ims(bits, core->mac[IMS], core->mac[IMS] & ~bits);
-core->mac[IMS] &= ~bits;
-}
-
 static inline bool
 igb_postpone_interrupt(IGBIntrDelayTimer *timer)
 {
@@ -1963,9 +1955,8 @@ igb_eitr_should_postpone(IGBCore *core, int idx)
 return igb_postpone_interrupt(>eitr[idx]);
 }
 
-static void igb_send_msix(IGBCore *core)
+static void igb_send_msix(IGBCore *core, uint32_t causes)
 {
-uint32_t causes = core->mac[EICR] & core->mac[EIMS];
 int vector;
 
 for (vector = 0; vector < IGB_INTR_NUM; ++vector) {
@@ -1988,124 +1979,116 @@ igb_fix_icr_asserted(IGBCore *core)
 trace_e1000e_irq_fix_icr_asserted(core->mac[ICR]);
 }
 
-static void
-igb_update_interrupt_state(IGBCore *core)
+static void igb_raise_interrupts(IGBCore *core, size_t index, uint32_t causes)
 {
-uint32_t icr;
-uint32_t causes;
+uint32_t old_causes = core->mac[ICR] & core->mac[IMS];
+uint32_t old_ecauses = core->mac[EICR] & core->mac[EIMS];
+uint32_t raised_causes;
+uint32_t raised_ecauses;
 uint32_t int_alloc;
 
-icr = core->mac[ICR] & core->mac[IMS];
+trace_e1000e_irq_set(index << 2,
+ core->mac[index], core->mac[index] | causes);
+
+core->mac[index] |= causes;
 
 if (core->mac[GPIE] & E1000_GPIE_MSIX_MODE) {
-if (icr) {
-causes = 0;
-if (icr & E1000_ICR_DRSTA) {
-int_alloc = core->mac[IVAR_MISC] & 0xff;
-if (int_alloc & E1000_IVAR_VALID) {
-causes |= BIT(int_alloc & 0x1f);
-}
+raised_causes = core->mac[ICR] & core->mac[IMS] & ~old_causes;
+
+if (raised_causes & E1000_ICR_DRSTA) {
+int_alloc = core->mac[IVAR_MISC] & 0xff;
+if (int_alloc & E1000_IVAR_VALID) {
+core->mac[EICR] |= BIT(int_alloc & 0x1f);
 }
-/* Check if other bits (excluding the TCP Timer) are enabled. */
-if (icr & ~E1000_ICR_DRSTA) {
-int_alloc = (core->mac[IVAR_MISC] >> 8) & 0xff;
-if (int_alloc & E1000_IVAR_VALID) {
-causes |= BIT(int_alloc & 0x1f);
-}
-trace_e1000e_irq_add_msi_other(core->mac[EICR]);
+}
+/* Check if other bits (excluding the TCP Timer) are enabled. */
+if (raised_causes & ~E1000_ICR_DRSTA) {
+

[PATCH v3 28/47] e1000e: Rename a variable in e1000e_receive_internal()

2023-04-22 Thread Akihiko Odaki

Rename variable "n" to "causes", which properly represents the content
of the variable.

Signed-off-by: Akihiko Odaki 
---
 hw/net/e1000e_core.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 27124bba07..0c0c45a6ce 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -1650,7 +1650,7 @@ static ssize_t
 e1000e_receive_internal(E1000ECore *core, const struct iovec *iov, int iovcnt,
 bool has_vnet)
 {
-uint32_t n = 0;
+uint32_t causes = 0;
 uint8_t min_buf[ETH_ZLEN];
 struct iovec min_iov;
 size_t size, orig_size;
@@ -1723,32 +1723,32 @@ e1000e_receive_internal(E1000ECore *core, const struct 
iovec *iov, int iovcnt,
 
 /* Perform small receive detection (RSRPD) */
 if (total_size < core->mac[RSRPD]) {
-n |= E1000_ICS_SRPD;
+causes |= E1000_ICS_SRPD;
 }
 
 /* Perform ACK receive detection */
 if  (!(core->mac[RFCTL] & E1000_RFCTL_ACK_DIS) &&
  (e1000e_is_tcp_ack(core, core->rx_pkt))) {
-n |= E1000_ICS_ACK;
+causes |= E1000_ICS_ACK;
 }
 
 /* Check if receive descriptor minimum threshold hit */
 rdmts_hit = e1000e_rx_descr_threshold_hit(core, rxr.i);
-n |= e1000e_rx_wb_interrupt_cause(core, rxr.i->idx, rdmts_hit);
+causes |= e1000e_rx_wb_interrupt_cause(core, rxr.i->idx, rdmts_hit);
 
 trace_e1000e_rx_written_to_guest(rxr.i->idx);
 } else {
-n |= E1000_ICS_RXO;
+causes |= E1000_ICS_RXO;
 retval = 0;
 
 trace_e1000e_rx_not_written_to_guest(rxr.i->idx);
 }
 
-if (!e1000e_intrmgr_delay_rx_causes(core, )) {
-trace_e1000e_rx_interrupt_set(n);
-e1000e_set_interrupt_cause(core, n);
+if (!e1000e_intrmgr_delay_rx_causes(core, )) {
+trace_e1000e_rx_interrupt_set(causes);
+e1000e_set_interrupt_cause(core, causes);
 } else {
-trace_e1000e_rx_interrupt_delayed(n);
+trace_e1000e_rx_interrupt_delayed(causes);
 }
 
 return retval;
-- 
2.40.0

[PATCH v3 23/47] vmxnet3: Reset packet state after emptying Tx queue

2023-04-22 Thread Akihiko Odaki

Keeping Tx packet state after the transmit queue is emptied but this
behavior is unreliable as the state can be reset anytime the migration
happens.

Always reset Tx packet state always after the queue is emptied.

Signed-off-by: Akihiko Odaki 
---
 hw/net/vmxnet3.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 05f41b6dfa..18b9edfdb2 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -681,6 +681,8 @@ static void vmxnet3_process_tx_queue(VMXNET3State *s, int 
qidx)
  net_tx_pkt_unmap_frag_pci, PCI_DEVICE(s));
 }
 }
+
+net_tx_pkt_reset(s->tx_pkt, net_tx_pkt_unmap_frag_pci, PCI_DEVICE(s));
 }
 
 static inline void
@@ -1159,7 +1161,6 @@ static void vmxnet3_deactivate_device(VMXNET3State *s)
 {
 if (s->device_active) {
 VMW_CBPRN("Deactivating vmxnet3...");
-net_tx_pkt_reset(s->tx_pkt, net_tx_pkt_unmap_frag_pci, PCI_DEVICE(s));
 net_tx_pkt_uninit(s->tx_pkt);
 net_rx_pkt_uninit(s->rx_pkt);
 s->device_active = false;
-- 
2.40.0

[PATCH v3 24/47] igb: Add more definitions for Tx descriptor

2023-04-22 Thread Akihiko Odaki

Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/igb_regs.h | 32 +++-
 hw/net/igb_core.c |  4 ++--
 2 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/hw/net/igb_regs.h b/hw/net/igb_regs.h
index 21ee9a3b2d..eb995d8b2e 100644
--- a/hw/net/igb_regs.h
+++ b/hw/net/igb_regs.h
@@ -42,11 +42,6 @@ union e1000_adv_tx_desc {
 } wb;
 };
 
-#define E1000_ADVTXD_DTYP_CTXT  0x0020 /* Advanced Context Descriptor */
-#define E1000_ADVTXD_DTYP_DATA  0x0030 /* Advanced Data Descriptor */
-#define E1000_ADVTXD_DCMD_DEXT  0x2000 /* Descriptor Extension (1=Adv) */
-#define E1000_ADVTXD_DCMD_TSE   0x8000 /* TCP/UDP Segmentation Enable */
-
 #define E1000_ADVTXD_POTS_IXSM  0x0100 /* Insert TCP/UDP Checksum */
 #define E1000_ADVTXD_POTS_TXSM  0x0200 /* Insert TCP/UDP Checksum */
 
@@ -151,6 +146,10 @@ union e1000_adv_rx_desc {
 #define IGB_82576_VF_DEV_ID0x10CA
 #define IGB_I350_VF_DEV_ID 0x1520
 
+/* VLAN info */
+#define IGB_TX_FLAGS_VLAN_MASK 0x
+#define IGB_TX_FLAGS_VLAN_SHIFT16
+
 /* from igb/e1000_82575.h */
 
 #define E1000_MRQC_ENABLE_RSS_MQ0x0002
@@ -160,6 +159,29 @@ union e1000_adv_rx_desc {
 #define E1000_MRQC_RSS_FIELD_IPV6_UDP   0x0080
 #define E1000_MRQC_RSS_FIELD_IPV6_UDP_EX0x0100
 
+/* Adv Transmit Descriptor Config Masks */
+#define E1000_ADVTXD_MAC_TSTAMP   0x0008 /* IEEE1588 Timestamp packet */
+#define E1000_ADVTXD_DTYP_CTXT0x0020 /* Advanced Context Descriptor */
+#define E1000_ADVTXD_DTYP_DATA0x0030 /* Advanced Data Descriptor */
+#define E1000_ADVTXD_DCMD_EOP 0x0100 /* End of Packet */
+#define E1000_ADVTXD_DCMD_IFCS0x0200 /* Insert FCS (Ethernet CRC) */
+#define E1000_ADVTXD_DCMD_RS  0x0800 /* Report Status */
+#define E1000_ADVTXD_DCMD_DEXT0x2000 /* Descriptor extension (1=Adv) */
+#define E1000_ADVTXD_DCMD_VLE 0x4000 /* VLAN pkt enable */
+#define E1000_ADVTXD_DCMD_TSE 0x8000 /* TCP Seg enable */
+#define E1000_ADVTXD_PAYLEN_SHIFT14 /* Adv desc PAYLEN shift */
+
+#define E1000_ADVTXD_MACLEN_SHIFT9  /* Adv ctxt desc mac len shift */
+#define E1000_ADVTXD_TUCMD_L4T_UDP 0x  /* L4 Packet TYPE of UDP */
+#define E1000_ADVTXD_TUCMD_IPV40x0400  /* IP Packet Type: 1=IPv4 */
+#define E1000_ADVTXD_TUCMD_L4T_TCP 0x0800  /* L4 Packet TYPE of TCP */
+#define E1000_ADVTXD_TUCMD_L4T_SCTP 0x1000 /* L4 packet TYPE of SCTP */
+/* IPSec Encrypt Enable for ESP */
+#define E1000_ADVTXD_L4LEN_SHIFT 8  /* Adv ctxt L4LEN shift */
+#define E1000_ADVTXD_MSS_SHIFT  16  /* Adv ctxt MSS shift */
+/* Adv ctxt IPSec SA IDX mask */
+/* Adv ctxt IPSec ESP len mask */
+
 /* Additional Transmit Descriptor Control definitions */
 #define E1000_TXDCTL_QUEUE_ENABLE  0x0200 /* Enable specific Tx Queue */
 
diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 4ac7e7af44..5fb2a38a6f 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -418,7 +418,7 @@ igb_setup_tx_offloads(IGBCore *core, struct igb_tx *tx)
 {
 if (tx->first_cmd_type_len & E1000_ADVTXD_DCMD_TSE) {
 uint32_t idx = (tx->first_olinfo_status >> 4) & 1;
-uint32_t mss = tx->ctx[idx].mss_l4len_idx >> 16;
+uint32_t mss = tx->ctx[idx].mss_l4len_idx >> E1000_ADVTXD_MSS_SHIFT;
 if (!net_tx_pkt_build_vheader(tx->tx_pkt, true, true, mss)) {
 return false;
 }
@@ -612,7 +612,7 @@ igb_process_tx_desc(IGBCore *core,
 if (!tx->skip_cp && net_tx_pkt_parse(tx->tx_pkt)) {
 idx = (tx->first_olinfo_status >> 4) & 1;
 igb_tx_insert_vlan(core, queue_index, tx,
-tx->ctx[idx].vlan_macip_lens >> 16,
+tx->ctx[idx].vlan_macip_lens >> IGB_TX_FLAGS_VLAN_SHIFT,
 !!(tx->first_cmd_type_len & E1000_TXD_CMD_VLE));
 
 if (igb_tx_pkt_send(core, tx, queue_index)) {
-- 
2.40.0

[PATCH v3 13/47] hw/net/net_tx_pkt: Remove net_rx_pkt_get_l4_info

2023-04-22 Thread Akihiko Odaki

This function is not used.

Signed-off-by: Akihiko Odaki 
---
 hw/net/net_rx_pkt.h | 9 -
 hw/net/net_rx_pkt.c | 5 -
 2 files changed, 14 deletions(-)

diff --git a/hw/net/net_rx_pkt.h b/hw/net/net_rx_pkt.h
index a06f5c2675..ce8dbdb284 100644
--- a/hw/net/net_rx_pkt.h
+++ b/hw/net/net_rx_pkt.h
@@ -119,15 +119,6 @@ eth_ip6_hdr_info *net_rx_pkt_get_ip6_info(struct NetRxPkt 
*pkt);
  */
 eth_ip4_hdr_info *net_rx_pkt_get_ip4_info(struct NetRxPkt *pkt);
 
-/**
- * fetches L4 header analysis results
- *
- * Return:  pointer to analysis results structure which is stored in internal
- *  packet area.
- *
- */
-eth_l4_hdr_info *net_rx_pkt_get_l4_info(struct NetRxPkt *pkt);
-
 typedef enum {
 NetPktRssIpV4,
 NetPktRssIpV4Tcp,
diff --git a/hw/net/net_rx_pkt.c b/hw/net/net_rx_pkt.c
index 63be6e05ad..6125a063d7 100644
--- a/hw/net/net_rx_pkt.c
+++ b/hw/net/net_rx_pkt.c
@@ -236,11 +236,6 @@ eth_ip4_hdr_info *net_rx_pkt_get_ip4_info(struct NetRxPkt 
*pkt)
 return >ip4hdr_info;
 }
 
-eth_l4_hdr_info *net_rx_pkt_get_l4_info(struct NetRxPkt *pkt)
-{
-return >l4hdr_info;
-}
-
 static inline void
 _net_rx_rss_add_chunk(uint8_t *rss_input, size_t *bytes_written,
   void *ptr, size_t size)
-- 
2.40.0

[PATCH v3 04/47] igb: Fix Rx packet type encoding

2023-04-22 Thread Akihiko Odaki

igb's advanced descriptor uses a packet type encoding different from
one used in e1000e's extended descriptor. Fix the logic to encode
Rx packet type accordingly.

Fixes: 3a977deebe ("Intrdocue igb device emulation")
Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/igb_regs.h |  5 +
 hw/net/igb_core.c | 38 +++---
 2 files changed, 24 insertions(+), 19 deletions(-)

diff --git a/hw/net/igb_regs.h b/hw/net/igb_regs.h
index c5c5b3c3b8..21ee9a3b2d 100644
--- a/hw/net/igb_regs.h
+++ b/hw/net/igb_regs.h
@@ -641,6 +641,11 @@ union e1000_adv_rx_desc {
 
 #define E1000_STATUS_NUM_VFS_SHIFT 14
 
+#define E1000_ADVRXD_PKT_IP4 BIT(4)
+#define E1000_ADVRXD_PKT_IP6 BIT(6)
+#define E1000_ADVRXD_PKT_TCP BIT(8)
+#define E1000_ADVRXD_PKT_UDP BIT(9)
+
 static inline uint8_t igb_ivar_entry_rx(uint8_t i)
 {
 return i < 8 ? i * 4 : (i - 8) * 4 + 2;
diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 464a41d0aa..dbd1192a8e 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -1227,7 +1227,6 @@ igb_build_rx_metadata(IGBCore *core,
 struct virtio_net_hdr *vhdr;
 bool hasip4, hasip6;
 EthL4HdrProto l4hdr_proto;
-uint32_t pkt_type;
 
 *status_flags = E1000_RXD_STAT_DD;
 
@@ -1266,28 +1265,29 @@ igb_build_rx_metadata(IGBCore *core,
 trace_e1000e_rx_metadata_ack();
 }
 
-if (hasip6 && (core->mac[RFCTL] & E1000_RFCTL_IPV6_DIS)) {
-trace_e1000e_rx_metadata_ipv6_filtering_disabled();
-pkt_type = E1000_RXD_PKT_MAC;
-} else if (l4hdr_proto == ETH_L4_HDR_PROTO_TCP ||
-   l4hdr_proto == ETH_L4_HDR_PROTO_UDP) {
-pkt_type = hasip4 ? E1000_RXD_PKT_IP4_XDP : E1000_RXD_PKT_IP6_XDP;
-} else if (hasip4 || hasip6) {
-pkt_type = hasip4 ? E1000_RXD_PKT_IP4 : E1000_RXD_PKT_IP6;
-} else {
-pkt_type = E1000_RXD_PKT_MAC;
-}
+if (pkt_info) {
+*pkt_info = rss_info->enabled ? rss_info->type : 0;
 
-trace_e1000e_rx_metadata_pkt_type(pkt_type);
+if (hasip4) {
+*pkt_info |= E1000_ADVRXD_PKT_IP4;
+}
 
-if (pkt_info) {
-if (rss_info->enabled) {
-*pkt_info = rss_info->type;
+if (hasip6) {
+*pkt_info |= E1000_ADVRXD_PKT_IP6;
 }
 
-*pkt_info |= (pkt_type << 4);
-} else {
-*status_flags |= E1000_RXD_PKT_TYPE(pkt_type);
+switch (l4hdr_proto) {
+case ETH_L4_HDR_PROTO_TCP:
+*pkt_info |= E1000_ADVRXD_PKT_TCP;
+break;
+
+case ETH_L4_HDR_PROTO_UDP:
+*pkt_info |= E1000_ADVRXD_PKT_UDP;
+break;
+
+default:
+break;
+}
 }
 
 if (hdr_info) {
-- 
2.40.0

[PATCH v3 17/47] e1000x: Rename TcpIpv6 into TcpIpv6Ex

2023-04-22 Thread Akihiko Odaki

e1000e and igb employs NetPktRssIpV6TcpEx for RSS hash if TcpIpv6 MRQC
bit is set. Moreover, igb also has a MRQC bit for NetPktRssIpV6Tcp
though it is not implemented yet. Rename it to TcpIpv6Ex to avoid
confusion.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/e1000x_regs.h | 24 
 hw/net/e1000e_core.c |  8 
 hw/net/igb_core.c|  8 
 hw/net/trace-events  |  2 +-
 4 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/hw/net/e1000x_regs.h b/hw/net/e1000x_regs.h
index 6d3c4c6d3a..13760c66d3 100644
--- a/hw/net/e1000x_regs.h
+++ b/hw/net/e1000x_regs.h
@@ -290,18 +290,18 @@
 #define E1000_RETA_IDX(hash)((hash) & (BIT(7) - 1))
 #define E1000_RETA_VAL(reta, hash)  (((uint8_t *)(reta))[E1000_RETA_IDX(hash)])
 
-#define E1000_MRQC_EN_TCPIPV4(mrqc) ((mrqc) & BIT(16))
-#define E1000_MRQC_EN_IPV4(mrqc)((mrqc) & BIT(17))
-#define E1000_MRQC_EN_TCPIPV6(mrqc) ((mrqc) & BIT(18))
-#define E1000_MRQC_EN_IPV6EX(mrqc)  ((mrqc) & BIT(19))
-#define E1000_MRQC_EN_IPV6(mrqc)((mrqc) & BIT(20))
-
-#define E1000_MRQ_RSS_TYPE_NONE (0)
-#define E1000_MRQ_RSS_TYPE_IPV4TCP  (1)
-#define E1000_MRQ_RSS_TYPE_IPV4 (2)
-#define E1000_MRQ_RSS_TYPE_IPV6TCP  (3)
-#define E1000_MRQ_RSS_TYPE_IPV6EX   (4)
-#define E1000_MRQ_RSS_TYPE_IPV6 (5)
+#define E1000_MRQC_EN_TCPIPV4(mrqc)   ((mrqc) & BIT(16))
+#define E1000_MRQC_EN_IPV4(mrqc)  ((mrqc) & BIT(17))
+#define E1000_MRQC_EN_TCPIPV6EX(mrqc) ((mrqc) & BIT(18))
+#define E1000_MRQC_EN_IPV6EX(mrqc)((mrqc) & BIT(19))
+#define E1000_MRQC_EN_IPV6(mrqc)  ((mrqc) & BIT(20))
+
+#define E1000_MRQ_RSS_TYPE_NONE   (0)
+#define E1000_MRQ_RSS_TYPE_IPV4TCP(1)
+#define E1000_MRQ_RSS_TYPE_IPV4   (2)
+#define E1000_MRQ_RSS_TYPE_IPV6TCPEX  (3)
+#define E1000_MRQ_RSS_TYPE_IPV6EX (4)
+#define E1000_MRQ_RSS_TYPE_IPV6   (5)
 
 #define E1000_ICR_ASSERTED BIT(31)
 #define E1000_EIAC_MASK0x01F0
diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 743b36ddfb..481db41931 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -537,7 +537,7 @@ e1000e_rss_get_hash_type(E1000ECore *core, struct NetRxPkt 
*pkt)
 ip6info->rss_ex_dst_valid,
 ip6info->rss_ex_src_valid,
 core->mac[MRQC],
-E1000_MRQC_EN_TCPIPV6(core->mac[MRQC]),
+E1000_MRQC_EN_TCPIPV6EX(core->mac[MRQC]),
 E1000_MRQC_EN_IPV6EX(core->mac[MRQC]),
 E1000_MRQC_EN_IPV6(core->mac[MRQC]));
 
@@ -546,8 +546,8 @@ e1000e_rss_get_hash_type(E1000ECore *core, struct NetRxPkt 
*pkt)
   ip6info->rss_ex_src_valid))) {
 
 if (l4hdr_proto == ETH_L4_HDR_PROTO_TCP &&
-E1000_MRQC_EN_TCPIPV6(core->mac[MRQC])) {
-return E1000_MRQ_RSS_TYPE_IPV6TCP;
+E1000_MRQC_EN_TCPIPV6EX(core->mac[MRQC])) {
+return E1000_MRQ_RSS_TYPE_IPV6TCPEX;
 }
 
 if (E1000_MRQC_EN_IPV6EX(core->mac[MRQC])) {
@@ -581,7 +581,7 @@ e1000e_rss_calc_hash(E1000ECore *core,
 case E1000_MRQ_RSS_TYPE_IPV4TCP:
 type = NetPktRssIpV4Tcp;
 break;
-case E1000_MRQ_RSS_TYPE_IPV6TCP:
+case E1000_MRQ_RSS_TYPE_IPV6TCPEX:
 type = NetPktRssIpV6TcpEx;
 break;
 case E1000_MRQ_RSS_TYPE_IPV6:
diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index d8fd0e1813..1b69775fd6 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -301,7 +301,7 @@ igb_rss_get_hash_type(IGBCore *core, struct NetRxPkt *pkt)
 ip6info->rss_ex_dst_valid,
 ip6info->rss_ex_src_valid,
 core->mac[MRQC],
-E1000_MRQC_EN_TCPIPV6(core->mac[MRQC]),
+E1000_MRQC_EN_TCPIPV6EX(core->mac[MRQC]),
 E1000_MRQC_EN_IPV6EX(core->mac[MRQC]),
 E1000_MRQC_EN_IPV6(core->mac[MRQC]));
 
@@ -310,8 +310,8 @@ igb_rss_get_hash_type(IGBCore *core, struct NetRxPkt *pkt)
   ip6info->rss_ex_src_valid))) {
 
 if (l4hdr_proto == ETH_L4_HDR_PROTO_TCP &&
-E1000_MRQC_EN_TCPIPV6(core->mac[MRQC])) {
-return E1000_MRQ_RSS_TYPE_IPV6TCP;
+E1000_MRQC_EN_TCPIPV6EX(core->mac[MRQC])) {
+return E1000_MRQ_RSS_TYPE_IPV6TCPEX;
 }
 
 if (E1000_MRQC_EN_IPV6EX(core->mac[MRQC])) {
@@ -343,7 +343,7 @@ igb_rss_calc_hash(IGBCore *core, struct NetRxPkt *pkt, 
E1000E_RSSInfo *info)
 case E1000_MRQ_RSS_TYPE_IPV4TCP:
 type = NetPktRssIpV4Tcp;
 break;
-case E1000_MRQ_RSS_TYPE_IPV6TCP:
+case E1000_MRQ_RSS_TYPE_IPV6TCPEX:
 type = NetPktRssIpV6TcpEx;
 break;
 case

[PATCH v3 00/47] igb: Fix for DPDK

2023-04-22 Thread Akihiko Odaki

Based-on: <366bbcafdb6e0373f0deb105153768a8c0bded87.ca...@gmail.com>
("[PATCH 0/1] e1000e: Fix tx/rx counters")

This series has fixes and feature additions to pass DPDK Test Suite with
igb. It also includes a few minor changes related to networking.

Patch [01, 10] are bug fixes.
Patch [11, 14] delete code which is unnecessary.
Patch [15, 33] are minor changes.
Patch [34, 45] implement new features.
Patch [46, 47] update documentations.

While this includes so many patches, it is not necessary to land them at
once. Only bug fix patches may be applied first, for example.

V2 -> V3:
- Fixed parameter name in hw/net/net_tx_pkt. (Philippe Mathieu-Daudé)
- Added patch "igb: Clear IMS bits when committing ICR access".
- Added patch "igb: Clear EICR bits for delayed MSI-X interrupts".
- Added patch "e1000e: Rename a variable in e1000e_receive_internal()".
- Added patch "igb: Rename a variable in igb_receive_internal()".
- Added patch "e1000e: Notify only new interrupts".
- Added patch "igb: Notify only new interrupts".

V1 -> V2:
- Dropped patch "Include the second VLAN tag in the buffer". The second
  VLAN tag is not used at the point and unecessary.
- Added patch "e1000x: Rename TcpIpv6 into TcpIpv6Ex".
- Split patch "hw/net/net_tx_pkt: Decouple from PCI".
  (Philippe Mathieu-Daudé)
- Added advanced Rx descriptor packet encoding definitions.
  (Sriram Yagnaraman)
- Added some constants to eth.h to derive packet oversize thresholds.
- Added IGB_TX_FLAGS_VLAN_SHIFT usage.
- Renamed patch "igb: Fix igb_mac_reg_init alignment".
  (Philippe Mathieu-Daudé)
- Fixed size check for packets with double VLAN. (Sriram Yagnaraman)
- Fixed timing to timestamp Tx packet.

Akihiko Odaki (47):
  hw/net/net_tx_pkt: Decouple implementation from PCI
  hw/net/net_tx_pkt: Decouple interface from PCI
  e1000x: Fix BPRC and MPRC
  igb: Fix Rx packet type encoding
  igb: Do not require CTRL.VME for tx VLAN tagging
  igb: Clear IMS bits when committing ICR access
  net/net_rx_pkt: Use iovec for net_rx_pkt_set_protocols()
  e1000e: Always copy ethernet header
  igb: Always copy ethernet header
  Fix references to igb Avocado test
  tests/avocado: Remove unused imports
  tests/avocado: Remove test_igb_nomsi_kvm
  hw/net/net_tx_pkt: Remove net_rx_pkt_get_l4_info
  net/eth: Rename eth_setup_vlan_headers_ex
  e1000x: Share more Rx filtering logic
  e1000x: Take CRC into consideration for size check
  e1000x: Rename TcpIpv6 into TcpIpv6Ex
  e1000e: Always log status after building rx metadata
  igb: Always log status after building rx metadata
  igb: Remove goto
  igb: Read DCMD.VLE of the first Tx descriptor
  e1000e: Reset packet state after emptying Tx queue
  vmxnet3: Reset packet state after emptying Tx queue
  igb: Add more definitions for Tx descriptor
  igb: Share common VF constants
  igb: Fix igb_mac_reg_init coding style alignment
  igb: Clear EICR bits for delayed MSI-X interrupts
  e1000e: Rename a variable in e1000e_receive_internal()
  igb: Rename a variable in igb_receive_internal()
  net/eth: Use void pointers
  net/eth: Always add VLAN tag
  hw/net/net_rx_pkt: Enforce alignment for eth_header
  tests/qtest/libqos/igb: Set GPIE.Multiple_MSIX
  igb: Implement MSI-X single vector mode
  igb: Use UDP for RSS hash
  igb: Implement Rx SCTP CSO
  igb: Implement Tx SCTP CSO
  igb: Strip the second VLAN tag for extended VLAN
  igb: Filter with the second VLAN tag for extended VLAN
  igb: Implement igb-specific oversize check
  igb: Implement Rx PTP2 timestamp
  igb: Implement Tx timestamp
  e1000e: Notify only new interrupts
  igb: Notify only new interrupts
  vmxnet3: Do not depend on PC
  MAINTAINERS: Add a reviewer for network packet abstractions
  docs/system/devices/igb: Note igb is tested for DPDK

 MAINTAINERS   |   3 +-
 docs/system/devices/igb.rst   |  14 +-
 hw/net/e1000e_core.h  |   2 -
 hw/net/e1000x_common.h|   9 +-
 hw/net/e1000x_regs.h  |  24 +-
 hw/net/igb_common.h   |  24 +-
 hw/net/igb_regs.h |  67 +-
 hw/net/net_rx_pkt.h   |  38 +-
 hw/net/net_tx_pkt.h   |  46 +-
 include/net/eth.h |  29 +-
 include/qemu/crc32c.h |   1 +
 hw/net/e1000.c|  41 +-
 hw/net/e1000e_core.c  | 282 +++
 hw/net/e1000x_common.c|  79 +-
 hw/net/igb.c  |  10 +-
 hw/net/igb_core.c | 711 ++
 hw/net/igbvf.c|   7 -
 hw/net/net_rx_pkt.c   | 107 ++-
 hw/net/net_tx_pkt.c   | 101 ++-
 hw/net/virtio-net.c   |   7 +-
 hw/net/vmxnet3.c  |  22 +-
 net/eth.c | 100

[PATCH v3 11/47] tests/avocado: Remove unused imports

2023-04-22 Thread Akihiko Odaki

Signed-off-by: Akihiko Odaki 
---
 tests/avocado/netdev-ethtool.py | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tests/avocado/netdev-ethtool.py b/tests/avocado/netdev-ethtool.py
index f7e9464184..8de118e313 100644
--- a/tests/avocado/netdev-ethtool.py
+++ b/tests/avocado/netdev-ethtool.py
@@ -7,7 +7,6 @@
 
 from avocado import skip
 from avocado_qemu import QemuSystemTest
-from avocado_qemu import exec_command, exec_command_and_wait_for_pattern
 from avocado_qemu import wait_for_console_pattern
 
 class NetDevEthtool(QemuSystemTest):
-- 
2.40.0

[PATCH v3 15/47] e1000x: Share more Rx filtering logic

2023-04-22 Thread Akihiko Odaki

This saves some code and enables tracepoint for e1000's VLAN filtering.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
---
 hw/net/e1000x_common.h |  4 +++-
 hw/net/e1000.c | 35 +--
 hw/net/e1000e_core.c   | 47 +-
 hw/net/e1000x_common.c | 44 +--
 hw/net/igb_core.c  | 41 +++-
 hw/net/trace-events|  4 ++--
 6 files changed, 56 insertions(+), 119 deletions(-)

diff --git a/hw/net/e1000x_common.h b/hw/net/e1000x_common.h
index 0298e06283..be291684de 100644
--- a/hw/net/e1000x_common.h
+++ b/hw/net/e1000x_common.h
@@ -107,7 +107,9 @@ bool e1000x_rx_ready(PCIDevice *d, uint32_t *mac);
 
 bool e1000x_is_vlan_packet(const void *buf, uint16_t vet);
 
-bool e1000x_rx_group_filter(uint32_t *mac, const uint8_t *buf);
+bool e1000x_rx_vlan_filter(uint32_t *mac, const struct vlan_header *vhdr);
+
+bool e1000x_rx_group_filter(uint32_t *mac, const struct eth_header *ehdr);
 
 bool e1000x_hw_rx_enabled(uint32_t *mac);
 
diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 18eb6d8876..aae5f0bdc0 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -804,36 +804,11 @@ start_xmit(E1000State *s)
 }
 
 static int
-receive_filter(E1000State *s, const uint8_t *buf, int size)
+receive_filter(E1000State *s, const void *buf)
 {
-uint32_t rctl = s->mac_reg[RCTL];
-int isbcast = is_broadcast_ether_addr(buf);
-int ismcast = is_multicast_ether_addr(buf);
-
-if (e1000x_is_vlan_packet(buf, le16_to_cpu(s->mac_reg[VET])) &&
-e1000x_vlan_rx_filter_enabled(s->mac_reg)) {
-uint16_t vid = lduw_be_p(_GET_VLAN_HDR(buf)->h_tci);
-uint32_t vfta =
-ldl_le_p((uint32_t *)(s->mac_reg + VFTA) +
- ((vid >> E1000_VFTA_ENTRY_SHIFT) & 
E1000_VFTA_ENTRY_MASK));
-if ((vfta & (1 << (vid & E1000_VFTA_ENTRY_BIT_SHIFT_MASK))) == 0) {
-return 0;
-}
-}
-
-if (!isbcast && !ismcast && (rctl & E1000_RCTL_UPE)) { /* promiscuous 
ucast */
-return 1;
-}
-
-if (ismcast && (rctl & E1000_RCTL_MPE)) {  /* promiscuous mcast */
-return 1;
-}
-
-if (isbcast && (rctl & E1000_RCTL_BAM)) {  /* broadcast enabled */
-return 1;
-}
-
-return e1000x_rx_group_filter(s->mac_reg, buf);
+return (!e1000x_is_vlan_packet(buf, s->mac_reg[VET]) ||
+e1000x_rx_vlan_filter(s->mac_reg, PKT_GET_VLAN_HDR(buf))) &&
+   e1000x_rx_group_filter(s->mac_reg, buf);
 }
 
 static void
@@ -949,7 +924,7 @@ e1000_receive_iov(NetClientState *nc, const struct iovec 
*iov, int iovcnt)
 return size;
 }
 
-if (!receive_filter(s, filter_buf, size)) {
+if (!receive_filter(s, filter_buf)) {
 return size;
 }
 
diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index f3335194d8..743b36ddfb 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -1034,48 +1034,11 @@ e1000e_rx_l4_cso_enabled(E1000ECore *core)
 }
 
 static bool
-e1000e_receive_filter(E1000ECore *core, const uint8_t *buf, int size)
+e1000e_receive_filter(E1000ECore *core, const void *buf)
 {
-uint32_t rctl = core->mac[RCTL];
-
-if (e1000x_is_vlan_packet(buf, core->mac[VET]) &&
-e1000x_vlan_rx_filter_enabled(core->mac)) {
-uint16_t vid = lduw_be_p(_GET_VLAN_HDR(buf)->h_tci);
-uint32_t vfta =
-ldl_le_p((uint32_t *)(core->mac + VFTA) +
- ((vid >> E1000_VFTA_ENTRY_SHIFT) & 
E1000_VFTA_ENTRY_MASK));
-if ((vfta & (1 << (vid & E1000_VFTA_ENTRY_BIT_SHIFT_MASK))) == 0) {
-trace_e1000e_rx_flt_vlan_mismatch(vid);
-return false;
-} else {
-trace_e1000e_rx_flt_vlan_match(vid);
-}
-}
-
-switch (net_rx_pkt_get_packet_type(core->rx_pkt)) {
-case ETH_PKT_UCAST:
-if (rctl & E1000_RCTL_UPE) {
-return true; /* promiscuous ucast */
-}
-break;
-
-case ETH_PKT_BCAST:
-if (rctl & E1000_RCTL_BAM) {
-return true; /* broadcast enabled */
-}
-break;
-
-case ETH_PKT_MCAST:
-if (rctl & E1000_RCTL_MPE) {
-return true; /* promiscuous mcast */
-}
-break;
-
-default:
-g_assert_not_reached();
-}
-
-return e1000x_rx_group_filter(core->mac, buf);
+return (!e1000x_is_vlan_packet(buf, core->mac[VET]) ||
+e1000x_rx_vlan_filter(core->mac, PKT_GET_VLAN_HDR(buf))) &&
+   e1000x_rx_group_filter(core->mac, buf);
 }
 
 static inline void
@@ -1736,7 +1699,7 @@ e1000e_receive_internal(E1000ECore *core, const struct 
iovec *iov, int iovcnt,
 net_rx_pkt_set_packet_type(core->rx_pkt,
 get_eth_packet_type(PKT_GET_ETH_HDR(min_buf)));
 
-if (!e1000e_receive_filter(core, min_buf, size)) {
+if (!e1000e_receive_filter(core, min_buf)) {
 trace_e1000e_rx_flt_dropped();
 return orig_size;

[PATCH v3 08/47] e1000e: Always copy ethernet header

2023-04-22 Thread Akihiko Odaki

e1000e_receive_internal() used to check the iov length to determine
copy the iovs to a contiguous buffer, but the check is flawed in two
ways:
- It does not ensure that iovcnt > 0.
- It does not take virtio-net header into consideration.

The size of this copy is just 18 octets, which can be even less than
the code size required for checks. This (wrong) optimization is probably
not worth so just remove it.

Fixes: 6f3fbe4ed0 ("net: Introduce e1000e device emulation")
Signed-off-by: Akihiko Odaki 
---
 hw/net/e1000e_core.c | 16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index c2d864a504..f3335194d8 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -1686,12 +1686,9 @@ static ssize_t
 e1000e_receive_internal(E1000ECore *core, const struct iovec *iov, int iovcnt,
 bool has_vnet)
 {
-static const int maximum_ethernet_hdr_len = (ETH_HLEN + 4);
-
 uint32_t n = 0;
 uint8_t min_buf[ETH_ZLEN];
 struct iovec min_iov;
-uint8_t *filter_buf;
 size_t size, orig_size;
 size_t iov_ofs = 0;
 E1000E_RxRing rxr;
@@ -1714,7 +1711,6 @@ e1000e_receive_internal(E1000ECore *core, const struct 
iovec *iov, int iovcnt,
 net_rx_pkt_unset_vhdr(core->rx_pkt);
 }
 
-filter_buf = iov->iov_base + iov_ofs;
 orig_size = iov_size(iov, iovcnt);
 size = orig_size - iov_ofs;
 
@@ -1723,15 +1719,13 @@ e1000e_receive_internal(E1000ECore *core, const struct 
iovec *iov, int iovcnt,
 iov_to_buf(iov, iovcnt, iov_ofs, min_buf, size);
 memset(_buf[size], 0, sizeof(min_buf) - size);
 e1000x_inc_reg_if_not_full(core->mac, RUC);
-min_iov.iov_base = filter_buf = min_buf;
+min_iov.iov_base = min_buf;
 min_iov.iov_len = size = sizeof(min_buf);
 iovcnt = 1;
 iov = _iov;
 iov_ofs = 0;
-} else if (iov->iov_len < maximum_ethernet_hdr_len) {
-/* This is very unlikely, but may happen. */
-iov_to_buf(iov, iovcnt, iov_ofs, min_buf, maximum_ethernet_hdr_len);
-filter_buf = min_buf;
+} else {
+iov_to_buf(iov, iovcnt, iov_ofs, min_buf, ETH_HLEN + 4);
 }
 
 /* Discard oversized packets if !LPE and !SBP. */
@@ -1740,9 +1734,9 @@ e1000e_receive_internal(E1000ECore *core, const struct 
iovec *iov, int iovcnt,
 }
 
 net_rx_pkt_set_packet_type(core->rx_pkt,
-get_eth_packet_type(PKT_GET_ETH_HDR(filter_buf)));
+get_eth_packet_type(PKT_GET_ETH_HDR(min_buf)));
 
-if (!e1000e_receive_filter(core, filter_buf, size)) {
+if (!e1000e_receive_filter(core, min_buf, size)) {
 trace_e1000e_rx_flt_dropped();
 return orig_size;
 }
-- 
2.40.0

Re: [PATCH v5] cxl-cdat:Fix open file not closed in ct3_load_cdat

2023-04-22 Thread Hao Zeng

hi Jonathan:

Thank you very much

Best regards
--- Hao


On Fri, 2023-04-21 at 14:14 +0100, Jonathan Cameron wrote:
> On Thu, 13 Apr 2023 20:23:58 +0800
> Hao Zeng  wrote:
> 
> > Open file descriptor not closed in error paths. Fix by replace
> > open coded handling of read of whole file into a buffer with
> > g_file_get_contents()
> > 
> > Fixes: aba578bdac ("hw/cxl: CDAT Data Object Exchange
> > implementation")
> > Signed-off-by: Zeng Hao 
> > Suggested-by: Philippe Mathieu-Daudé 
> > Suggested-by: Peter Maydell 
> > Suggested-by: Jonathan Cameron via 
> > 
> > ---
> > ChangeLog:
> >     v4-v5:
> >     fixes some style issues and keep the protection after using
> > g_free()
> >     v3-v4:
> >     Modify commit information,No code change.
> >     v2->v3:
> >     Submission of v3 on the basis of v2, based on Philippe
> > Mathieu-Daudé's suggestion
> >     "Pointless bzero in g_malloc0, however this code would be
> >  simplified using g_file_get_contents()."
> >     v1->v2:
> >     - Patch 1: No change in patch v1
> >     - Patch 2: Fix the check on the return value of fread() in
> > ct3_load_cdat
> > ---
> >  hw/cxl/cxl-cdat.c | 27 ---
> >  1 file changed, 8 insertions(+), 19 deletions(-)
> > 
> > diff --git a/hw/cxl/cxl-cdat.c b/hw/cxl/cxl-cdat.c
> > index 137abd0992..dd69366797 100644
> > --- a/hw/cxl/cxl-cdat.c
> > +++ b/hw/cxl/cxl-cdat.c
> > @@ -110,29 +110,18 @@ static void ct3_load_cdat(CDATObject *cdat,
> > Error **errp)
> >  g_autofree CDATEntry *cdat_st = NULL;
> >  uint8_t sum = 0;
> >  int num_ent;
> > -    int i = 0, ent = 1, file_size = 0;
> > +    int i = 0, ent = 1;
> > +    gsize file_size = 0;
> >  CDATSubHeader *hdr;
> > -    FILE *fp = NULL;
> > +    GError *error = NULL;
> >  
> >  /* Read CDAT file and create its cache */
> > -    fp = fopen(cdat->filename, "r");
> > -    if (!fp) {
> > -    error_setg(errp, "CDAT: Unable to open file");
> > +    if (!g_file_get_contents(cdat->filename, (gchar **)>buf,
> > + _size, )) {
> > +    error_setg(errp, "CDAT: File read failed: %s", error-
> > >message);
> > +    g_error_free(error);
> >  return;
> >  }
> > -
> > -    fseek(fp, 0, SEEK_END);
> > -    file_size = ftell(fp);
> > -    fseek(fp, 0, SEEK_SET);
> > -    cdat->buf = g_malloc0(file_size);
> > -
> > -    if (fread(cdat->buf, file_size, 1, fp) == 0) {
> > -    error_setg(errp, "CDAT: File read failed");
> > -    return;
> > -    }
> > -
> > -    fclose(fp);
> > -
> >  if (file_size < sizeof(CDATTableHeader)) {
> >  error_setg(errp, "CDAT: File too short");
> >  return;
> > @@ -219,6 +208,6 @@ void cxl_doe_cdat_release(CXLComponentState
> > *cxl_cstate)
> >    cdat->private);
> >  }
> >  if (cdat->buf) {
> 
> Check patch complains about this check being unnecessary. I'll drop
> the check
> and then pick up this patch as a precusor to the other stuff Peter
> pointed out in this
> area.
> 
> Thanks,
> 
> Jonathan
> 
> 
> > -    free(cdat->buf);
> > +    g_free(cdat->buf);
> >  }
> >  }
>

RE: [PATCH v2 6/6] tests/migration: Only run auto_converge in slow mode

2023-04-22 Thread Zhang, Chen




> -Original Message-
> From: Daniel P. Berrangé 
> Sent: Saturday, April 22, 2023 1:14 AM
> To: qemu-devel@nongnu.org
> Cc: qemu-bl...@nongnu.org; Paolo Bonzini ;
> Thomas Huth ; John Snow ; Li
> Zhijian ; Juan Quintela ;
> Stefan Hajnoczi ; Zhang, Chen
> ; Laurent Vivier 
> Subject: [PATCH v2 6/6] tests/migration: Only run auto_converge in slow
> mode
> 

What kind of scenario will the qtest open this g_test_init() -m slow to trigger 
the slow mode?

Thanks
Chen

> From: Juan Quintela 
> 
> Signed-off-by: Juan Quintela 
> ---
>  tests/qtest/migration-test.c | 23 +--
>  1 file changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index
> 63bd8a1893..9ed178aa03 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -1877,6 +1877,21 @@ static void test_validate_uuid_dst_not_set(void)
>  do_test_validate_uuid(, false);  }
> 
> +/*
> + * The way auto_converge works, we need to do too many passes to
> + * run this test.  Auto_converge logic is only run once every
> + * three iterations, so:
> + *
> + * - 3 iterations without auto_converge enabled
> + * - 3 iterations with pct = 5
> + * - 3 iterations with pct = 30
> + * - 3 iterations with pct = 55
> + * - 3 iterations with pct = 80
> + * - 3 iterations with pct = 95 (max(95, 80 + 25))
> + *
> + * To make things even worse, we need to run the initial stage at
> + * 3MB/s so we enter autoconverge even when host is (over)loaded.
> + */
>  static void test_migrate_auto_converge(void)  {
>  g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs); @@ -
> 2660,8 +2675,12 @@ int main(int argc, char **argv)
> test_validate_uuid_src_not_set);
>  qtest_add_func("/migration/validate_uuid_dst_not_set",
> test_validate_uuid_dst_not_set);
> -
> -qtest_add_func("/migration/auto_converge",
> test_migrate_auto_converge);
> +/*
> + * See explanation why this test is slow on function definition
> + */
> +if (g_test_slow()) {
> +qtest_add_func("/migration/auto_converge",
> test_migrate_auto_converge);
> +}
>  qtest_add_func("/migration/multifd/tcp/plain/none",
> test_multifd_tcp_none);
>  /*
> --
> 2.40.0

RE: [PATCH v2 1/6] tests/qtest: replace qmp_discard_response with qtest_qmp_assert_success

2023-04-22 Thread Zhang, Chen



> -Original Message-
> From: Daniel P. Berrangé 
> Sent: Saturday, April 22, 2023 1:14 AM
> To: qemu-devel@nongnu.org
> Cc: qemu-bl...@nongnu.org; Paolo Bonzini ;
> Thomas Huth ; John Snow ; Li
> Zhijian ; Juan Quintela ;
> Stefan Hajnoczi ; Zhang, Chen
> ; Laurent Vivier ; Daniel P.
> Berrangé 
> Subject: [PATCH v2 1/6] tests/qtest: replace qmp_discard_response with
> qtest_qmp_assert_success
> 
> The qmp_discard_response method simply ignores the result of the QMP
> command, merely unref'ing the object. This is a bad idea for tests as it 
> leaves
> no trace if the QMP command unexpectedly failed. The
> qtest_qmp_assert_success method will validate that the QMP command
> returned without error, and if errors occur, it will print a message on the
> console aiding debugging.
> 
> Signed-off-by: Daniel P. Berrangé 

Reviewed-by: Zhang Chen 

> ---
>  tests/qtest/ahci-test.c  | 31 ++--
>  tests/qtest/boot-order-test.c|  5 +
>  tests/qtest/fdc-test.c   | 15 +++---
>  tests/qtest/ide-test.c   |  5 +
>  tests/qtest/migration-test.c |  5 +
>  tests/qtest/test-filter-mirror.c |  5 +
>  tests/qtest/test-filter-redirector.c |  7 ++-
>  tests/qtest/virtio-blk-test.c| 24 ++---
>  8 files changed, 40 insertions(+), 57 deletions(-)
> 
> diff --git a/tests/qtest/ahci-test.c b/tests/qtest/ahci-test.c index
> 1967cd5898..abab761c26 100644
> --- a/tests/qtest/ahci-test.c
> +++ b/tests/qtest/ahci-test.c
> @@ -36,9 +36,6 @@
>  #include "hw/pci/pci_ids.h"
>  #include "hw/pci/pci_regs.h"
> 
> -/* TODO actually test the results and get rid of this */ -#define
> qmp_discard_response(s, ...) qobject_unref(qtest_qmp(s, __VA_ARGS__))
> -
>  /* Test images sizes in MB */
>  #define TEST_IMAGE_SIZE_MB_LARGE (200 * 1024)  #define
> TEST_IMAGE_SIZE_MB_SMALL 64 @@ -1595,9 +1592,9 @@ static void
> test_atapi_tray(void)
>  rsp = qtest_qmp_receive(ahci->parent->qts);
>  qobject_unref(rsp);
> 
> -qmp_discard_response(ahci->parent->qts,
> - "{'execute': 'blockdev-remove-medium', "
> - "'arguments': {'id': 'cd0'}}");
> +qtest_qmp_assert_success(ahci->parent->qts,
> + "{'execute': 'blockdev-remove-medium', "
> + "'arguments': {'id': 'cd0'}}");
> 
>  /* Test the tray without a medium */
>  ahci_atapi_load(ahci, port);
> @@ -1607,16 +1604,18 @@ static void test_atapi_tray(void)
>  atapi_wait_tray(ahci, true);
> 
>  /* Re-insert media */
> -qmp_discard_response(ahci->parent->qts,
> - "{'execute': 'blockdev-add', "
> - "'arguments': {'node-name': 'node0', "
> -"'driver': 'raw', "
> -"'file': { 'driver': 'file', "
> -  "'filename': %s }}}", iso);
> -qmp_discard_response(ahci->parent->qts,
> - "{'execute': 'blockdev-insert-medium',"
> - "'arguments': { 'id': 'cd0', "
> - "'node-name': 'node0' }}");
> +qtest_qmp_assert_success(
> +ahci->parent->qts,
> +"{'execute': 'blockdev-add', "
> +"'arguments': {'node-name': 'node0', "
> +  "'driver': 'raw', "
> +  "'file': { 'driver': 'file', "
> +"'filename': %s }}}", iso);
> +qtest_qmp_assert_success(
> +ahci->parent->qts,
> +"{'execute': 'blockdev-insert-medium',"
> +"'arguments': { 'id': 'cd0', "
> +   "'node-name': 'node0' }}");
> 
>  /* Again, the event shows up first */
>  qtest_qmp_send(ahci->parent->qts, "{'execute': 'blockdev-close-tray', "
> diff --git a/tests/qtest/boot-order-test.c b/tests/qtest/boot-order-test.c
> index 0680d79d6d..8f2b6ef05a 100644
> --- a/tests/qtest/boot-order-test.c
> +++ b/tests/qtest/boot-order-test.c
> @@ -16,9 +16,6 @@
>  #include "qapi/qmp/qdict.h"
>  #include "standard-headers/linux/qemu_fw_cfg.h"
> 
> -/* TODO actually test the results and get rid of this */ -#define
> qmp_discard_response(qs, ...) qobject_unref(qtest_qmp(qs, __VA_ARGS__))
> -
>  typedef struct {
>  const char *args;
>  uint64_t expected_boot;
> @@ -43,7 +40,7 @@ static void test_a_boot_order(const char *machine,
>machine ?: "", test_args);
>  actual = read_boot_order(qts);
>  g_assert_cmphex(actual, ==, expected_boot);
> -qmp_discard_response(qts, "{ 'execute': 'system_reset' }");
> +qtest_qmp_assert_success(qts, "{ 'execute': 'system_reset' }");
>  /*
>   * system_reset only requests reset.  We get a RESET event after
>   * the actual reset completes.  Need to wait for that.
> diff --git a/tests/qtest/fdc-test.c b/tests/qtest/fdc-test.c index
>

RE: [PATCH v2 4/4] configure: add --disable-colo-filters option

2023-04-22 Thread Zhang, Chen



> -Original Message-
> From: Vladimir Sementsov-Ogievskiy 
> Sent: Friday, April 21, 2023 4:53 PM
> To: Zhang, Chen ; qemu-devel@nongnu.org
> Cc: qemu-bl...@nongnu.org; michael.r...@amd.com; arm...@redhat.com;
> ebl...@redhat.com; jasow...@redhat.com; quint...@redhat.com; Zhang,
> Hailiang ; phi...@linaro.org;
> th...@redhat.com; berra...@redhat.com; marcandre.lur...@redhat.com;
> pbonz...@redhat.com; d...@treblig.org; hre...@redhat.com;
> kw...@redhat.com; lizhij...@fujitsu.com
> Subject: Re: [PATCH v2 4/4] configure: add --disable-colo-filters option
> 
> On 21.04.23 05:22, Zhang, Chen wrote:
> >
> >
> >> -Original Message-
> >> From: Vladimir Sementsov-Ogievskiy 
> >> Sent: Thursday, April 20, 2023 7:26 PM
> >> To: Zhang, Chen ; qemu-devel@nongnu.org
> >> Cc: qemu-bl...@nongnu.org; michael.r...@amd.com;
> arm...@redhat.com;
> >> ebl...@redhat.com; jasow...@redhat.com; quint...@redhat.com;
> Zhang,
> >> Hailiang ; phi...@linaro.org;
> >> th...@redhat.com; berra...@redhat.com;
> marcandre.lur...@redhat.com;
> >> pbonz...@redhat.com; d...@treblig.org; hre...@redhat.com;
> >> kw...@redhat.com; lizhij...@fujitsu.com
> >> Subject: Re: [PATCH v2 4/4] configure: add --disable-colo-filters
> >> option
> >>
> >> On 20.04.23 12:09, Zhang, Chen wrote:
> >>>
> >>>
>  -Original Message-
>  From: Vladimir Sementsov-Ogievskiy 
>  Sent: Thursday, April 20, 2023 6:53 AM
>  To: qemu-devel@nongnu.org
>  Cc: qemu-bl...@nongnu.org; michael.r...@amd.com;
> >> arm...@redhat.com;
>  ebl...@redhat.com; jasow...@redhat.com; quint...@redhat.com;
> >> Zhang,
>  Hailiang ; phi...@linaro.org;
>  th...@redhat.com; berra...@redhat.com;
> >> marcandre.lur...@redhat.com;
>  pbonz...@redhat.com; d...@treblig.org; hre...@redhat.com;
>  kw...@redhat.com; Zhang, Chen ;
>  lizhij...@fujitsu.com; Vladimir Sementsov-Ogievskiy
>  
>  Subject: [PATCH v2 4/4] configure: add --disable-colo-filters
>  option
> 
>  Add option to not build COLO Proxy subsystem if it is not needed.
> >>>
> >>> I think no need to add the --disable-colo-filter option.
> >>> Net-filters just general infrastructure. Another example is COLO
> >>> also use the -chardev socket to connect each filters. No need to add
> >>> the --
> >> disable-colo-chardev
> >>> Please drop this patch.
> >>
> >> I don't follow your reasoning. Of course, we don't need any option
> >> like this, and can simply include to build all the components we
> >> don't use. So "no need" is correct for any --disable-* option.
> >> Still, I think, it's good, when you can exclude from build the
> >> subsystems that you don't need / don't want to maintain or ship to your
> end users.
> >
> > Yes, I agree with your idea.
> >
> >>
> >> In MAINTAINERS these two filters are in "COLO Proxy" susbsystem. Is
> >> it correct? What's wrong with adding an option to not build this
> >> subsystem? I can rename it to --disable-colo-proxy.
> >
> > The history is COLO project contributed QEMU filter framework and filter-
> mirror/redirector...etc..
> > And the "COLO Proxy" susbsystem covered the filters do not means it
> belong to COLO.
> > So, It is unreasonable to tell users that you cannot use
> > filter-mirror/rediretor for network debugging Or other purpose because
> you have not enabled COLO proxy.
> 
> But we don't say it to users, as these filters are enabled by default.
> 
> But I see your point. And looking at filter-mirror.c I see that there is no
> relations with colo. Can't say this about filter-rewriter.c
> 
> So, absolutely correct would be just have separate options
> 
> --disable-net-filter-mirror
> --disable-net-filter-rewriter
> 
> and for any other filter we want to be "disable-able", like options for block
> drivers (I mean --disable-parallels, --disable-qcow1, --disable-qed, etc for
> files describing these drivers in block/)
> 

Yes.

> 
> >
> >>
> >>> But for COLO network part, still something need to do:
> >>> You can add --disable-colo-proxy not to build the net/colo-compare.c
> >>> if it is
> >> not needed.
> >>> This file just for COLO and not belong to network filters.
> >>
> >> net/colo-compare.c is used only only for COLO, which in turn used
> >> only with CONFIG_REPLICATION enabled, see patch 3. So, no reason to
> >> add separate option for it, it should be disabled with 
> >> --disable-replication.
> >
> > Yes, and as Lukas said, COLO is the only user for the filter-rewriter 
> > currently.
> 
> So, maybe simply move filter-rewriter.c under CONFIG_REPLICATION, if it's
> needed only for COLO?
> 

As I know, in QEMU side, COLO is the only user for filter-rewriter.
But QEMU user(libvirt...etc...) may try to use it for other proposal.

> > It is more appropriate to add filter-rewriter replace the filter-mirror 
> > here.
> > I saw the patch 3, it is better to move it to this patch.
> 
> Hmm what do you mean? Both filter-rewriter and filter-mirror are now
> handled in this patch, so what to replace?

I

RE: [PATCH v2 3/4] build: move COLO under CONFIG_REPLICATION

2023-04-22 Thread Zhang, Chen



> -Original Message-
> From: Vladimir Sementsov-Ogievskiy 
> Sent: Friday, April 21, 2023 4:36 PM
> To: Zhang, Chen ; qemu-devel@nongnu.org
> Cc: qemu-bl...@nongnu.org; michael.r...@amd.com; arm...@redhat.com;
> ebl...@redhat.com; jasow...@redhat.com; quint...@redhat.com; Zhang,
> Hailiang ; phi...@linaro.org;
> th...@redhat.com; berra...@redhat.com; marcandre.lur...@redhat.com;
> pbonz...@redhat.com; d...@treblig.org; hre...@redhat.com;
> kw...@redhat.com; lizhij...@fujitsu.com
> Subject: Re: [PATCH v2 3/4] build: move COLO under CONFIG_REPLICATION
> 
> On 21.04.23 06:02, Zhang, Chen wrote:
> >
> >
> >> -Original Message-
> >> From: Vladimir Sementsov-Ogievskiy 
> >> Sent: Thursday, April 20, 2023 6:53 AM
> >> To: qemu-devel@nongnu.org
> >> Cc: qemu-bl...@nongnu.org; michael.r...@amd.com;
> arm...@redhat.com;
> >> ebl...@redhat.com; jasow...@redhat.com; quint...@redhat.com;
> Zhang,
> >> Hailiang ; phi...@linaro.org;
> >> th...@redhat.com; berra...@redhat.com;
> marcandre.lur...@redhat.com;
> >> pbonz...@redhat.com; d...@treblig.org; hre...@redhat.com;
> >> kw...@redhat.com; Zhang, Chen ;
> >> lizhij...@fujitsu.com; Vladimir Sementsov-Ogievskiy
> >> 
> >> Subject: [PATCH v2 3/4] build: move COLO under CONFIG_REPLICATION
> >>
> >> We don't allow to use x-colo capability when replication is not
> >> configured. So, no reason to build COLO when replication is disabled,
> >> it's unusable in this case.
> >
> > Yes, you are right for current status. Because COLO best practices is
> replication + colo live migration + colo proxy.
> > But doesn't mean it has to be done in all scenarios as I explanation in V1.
> > The better way is allow to use x-colo capability firstly, and separate
> > this patch with two config options: --disable-replication  and --disable-x-
> colo.
> >
> 
> But what for? We for sure don't have such scenarios now (COLO without
> replication), as it's not allowed by far 7e934f5b27eee1b0d7 (by you and
> David).
> 
> If you think we need such scenario, I think it should be a separate series
> which reverts 7e934f5b27eee1b0d7 and adds corresponding test and
> probably documentation.

In the patch 7e934f5b27eee1b0d7 said it's for current independent disk mode,
And what we talked about before is the shared disk mode.
Rethink about the COLO shared disk mode, this feature still needs some enabling 
works.
It looks OK for now and separate the build options when enabling COLO shared 
disk mode.

Thanks
Chen

> 
> 
> --
> Best regards,
> Vladimir

Re: [PATCH 13/13] hw/ide: Extract bmdma_clear_status()

2023-04-22 Thread BALATON Zoltan


On Sat, 22 Apr 2023, Bernhard Beschow wrote:

Extract bmdma_clear_status() mirroring bmdma_cmd_writeb().

Signed-off-by: Bernhard Beschow 
---
include/hw/ide/pci.h |  1 +
hw/ide/cmd646.c  |  2 +-
hw/ide/pci.c |  7 +++
hw/ide/piix.c|  2 +-
hw/ide/sii3112.c | 12 +---
hw/ide/via.c |  2 +-
hw/ide/trace-events  |  1 +
7 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/include/hw/ide/pci.h b/include/hw/ide/pci.h
index 81e0370202..6a286ad307 100644
--- a/include/hw/ide/pci.h
+++ b/include/hw/ide/pci.h
@@ -59,6 +59,7 @@ struct PCIIDEState {
void bmdma_init(IDEBus *bus, BMDMAState *bm, PCIIDEState *d);
void bmdma_init_ops(PCIIDEState *d, const MemoryRegionOps *bmdma_ops);
void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val);
+void bmdma_clear_status(BMDMAState *bm, uint32_t val);
void pci_ide_create_devs(PCIDevice *dev);

#endif
diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
index b9d005a357..973c3ff0dc 100644
--- a/hw/ide/cmd646.c
+++ b/hw/ide/cmd646.c
@@ -144,7 +144,7 @@ static void bmdma_write(void *opaque, hwaddr addr,
cmd646_update_irq(pci_dev);
break;
case 2:
-bm->status = (val & 0x60) | (bm->status & 1) | (bm->status & ~val & 
0x06);
+bmdma_clear_status(bm, val);
break;
case 3:
if (bm == >pci_dev->bmdma[0]) {
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 3539b162b7..4aa06be7c6 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -318,6 +318,13 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val)
bm->cmd = val & 0x09;
}

+void bmdma_clear_status(BMDMAState *bm, uint32_t val)
+{
+trace_bmdma_update_status(val);
+
+bm->status = (val & 0x60) | (bm->status & BM_STATUS_DMAING) | (bm->status & 
~val & 0x06);
+}
+
static uint64_t bmdma_addr_read(void *opaque, hwaddr addr,
unsigned width)
{
diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index 406a67fa0f..9eab615e35 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -76,7 +76,7 @@ static void bmdma_write(void *opaque, hwaddr addr,
bmdma_cmd_writeb(bm, val);
break;
case 2:
-bm->status = (val & 0x60) | (bm->status & 1) | (bm->status & ~val & 
0x06);
+bmdma_clear_status(bm, val);
break;
}
}
diff --git a/hw/ide/sii3112.c b/hw/ide/sii3112.c
index 373c0dd1ee..1180ff55e7 100644
--- a/hw/ide/sii3112.c
+++ b/hw/ide/sii3112.c
@@ -66,7 +66,7 @@ static void sii3112_bmdma_write(void *opaque, hwaddr addr,
uint64_t val, unsigned int size)
{
BMDMAState *bm = opaque;
-SiI3112PCIState *d = SII3112_PCI(bm->pci_dev);
+SiI3112PCIState *s = SII3112_PCI(bm->pci_dev);


Also renaming local variable is an unrelated change. May be separate patch 
but wasn't it added in previous patch? Why not already done there?


Regards,
BALATON Zoltan


int i = (bm == >pci_dev->bmdma[0]) ? 0 : 1;

trace_sii3112_bmdma_write(size, addr, val);
@@ -75,10 +75,10 @@ static void sii3112_bmdma_write(void *opaque, hwaddr addr,
bmdma_cmd_writeb(bm, val);
break;
case 0x01:
-d->regs[i].swdata = val & 0x3f;
+s->regs[i].swdata = val & 0x3f;
break;
case 0x02:
-bm->status = (val & 0x60) | (bm->status & 1) | (bm->status & ~val & 6);
+bmdma_clear_status(bm, val);
break;
default:
break;
@@ -160,8 +160,7 @@ static void sii3112_reg_write(void *opaque, hwaddr addr,
d->regs[0].swdata = val & 0x3f;
break;
case 0x12:
-d->i.bmdma[0].status = (val & 0x60) | (d->i.bmdma[0].status & 1) |
-   (d->i.bmdma[0].status & ~val & 6);
+bmdma_clear_status(>i.bmdma[0], val);
break;
case 0x18:
bmdma_cmd_writeb(>i.bmdma[1], val);
@@ -170,8 +169,7 @@ static void sii3112_reg_write(void *opaque, hwaddr addr,
d->regs[1].swdata = val & 0x3f;
break;
case 0x1a:
-d->i.bmdma[1].status = (val & 0x60) | (d->i.bmdma[1].status & 1) |
-   (d->i.bmdma[1].status & ~val & 6);
+bmdma_clear_status(>i.bmdma[1], val);
break;
case 0x100:
d->regs[0].scontrol = val & 0xfff;
diff --git a/hw/ide/via.c b/hw/ide/via.c
index 35dd97e49b..afb97f302a 100644
--- a/hw/ide/via.c
+++ b/hw/ide/via.c
@@ -75,7 +75,7 @@ static void bmdma_write(void *opaque, hwaddr addr,
bmdma_cmd_writeb(bm, val);
break;
case 2:
-bm->status = (val & 0x60) | (bm->status & 1) | (bm->status & ~val & 
0x06);
+bmdma_clear_status(bm, val);
break;
default:;
}
diff --git a/hw/ide/trace-events b/hw/ide/trace-events
index a479525e38..d219c64b61 100644
--- a/hw/ide/trace-events
+++ b/hw/ide/trace-events
@@ -30,6 +30,7 @@ bmdma_write_cmd646(uint64_t addr, uint64_t val) "bmdma: writeb 
0x%"PRIx64" : 0x%
# pci.c
bmdma_reset(void) ""
bmdma_cmd_writeb(uint32_t val) "val: 0x%08x"
+bmdma_update_status(uint32_t val) "val: 0x%08x"
bmdma_addr_read(uint64_t data) "data:

Re: [PATCH 13/13] hw/ide: Extract bmdma_clear_status()

2023-04-22 Thread BALATON Zoltan


On Sat, 22 Apr 2023, Bernhard Beschow wrote:

Extract bmdma_clear_status() mirroring bmdma_cmd_writeb().


Is adding a trace point useful? This is called from places that already 
have traces so I don't think we need another separate trace point here. 
Also the names don't match but maybe rename function to 
bmdma_update_status instead as it is more what it does.


Regards,
BALATON Zoltan


Signed-off-by: Bernhard Beschow 
---
include/hw/ide/pci.h |  1 +
hw/ide/cmd646.c  |  2 +-
hw/ide/pci.c |  7 +++
hw/ide/piix.c|  2 +-
hw/ide/sii3112.c | 12 +---
hw/ide/via.c |  2 +-
hw/ide/trace-events  |  1 +
7 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/include/hw/ide/pci.h b/include/hw/ide/pci.h
index 81e0370202..6a286ad307 100644
--- a/include/hw/ide/pci.h
+++ b/include/hw/ide/pci.h
@@ -59,6 +59,7 @@ struct PCIIDEState {
void bmdma_init(IDEBus *bus, BMDMAState *bm, PCIIDEState *d);
void bmdma_init_ops(PCIIDEState *d, const MemoryRegionOps *bmdma_ops);
void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val);
+void bmdma_clear_status(BMDMAState *bm, uint32_t val);
void pci_ide_create_devs(PCIDevice *dev);

#endif
diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
index b9d005a357..973c3ff0dc 100644
--- a/hw/ide/cmd646.c
+++ b/hw/ide/cmd646.c
@@ -144,7 +144,7 @@ static void bmdma_write(void *opaque, hwaddr addr,
cmd646_update_irq(pci_dev);
break;
case 2:
-bm->status = (val & 0x60) | (bm->status & 1) | (bm->status & ~val & 
0x06);
+bmdma_clear_status(bm, val);
break;
case 3:
if (bm == >pci_dev->bmdma[0]) {
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 3539b162b7..4aa06be7c6 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -318,6 +318,13 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val)
bm->cmd = val & 0x09;
}

+void bmdma_clear_status(BMDMAState *bm, uint32_t val)
+{
+trace_bmdma_update_status(val);
+
+bm->status = (val & 0x60) | (bm->status & BM_STATUS_DMAING) | (bm->status & 
~val & 0x06);
+}
+
static uint64_t bmdma_addr_read(void *opaque, hwaddr addr,
unsigned width)
{
diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index 406a67fa0f..9eab615e35 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -76,7 +76,7 @@ static void bmdma_write(void *opaque, hwaddr addr,
bmdma_cmd_writeb(bm, val);
break;
case 2:
-bm->status = (val & 0x60) | (bm->status & 1) | (bm->status & ~val & 
0x06);
+bmdma_clear_status(bm, val);
break;
}
}
diff --git a/hw/ide/sii3112.c b/hw/ide/sii3112.c
index 373c0dd1ee..1180ff55e7 100644
--- a/hw/ide/sii3112.c
+++ b/hw/ide/sii3112.c
@@ -66,7 +66,7 @@ static void sii3112_bmdma_write(void *opaque, hwaddr addr,
uint64_t val, unsigned int size)
{
BMDMAState *bm = opaque;
-SiI3112PCIState *d = SII3112_PCI(bm->pci_dev);
+SiI3112PCIState *s = SII3112_PCI(bm->pci_dev);
int i = (bm == >pci_dev->bmdma[0]) ? 0 : 1;

trace_sii3112_bmdma_write(size, addr, val);
@@ -75,10 +75,10 @@ static void sii3112_bmdma_write(void *opaque, hwaddr addr,
bmdma_cmd_writeb(bm, val);
break;
case 0x01:
-d->regs[i].swdata = val & 0x3f;
+s->regs[i].swdata = val & 0x3f;
break;
case 0x02:
-bm->status = (val & 0x60) | (bm->status & 1) | (bm->status & ~val & 6);
+bmdma_clear_status(bm, val);
break;
default:
break;
@@ -160,8 +160,7 @@ static void sii3112_reg_write(void *opaque, hwaddr addr,
d->regs[0].swdata = val & 0x3f;
break;
case 0x12:
-d->i.bmdma[0].status = (val & 0x60) | (d->i.bmdma[0].status & 1) |
-   (d->i.bmdma[0].status & ~val & 6);
+bmdma_clear_status(>i.bmdma[0], val);
break;
case 0x18:
bmdma_cmd_writeb(>i.bmdma[1], val);
@@ -170,8 +169,7 @@ static void sii3112_reg_write(void *opaque, hwaddr addr,
d->regs[1].swdata = val & 0x3f;
break;
case 0x1a:
-d->i.bmdma[1].status = (val & 0x60) | (d->i.bmdma[1].status & 1) |
-   (d->i.bmdma[1].status & ~val & 6);
+bmdma_clear_status(>i.bmdma[1], val);
break;
case 0x100:
d->regs[0].scontrol = val & 0xfff;
diff --git a/hw/ide/via.c b/hw/ide/via.c
index 35dd97e49b..afb97f302a 100644
--- a/hw/ide/via.c
+++ b/hw/ide/via.c
@@ -75,7 +75,7 @@ static void bmdma_write(void *opaque, hwaddr addr,
bmdma_cmd_writeb(bm, val);
break;
case 2:
-bm->status = (val & 0x60) | (bm->status & 1) | (bm->status & ~val & 
0x06);
+bmdma_clear_status(bm, val);
break;
default:;
}
diff --git a/hw/ide/trace-events b/hw/ide/trace-events
index a479525e38..d219c64b61 100644
--- a/hw/ide/trace-events
+++ b/hw/ide/trace-events
@@ -30,6 +30,7 @@ bmdma_write_cmd646(uint64_t addr, uint64_t val) "bmdma: writeb 
0x%"PRIx64" : 0x%
# pci.c
bmdma_reset(void) ""
bmdma_cmd_writeb(uint32_t

Re: [PATCH 11/13] hw/ide/sii3112: Reuse PCIIDEState::{cmd,data}_ops

2023-04-22 Thread BALATON Zoltan


On Sat, 22 Apr 2023, Bernhard Beschow wrote:

Allows to unexport pci_ide_{cmd,data}_le_ops and models TYPE_SII3112_PCI as a
standard-compliant PCI IDE device.

Signed-off-by: Bernhard Beschow 
---
include/hw/ide/pci.h |  2 --
hw/ide/pci.c |  4 ++--
hw/ide/sii3112.c | 50 
3 files changed, 20 insertions(+), 36 deletions(-)

diff --git a/include/hw/ide/pci.h b/include/hw/ide/pci.h
index 5025df5b82..dbb4b13161 100644
--- a/include/hw/ide/pci.h
+++ b/include/hw/ide/pci.h
@@ -62,6 +62,4 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val);
extern MemoryRegionOps bmdma_addr_ioport_ops;
void pci_ide_create_devs(PCIDevice *dev);

-extern const MemoryRegionOps pci_ide_cmd_le_ops;
-extern const MemoryRegionOps pci_ide_data_le_ops;
#endif
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index b2fcc00a64..97ccc75aa6 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -60,7 +60,7 @@ static void pci_ide_ctrl_write(void *opaque, hwaddr addr,
ide_ctrl_write(bus, addr + 2, data);
}

-const MemoryRegionOps pci_ide_cmd_le_ops = {
+static const MemoryRegionOps pci_ide_cmd_le_ops = {
.read = pci_ide_status_read,
.write = pci_ide_ctrl_write,
.endianness = DEVICE_LITTLE_ENDIAN,
@@ -98,7 +98,7 @@ static void pci_ide_data_write(void *opaque, hwaddr addr,
}
}

-const MemoryRegionOps pci_ide_data_le_ops = {
+static const MemoryRegionOps pci_ide_data_le_ops = {
.read = pci_ide_data_read,
.write = pci_ide_data_write,
.endianness = DEVICE_LITTLE_ENDIAN,
diff --git a/hw/ide/sii3112.c b/hw/ide/sii3112.c
index 0af897a9ef..9cf920369f 100644
--- a/hw/ide/sii3112.c
+++ b/hw/ide/sii3112.c
@@ -88,21 +88,9 @@ static uint64_t sii3112_reg_read(void *opaque, hwaddr addr,
val |= (d->regs[1].confstat & (1UL << 11) ? (1 << 4) : 0);
val |= (uint32_t)d->i.bmdma[1].status << 16;
break;
-case 0x80 ... 0x87:
-val = pci_ide_data_le_ops.read(>i.bus[0], addr - 0x80, size);
-break;
-case 0x8a:
-val = pci_ide_cmd_le_ops.read(>i.bus[0], 2, size);
-break;
case 0xa0:
val = d->regs[0].confstat;
break;
-case 0xc0 ... 0xc7:
-val = pci_ide_data_le_ops.read(>i.bus[1], addr - 0xc0, size);
-break;
-case 0xca:
-val = pci_ide_cmd_le_ops.read(>i.bus[1], 2, size);
-break;
case 0xe0:
val = d->regs[1].confstat;
break;
@@ -171,18 +159,6 @@ static void sii3112_reg_write(void *opaque, hwaddr addr,
case 0x0c ... 0x0f:
bmdma_addr_ioport_ops.write(>i.bmdma[1], addr - 12, val, size);
break;
-case 0x80 ... 0x87:
-pci_ide_data_le_ops.write(>i.bus[0], addr - 0x80, val, size);
-break;
-case 0x8a:
-pci_ide_cmd_le_ops.write(>i.bus[0], 2, val, size);
-break;
-case 0xc0 ... 0xc7:
-pci_ide_data_le_ops.write(>i.bus[1], addr - 0xc0, val, size);
-break;
-case 0xca:
-pci_ide_cmd_le_ops.write(>i.bus[1], 2, val, size);
-break;
case 0x100:
d->regs[0].scontrol = val & 0xfff;
if (val & 1) {
@@ -259,6 +235,11 @@ static void sii3112_pci_realize(PCIDevice *dev, Error 
**errp)
pci_config_set_interrupt_pin(dev->config, 1);
pci_set_byte(dev->config + PCI_CACHE_LINE_SIZE, 8);

+pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_IO, >data_ops[0]);
+pci_register_bar(dev, 1, PCI_BASE_ADDRESS_SPACE_IO, >cmd_ops[0]);
+pci_register_bar(dev, 2, PCI_BASE_ADDRESS_SPACE_IO, >data_ops[1]);
+pci_register_bar(dev, 3, PCI_BASE_ADDRESS_SPACE_IO, >cmd_ops[1]);
+
/* BAR5 is in PCI memory space */
memory_region_init_io(>mmio, OBJECT(d), _reg_ops, d,
 "sii3112.bar5", 0x200);
@@ -266,17 +247,22 @@ static void sii3112_pci_realize(PCIDevice *dev, Error 
**errp)

/* BAR0-BAR4 are PCI I/O space aliases into BAR5 */


This patch breaks the above comment but I think you should not mess with 
BAR0-4 at all and leave to to aliased into BAR5. These have the same 
registers mirrored and some guests access them via the memory mapped BAR5 
while others prefer the io mapped BAR0-4 so removing these from BAR5 would 
break some guests. If you want to remove something from BAR5 and map 
subregions implementing those instead then I think only BAR5 needs to be 
chnaged or I'm not getting what is happening here so a more detailed 
commit message would be needed.


Was this tested? A minimal test might be booting AROS and MorphOS on 
sam460ex.


Regards,
BALATON Zoltan


mr = g_new(MemoryRegion, 1);
-memory_region_init_alias(mr, OBJECT(d), "sii3112.bar0", >mmio, 0x80, 8);
-pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_IO, mr);
+memory_region_init_alias(mr, OBJECT(d), "sii3112.bar0", >data_ops[0], 0,
+ memory_region_size(>data_ops[0]));
+memory_region_add_subregion_overlap(>mmio, 0x80, mr, 1);
mr = g_new(MemoryRegion, 1);
-memory_region_init_alias(mr, OBJECT(d), "sii3112.bar1", >mmio, 0x88, 4);
-

Re: [PATCH 01/11] Signed-off-by: Karim Taha

2023-04-22 Thread Karim Taha

I made a fork on gitlab and pushed a branch at
https://gitlab.com/Kariiem/qemu/-/tree/gsoc23-task3/ .

On Sat, Apr 22, 2023 at 1:18 AM Warner Losh  wrote:

> Usually this means pushing a branch off of mastar to a service like github
> or gitlab, and then
> posting a URL with where to get it.
>
> Warner
>
> On Fri, Apr 21, 2023 at 4:40 PM Karim Taha 
> wrote:
>
>> It was sent with git-publish, what do you mean by pointing to a branch?
>>
>> On Fri, Apr 21, 2023 at 7:22 PM Alex Bennée 
>> wrote:
>>
>>>
>>> Karim Taha  writes:
>>>
>>> > On Fri, Apr 21, 2023 at 9:17 AM Daniel P. Berrangé <
>>> berra...@redhat.com> wrote:
>>> >
>>> >  On Fri, Apr 21, 2023 at 07:22:45AM +0200, Karim Taha wrote:
>>> >  > From: Warner Losh 
>>> >  >
>>> >  > Allow guest_base to be initialized on 64-bit hosts, the initial
>>> value is used by g2h_untagged function
>>> >  defined in include/exec/cpu_ldst.h
>>> >
>>> >  This commit message is all incorrectly structured I'm afraid.
>>> >
>>> >  There needs to a short 1 line summary, then a blank line,
>>> >  then the full commit description text, then a blank line,
>>> >  then the Signed-off-by tag(s).
>>> >
>>> >  Also if you're sending work done by Warner (as the From
>>> >  tag suggests), then we would expect to see Warner's own
>>> >  Signed-off-by tag, in addition to your own Signed-off-by.
>>> 
>>> >
>>> > Alright, thanks for the commit formatting tips, I resent the patch
>>> series, with my signed off by tag and the
>>> > author signed off by tags as well.
>>>
>>> Hmm something has gone wrong. Was this sent with a plain git-send-email
>>> or using a tool like git-publish?
>>>
>>> Can you point to a branch?
>>>
>>> >
>>> > Best regards,
>>> > Karim
>>>
>>>
>>> --
>>> Alex Bennée
>>> Virtualisation Tech Lead @ Linaro
>>>
>>

Re: [PATCH 02/13] hw/ide/via: Implement ISA IRQ routing

2023-04-22 Thread BALATON Zoltan


On Sat, 22 Apr 2023, Bernhard Beschow wrote:

Am 22. April 2023 17:23:56 UTC schrieb BALATON Zoltan :

On Sat, 22 Apr 2023, Bernhard Beschow wrote:

The VIA south bridge allows the legacy IDE interrupts to be routed to four
different ISA interrupts. This can be configured through the 0x4a register in
the PCI configuration space of the ISA function. The default routing matches
the legacy ISA IRQs, that is 14 and 15.


On VT8231 0x4a is PCI Master Arbitration Control, IDE interrupt Routing is 0x4c 
and only documents 14/15 as valid values.


In the datasheet titled "VT8231 South Bridge", preliminary revision 0.8, 
Oct. 29, 1999, page 60, the "IDE Interrupt Routing" register is located 
at offset 0x4a and offers the same four interrupts in the same order as 
in the code. Are we looking at the same datasheet?


Apparently not. The one I have says: Revision 2.32, May 10, 2004. Looks 
more authorative than a preliminary one.



Not sure any guest would actually change this or 0x4a and if that could cause 
problems but you may need to handle this somehow. (Apart from testing with 
MorphOS with -kernel you should really be testing with pegasos2.rom with 
MorphOS and Linux, e.g. Debian 8.11 netinstall iso is known to boot.)


I've tested extensively with an x86 Linux guest on my pc-via branch which 
worked flawlessly.


That does not substitute testing Linux on pegasos2 though becuase there 
are some hacks in Linux kernel to handle some pecularities of the pegasos2 
including via ide on that machine and that can only be fully tested with 
pegasos2.rom and PPC Linux.


As mentioned in the commit message the default routing of the chipset 
matches legacy behavior, that is interrupts 14 and 15. This is reflected 
by assigning [0x4a] = 4 in the code and that is how the code behaved 
before.


And that's the only allowed value on VT8231, other bits are listed as 
reserved so I wonder if we want to model this at all if no guest is 
touching it anyway. So you could also just drop that part and keep it hard 
mapped to 14-15 as it is now, mentioning the config reg in a comment if we 
ever find a guest that needs it.


Regards,
BALATON Zoltan

Re: [RFC PATCH 12/13] HACK: use memory region API to inject memory to guest

2023-04-22 Thread Peter Maydell

On Fri, 21 Apr 2023 at 02:13, Gurchetan Singh
 wrote:

> Though the api does make an exception:
>
> "There is an exception to the above rule: it is okay to call
> object_unparent at any time for an alias or a container region. It is
> therefore also okay to create or destroy alias and container regions
> dynamically during a device’s lifetime."
>
> I believe we are trying to create a container subregion, but that's
> still failing?

> @@ -671,6 +677,14 @@ rutabaga_cmd_resource_map_blob(VirtIOGPU *g,
>  result = rutabaga_resource_map(rutabaga, mblob.resource_id, );
>  CHECK_RESULT(result, cmd);
>
> +memory_region_transaction_begin();
> +memory_region_init_ram_device_ptr(>region, OBJECT(g), NULL,
> +  mapping.size, (void *)mapping.ptr);

This isn't a container MemoryRegion -- it is a RAM MR. That is,
accesses to it are backed by a lump of host memory (viz, the
one provided here via the mapping.ptr). A container MR is one
which provides no backing mechanism (neither host RAM, nor
MMIO read/write callbacks), and whose contents are purely
those of any other MemoryRegions that you add to it via
memory_region_add_subregion(). So the exception listed in the
API docs does not apply here.

-- PMM

Re: [PATCH 05/13] hw/ide: Extract pci_ide_class_init()

2023-04-22 Thread Bernhard Beschow




Am 22. April 2023 17:34:30 UTC schrieb BALATON Zoltan :
>On Sat, 22 Apr 2023, Bernhard Beschow wrote:
>> Resolves redundant code in every PCI IDE device model.
>
>This patch could be broken up a bit more as it seems to do unrelated changes. 
>Such as setting DEVICE_CATEGORY_STORAGE in a different way could be a separate 
>patch to make it simpler to review.

Okay, I'll slice this patch in the next iteration, moving each assignment 
separately.

Best regards,
Bernhard

>
>Regards,
>BALATON Zoltan
>
>> ---
>> include/hw/ide/pci.h |  1 -
>> hw/ide/cmd646.c  | 15 ---
>> hw/ide/pci.c | 25 -
>> hw/ide/piix.c| 19 ---
>> hw/ide/sii3112.c |  3 ++-
>> hw/ide/via.c | 15 ---
>> 6 files changed, 26 insertions(+), 52 deletions(-)
>> 
>> diff --git a/include/hw/ide/pci.h b/include/hw/ide/pci.h
>> index 74c127e32f..7bc4e53d02 100644
>> --- a/include/hw/ide/pci.h
>> +++ b/include/hw/ide/pci.h
>> @@ -61,7 +61,6 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val);
>> extern MemoryRegionOps bmdma_addr_ioport_ops;
>> void pci_ide_create_devs(PCIDevice *dev);
>> 
>> -extern const VMStateDescription vmstate_ide_pci;
>> extern const MemoryRegionOps pci_ide_cmd_le_ops;
>> extern const MemoryRegionOps pci_ide_data_le_ops;
>> #endif
>> diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
>> index a094a6e12a..9aabf80e52 100644
>> --- a/hw/ide/cmd646.c
>> +++ b/hw/ide/cmd646.c
>> @@ -301,17 +301,6 @@ static void pci_cmd646_ide_realize(PCIDevice *dev, 
>> Error **errp)
>> }
>> }
>> 
>> -static void pci_cmd646_ide_exitfn(PCIDevice *dev)
>> -{
>> -PCIIDEState *d = PCI_IDE(dev);
>> -unsigned i;
>> -
>> -for (i = 0; i < 2; ++i) {
>> -memory_region_del_subregion(>bmdma_bar, >bmdma[i].extra_io);
>> -memory_region_del_subregion(>bmdma_bar, 
>> >bmdma[i].addr_ioport);
>> -}
>> -}
>> -
>> static Property cmd646_ide_properties[] = {
>> DEFINE_PROP_UINT32("secondary", PCIIDEState, secondary, 0),
>> DEFINE_PROP_END_OF_LIST(),
>> @@ -323,17 +312,13 @@ static void cmd646_ide_class_init(ObjectClass *klass, 
>> void *data)
>> PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>> 
>> dc->reset = cmd646_reset;
>> -dc->vmsd = _ide_pci;
>> k->realize = pci_cmd646_ide_realize;
>> -k->exit = pci_cmd646_ide_exitfn;
>> k->vendor_id = PCI_VENDOR_ID_CMD;
>> k->device_id = PCI_DEVICE_ID_CMD_646;
>> k->revision = 0x07;
>> -k->class_id = PCI_CLASS_STORAGE_IDE;
>> k->config_read = cmd646_pci_config_read;
>> k->config_write = cmd646_pci_config_write;
>> device_class_set_props(dc, cmd646_ide_properties);
>> -set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
>> }
>> 
>> static const TypeInfo cmd646_ide_info = {
>> diff --git a/hw/ide/pci.c b/hw/ide/pci.c
>> index 67e0998ff0..8bea92e394 100644
>> --- a/hw/ide/pci.c
>> +++ b/hw/ide/pci.c
>> @@ -467,7 +467,7 @@ static int ide_pci_post_load(void *opaque, int 
>> version_id)
>> return 0;
>> }
>> 
>> -const VMStateDescription vmstate_ide_pci = {
>> +static const VMStateDescription vmstate_ide_pci = {
>> .name = "ide",
>> .version_id = 3,
>> .minimum_version_id = 0,
>> @@ -530,11 +530,34 @@ static void pci_ide_init(Object *obj)
>> qdev_init_gpio_out(DEVICE(d), d->isa_irq, ARRAY_SIZE(d->isa_irq));
>> }
>> 
>> +static void pci_ide_exitfn(PCIDevice *dev)
>> +{
>> +PCIIDEState *d = PCI_IDE(dev);
>> +unsigned i;
>> +
>> +for (i = 0; i < ARRAY_SIZE(d->bmdma); ++i) {
>> +memory_region_del_subregion(>bmdma_bar, >bmdma[i].extra_io);
>> +memory_region_del_subregion(>bmdma_bar, 
>> >bmdma[i].addr_ioport);
>> +}
>> +}
>> +
>> +static void pci_ide_class_init(ObjectClass *klass, void *data)
>> +{
>> +DeviceClass *dc = DEVICE_CLASS(klass);
>> +PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>> +
>> +dc->vmsd = _ide_pci;
>> +k->exit = pci_ide_exitfn;
>> +k->class_id = PCI_CLASS_STORAGE_IDE;
>> +set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
>> +}
>> +
>> static const TypeInfo pci_ide_type_info = {
>> .name = TYPE_PCI_IDE,
>> .parent = TYPE_PCI_DEVICE,
>> .instance_size = sizeof(PCIIDEState),
>> .instance_init = pci_ide_init,
>> +.class_init = pci_ide_class_init,
>> .abstract = true,
>> .interfaces = (InterfaceInfo[]) {
>> { INTERFACE_CONVENTIONAL_PCI_DEVICE },
>> diff --git a/hw/ide/piix.c b/hw/ide/piix.c
>> index a32f7ccece..4e6ca99123 100644
>> --- a/hw/ide/piix.c
>> +++ b/hw/ide/piix.c
>> @@ -159,8 +159,6 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error 
>> **errp)
>> bmdma_setup_bar(d);
>> pci_register_bar(dev, 4, PCI_BASE_ADDRESS_SPACE_IO, >bmdma_bar);
>> 
>> -vmstate_register(VMSTATE_IF(dev), 0, _ide_pci, d);
>> -
>> for (unsigned i = 0; i < 2; i++) {
>> if (!pci_piix_init_bus(d, i, errp)) {
>> return;
>> @@ -168,17 +166,6 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error 
>> **errp)
>>

Re: [PATCH 02/13] hw/ide/via: Implement ISA IRQ routing

2023-04-22 Thread Bernhard Beschow




Am 22. April 2023 17:23:56 UTC schrieb BALATON Zoltan :
>On Sat, 22 Apr 2023, Bernhard Beschow wrote:
>> The VIA south bridge allows the legacy IDE interrupts to be routed to four
>> different ISA interrupts. This can be configured through the 0x4a register in
>> the PCI configuration space of the ISA function. The default routing matches
>> the legacy ISA IRQs, that is 14 and 15.
>
>On VT8231 0x4a is PCI Master Arbitration Control, IDE interrupt Routing is 
>0x4c and only documents 14/15 as valid values.

In the datasheet titled "VT8231 South Bridge", preliminary revision 0.8, Oct. 
29, 1999, page 60, the "IDE Interrupt Routing" register is located at offset 
0x4a and offers the same four interrupts in the same order as in the code. Are 
we looking at the same datasheet?

>Not sure any guest would actually change this or 0x4a and if that could cause 
>problems but you may need to handle this somehow. (Apart from testing with 
>MorphOS with -kernel you should really be testing with pegasos2.rom with 
>MorphOS and Linux, e.g. Debian 8.11 netinstall iso is known to boot.)

I've tested extensively with an x86 Linux guest on my pc-via branch which 
worked flawlessly.

As mentioned in the commit message the default routing of the chipset matches 
legacy behavior, that is interrupts 14 and 15. This is reflected by assigning 
[0x4a] = 4 in the code and that is how the code behaved before.

Best regards,
Bernhard

>
>Regards,
>BALATON Zoltan
>
>> Implement this missing piece of the VIA south bridge.
>> 
>> Signed-off-by: Bernhard Beschow 
>> ---
>> hw/ide/via.c  |  6 --
>> hw/isa/vt82c686.c | 17 +
>> 2 files changed, 21 insertions(+), 2 deletions(-)
>> 
>> diff --git a/hw/ide/via.c b/hw/ide/via.c
>> index 177baea9a7..0caae52276 100644
>> --- a/hw/ide/via.c
>> +++ b/hw/ide/via.c
>> @@ -31,6 +31,7 @@
>> #include "sysemu/dma.h"
>> #include "hw/isa/vt82c686.h"
>> #include "hw/ide/pci.h"
>> +#include "hw/irq.h"
>> #include "trace.h"
>> 
>> static uint64_t bmdma_read(void *opaque, hwaddr addr,
>> @@ -104,7 +105,8 @@ static void bmdma_setup_bar(PCIIDEState *d)
>> 
>> static void via_ide_set_irq(void *opaque, int n, int level)
>> {
>> -PCIDevice *d = PCI_DEVICE(opaque);
>> +PCIIDEState *s = opaque;
>> +PCIDevice *d = PCI_DEVICE(s);
>> 
>> if (level) {
>> d->config[0x70 + n * 8] |= 0x80;
>> @@ -112,7 +114,7 @@ static void via_ide_set_irq(void *opaque, int n, int 
>> level)
>> d->config[0x70 + n * 8] &= ~0x80;
>> }
>> 
>> -via_isa_set_irq(pci_get_function_0(d), 14 + n, level);
>> +qemu_set_irq(s->isa_irq[n], level);
>> }
>> 
>> static void via_ide_reset(DeviceState *dev)
>> diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
>> index ca89119ce0..c7e29bb46a 100644
>> --- a/hw/isa/vt82c686.c
>> +++ b/hw/isa/vt82c686.c
>> @@ -568,9 +568,19 @@ static const VMStateDescription vmstate_via = {
>> }
>> };
>> 
>> +static void via_isa_set_ide_irq(void *opaque, int n, int level)
>> +{
>> +static const uint8_t irqs[] = { 14, 15, 10, 11 };
>> +ViaISAState *s = opaque;
>> +uint8_t irq = irqs[(s->dev.config[0x4a] >> (n * 2)) & 0x3];
>> +
>> +qemu_set_irq(s->isa_irqs_in[irq], level);
>> +}
>> +
>> static void via_isa_init(Object *obj)
>> {
>> ViaISAState *s = VIA_ISA(obj);
>> +DeviceState *dev = DEVICE(s);
>> 
>> object_initialize_child(obj, "rtc", >rtc, TYPE_MC146818_RTC);
>> object_initialize_child(obj, "ide", >ide, TYPE_VIA_IDE);
>> @@ -578,6 +588,8 @@ static void via_isa_init(Object *obj)
>> object_initialize_child(obj, "uhci2", >uhci[1], 
>> TYPE_VT82C686B_USB_UHCI);
>> object_initialize_child(obj, "ac97", >ac97, TYPE_VIA_AC97);
>> object_initialize_child(obj, "mc97", >mc97, TYPE_VIA_MC97);
>> +
>> +qdev_init_gpio_in_named(dev, via_isa_set_ide_irq, "ide", 
>> ARRAY_SIZE(s->ide.isa_irq));
>> }
>> 
>> static const TypeInfo via_isa_info = {
>> @@ -692,6 +704,10 @@ static void via_isa_realize(PCIDevice *d, Error **errp)
>> if (!qdev_realize(DEVICE(>ide), BUS(pci_bus), errp)) {
>> return;
>> }
>> +for (i = 0; i < 2; i++) {
>> +qdev_connect_gpio_out(DEVICE(>ide), i,
>> +  qdev_get_gpio_in_named(DEVICE(s), "ide", i));
>> +}
>> 
>> /* Functions 2-3: USB Ports */
>> for (i = 0; i < ARRAY_SIZE(s->uhci); i++) {
>> @@ -814,6 +830,7 @@ static void vt8231_isa_reset(DeviceState *dev)
>>  PCI_COMMAND_MASTER | PCI_COMMAND_SPECIAL);
>> pci_set_word(pci_conf + PCI_STATUS, PCI_STATUS_DEVSEL_MEDIUM);
>> 
>> +pci_conf[0x4a] = 0x04; /* IDE interrupt Routing */
>> pci_conf[0x58] = 0x40; /* Miscellaneous Control 0 */
>> pci_conf[0x67] = 0x08; /* Fast IR Config */
>> pci_conf[0x6b] = 0x01; /* Fast IR I/O Base */
>>

Re: [PATCH 08/13] hw/ide: Rename PCIIDEState::*_bar attributes

2023-04-22 Thread BALATON Zoltan


On Sat, 22 Apr 2023, Bernhard Beschow wrote:

The attributes represent memory regions containing operations which are mapped
by the device models into PCI BARs. Reflect this by changing the suffic into
"_ops".

Note that in a few commits piix will also use the {cmd,data}_ops but won't map
them into BARs. This further suggests that the "_bar" suffix doesn't match
very well.


I'm not sure about this. Ops is typically used for read/write functions of 
an io MemeoryRegion while these are typically regions of a PCI IDE 
contoller that are mapped via BARs so calling them bar looks OK to me and 
this patch seems to be just code churn so I'd just drop this if there's no 
good reason to keep it.


Regards,
BALATON Zoltan


Signed-off-by: Bernhard Beschow 
---
include/hw/ide/pci.h |  6 +++---
hw/ide/cmd646.c  | 10 +-
hw/ide/pci.c | 18 +-
hw/ide/piix.c|  2 +-
hw/ide/via.c | 10 +-
5 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/include/hw/ide/pci.h b/include/hw/ide/pci.h
index 597c77c7ad..5025df5b82 100644
--- a/include/hw/ide/pci.h
+++ b/include/hw/ide/pci.h
@@ -51,9 +51,9 @@ struct PCIIDEState {
BMDMAState bmdma[2];
qemu_irq isa_irq[2];
uint32_t secondary; /* used only for cmd646 */
-MemoryRegion bmdma_bar;
-MemoryRegion cmd_bar[2];
-MemoryRegion data_bar[2];
+MemoryRegion bmdma_ops;
+MemoryRegion cmd_ops[2];
+MemoryRegion data_ops[2];
};

void bmdma_init(IDEBus *bus, BMDMAState *bm, PCIIDEState *d);
diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
index 85716aaf17..b9d005a357 100644
--- a/hw/ide/cmd646.c
+++ b/hw/ide/cmd646.c
@@ -251,13 +251,13 @@ static void pci_cmd646_ide_realize(PCIDevice *dev, Error 
**errp)
dev->wmask[MRDMODE] = 0x0;
dev->w1cmask[MRDMODE] = MRDMODE_INTR_CH0 | MRDMODE_INTR_CH1;

-pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_IO, >data_bar[0]);
-pci_register_bar(dev, 1, PCI_BASE_ADDRESS_SPACE_IO, >cmd_bar[0]);
-pci_register_bar(dev, 2, PCI_BASE_ADDRESS_SPACE_IO, >data_bar[1]);
-pci_register_bar(dev, 3, PCI_BASE_ADDRESS_SPACE_IO, >cmd_bar[1]);
+pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_IO, >data_ops[0]);
+pci_register_bar(dev, 1, PCI_BASE_ADDRESS_SPACE_IO, >cmd_ops[0]);
+pci_register_bar(dev, 2, PCI_BASE_ADDRESS_SPACE_IO, >data_ops[1]);
+pci_register_bar(dev, 3, PCI_BASE_ADDRESS_SPACE_IO, >cmd_ops[1]);

bmdma_init_ops(d, _bmdma_ops);
-pci_register_bar(dev, 4, PCI_BASE_ADDRESS_SPACE_IO, >bmdma_bar);
+pci_register_bar(dev, 4, PCI_BASE_ADDRESS_SPACE_IO, >bmdma_ops);

/* TODO: RST# value should be 0 */
pci_conf[PCI_INTERRUPT_PIN] = 0x01; // interrupt on pin 1
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index a9194313bd..b2fcc00a64 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -527,15 +527,15 @@ void bmdma_init_ops(PCIIDEState *d, const MemoryRegionOps 
*bmdma_ops)
{
size_t i;

-memory_region_init(>bmdma_bar, OBJECT(d), "bmdma-container", 16);
+memory_region_init(>bmdma_ops, OBJECT(d), "bmdma-container", 16);
for (i = 0; i < ARRAY_SIZE(d->bmdma); i++) {
BMDMAState *bm = >bmdma[i];

memory_region_init_io(>extra_io, OBJECT(d), bmdma_ops, bm, 
"bmdma-ops", 4);
-memory_region_add_subregion(>bmdma_bar, i * 8, >extra_io);
+memory_region_add_subregion(>bmdma_ops, i * 8, >extra_io);
memory_region_init_io(>addr_ioport, OBJECT(d), 
_addr_ioport_ops, bm,
  "bmdma-ioport-ops", 4);
-memory_region_add_subregion(>bmdma_bar, i * 8 + 4, 
>addr_ioport);
+memory_region_add_subregion(>bmdma_ops, i * 8 + 4, 
>addr_ioport);
}
}

@@ -543,14 +543,14 @@ static void pci_ide_init(Object *obj)
{
PCIIDEState *d = PCI_IDE(obj);

-memory_region_init_io(>data_bar[0], OBJECT(d), _ide_data_le_ops,
+memory_region_init_io(>data_ops[0], OBJECT(d), _ide_data_le_ops,
  >bus[0], "pci-ide0-data-ops", 8);
-memory_region_init_io(>cmd_bar[0], OBJECT(d), _ide_cmd_le_ops,
+memory_region_init_io(>cmd_ops[0], OBJECT(d), _ide_cmd_le_ops,
  >bus[0], "pci-ide0-cmd-ops", 4);

-memory_region_init_io(>data_bar[1], OBJECT(d), _ide_data_le_ops,
+memory_region_init_io(>data_ops[1], OBJECT(d), _ide_data_le_ops,
  >bus[1], "pci-ide1-data-ops", 8);
-memory_region_init_io(>cmd_bar[1], OBJECT(d), _ide_cmd_le_ops,
+memory_region_init_io(>cmd_ops[1], OBJECT(d), _ide_cmd_le_ops,
  >bus[1], "pci-ide1-cmd-ops", 4);

qdev_init_gpio_out(DEVICE(d), d->isa_irq, ARRAY_SIZE(d->isa_irq));
@@ -562,8 +562,8 @@ static void pci_ide_exitfn(PCIDevice *dev)
unsigned i;

for (i = 0; i < ARRAY_SIZE(d->bmdma); ++i) {
-memory_region_del_subregion(>bmdma_bar, >bmdma[i].extra_io);
-memory_region_del_subregion(>bmdma_bar, >bmdma[i].addr_ioport);
+memory_region_del_subregion(>bmdma_ops, >bmdma[i].extra_io);
+

Re: [PATCH 05/13] hw/ide: Extract pci_ide_class_init()

2023-04-22 Thread BALATON Zoltan


On Sat, 22 Apr 2023, Bernhard Beschow wrote:

Resolves redundant code in every PCI IDE device model.


This patch could be broken up a bit more as it seems to do unrelated 
changes. Such as setting DEVICE_CATEGORY_STORAGE in a different way could 
be a separate patch to make it simpler to review.


Regards,
BALATON Zoltan


---
include/hw/ide/pci.h |  1 -
hw/ide/cmd646.c  | 15 ---
hw/ide/pci.c | 25 -
hw/ide/piix.c| 19 ---
hw/ide/sii3112.c |  3 ++-
hw/ide/via.c | 15 ---
6 files changed, 26 insertions(+), 52 deletions(-)

diff --git a/include/hw/ide/pci.h b/include/hw/ide/pci.h
index 74c127e32f..7bc4e53d02 100644
--- a/include/hw/ide/pci.h
+++ b/include/hw/ide/pci.h
@@ -61,7 +61,6 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val);
extern MemoryRegionOps bmdma_addr_ioport_ops;
void pci_ide_create_devs(PCIDevice *dev);

-extern const VMStateDescription vmstate_ide_pci;
extern const MemoryRegionOps pci_ide_cmd_le_ops;
extern const MemoryRegionOps pci_ide_data_le_ops;
#endif
diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
index a094a6e12a..9aabf80e52 100644
--- a/hw/ide/cmd646.c
+++ b/hw/ide/cmd646.c
@@ -301,17 +301,6 @@ static void pci_cmd646_ide_realize(PCIDevice *dev, Error 
**errp)
}
}

-static void pci_cmd646_ide_exitfn(PCIDevice *dev)
-{
-PCIIDEState *d = PCI_IDE(dev);
-unsigned i;
-
-for (i = 0; i < 2; ++i) {
-memory_region_del_subregion(>bmdma_bar, >bmdma[i].extra_io);
-memory_region_del_subregion(>bmdma_bar, >bmdma[i].addr_ioport);
-}
-}
-
static Property cmd646_ide_properties[] = {
DEFINE_PROP_UINT32("secondary", PCIIDEState, secondary, 0),
DEFINE_PROP_END_OF_LIST(),
@@ -323,17 +312,13 @@ static void cmd646_ide_class_init(ObjectClass *klass, 
void *data)
PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);

dc->reset = cmd646_reset;
-dc->vmsd = _ide_pci;
k->realize = pci_cmd646_ide_realize;
-k->exit = pci_cmd646_ide_exitfn;
k->vendor_id = PCI_VENDOR_ID_CMD;
k->device_id = PCI_DEVICE_ID_CMD_646;
k->revision = 0x07;
-k->class_id = PCI_CLASS_STORAGE_IDE;
k->config_read = cmd646_pci_config_read;
k->config_write = cmd646_pci_config_write;
device_class_set_props(dc, cmd646_ide_properties);
-set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
}

static const TypeInfo cmd646_ide_info = {
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 67e0998ff0..8bea92e394 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -467,7 +467,7 @@ static int ide_pci_post_load(void *opaque, int version_id)
return 0;
}

-const VMStateDescription vmstate_ide_pci = {
+static const VMStateDescription vmstate_ide_pci = {
.name = "ide",
.version_id = 3,
.minimum_version_id = 0,
@@ -530,11 +530,34 @@ static void pci_ide_init(Object *obj)
qdev_init_gpio_out(DEVICE(d), d->isa_irq, ARRAY_SIZE(d->isa_irq));
}

+static void pci_ide_exitfn(PCIDevice *dev)
+{
+PCIIDEState *d = PCI_IDE(dev);
+unsigned i;
+
+for (i = 0; i < ARRAY_SIZE(d->bmdma); ++i) {
+memory_region_del_subregion(>bmdma_bar, >bmdma[i].extra_io);
+memory_region_del_subregion(>bmdma_bar, >bmdma[i].addr_ioport);
+}
+}
+
+static void pci_ide_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+dc->vmsd = _ide_pci;
+k->exit = pci_ide_exitfn;
+k->class_id = PCI_CLASS_STORAGE_IDE;
+set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
+}
+
static const TypeInfo pci_ide_type_info = {
.name = TYPE_PCI_IDE,
.parent = TYPE_PCI_DEVICE,
.instance_size = sizeof(PCIIDEState),
.instance_init = pci_ide_init,
+.class_init = pci_ide_class_init,
.abstract = true,
.interfaces = (InterfaceInfo[]) {
{ INTERFACE_CONVENTIONAL_PCI_DEVICE },
diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index a32f7ccece..4e6ca99123 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -159,8 +159,6 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error 
**errp)
bmdma_setup_bar(d);
pci_register_bar(dev, 4, PCI_BASE_ADDRESS_SPACE_IO, >bmdma_bar);

-vmstate_register(VMSTATE_IF(dev), 0, _ide_pci, d);
-
for (unsigned i = 0; i < 2; i++) {
if (!pci_piix_init_bus(d, i, errp)) {
return;
@@ -168,17 +166,6 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error 
**errp)
}
}

-static void pci_piix_ide_exitfn(PCIDevice *dev)
-{
-PCIIDEState *d = PCI_IDE(dev);
-unsigned i;
-
-for (i = 0; i < 2; ++i) {
-memory_region_del_subregion(>bmdma_bar, >bmdma[i].extra_io);
-memory_region_del_subregion(>bmdma_bar, >bmdma[i].addr_ioport);
-}
-}
-
/* NOTE: for the PIIX3, the IRQs and IOports are hardcoded */
static void piix3_ide_class_init(ObjectClass *klass, void *data)
{
@@ -187,11 +174,8 @@ static void piix3_ide_class_init(ObjectClass *klass, void 
*data)

dc->reset = piix_ide_reset;
k->realize =

Re: [PATCH 04/13] hw/ide: Extract IDEBus assignment into bmdma_init()

2023-04-22 Thread BALATON Zoltan


On Sat, 22 Apr 2023, Bernhard Beschow wrote:

Every invocation of bmdma_init() is followed by `d->bmdma[i].bus = >bus[i]`.
Resolve this redundancy by extracting it into bmdma_init().

Signed-off-by: Bernhard Beschow 


Reviewed-by: BALATON Zoltan

[PATCH 7/7] Added safe_syscalls for time functions.

2023-04-22 Thread Ajeets6

From: Warner Losh 

+Added safe_syscalls

Signed-off-by: Warner Losh 
Signed-off-by: Ajeets6 
---
 bsd-user/freebsd/os-syscall.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index 8fd6eb05cb..3d56aff0fd 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -49,6 +49,16 @@
 /* *BSD dependent syscall shims */
 #include "os-time.h"
 
+/* used in os-time */
+safe_syscall2(int, nanosleep, const struct timespec *, rqtp, struct timespec *,
+rmtp);
+safe_syscall4(int, clock_nanosleep, clockid_t, clock_id, int, flags,
+const struct timespec *, rqtp, struct timespec *, rmtp);
+
+safe_syscall6(int, kevent, int, kq, const struct kevent *, changelist,
+int, nchanges, struct kevent *, eventlist, int, nevents,
+const struct timespec *, timeout);
+
 /* I/O */
 safe_syscall3(int, open, const char *, path, int, flags, mode_t, mode);
 safe_syscall4(int, openat, int, fd, const char *, path, int, flags, mode_t,
-- 
2.34.1

[PATCH 1/7] Create os-time.c and add t2h_freebsd_timeval

2023-04-22 Thread Ajeets6

From: Stacey Son 

os-time.c contains various functions to convert FreeBSD-specific time
structure between host and guest formats

Signed-off-by: Ajeets6 
Signed-off-by: Stacey Son 
---
 bsd-user/freebsd/os-time.c | 41 ++
 1 file changed, 41 insertions(+)
 create mode 100644 bsd-user/freebsd/os-time.c

diff --git a/bsd-user/freebsd/os-time.c b/bsd-user/freebsd/os-time.c
new file mode 100644
index 00..ec9f59ded7
--- /dev/null
+++ b/bsd-user/freebsd/os-time.c
@@ -0,0 +1,41 @@
+/*
+ *  FreeBSD time related system call helpers
+ *
+ *  Copyright (c) 2013-15 Stacey D. Son
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+/*
+ * FreeBSD time conversion functions
+ */
+#include 
+#include 
+#include 
+#include "qemu.h"
+
+
+abi_long t2h_freebsd_timeval(struct timeval *tv, abi_ulong target_tv_addr)
+{
+struct target_freebsd_timeval *target_tv;
+
+if (!lock_user_struct(VERIFY_READ, target_tv, target_tv_addr, 0)) {
+return -TARGET_EFAULT;
+}
+__get_user(tv->tv_sec, _tv->tv_sec);
+__get_user(tv->tv_usec, _tv->tv_usec);
+unlock_user_struct(target_tv, target_tv_addr, 1);
+
+return 0;
+}
\ No newline at end of file
-- 
2.34.1

[PATCH 4/7] Added clock_gettime(2) and clock_getres(2)

2023-04-22 Thread Ajeets6

From: Stacey Son 

+Added clock_gettime(2) which gets the time
+Added clock_getres(2) which finds the resoultion of the specidfied
clock

Signed-off-by: Ajeets6 
Signed-off-by: Stacey Son 
---
 bsd-user/freebsd/os-time.h | 32 
 1 file changed, 32 insertions(+)

diff --git a/bsd-user/freebsd/os-time.h b/bsd-user/freebsd/os-time.h
index 29d2c8d02a..f76744e808 100644
--- a/bsd-user/freebsd/os-time.h
+++ b/bsd-user/freebsd/os-time.h
@@ -63,3 +63,35 @@ static inline abi_long do_freebsd_clock_nanosleep(abi_long 
arg1, abi_long arg2,
 
 return ret;
 }
+
+/* clock_gettime(2) */
+static inline abi_long do_freebsd_clock_gettime(abi_long arg1, abi_long arg2)
+{
+abi_long ret;
+struct timespec ts;
+
+ret = get_errno(clock_gettime(arg1, ));
+if (!is_error(ret)) {
+if (h2t_freebsd_timespec(arg2, )) {
+return -TARGET_EFAULT;
+}
+}
+
+return ret;
+}
+
+/* clock_getres(2) */
+static inline abi_long do_freebsd_clock_getres(abi_long arg1, abi_long arg2)
+{
+abi_long ret;
+struct timespec ts;
+
+ret = get_errno(clock_getres(arg1, ));
+if (!is_error(ret)) {
+if (h2t_freebsd_timespec(arg2, )) {
+return -TARGET_EFAULT;
+}
+}
+
+return ret;
+}
\ No newline at end of file
-- 
2.34.1

[PATCH 2/7] Create os-time.h and modify os-time.c

2023-04-22 Thread Ajeets6

From: Stacey Son 

+add nanosleep(2) in os-time.h
+add t2h_freebsd_timeval and h2t_freebsd_timeval time conversion
functions
-remove t2h_freebsd_timeval in os-time.c
Co-Authored-By: Kyle Evans 
Signed-off-by: Ajeets6 
Signed-off-by: Kyle Evans 
Signed-off-by: Stacey Son 
---
 bsd-user/freebsd/os-time.c | 29 ++---
 bsd-user/freebsd/os-time.h | 44 ++
 2 files changed, 65 insertions(+), 8 deletions(-)
 create mode 100644 bsd-user/freebsd/os-time.h

diff --git a/bsd-user/freebsd/os-time.c b/bsd-user/freebsd/os-time.c
index ec9f59ded7..e71eed6519 100644
--- a/bsd-user/freebsd/os-time.c
+++ b/bsd-user/freebsd/os-time.c
@@ -20,22 +20,35 @@
 /*
  * FreeBSD time conversion functions
  */
-#include 
+#include "qemu/osdep.h"
 #include 
-#include 
 #include "qemu.h"
 
 
-abi_long t2h_freebsd_timeval(struct timeval *tv, abi_ulong target_tv_addr)
+abi_long t2h_freebsd_timespec(struct timespec *ts, abi_ulong target_ts_addr)
 {
-struct target_freebsd_timeval *target_tv;
+struct target_freebsd_timespec *target_ts;
 
-if (!lock_user_struct(VERIFY_READ, target_tv, target_tv_addr, 0)) {
+if (!lock_user_struct(VERIFY_READ, target_ts, target_ts_addr, 0)) {
 return -TARGET_EFAULT;
 }
-__get_user(tv->tv_sec, _tv->tv_sec);
-__get_user(tv->tv_usec, _tv->tv_usec);
-unlock_user_struct(target_tv, target_tv_addr, 1);
+__get_user(ts->tv_sec, _ts->tv_sec);
+__get_user(ts->tv_nsec, _ts->tv_nsec);
+unlock_user_struct(target_ts, target_ts_addr, 1);
+
+return 0;
+}
+
+abi_long h2t_freebsd_timespec(abi_ulong target_ts_addr, struct timespec *ts)
+{
+struct target_freebsd_timespec *target_ts;
+
+if (!lock_user_struct(VERIFY_WRITE, target_ts, target_ts_addr, 0)) {
+return -TARGET_EFAULT;
+}
+__put_user(ts->tv_sec, _ts->tv_sec);
+__put_user(ts->tv_nsec, _ts->tv_nsec);
+unlock_user_struct(target_ts, target_ts_addr, 1);
 
 return 0;
 }
\ No newline at end of file
diff --git a/bsd-user/freebsd/os-time.h b/bsd-user/freebsd/os-time.h
new file mode 100644
index 00..18c9e1dd12
--- /dev/null
+++ b/bsd-user/freebsd/os-time.h
@@ -0,0 +1,44 @@
+/*
+ *  FreeBSD time related system call shims
+ *
+ *  Copyright (c) 2013-2015 Stacey Son
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+#ifndef FREEBSD_OS_TIME_H
+#define FREEBSD_OS_TIME_H
+
+
+#include "qemu.h"
+
+
+
+
+/* nanosleep(2) */
+static inline abi_long do_freebsd_nanosleep(abi_long arg1, abi_long arg2)
+{
+abi_long ret;
+struct timespec req, rem;
+
+ret = t2h_freebsd_timespec(, arg1);
+if (!is_error(ret)) {
+ret = get_errno(safe_nanosleep(, ));
+if (ret == -TARGET_EINTR && arg2) {
+h2t_freebsd_timespec(arg2, );
+}
+}
+
+return ret;
+}
-- 
2.34.1

[PATCH 6/7] Added case staements for time functions

2023-04-22 Thread Ajeets6

From: Warner Losh 

+Added cases for nanosleep(2),clock_nanosleep(2),clock_gettime(2) and
clock_getres(2)
+Updated meson.build

Signed-off-by: Warner Losh 
Signed-off-by: Ajeets6 
---
 bsd-user/freebsd/meson.build  |  1 +
 bsd-user/freebsd/os-syscall.c | 20 
 2 files changed, 21 insertions(+)

diff --git a/bsd-user/freebsd/meson.build b/bsd-user/freebsd/meson.build
index f87c788e84..2eb0ed4d96 100644
--- a/bsd-user/freebsd/meson.build
+++ b/bsd-user/freebsd/meson.build
@@ -1,4 +1,5 @@
 bsd_user_ss.add(files(
   'os-sys.c',
   'os-syscall.c',
+  'os-time.c',
 ))
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index c8f998ecec..8fd6eb05cb 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -46,6 +46,8 @@
 
 #include "bsd-file.h"
 #include "bsd-proc.h"
+/* *BSD dependent syscall shims */
+#include "os-time.h"
 
 /* I/O */
 safe_syscall3(int, open, const char *, path, int, flags, mode_t, mode);
@@ -507,6 +509,24 @@ static abi_long freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 case TARGET_FREEBSD_NR_sysarch: /* sysarch(2) */
 ret = do_freebsd_sysarch(cpu_env, arg1, arg2);
 break;
+ /*
+ * time related system calls.
+ */
+case TARGET_FREEBSD_NR_nanosleep: /* nanosleep(2) */
+ret = do_freebsd_nanosleep(arg1, arg2);
+break;
+
+case TARGET_FREEBSD_NR_clock_nanosleep: /* clock_nanosleep(2) */
+ret = do_freebsd_clock_nanosleep(arg1, arg2, arg3, arg4);
+break;
+
+case TARGET_FREEBSD_NR_clock_gettime: /* clock_gettime(2) */
+ret = do_freebsd_clock_gettime(arg1, arg2);
+break;
+
+case TARGET_FREEBSD_NR_clock_getres: /* clock_getres(2) */
+ret = do_freebsd_clock_getres(arg1, arg2);
+break;
 
 default:
 qemu_log_mask(LOG_UNIMP, "Unsupported syscall: %d\n", num);
-- 
2.34.1

[PATCH 5/7] Created qemu-os.h for function prototype

2023-04-22 Thread Ajeets6

From: Stacey Son 

+Added t2h_freebsd_timespec and h2t_freebsd_timespec function protype in
qemu-is.h
+included qemu-os.h in os-time.c and os-time.h

Signed-off-by: Stacey Son 
Signed-off-by: Ajeets6 
---
 bsd-user/freebsd/os-time.c |  2 ++
 bsd-user/freebsd/os-time.h |  5 -
 bsd-user/freebsd/qemu-os.h | 30 ++
 3 files changed, 36 insertions(+), 1 deletion(-)
 create mode 100644 bsd-user/freebsd/qemu-os.h

diff --git a/bsd-user/freebsd/os-time.c b/bsd-user/freebsd/os-time.c
index e71eed6519..5c88e1f13d 100644
--- a/bsd-user/freebsd/os-time.c
+++ b/bsd-user/freebsd/os-time.c
@@ -23,6 +23,8 @@
 #include "qemu/osdep.h"
 #include 
 #include "qemu.h"
+#include "qemu-os.h"
+
 
 
 abi_long t2h_freebsd_timespec(struct timespec *ts, abi_ulong target_ts_addr)
diff --git a/bsd-user/freebsd/os-time.h b/bsd-user/freebsd/os-time.h
index f76744e808..bd995c8a7b 100644
--- a/bsd-user/freebsd/os-time.h
+++ b/bsd-user/freebsd/os-time.h
@@ -22,6 +22,7 @@
 
 
 #include "qemu.h"
+#include "qemu-os.h"
 
 
 
@@ -94,4 +95,6 @@ static inline abi_long do_freebsd_clock_getres(abi_long arg1, 
abi_long arg2)
 }
 
 return ret;
-}
\ No newline at end of file
+}
+
+#endif /* FREEBSD_OS_TIME_H */
diff --git a/bsd-user/freebsd/qemu-os.h b/bsd-user/freebsd/qemu-os.h
new file mode 100644
index 00..0c502ff0e5
--- /dev/null
+++ b/bsd-user/freebsd/qemu-os.h
@@ -0,0 +1,30 @@
+/*
+ *  FreeBSD conversion extern declarations
+ *
+ *  Copyright (c) 2013 Stacey D. Son
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+#ifndef QEMU_OS_H
+#define QEMU_OS_H
+
+#include 
+
+/* os-time.c */
+abi_long t2h_freebsd_timespec(struct timespec *ts, abi_ulong target_ts_addr);
+abi_long h2t_freebsd_timespec(abi_ulong target_ts_addr, struct timespec *ts);
+
+
+#endif /* QEMU_OS_H */
\ No newline at end of file
-- 
2.34.1

[PATCH 3/7] Add clock_nanosleep

2023-04-22 Thread Ajeets6

From: Kyle Evans 

+Add clock_nanosleep(2)
Provide sleep interval in nanoseconds and allows to choose which clock
to measure it against.
Signed-off-by: Ajeets6 
Signed-off-by: Kyle Evans 
---
 bsd-user/freebsd/os-time.h | 21 +
 1 file changed, 21 insertions(+)

diff --git a/bsd-user/freebsd/os-time.h b/bsd-user/freebsd/os-time.h
index 18c9e1dd12..29d2c8d02a 100644
--- a/bsd-user/freebsd/os-time.h
+++ b/bsd-user/freebsd/os-time.h
@@ -42,3 +42,24 @@ static inline abi_long do_freebsd_nanosleep(abi_long arg1, 
abi_long arg2)
 
 return ret;
 }
+/* clock_nanosleep(2) */
+static inline abi_long do_freebsd_clock_nanosleep(abi_long arg1, abi_long arg2,
+abi_long arg3, abi_long arg4)
+{
+struct timespec req, rem;
+abi_long ret;
+int clkid, flags;
+
+clkid = arg1;
+/* XXX Translate? */
+flags = arg2;
+ret = t2h_freebsd_timespec(, arg3);
+if (!is_error(ret)) {
+ret = get_errno(safe_clock_nanosleep(clkid, flags, , ));
+if (ret == -TARGET_EINTR && arg4) {
+h2t_freebsd_timespec(arg4, );
+}
+}
+
+return ret;
+}
-- 
2.34.1

Re: [PATCH 02/13] hw/ide/via: Implement ISA IRQ routing

2023-04-22 Thread BALATON Zoltan


On Sat, 22 Apr 2023, Bernhard Beschow wrote:

The VIA south bridge allows the legacy IDE interrupts to be routed to four
different ISA interrupts. This can be configured through the 0x4a register in
the PCI configuration space of the ISA function. The default routing matches
the legacy ISA IRQs, that is 14 and 15.


On VT8231 0x4a is PCI Master Arbitration Control, IDE interrupt Routing is 
0x4c and only documents 14/15 as valid values. Not sure any guest would 
actually change this or 0x4a and if that could cause problems but you may 
need to handle this somehow. (Apart from testing with MorphOS with -kernel 
you should really be testing with pegasos2.rom with MorphOS and Linux, 
e.g. Debian 8.11 netinstall iso is known to boot.)


Regards,
BALATON Zoltan


Implement this missing piece of the VIA south bridge.

Signed-off-by: Bernhard Beschow 
---
hw/ide/via.c  |  6 --
hw/isa/vt82c686.c | 17 +
2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/hw/ide/via.c b/hw/ide/via.c
index 177baea9a7..0caae52276 100644
--- a/hw/ide/via.c
+++ b/hw/ide/via.c
@@ -31,6 +31,7 @@
#include "sysemu/dma.h"
#include "hw/isa/vt82c686.h"
#include "hw/ide/pci.h"
+#include "hw/irq.h"
#include "trace.h"

static uint64_t bmdma_read(void *opaque, hwaddr addr,
@@ -104,7 +105,8 @@ static void bmdma_setup_bar(PCIIDEState *d)

static void via_ide_set_irq(void *opaque, int n, int level)
{
-PCIDevice *d = PCI_DEVICE(opaque);
+PCIIDEState *s = opaque;
+PCIDevice *d = PCI_DEVICE(s);

if (level) {
d->config[0x70 + n * 8] |= 0x80;
@@ -112,7 +114,7 @@ static void via_ide_set_irq(void *opaque, int n, int level)
d->config[0x70 + n * 8] &= ~0x80;
}

-via_isa_set_irq(pci_get_function_0(d), 14 + n, level);
+qemu_set_irq(s->isa_irq[n], level);
}

static void via_ide_reset(DeviceState *dev)
diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
index ca89119ce0..c7e29bb46a 100644
--- a/hw/isa/vt82c686.c
+++ b/hw/isa/vt82c686.c
@@ -568,9 +568,19 @@ static const VMStateDescription vmstate_via = {
}
};

+static void via_isa_set_ide_irq(void *opaque, int n, int level)
+{
+static const uint8_t irqs[] = { 14, 15, 10, 11 };
+ViaISAState *s = opaque;
+uint8_t irq = irqs[(s->dev.config[0x4a] >> (n * 2)) & 0x3];
+
+qemu_set_irq(s->isa_irqs_in[irq], level);
+}
+
static void via_isa_init(Object *obj)
{
ViaISAState *s = VIA_ISA(obj);
+DeviceState *dev = DEVICE(s);

object_initialize_child(obj, "rtc", >rtc, TYPE_MC146818_RTC);
object_initialize_child(obj, "ide", >ide, TYPE_VIA_IDE);
@@ -578,6 +588,8 @@ static void via_isa_init(Object *obj)
object_initialize_child(obj, "uhci2", >uhci[1], TYPE_VT82C686B_USB_UHCI);
object_initialize_child(obj, "ac97", >ac97, TYPE_VIA_AC97);
object_initialize_child(obj, "mc97", >mc97, TYPE_VIA_MC97);
+
+qdev_init_gpio_in_named(dev, via_isa_set_ide_irq, "ide", 
ARRAY_SIZE(s->ide.isa_irq));
}

static const TypeInfo via_isa_info = {
@@ -692,6 +704,10 @@ static void via_isa_realize(PCIDevice *d, Error **errp)
if (!qdev_realize(DEVICE(>ide), BUS(pci_bus), errp)) {
return;
}
+for (i = 0; i < 2; i++) {
+qdev_connect_gpio_out(DEVICE(>ide), i,
+  qdev_get_gpio_in_named(DEVICE(s), "ide", i));
+}

/* Functions 2-3: USB Ports */
for (i = 0; i < ARRAY_SIZE(s->uhci); i++) {
@@ -814,6 +830,7 @@ static void vt8231_isa_reset(DeviceState *dev)
 PCI_COMMAND_MASTER | PCI_COMMAND_SPECIAL);
pci_set_word(pci_conf + PCI_STATUS, PCI_STATUS_DEVSEL_MEDIUM);

+pci_conf[0x4a] = 0x04; /* IDE interrupt Routing */
pci_conf[0x58] = 0x40; /* Miscellaneous Control 0 */
pci_conf[0x67] = 0x08; /* Fast IR Config */
pci_conf[0x6b] = 0x01; /* Fast IR I/O Base */

Re: [RFC PATCH 00/13] gfxstream + rutabaga_gfx: a surprising delight or startling epiphany?

2023-04-22 Thread Akihiko Odaki


On 2023/04/22 8:54, Gurchetan Singh wrote:

On Fri, Apr 21, 2023 at 9:02 AM Stefan Hajnoczi  wrote:


On Thu, 20 Apr 2023 at 21:13, Gurchetan Singh
 wrote:


From: Gurchetan Singh 

Rationale:

- gfxstream [a] is good for the Android Emulator/upstream QEMU
   alignment
- Wayland passhthrough [b] via the cross-domain context type is good
   for Linux on Linux display virtualization
- rutabaga_gfx [c] sits on top of gfxstream, cross-domain and even
   virglrenderer
- This series ports rutabaga_gfx to QEMU


What rutabaga_gfx and gfxstream? Can you explain where they sit in the
stack and how they build on or complement virtio-gpu and
virglrenderer?


rutabaga_gfx and gfxstream are both libraries that implement the
virtio-gpu protocol.  There's a document available in the Gitlab issue
to see where they fit in the stack [a].

gfxstream grew out of the Android Emulator's need to virtualize
graphics for app developers.  There's a short history of gfxstream in
patch 10.  It complements virglrenderer in that it's a bit more
cross-platform and targets different use cases -- more detail here
[b].  The ultimate goal is ditch out-of-tree kernel drivers in the
Android Emulator and adopt virtio, and porting gfxstream to QEMU would
speed up that transition.


I wonder what is motivation for maintaining gfxstream instead of 
switching to virglrenderer/venus.




rutabaga_gfx is a much smaller Rust library that sits on top of
gfxstream and even virglrenderer, but does a few extra things.  It
implements the cross-domain context type, which provides Wayland
passthrough.  This helps virtio-gpu by providing more modern display
virtualization.  For example, Microsoft for WSL2 also uses a similar
technique [c], but I believe it is not virtio-based nor open-source.


The guest side components of WSLg are open-source, but the host side is 
not: https://github.com/microsoft/wslg
It also uses DirectX for acceleration so it's not really portable for 
outside Windows.



With this, we can have the same open-source Wayland passthrough
solution on crosvm, QEMU and even Fuchsia [d].  Also, there might be
an additional small Rust context type for security-sensitive use cases
in the future -- rutabaga_gfx wouldn't compile its gfxstream bindings
(since it's C++ based) in such cases.

Both gfxstream and rutabaga_gfx are a part of the virtio spec [e] now too.

[a] https://gitlab.com/qemu-project/qemu/-/issues/1611
[b] https://lists.gnu.org/archive/html/qemu-devel/2023-03/msg04271.html
[c] https://www.youtube.com/watch?v=EkNBsBx501Q
[d] https://fuchsia-review.googlesource.com/c/fuchsia/+/778764
[e] 
https://github.com/oasis-tcs/virtio-spec/blob/master/device-types/gpu/description.tex#L533



Stefan

Re: [PATCH v2 08/10] hw/ide: Let ide_init_ioport() take a MemoryRegion argument instead of ISADevice

2023-04-22 Thread Bernhard Beschow




Am 5. Februar 2023 22:02:02 UTC schrieb Mark Cave-Ayland 
:
>On 26/01/2023 21:17, Bernhard Beschow wrote:
>
>> Both callers to ide_init_ioport() have access to the I/O memory region
>> of the ISA bus, so can pass it directly. This allows ide_init_ioport()
>> to directly call portio_list_init().
>> 
>> Note, now the callers become the owner of the PortioList.
>> 
>> Inspired-by: <20210518215545.1793947-10-phi...@redhat.com>
>>'hw/ide: Let ide_init_ioport() take an ISA bus argument instead of device'
>> Signed-off-by: Bernhard Beschow 
>> ---
>>   include/hw/ide/internal.h |  3 ++-
>>   hw/ide/ioport.c   | 15 ---
>>   hw/ide/isa.c  |  4 +++-
>>   hw/ide/piix.c |  8 ++--
>>   4 files changed, 19 insertions(+), 11 deletions(-)
>> 
>> diff --git a/include/hw/ide/internal.h b/include/hw/ide/internal.h
>> index 42c49414f4..c3e4d192fa 100644
>> --- a/include/hw/ide/internal.h
>> +++ b/include/hw/ide/internal.h
>> @@ -628,7 +628,8 @@ int ide_init_drive(IDEState *s, BlockBackend *blk, 
>> IDEDriveKind kind,
>>  int chs_trans, Error **errp);
>>   void ide_init2(IDEBus *bus, qemu_irq irq);
>>   void ide_exit(IDEState *s);
>> -void ide_init_ioport(IDEBus *bus, ISADevice *isa, int iobase, int iobase2);
>> +void ide_init_ioport(IDEBus *bus, MemoryRegion *address_space_io, Object 
>> *owner,
>> + int iobase, int iobase2);
>>   void ide_register_restart_cb(IDEBus *bus);
>> void ide_exec_cmd(IDEBus *bus, uint32_t val);
>> diff --git a/hw/ide/ioport.c b/hw/ide/ioport.c
>> index b613ff3bba..00e9baf0d1 100644
>> --- a/hw/ide/ioport.c
>> +++ b/hw/ide/ioport.c
>> @@ -50,15 +50,16 @@ static const MemoryRegionPortio ide_portio2_list[] = {
>>   PORTIO_END_OF_LIST(),
>>   };
>>   -void ide_init_ioport(IDEBus *bus, ISADevice *dev, int iobase, int iobase2)
>> +void ide_init_ioport(IDEBus *bus, MemoryRegion *address_space_io, Object 
>> *owner,
>> + int iobase, int iobase2)
>>   {
>> -/* ??? Assume only ISA and PCI configurations, and that the PCI-ISA
>> -   bridge has been setup properly to always register with ISA.  */
>> -isa_register_portio_list(dev, >portio_list,
>> - iobase, ide_portio_list, bus, "ide");
>> +assert(address_space_io);
>> +
>> +portio_list_init(>portio_list, owner, ide_portio_list, bus, "ide",
>> + address_space_io, iobase);
>> if (iobase2) {
>> -isa_register_portio_list(dev, >portio2_list,
>> - iobase2, ide_portio2_list, bus, "ide");
>> +portio_list_init(>portio2_list, owner, ide_portio2_list, bus,
>> + "ide", address_space_io, iobase2);
>>   }
>>   }
>> diff --git a/hw/ide/isa.c b/hw/ide/isa.c
>> index 8bedbd13f1..cab5d0a07a 100644
>> --- a/hw/ide/isa.c
>> +++ b/hw/ide/isa.c
>> @@ -72,9 +72,11 @@ static void isa_ide_realizefn(DeviceState *dev, Error 
>> **errp)
>>   {
>>   ISADevice *isadev = ISA_DEVICE(dev);
>>   ISAIDEState *s = ISA_IDE(dev);
>> +ISABus *isabus = isa_bus_from_device(isadev);
>> ide_bus_init(>bus, sizeof(s->bus), dev, 0, 2);
>> -ide_init_ioport(>bus, isadev, s->iobase, s->iobase2);
>> +ide_init_ioport(>bus, isabus->address_space_io, OBJECT(dev),
>> +s->iobase, s->iobase2);
>>   s->irq = isa_get_irq(isadev, s->isairq);
>>   ide_init2(>bus, s->irq);
>>   vmstate_register(VMSTATE_IF(dev), 0, _ide_isa, s);
>> diff --git a/hw/ide/piix.c b/hw/ide/piix.c
>> index f0d95761ac..236b5b7416 100644
>> --- a/hw/ide/piix.c
>> +++ b/hw/ide/piix.c
>> @@ -29,6 +29,7 @@
>> #include "qemu/osdep.h"
>>   #include "hw/pci/pci.h"
>> +#include "hw/pci/pci_bus.h"
>>   #include "migration/vmstate.h"
>>   #include "qapi/error.h"
>>   #include "qemu/module.h"
>> @@ -143,8 +144,11 @@ static void pci_piix_init_ports(PCIIDEState *d, ISABus 
>> *isa_bus)
>>   {0x1f0, 0x3f6, 14},
>>   {0x170, 0x376, 15},
>>   };
>> +PCIBus *pci_bus = pci_get_bus(>parent_obj);
>>   int i;
>>   +assert(pci_bus);
>> +
>>   if (isa_bus) {
>>   d->isa_irqs[0] = isa_bus->irqs[port_info[0].isairq];
>>   d->isa_irqs[1] = isa_bus->irqs[port_info[1].isairq];
>> @@ -154,8 +158,8 @@ static void pci_piix_init_ports(PCIIDEState *d, ISABus 
>> *isa_bus)
>> for (i = 0; i < 2; i++) {
>>   ide_bus_init(>bus[i], sizeof(d->bus[i]), DEVICE(d), i, 2);
>> -ide_init_ioport(>bus[i], NULL, port_info[i].iobase,
>> -port_info[i].iobase2);
>> +ide_init_ioport(>bus[i], pci_bus->address_space_io, OBJECT(d),
>> +port_info[i].iobase, port_info[i].iobase2);
>>   ide_init2(>bus[i], qdev_get_gpio_in(DEVICE(d), i));
>> bmdma_init(>bus[i], >bmdma[i], d);
>
>Again, given that I suspect ioports are specific to x86 I'd be inclined to 
>leave this as a reference to ISA. I could see there being a function

Re: [RFC PATCH 12/13] HACK: use memory region API to inject memory to guest

2023-04-22 Thread Akihiko Odaki


On 2023/04/21 10:12, Gurchetan Singh wrote:

I just copied the patches that have been floating around that do
this, but it doesn't seem to robustly work.  This current
implementation is probably good enough to run vkcube or simple
apps, but whenever a test starts to aggressively map/unmap memory,
things do explode on the QEMU side.

A simple way to reproduce is run:

./deqp-vk --deqp-case=deqp-vk --deqp-case=dEQP-VK.memory.mapping.suballocation.*

You should get stack traces that sometimes look like this:

0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737316304448) 
at ./nptl/pthread_kill.c:44
1  __pthread_kill_internal (signo=6, threadid=140737316304448) at 
./nptl/pthread_kill.c:78
2  __GI___pthread_kill (threadid=140737316304448, signo=signo@entry=6) at 
./nptl/pthread_kill.c:89
3  0x77042476 in __GI_raise (sig=sig@entry=6) at 
../sysdeps/posix/raise.c:26
4  0x770287f3 in __GI_abort () at ./stdlib/abort.c:79
5  0x770896f6 in __libc_message (action=action@entry=do_abort, 
fmt=fmt@entry=0x771dbb8c "%s\n") at ../sysdeps/posix/libc_fatal.c:155
6  0x770a0d7c in malloc_printerr (str=str@entry=0x771de7b0 "double free 
or corruption (out)") at ./malloc/malloc.c:5664
7  0x770a2ef0 in _int_free (av=0x77219c80 , p=0x57793e00, 
have_lock=) at ./malloc/malloc.c:4588
8  0x770a54d3 in __GI___libc_free (mem=) at 
./malloc/malloc.c:3391
9  0x55d65e7e in phys_section_destroy (mr=0x57793e10) at 
../softmmu/physmem.c:1003
10 0x55d65ed0 in phys_sections_free (map=0x56d4b410) at 
../softmmu/physmem.c:1011
11 0x55d69578 in address_space_dispatch_free (d=0x56d4b400) at 
../softmmu/physmem.c:2430
12 0x55d58412 in flatview_destroy (view=0x572bb090) at 
../softmmu/memory.c:292
13 0x5600fd23 in call_rcu_thread (opaque=0x0) at ../util/rcu.c:284
14 0x560026d4 in qemu_thread_start (args=0x569cafa0) at 
../util/qemu-thread-posix.c:541
15 0x77094b43 in start_thread (arg=) at 
./nptl/pthread_create.c:442
16 0x77126a00 in clone3 () at 
../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

or this:

0x55e1dc80 in object_unref (objptr=0x6d656d3c6b6e696c) at 
../qom/object.c:1198
1198g_assert(obj->ref > 0);
(gdb) bt
0  0x55e1dc80 in object_unref (objptr=0x6d656d3c6b6e696c) at 
../qom/object.c:1198
1  0x55d5cca5 in memory_region_unref (mr=0x572b9e20) at 
../softmmu/memory.c:1799
2  0x55d65e47 in phys_section_destroy (mr=0x572b9e20) at 
../softmmu/physmem.c:998
3  0x55d65ec7 in phys_sections_free (map=0x588365c0) at 
../softmmu/physmem.c:1011
4  0x55d6956f in address_space_dispatch_free (d=0x588365b0) at 
../softmmu/physmem.c:2430
5  0x55d58409 in flatview_destroy (view=0x58836570) at 
../softmmu/memory.c:292
6  0x5600fd1a in call_rcu_thread (opaque=0x0) at ../util/rcu.c:284
7  0x560026cb in qemu_thread_start (args=0x569cafa0) at 
../util/qemu-thread-posix.c:541
8  0x77094b43 in start_thread (arg=) at 
./nptl/pthread_create.c:442
9  0x77126a00 in clone3 () at 
../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

The reason seems to be memory regions are handled on a different
thread than the virtio-gpu thread, and that inevitably leads to
raciness.  The memory region docs[a] generally seems to dissuade this:

"In order to do this, as a general rule do not create or destroy
  memory regions dynamically during a device’s lifetime, and only
  call object_unparent() in the memory region owner’s instance_finalize
  callback. The dynamically allocated data structure that contains
  the memory region then should obviously be freed in the
  instance_finalize callback as well."

Though instance_finalize is called before device destruction, so
storing the memory until then is unlikely to be an option.  The
tests do pass when virtio-gpu doesn't free the memory, but
progressively the guest becomes slower and then OOMs.

Though the api does make an exception:

"There is an exception to the above rule: it is okay to call
object_unparent at any time for an alias or a container region. It is
therefore also okay to create or destroy alias and container regions
dynamically during a device’s lifetime."

I believe we are trying to create a container subregion, but that's
still failing?  Are we doing it right?  Any memory region experts
here can help out?  The other revelant patch in this series
is "virtio-gpu: hostmem".


Perhaps dma_memory_map() is what you want?



[a] https://qemu.readthedocs.io/en/latest/devel/memory.html

Signed-off-by: Gurchetan Singh 
---
  hw/display/virtio-gpu-rutabaga.c | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/hw/display/virtio-gpu-rutabaga.c b/hw/display/virtio-gpu-rutabaga.c
index 5fd1154198..196267aac2 100644
--- a/hw/display/virtio-gpu-rutabaga.c
+++ b/hw/display/virtio-gpu-rutabaga.c
@@ -159,6 +159,12 @@ static int32_t

Re: [RFC PATCH 10/13] gfxstream + rutabaga: add initial support for gfxstream

2023-04-22 Thread Akihiko Odaki


On 2023/04/21 10:12, Gurchetan Singh wrote:

This adds initial support for gfxstream and cross-domain.  Both
features rely on virtio-gpu blob resources and context types, which
are also implemented in this patch.

gfxstream has a long and illustrious history in Android graphics
paravirtualization.  It has been powering graphics in the Android
Studio Emulator for more than a decade, which is the main developer
platform.

Originally conceived by Jesse Hall, it was first known as "EmuGL" [a].
The key design characteristic was a 1:1 threading model and
auto-generation, which fit nicely with the OpenGLES spec.  It also
allowed easy layering with ANGLE on the host, which provides the GLES
implementations on Windows or MacOS enviroments.

gfxstream has traditionally been maintained by a single engineer, and
between 2015 to 2021, the iron throne passed to Frank Yang.  Just to
name a few accomplishments in a reign filled with many of them: newer
versions of GLES, address space graphics, snapshot support and CTS
compliant Vulkan [b].

One major drawback was the use of out-of-tree goldfish drivers.
Android engineers didn't know much about DRM/KMS and especially TTM so
a simple guest to host pipe was conceived.

Luckily, virtio-gpu 3D started to emerge in 2016 due to the work of
the Mesa/virglrenderer communities.  In 2018, the initial virtio-gpu
port of gfxstream was done by Cuttlefish enthusiast Alistair Delva.
It was a symbol compatible replacement of virglrenderer [c] and named
"AVDVirglrenderer".  This implementation forms the basis of the
current gfxstream host implementation still in use today.

cross-domain support follows a similar arc.  Originally conceived by
Wayland aficionado David Reveman and crosvm enjoyer Zach Reizner in
2018, it initially relied on the downstream "virtio-wl" device.

In 2020 and 2021, virtio-gpu was extended to include blob resources
and multiple timelines by yours truly, features gfxstream/cross-domain
both require to function correctly.

Right now, we stand at the precipice of a truly fantastic possibility:
the Android Emulator powered by upstream QEMU and upstream Linux
kernel.  gfxstream will then be packaged properfully, and app
developers can even fix gfxstream bugs on their own if they encounter
them.

It's been quite the ride, my friends.  Where will gfxstream head next,
nobody really knows.  I wouldn't be surprised if it's around for
another decade, maintained by a new generation of Android graphics
enthusiasts.  One thing is for sure, though -- it'll be filled with
friendship and magic!

Technical details:
   - Very simple initial display integration: just used Pixman
   - Largely, 1:1 mapping of virtio-gpu hypercalls to rutabaga function
 calls

[a] https://android-review.googlesource.com/c/platform/development/+/34470
[b] 
https://android-review.googlesource.com/q/topic:%22vulkan-hostconnection-start%22
[c] 
https://android-review.googlesource.com/c/device/generic/goldfish-opengl/+/761927

Signed-off-by: Gurchetan Singh 
---
  hw/display/virtio-gpu-rutabaga.c | 995 +++
  1 file changed, 995 insertions(+)
  create mode 100644 hw/display/virtio-gpu-rutabaga.c

diff --git a/hw/display/virtio-gpu-rutabaga.c b/hw/display/virtio-gpu-rutabaga.c
new file mode 100644
index 00..5fd1154198
--- /dev/null
+++ b/hw/display/virtio-gpu-rutabaga.c
@@ -0,0 +1,995 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "qemu/iov.h"
+#include "trace.h"
+#include "hw/virtio/virtio.h"
+#include "hw/virtio/virtio-gpu.h"
+#include "hw/virtio/virtio-gpu-pixman.h"
+#include "hw/virtio/virtio-iommu.h"
+
+#include 
+
+static int virtio_gpu_rutabaga_init(VirtIOGPU *g);
+
+#define GET_VIRTIO_GPU_GL(x)  \
+VirtIOGPUGL *virtio_gpu = VIRTIO_GPU_GL(x);   \


It's confusing to name a VirtIOGPUGL pointer derived from a VirtIOGPU 
pointer "virtio_gpu". "gl", the name used in virtio-gpu-gl.c is less 
confusing though it's a bit strange considering the VirtIOGPU pointer is 
named "g".


I also doubt this macro (and following GET_RUTABAGA()) makes sense. It's 
confusing that they declare a variable, and it's not really saving code 
either as the content of each macro is just one line. But other people 
may prefer these macros to stay.



+
+#define GET_RUTABAGA(x)   \
+struct rutabaga *rutabaga = (struct rutabaga *)(x->rutabaga); \


Wrap x with parentheses. Also you don't need casting from (void *) since 
it's C.



+
+#define CHECK(condition, cmd) \
+do {  \
+if (!condition) { \
+qemu_log_mask(LOG_GUEST_ERROR, "CHECK_RESULT failed in %s() %s:"  \


This macro is named CHECK but it

Re: [PATCH v3 00/18] hw/ide: Untangle ISA/PCI abuses of ide_init_ioport()

2023-04-22 Thread Bernhard Beschow




Am 2. März 2023 22:40:40 UTC schrieb "Philippe Mathieu-Daudé" 
:
>Since v2: rebased
>
>I'm posting this series as it to not block Bernhard's PIIX
>cleanup work. I don't have code change planned, but eventually
>reword / improve commit descriptions.
>
>Tested commit after commit to be sure it is bisectable. Sadly
>this was before Zoltan & Thomas report a problem with commit
>bb98e0f59c ("hw/isa/vt82c686: Remove intermediate IRQ forwarder").
>
>Background thread:
>https://lore.kernel.org/qemu-devel/5095dffc-309b-6c72-d255-8cdaa6fd3...@ilande.co.uk/

Hi,

I've just sent yet another proposal which might make some renamings done in 
this series appear unneccessary.

Best regards,
Bernhard

>
>Philippe Mathieu-Daudé (18):
>  hw/ide/piix: Expose output IRQ as properties for late object
>population
>  hw/ide/piix: Allow using PIIX3-IDE as standalone PCI function
>  hw/i386/pc_piix: Wire PIIX3 IDE ouput IRQs to ISA bus IRQs 14/15
>  hw/isa/piix4: Wire PIIX4 IDE ouput IRQs to ISA bus IRQs 14/15
>  hw/ide: Rename ISA specific ide_init_ioport -> ide_bus_init_ioport_isa
>  hw/ide/piix: Ensure IDE output IRQs are wired at realization
>  hw/isa: Deprecate isa_get_irq() in favor of isa_bus_get_irq()
>  hw/ide: Introduce generic ide_init_ioport()
>  hw/ide/piix: Use generic ide_bus_init_ioport()
>  hw/isa: Ensure isa_register_portio_list() do not get NULL ISA device
>  hw/isa: Simplify isa_address_space[_io]()
>  hw/isa: Reduce 'isabus' singleton scope to isa_bus_new()
>  exec/ioport: Factor portio_list_register_flush_coalesced() out
>  exec/ioport: Factor portio_list_register() out
>  hw/southbridge/piix: Use OBJECT_DECLARE_SIMPLE_TYPE() macro
>  hw/isa/piix: Batch register QOM types using DEFINE_TYPES() macro
>  hw/isa/piix: Unify QOM type name of PIIX ISA function
>  hw/isa/piix: Unify PIIX-ISA QOM type names using qdev aliases
>
> hw/audio/adlib.c  |  4 +--
> hw/display/qxl.c  |  7 ++--
> hw/display/vga.c  |  9 +++--
> hw/dma/i82374.c   |  7 ++--
> hw/i386/pc_piix.c | 13 +---
> hw/ide/ioport.c   | 15 +++--
> hw/ide/isa.c  |  2 +-
> hw/ide/piix.c | 54 +++---
> hw/isa/isa-bus.c  | 36 
> hw/isa/piix3.c| 63 +++
> hw/isa/piix4.c| 12 ---
> hw/mips/malta.c   |  2 +-
> hw/watchdog/wdt_ib700.c   |  4 +--
> include/exec/ioport.h | 15 +
> include/hw/ide/internal.h |  3 +-
> include/hw/ide/isa.h  |  3 ++
> include/hw/ide/piix.h |  4 +++
> include/hw/isa/isa.h  |  3 +-
> include/hw/southbridge/piix.h | 14 
> softmmu/ioport.c  | 48 +++---
> softmmu/qdev-monitor.c|  3 ++
> 21 files changed, 190 insertions(+), 131 deletions(-)
>

[PATCH 03/13] hw/isa/vt82c686: Remove via_isa_set_irq()

2023-04-22 Thread Bernhard Beschow

Now that via_isa_set_irq() is unused it can be removed.

Signed-off-by: Bernhard Beschow 
---
 include/hw/isa/vt82c686.h | 2 --
 hw/isa/vt82c686.c | 6 --
 2 files changed, 8 deletions(-)

diff --git a/include/hw/isa/vt82c686.h b/include/hw/isa/vt82c686.h
index da1722daf2..b6e95b2851 100644
--- a/include/hw/isa/vt82c686.h
+++ b/include/hw/isa/vt82c686.h
@@ -34,6 +34,4 @@ struct ViaAC97State {
 uint32_t ac97_cmd;
 };
 
-void via_isa_set_irq(PCIDevice *d, int n, int level);
-
 #endif
diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
index c7e29bb46a..a69888a396 100644
--- a/hw/isa/vt82c686.c
+++ b/hw/isa/vt82c686.c
@@ -604,12 +604,6 @@ static const TypeInfo via_isa_info = {
 },
 };
 
-void via_isa_set_irq(PCIDevice *d, int n, int level)
-{
-ViaISAState *s = VIA_ISA(d);
-qemu_set_irq(s->isa_irqs_in[n], level);
-}
-
 static void via_isa_request_i8259_irq(void *opaque, int irq, int level)
 {
 ViaISAState *s = opaque;
-- 
2.40.0

[PATCH 12/13] hw/ide/sii3112: Reuse PCIIDEState::bmdma_ops

2023-04-22 Thread Bernhard Beschow

Allows to unexport bmdma_addr_ioport_ops and models TYPE_SII3112_PCI as a
standard-compliant PCI IDE device.

Signed-off-by: Bernhard Beschow 
---
 include/hw/ide/pci.h |  1 -
 hw/ide/pci.c |  2 +-
 hw/ide/sii3112.c | 94 ++--
 hw/ide/trace-events  |  6 ++-
 4 files changed, 60 insertions(+), 43 deletions(-)

diff --git a/include/hw/ide/pci.h b/include/hw/ide/pci.h
index dbb4b13161..81e0370202 100644
--- a/include/hw/ide/pci.h
+++ b/include/hw/ide/pci.h
@@ -59,7 +59,6 @@ struct PCIIDEState {
 void bmdma_init(IDEBus *bus, BMDMAState *bm, PCIIDEState *d);
 void bmdma_init_ops(PCIIDEState *d, const MemoryRegionOps *bmdma_ops);
 void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val);
-extern MemoryRegionOps bmdma_addr_ioport_ops;
 void pci_ide_create_devs(PCIDevice *dev);
 
 #endif
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 97ccc75aa6..3539b162b7 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -342,7 +342,7 @@ static void bmdma_addr_write(void *opaque, hwaddr addr,
 bm->addr |= ((data & mask) << shift) & ~3;
 }
 
-MemoryRegionOps bmdma_addr_ioport_ops = {
+static MemoryRegionOps bmdma_addr_ioport_ops = {
 .read = bmdma_addr_read,
 .write = bmdma_addr_write,
 .endianness = DEVICE_LITTLE_ENDIAN,
diff --git a/hw/ide/sii3112.c b/hw/ide/sii3112.c
index 9cf920369f..373c0dd1ee 100644
--- a/hw/ide/sii3112.c
+++ b/hw/ide/sii3112.c
@@ -34,47 +34,73 @@ struct SiI3112PCIState {
 SiI3112Regs regs[2];
 };
 
-/* The sii3112_reg_read and sii3112_reg_write functions implement the
- * Internal Register Space - BAR5 (section 6.7 of the data sheet).
- */
-
-static uint64_t sii3112_reg_read(void *opaque, hwaddr addr,
-unsigned int size)
+static uint64_t sii3112_bmdma_read(void *opaque, hwaddr addr, unsigned int 
size)
 {
-SiI3112PCIState *d = opaque;
+BMDMAState *bm = opaque;
+SiI3112PCIState *d = SII3112_PCI(bm->pci_dev);
+int i = (bm == >pci_dev->bmdma[0]) ? 0 : 1;
 uint64_t val;
 
 switch (addr) {
 case 0x00:
-val = d->i.bmdma[0].cmd;
+val = bm->cmd;
 break;
 case 0x01:
-val = d->regs[0].swdata;
+val = d->regs[i].swdata;
 break;
 case 0x02:
-val = d->i.bmdma[0].status;
+val = bm->status;
 break;
 case 0x03:
 val = 0;
 break;
-case 0x04 ... 0x07:
-val = bmdma_addr_ioport_ops.read(>i.bmdma[0], addr - 4, size);
-break;
-case 0x08:
-val = d->i.bmdma[1].cmd;
+default:
+val = 0;
 break;
-case 0x09:
-val = d->regs[1].swdata;
+}
+trace_sii3112_bmdma_read(size, addr, val);
+return val;
+}
+
+static void sii3112_bmdma_write(void *opaque, hwaddr addr,
+uint64_t val, unsigned int size)
+{
+BMDMAState *bm = opaque;
+SiI3112PCIState *d = SII3112_PCI(bm->pci_dev);
+int i = (bm == >pci_dev->bmdma[0]) ? 0 : 1;
+
+trace_sii3112_bmdma_write(size, addr, val);
+switch (addr) {
+case 0x00:
+bmdma_cmd_writeb(bm, val);
 break;
-case 0x0a:
-val = d->i.bmdma[1].status;
+case 0x01:
+d->regs[i].swdata = val & 0x3f;
 break;
-case 0x0b:
-val = 0;
+case 0x02:
+bm->status = (val & 0x60) | (bm->status & 1) | (bm->status & ~val & 6);
 break;
-case 0x0c ... 0x0f:
-val = bmdma_addr_ioport_ops.read(>i.bmdma[1], addr - 12, size);
+default:
 break;
+}
+}
+
+static const MemoryRegionOps sii3112_bmdma_ops = {
+.read = sii3112_bmdma_read,
+.write = sii3112_bmdma_write,
+};
+
+/* The sii3112_reg_read and sii3112_reg_write functions implement the
+ * Internal Register Space - BAR5 (section 6.7 of the data sheet).
+ */
+
+static uint64_t sii3112_reg_read(void *opaque, hwaddr addr,
+unsigned int size)
+{
+SiI3112PCIState *d = opaque;
+uint64_t val;
+
+switch (addr) {
 case 0x10:
 val = d->i.bmdma[0].cmd;
 val |= (d->regs[0].confstat & (1UL << 11) ? (1 << 4) : 0); /*SATAINT0*/
@@ -127,38 +153,26 @@ static void sii3112_reg_write(void *opaque, hwaddr addr,
 
 trace_sii3112_write(size, addr, val);
 switch (addr) {
-case 0x00:
 case 0x10:
 bmdma_cmd_writeb(>i.bmdma[0], val);
 break;
-case 0x01:
 case 0x11:
 d->regs[0].swdata = val & 0x3f;
 break;
-case 0x02:
 case 0x12:
 d->i.bmdma[0].status = (val & 0x60) | (d->i.bmdma[0].status & 1) |
(d->i.bmdma[0].status & ~val & 6);
 break;
-case 0x04 ... 0x07:
-bmdma_addr_ioport_ops.write(>i.bmdma[0], addr - 4, val, size);
-break;
-case 0x08:
 case 0x18:
 bmdma_cmd_writeb(>i.bmdma[1], val);
 break;
-case 0x09:
 case 0x19:
 d->regs[1].swdata = val & 0x3f;
 break;
-case 0x0a:
 case 0x1a:
 d->i.bmdma[1].status =

[PATCH 05/13] hw/ide: Extract pci_ide_class_init()

2023-04-22 Thread Bernhard Beschow

Resolves redundant code in every PCI IDE device model.
---
 include/hw/ide/pci.h |  1 -
 hw/ide/cmd646.c  | 15 ---
 hw/ide/pci.c | 25 -
 hw/ide/piix.c| 19 ---
 hw/ide/sii3112.c |  3 ++-
 hw/ide/via.c | 15 ---
 6 files changed, 26 insertions(+), 52 deletions(-)

diff --git a/include/hw/ide/pci.h b/include/hw/ide/pci.h
index 74c127e32f..7bc4e53d02 100644
--- a/include/hw/ide/pci.h
+++ b/include/hw/ide/pci.h
@@ -61,7 +61,6 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val);
 extern MemoryRegionOps bmdma_addr_ioport_ops;
 void pci_ide_create_devs(PCIDevice *dev);
 
-extern const VMStateDescription vmstate_ide_pci;
 extern const MemoryRegionOps pci_ide_cmd_le_ops;
 extern const MemoryRegionOps pci_ide_data_le_ops;
 #endif
diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
index a094a6e12a..9aabf80e52 100644
--- a/hw/ide/cmd646.c
+++ b/hw/ide/cmd646.c
@@ -301,17 +301,6 @@ static void pci_cmd646_ide_realize(PCIDevice *dev, Error 
**errp)
 }
 }
 
-static void pci_cmd646_ide_exitfn(PCIDevice *dev)
-{
-PCIIDEState *d = PCI_IDE(dev);
-unsigned i;
-
-for (i = 0; i < 2; ++i) {
-memory_region_del_subregion(>bmdma_bar, >bmdma[i].extra_io);
-memory_region_del_subregion(>bmdma_bar, >bmdma[i].addr_ioport);
-}
-}
-
 static Property cmd646_ide_properties[] = {
 DEFINE_PROP_UINT32("secondary", PCIIDEState, secondary, 0),
 DEFINE_PROP_END_OF_LIST(),
@@ -323,17 +312,13 @@ static void cmd646_ide_class_init(ObjectClass *klass, 
void *data)
 PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
 dc->reset = cmd646_reset;
-dc->vmsd = _ide_pci;
 k->realize = pci_cmd646_ide_realize;
-k->exit = pci_cmd646_ide_exitfn;
 k->vendor_id = PCI_VENDOR_ID_CMD;
 k->device_id = PCI_DEVICE_ID_CMD_646;
 k->revision = 0x07;
-k->class_id = PCI_CLASS_STORAGE_IDE;
 k->config_read = cmd646_pci_config_read;
 k->config_write = cmd646_pci_config_write;
 device_class_set_props(dc, cmd646_ide_properties);
-set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
 }
 
 static const TypeInfo cmd646_ide_info = {
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 67e0998ff0..8bea92e394 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -467,7 +467,7 @@ static int ide_pci_post_load(void *opaque, int version_id)
 return 0;
 }
 
-const VMStateDescription vmstate_ide_pci = {
+static const VMStateDescription vmstate_ide_pci = {
 .name = "ide",
 .version_id = 3,
 .minimum_version_id = 0,
@@ -530,11 +530,34 @@ static void pci_ide_init(Object *obj)
 qdev_init_gpio_out(DEVICE(d), d->isa_irq, ARRAY_SIZE(d->isa_irq));
 }
 
+static void pci_ide_exitfn(PCIDevice *dev)
+{
+PCIIDEState *d = PCI_IDE(dev);
+unsigned i;
+
+for (i = 0; i < ARRAY_SIZE(d->bmdma); ++i) {
+memory_region_del_subregion(>bmdma_bar, >bmdma[i].extra_io);
+memory_region_del_subregion(>bmdma_bar, >bmdma[i].addr_ioport);
+}
+}
+
+static void pci_ide_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+dc->vmsd = _ide_pci;
+k->exit = pci_ide_exitfn;
+k->class_id = PCI_CLASS_STORAGE_IDE;
+set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
+}
+
 static const TypeInfo pci_ide_type_info = {
 .name = TYPE_PCI_IDE,
 .parent = TYPE_PCI_DEVICE,
 .instance_size = sizeof(PCIIDEState),
 .instance_init = pci_ide_init,
+.class_init = pci_ide_class_init,
 .abstract = true,
 .interfaces = (InterfaceInfo[]) {
 { INTERFACE_CONVENTIONAL_PCI_DEVICE },
diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index a32f7ccece..4e6ca99123 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -159,8 +159,6 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error 
**errp)
 bmdma_setup_bar(d);
 pci_register_bar(dev, 4, PCI_BASE_ADDRESS_SPACE_IO, >bmdma_bar);
 
-vmstate_register(VMSTATE_IF(dev), 0, _ide_pci, d);
-
 for (unsigned i = 0; i < 2; i++) {
 if (!pci_piix_init_bus(d, i, errp)) {
 return;
@@ -168,17 +166,6 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error 
**errp)
 }
 }
 
-static void pci_piix_ide_exitfn(PCIDevice *dev)
-{
-PCIIDEState *d = PCI_IDE(dev);
-unsigned i;
-
-for (i = 0; i < 2; ++i) {
-memory_region_del_subregion(>bmdma_bar, >bmdma[i].extra_io);
-memory_region_del_subregion(>bmdma_bar, >bmdma[i].addr_ioport);
-}
-}
-
 /* NOTE: for the PIIX3, the IRQs and IOports are hardcoded */
 static void piix3_ide_class_init(ObjectClass *klass, void *data)
 {
@@ -187,11 +174,8 @@ static void piix3_ide_class_init(ObjectClass *klass, void 
*data)
 
 dc->reset = piix_ide_reset;
 k->realize = pci_piix_ide_realize;
-k->exit = pci_piix_ide_exitfn;
 k->vendor_id = PCI_VENDOR_ID_INTEL;
 k->device_id = PCI_DEVICE_ID_INTEL_82371SB_1;
-k->class_id = PCI_CLASS_STORAGE_IDE;
-

[PATCH 02/13] hw/ide/via: Implement ISA IRQ routing

2023-04-22 Thread Bernhard Beschow

The VIA south bridge allows the legacy IDE interrupts to be routed to four
different ISA interrupts. This can be configured through the 0x4a register in
the PCI configuration space of the ISA function. The default routing matches
the legacy ISA IRQs, that is 14 and 15.

Implement this missing piece of the VIA south bridge.

Signed-off-by: Bernhard Beschow 
---
 hw/ide/via.c  |  6 --
 hw/isa/vt82c686.c | 17 +
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/hw/ide/via.c b/hw/ide/via.c
index 177baea9a7..0caae52276 100644
--- a/hw/ide/via.c
+++ b/hw/ide/via.c
@@ -31,6 +31,7 @@
 #include "sysemu/dma.h"
 #include "hw/isa/vt82c686.h"
 #include "hw/ide/pci.h"
+#include "hw/irq.h"
 #include "trace.h"
 
 static uint64_t bmdma_read(void *opaque, hwaddr addr,
@@ -104,7 +105,8 @@ static void bmdma_setup_bar(PCIIDEState *d)
 
 static void via_ide_set_irq(void *opaque, int n, int level)
 {
-PCIDevice *d = PCI_DEVICE(opaque);
+PCIIDEState *s = opaque;
+PCIDevice *d = PCI_DEVICE(s);
 
 if (level) {
 d->config[0x70 + n * 8] |= 0x80;
@@ -112,7 +114,7 @@ static void via_ide_set_irq(void *opaque, int n, int level)
 d->config[0x70 + n * 8] &= ~0x80;
 }
 
-via_isa_set_irq(pci_get_function_0(d), 14 + n, level);
+qemu_set_irq(s->isa_irq[n], level);
 }
 
 static void via_ide_reset(DeviceState *dev)
diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
index ca89119ce0..c7e29bb46a 100644
--- a/hw/isa/vt82c686.c
+++ b/hw/isa/vt82c686.c
@@ -568,9 +568,19 @@ static const VMStateDescription vmstate_via = {
 }
 };
 
+static void via_isa_set_ide_irq(void *opaque, int n, int level)
+{
+static const uint8_t irqs[] = { 14, 15, 10, 11 };
+ViaISAState *s = opaque;
+uint8_t irq = irqs[(s->dev.config[0x4a] >> (n * 2)) & 0x3];
+
+qemu_set_irq(s->isa_irqs_in[irq], level);
+}
+
 static void via_isa_init(Object *obj)
 {
 ViaISAState *s = VIA_ISA(obj);
+DeviceState *dev = DEVICE(s);
 
 object_initialize_child(obj, "rtc", >rtc, TYPE_MC146818_RTC);
 object_initialize_child(obj, "ide", >ide, TYPE_VIA_IDE);
@@ -578,6 +588,8 @@ static void via_isa_init(Object *obj)
 object_initialize_child(obj, "uhci2", >uhci[1], 
TYPE_VT82C686B_USB_UHCI);
 object_initialize_child(obj, "ac97", >ac97, TYPE_VIA_AC97);
 object_initialize_child(obj, "mc97", >mc97, TYPE_VIA_MC97);
+
+qdev_init_gpio_in_named(dev, via_isa_set_ide_irq, "ide", 
ARRAY_SIZE(s->ide.isa_irq));
 }
 
 static const TypeInfo via_isa_info = {
@@ -692,6 +704,10 @@ static void via_isa_realize(PCIDevice *d, Error **errp)
 if (!qdev_realize(DEVICE(>ide), BUS(pci_bus), errp)) {
 return;
 }
+for (i = 0; i < 2; i++) {
+qdev_connect_gpio_out(DEVICE(>ide), i,
+  qdev_get_gpio_in_named(DEVICE(s), "ide", i));
+}
 
 /* Functions 2-3: USB Ports */
 for (i = 0; i < ARRAY_SIZE(s->uhci); i++) {
@@ -814,6 +830,7 @@ static void vt8231_isa_reset(DeviceState *dev)
  PCI_COMMAND_MASTER | PCI_COMMAND_SPECIAL);
 pci_set_word(pci_conf + PCI_STATUS, PCI_STATUS_DEVSEL_MEDIUM);
 
+pci_conf[0x4a] = 0x04; /* IDE interrupt Routing */
 pci_conf[0x58] = 0x40; /* Miscellaneous Control 0 */
 pci_conf[0x67] = 0x08; /* Fast IR Config */
 pci_conf[0x6b] = 0x01; /* Fast IR I/O Base */
-- 
2.40.0

[PATCH 06/13] hw/ide: Extract bmdma_init_ops()

2023-04-22 Thread Bernhard Beschow

There are three private copies of bmdma_setup_bar() with small adaptions.
Consolidate them into one public implementation.

While at it rename the function to bmdma_init_ops() to reflect that the memory
regions being initialized represent BMDMA operations. The actual mapping as a
PCI BAR is still performed separately in each device.

Note that the bmdma_bar attribute will be renamed in a separate commit.

Signed-off-by: Bernhard Beschow 
---
 include/hw/ide/pci.h |  1 +
 hw/ide/cmd646.c  | 20 +---
 hw/ide/pci.c | 16 
 hw/ide/piix.c| 19 +--
 hw/ide/via.c | 19 +--
 5 files changed, 20 insertions(+), 55 deletions(-)

diff --git a/include/hw/ide/pci.h b/include/hw/ide/pci.h
index 7bc4e53d02..597c77c7ad 100644
--- a/include/hw/ide/pci.h
+++ b/include/hw/ide/pci.h
@@ -57,6 +57,7 @@ struct PCIIDEState {
 };
 
 void bmdma_init(IDEBus *bus, BMDMAState *bm, PCIIDEState *d);
+void bmdma_init_ops(PCIIDEState *d, const MemoryRegionOps *bmdma_ops);
 void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val);
 extern MemoryRegionOps bmdma_addr_ioport_ops;
 void pci_ide_create_devs(PCIDevice *dev);
diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
index 9aabf80e52..6fd09fe74e 100644
--- a/hw/ide/cmd646.c
+++ b/hw/ide/cmd646.c
@@ -161,24 +161,6 @@ static const MemoryRegionOps cmd646_bmdma_ops = {
 .write = bmdma_write,
 };
 
-static void bmdma_setup_bar(PCIIDEState *d)
-{
-BMDMAState *bm;
-int i;
-
-memory_region_init(>bmdma_bar, OBJECT(d), "cmd646-bmdma", 16);
-for(i = 0;i < 2; i++) {
-bm = >bmdma[i];
-memory_region_init_io(>extra_io, OBJECT(d), _bmdma_ops, bm,
-  "cmd646-bmdma-bus", 4);
-memory_region_add_subregion(>bmdma_bar, i * 8, >extra_io);
-memory_region_init_io(>addr_ioport, OBJECT(d),
-  _addr_ioport_ops, bm,
-  "cmd646-bmdma-ioport", 4);
-memory_region_add_subregion(>bmdma_bar, i * 8 + 4, 
>addr_ioport);
-}
-}
-
 static void cmd646_update_irq(PCIDevice *pd)
 {
 int pci_level;
@@ -285,7 +267,7 @@ static void pci_cmd646_ide_realize(PCIDevice *dev, Error 
**errp)
   >bus[1], "cmd646-cmd1", 4);
 pci_register_bar(dev, 3, PCI_BASE_ADDRESS_SPACE_IO, >cmd_bar[1]);
 
-bmdma_setup_bar(d);
+bmdma_init_ops(d, _bmdma_ops);
 pci_register_bar(dev, 4, PCI_BASE_ADDRESS_SPACE_IO, >bmdma_bar);
 
 /* TODO: RST# value should be 0 */
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 8bea92e394..65ed6f7f72 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -523,6 +523,22 @@ void bmdma_init(IDEBus *bus, BMDMAState *bm, PCIIDEState 
*d)
 bm->pci_dev = d;
 }
 
+void bmdma_init_ops(PCIIDEState *d, const MemoryRegionOps *bmdma_ops)
+{
+size_t i;
+
+memory_region_init(>bmdma_bar, OBJECT(d), "bmdma-container", 16);
+for (i = 0; i < ARRAY_SIZE(d->bmdma); i++) {
+BMDMAState *bm = >bmdma[i];
+
+memory_region_init_io(>extra_io, OBJECT(d), bmdma_ops, bm, 
"bmdma-ops", 4);
+memory_region_add_subregion(>bmdma_bar, i * 8, >extra_io);
+memory_region_init_io(>addr_ioport, OBJECT(d), 
_addr_ioport_ops, bm,
+  "bmdma-ioport-ops", 4);
+memory_region_add_subregion(>bmdma_bar, i * 8 + 4, 
>addr_ioport);
+}
+}
+
 static void pci_ide_init(Object *obj)
 {
 PCIIDEState *d = PCI_IDE(obj);
diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index 4e6ca99123..5611473d37 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -86,23 +86,6 @@ static const MemoryRegionOps piix_bmdma_ops = {
 .write = bmdma_write,
 };
 
-static void bmdma_setup_bar(PCIIDEState *d)
-{
-int i;
-
-memory_region_init(>bmdma_bar, OBJECT(d), "piix-bmdma-container", 16);
-for(i = 0;i < 2; i++) {
-BMDMAState *bm = >bmdma[i];
-
-memory_region_init_io(>extra_io, OBJECT(d), _bmdma_ops, bm,
-  "piix-bmdma", 4);
-memory_region_add_subregion(>bmdma_bar, i * 8, >extra_io);
-memory_region_init_io(>addr_ioport, OBJECT(d),
-  _addr_ioport_ops, bm, "bmdma", 4);
-memory_region_add_subregion(>bmdma_bar, i * 8 + 4, 
>addr_ioport);
-}
-}
-
 static void piix_ide_reset(DeviceState *dev)
 {
 PCIIDEState *d = PCI_IDE(dev);
@@ -156,7 +139,7 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error 
**errp)
 
 pci_conf[PCI_CLASS_PROG] = 0x80; // legacy ATA mode
 
-bmdma_setup_bar(d);
+bmdma_init_ops(d, _bmdma_ops);
 pci_register_bar(dev, 4, PCI_BASE_ADDRESS_SPACE_IO, >bmdma_bar);
 
 for (unsigned i = 0; i < 2; i++) {
diff --git a/hw/ide/via.c b/hw/ide/via.c
index 287143a005..40704e2857 100644
--- a/hw/ide/via.c
+++ b/hw/ide/via.c
@@ -86,23 +86,6 @@ static const MemoryRegionOps via_bmdma_ops = {
 .write = bmdma_write,
 };
 
-static void bmdma_setup_bar(PCIIDEState *d)
-{
-int i;
-
-memory_region_init(>bmdma_bar,

[PATCH 04/13] hw/ide: Extract IDEBus assignment into bmdma_init()

2023-04-22 Thread Bernhard Beschow

Every invocation of bmdma_init() is followed by `d->bmdma[i].bus = >bus[i]`.
Resolve this redundancy by extracting it into bmdma_init().

Signed-off-by: Bernhard Beschow 
---
 hw/ide/cmd646.c  | 1 -
 hw/ide/pci.c | 1 +
 hw/ide/piix.c| 1 -
 hw/ide/sii3112.c | 1 -
 hw/ide/via.c | 1 -
 5 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
index a68357c1c5..a094a6e12a 100644
--- a/hw/ide/cmd646.c
+++ b/hw/ide/cmd646.c
@@ -297,7 +297,6 @@ static void pci_cmd646_ide_realize(PCIDevice *dev, Error 
**errp)
 ide_bus_init_output_irq(>bus[i], qdev_get_gpio_in(ds, i));
 
 bmdma_init(>bus[i], >bmdma[i], d);
-d->bmdma[i].bus = >bus[i];
 ide_bus_register_restart_cb(>bus[i]);
 }
 }
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 942e216b9b..67e0998ff0 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -519,6 +519,7 @@ void bmdma_init(IDEBus *bus, BMDMAState *bm, PCIIDEState *d)
 bus->dma = >dma;
 bm->irq = bus->irq;
 bus->irq = qemu_allocate_irq(bmdma_irq, bm, 0);
+bm->bus = bus;
 bm->pci_dev = d;
 }
 
diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index 41d60921e3..a32f7ccece 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -144,7 +144,6 @@ static bool pci_piix_init_bus(PCIIDEState *d, unsigned i, 
Error **errp)
 ide_bus_init_output_irq(>bus[i], isa_get_irq(NULL, 
port_info[i].isairq));
 
 bmdma_init(>bus[i], >bmdma[i], d);
-d->bmdma[i].bus = >bus[i];
 ide_bus_register_restart_cb(>bus[i]);
 
 return true;
diff --git a/hw/ide/sii3112.c b/hw/ide/sii3112.c
index f9becdff8e..5dd3d03c29 100644
--- a/hw/ide/sii3112.c
+++ b/hw/ide/sii3112.c
@@ -287,7 +287,6 @@ static void sii3112_pci_realize(PCIDevice *dev, Error 
**errp)
 ide_bus_init_output_irq(>bus[i], qdev_get_gpio_in(ds, i));
 
 bmdma_init(>bus[i], >bmdma[i], s);
-s->bmdma[i].bus = >bus[i];
 ide_bus_register_restart_cb(>bus[i]);
 }
 }
diff --git a/hw/ide/via.c b/hw/ide/via.c
index 0caae52276..91253fa4ef 100644
--- a/hw/ide/via.c
+++ b/hw/ide/via.c
@@ -196,7 +196,6 @@ static void via_ide_realize(PCIDevice *dev, Error **errp)
 ide_bus_init_output_irq(>bus[i], qdev_get_gpio_in(ds, i));
 
 bmdma_init(>bus[i], >bmdma[i], d);
-d->bmdma[i].bus = >bus[i];
 ide_bus_register_restart_cb(>bus[i]);
 }
 }
-- 
2.40.0

[PATCH 07/13] hw/ide: Extract pci_ide_{cmd, data}_le_ops initialization into base class constructor

2023-04-22 Thread Bernhard Beschow

There is redundant code in cmd646 and via which can be extracted into the base
class. In case of piix and sii3112 this is currently unneccessary but shouldn't
interfere since the memory regions aren't mapped by those devices. In few
commits later this will be changed, i.e. those device models will also make use
of these memory regions.

Signed-off-by: Bernhard Beschow 
---
 hw/ide/cmd646.c | 11 ---
 hw/ide/pci.c| 10 ++
 hw/ide/via.c| 11 ---
 3 files changed, 10 insertions(+), 22 deletions(-)

diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
index 6fd09fe74e..85716aaf17 100644
--- a/hw/ide/cmd646.c
+++ b/hw/ide/cmd646.c
@@ -251,20 +251,9 @@ static void pci_cmd646_ide_realize(PCIDevice *dev, Error 
**errp)
 dev->wmask[MRDMODE] = 0x0;
 dev->w1cmask[MRDMODE] = MRDMODE_INTR_CH0 | MRDMODE_INTR_CH1;
 
-memory_region_init_io(>data_bar[0], OBJECT(d), _ide_data_le_ops,
-  >bus[0], "cmd646-data0", 8);
 pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_IO, >data_bar[0]);
-
-memory_region_init_io(>cmd_bar[0], OBJECT(d), _ide_cmd_le_ops,
-  >bus[0], "cmd646-cmd0", 4);
 pci_register_bar(dev, 1, PCI_BASE_ADDRESS_SPACE_IO, >cmd_bar[0]);
-
-memory_region_init_io(>data_bar[1], OBJECT(d), _ide_data_le_ops,
-  >bus[1], "cmd646-data1", 8);
 pci_register_bar(dev, 2, PCI_BASE_ADDRESS_SPACE_IO, >data_bar[1]);
-
-memory_region_init_io(>cmd_bar[1], OBJECT(d), _ide_cmd_le_ops,
-  >bus[1], "cmd646-cmd1", 4);
 pci_register_bar(dev, 3, PCI_BASE_ADDRESS_SPACE_IO, >cmd_bar[1]);
 
 bmdma_init_ops(d, _bmdma_ops);
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 65ed6f7f72..a9194313bd 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -543,6 +543,16 @@ static void pci_ide_init(Object *obj)
 {
 PCIIDEState *d = PCI_IDE(obj);
 
+memory_region_init_io(>data_bar[0], OBJECT(d), _ide_data_le_ops,
+  >bus[0], "pci-ide0-data-ops", 8);
+memory_region_init_io(>cmd_bar[0], OBJECT(d), _ide_cmd_le_ops,
+  >bus[0], "pci-ide0-cmd-ops", 4);
+
+memory_region_init_io(>data_bar[1], OBJECT(d), _ide_data_le_ops,
+  >bus[1], "pci-ide1-data-ops", 8);
+memory_region_init_io(>cmd_bar[1], OBJECT(d), _ide_cmd_le_ops,
+  >bus[1], "pci-ide1-cmd-ops", 4);
+
 qdev_init_gpio_out(DEVICE(d), d->isa_irq, ARRAY_SIZE(d->isa_irq));
 }
 
diff --git a/hw/ide/via.c b/hw/ide/via.c
index 40704e2857..704a8024cb 100644
--- a/hw/ide/via.c
+++ b/hw/ide/via.c
@@ -154,20 +154,9 @@ static void via_ide_realize(PCIDevice *dev, Error **errp)
 dev->wmask[PCI_INTERRUPT_LINE] = 0;
 dev->wmask[PCI_CLASS_PROG] = 5;
 
-memory_region_init_io(>data_bar[0], OBJECT(d), _ide_data_le_ops,
-  >bus[0], "via-ide0-data", 8);
 pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_IO, >data_bar[0]);
-
-memory_region_init_io(>cmd_bar[0], OBJECT(d), _ide_cmd_le_ops,
-  >bus[0], "via-ide0-cmd", 4);
 pci_register_bar(dev, 1, PCI_BASE_ADDRESS_SPACE_IO, >cmd_bar[0]);
-
-memory_region_init_io(>data_bar[1], OBJECT(d), _ide_data_le_ops,
-  >bus[1], "via-ide1-data", 8);
 pci_register_bar(dev, 2, PCI_BASE_ADDRESS_SPACE_IO, >data_bar[1]);
-
-memory_region_init_io(>cmd_bar[1], OBJECT(d), _ide_cmd_le_ops,
-  >bus[1], "via-ide1-cmd", 4);
 pci_register_bar(dev, 3, PCI_BASE_ADDRESS_SPACE_IO, >cmd_bar[1]);
 
 bmdma_init_ops(d, _bmdma_ops);
-- 
2.40.0

[PATCH 11/13] hw/ide/sii3112: Reuse PCIIDEState::{cmd,data}_ops

2023-04-22 Thread Bernhard Beschow

Allows to unexport pci_ide_{cmd,data}_le_ops and models TYPE_SII3112_PCI as a
standard-compliant PCI IDE device.

Signed-off-by: Bernhard Beschow 
---
 include/hw/ide/pci.h |  2 --
 hw/ide/pci.c |  4 ++--
 hw/ide/sii3112.c | 50 
 3 files changed, 20 insertions(+), 36 deletions(-)

diff --git a/include/hw/ide/pci.h b/include/hw/ide/pci.h
index 5025df5b82..dbb4b13161 100644
--- a/include/hw/ide/pci.h
+++ b/include/hw/ide/pci.h
@@ -62,6 +62,4 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val);
 extern MemoryRegionOps bmdma_addr_ioport_ops;
 void pci_ide_create_devs(PCIDevice *dev);
 
-extern const MemoryRegionOps pci_ide_cmd_le_ops;
-extern const MemoryRegionOps pci_ide_data_le_ops;
 #endif
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index b2fcc00a64..97ccc75aa6 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -60,7 +60,7 @@ static void pci_ide_ctrl_write(void *opaque, hwaddr addr,
 ide_ctrl_write(bus, addr + 2, data);
 }
 
-const MemoryRegionOps pci_ide_cmd_le_ops = {
+static const MemoryRegionOps pci_ide_cmd_le_ops = {
 .read = pci_ide_status_read,
 .write = pci_ide_ctrl_write,
 .endianness = DEVICE_LITTLE_ENDIAN,
@@ -98,7 +98,7 @@ static void pci_ide_data_write(void *opaque, hwaddr addr,
 }
 }
 
-const MemoryRegionOps pci_ide_data_le_ops = {
+static const MemoryRegionOps pci_ide_data_le_ops = {
 .read = pci_ide_data_read,
 .write = pci_ide_data_write,
 .endianness = DEVICE_LITTLE_ENDIAN,
diff --git a/hw/ide/sii3112.c b/hw/ide/sii3112.c
index 0af897a9ef..9cf920369f 100644
--- a/hw/ide/sii3112.c
+++ b/hw/ide/sii3112.c
@@ -88,21 +88,9 @@ static uint64_t sii3112_reg_read(void *opaque, hwaddr addr,
 val |= (d->regs[1].confstat & (1UL << 11) ? (1 << 4) : 0);
 val |= (uint32_t)d->i.bmdma[1].status << 16;
 break;
-case 0x80 ... 0x87:
-val = pci_ide_data_le_ops.read(>i.bus[0], addr - 0x80, size);
-break;
-case 0x8a:
-val = pci_ide_cmd_le_ops.read(>i.bus[0], 2, size);
-break;
 case 0xa0:
 val = d->regs[0].confstat;
 break;
-case 0xc0 ... 0xc7:
-val = pci_ide_data_le_ops.read(>i.bus[1], addr - 0xc0, size);
-break;
-case 0xca:
-val = pci_ide_cmd_le_ops.read(>i.bus[1], 2, size);
-break;
 case 0xe0:
 val = d->regs[1].confstat;
 break;
@@ -171,18 +159,6 @@ static void sii3112_reg_write(void *opaque, hwaddr addr,
 case 0x0c ... 0x0f:
 bmdma_addr_ioport_ops.write(>i.bmdma[1], addr - 12, val, size);
 break;
-case 0x80 ... 0x87:
-pci_ide_data_le_ops.write(>i.bus[0], addr - 0x80, val, size);
-break;
-case 0x8a:
-pci_ide_cmd_le_ops.write(>i.bus[0], 2, val, size);
-break;
-case 0xc0 ... 0xc7:
-pci_ide_data_le_ops.write(>i.bus[1], addr - 0xc0, val, size);
-break;
-case 0xca:
-pci_ide_cmd_le_ops.write(>i.bus[1], 2, val, size);
-break;
 case 0x100:
 d->regs[0].scontrol = val & 0xfff;
 if (val & 1) {
@@ -259,6 +235,11 @@ static void sii3112_pci_realize(PCIDevice *dev, Error 
**errp)
 pci_config_set_interrupt_pin(dev->config, 1);
 pci_set_byte(dev->config + PCI_CACHE_LINE_SIZE, 8);
 
+pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_IO, >data_ops[0]);
+pci_register_bar(dev, 1, PCI_BASE_ADDRESS_SPACE_IO, >cmd_ops[0]);
+pci_register_bar(dev, 2, PCI_BASE_ADDRESS_SPACE_IO, >data_ops[1]);
+pci_register_bar(dev, 3, PCI_BASE_ADDRESS_SPACE_IO, >cmd_ops[1]);
+
 /* BAR5 is in PCI memory space */
 memory_region_init_io(>mmio, OBJECT(d), _reg_ops, d,
  "sii3112.bar5", 0x200);
@@ -266,17 +247,22 @@ static void sii3112_pci_realize(PCIDevice *dev, Error 
**errp)
 
 /* BAR0-BAR4 are PCI I/O space aliases into BAR5 */
 mr = g_new(MemoryRegion, 1);
-memory_region_init_alias(mr, OBJECT(d), "sii3112.bar0", >mmio, 0x80, 8);
-pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_IO, mr);
+memory_region_init_alias(mr, OBJECT(d), "sii3112.bar0", >data_ops[0], 0,
+ memory_region_size(>data_ops[0]));
+memory_region_add_subregion_overlap(>mmio, 0x80, mr, 1);
 mr = g_new(MemoryRegion, 1);
-memory_region_init_alias(mr, OBJECT(d), "sii3112.bar1", >mmio, 0x88, 4);
-pci_register_bar(dev, 1, PCI_BASE_ADDRESS_SPACE_IO, mr);
+memory_region_init_alias(mr, OBJECT(d), "sii3112.bar1", >cmd_ops[0], 0,
+ memory_region_size(>cmd_ops[0]));
+memory_region_add_subregion_overlap(>mmio, 0x88, mr, 1);
 mr = g_new(MemoryRegion, 1);
-memory_region_init_alias(mr, OBJECT(d), "sii3112.bar2", >mmio, 0xc0, 8);
-pci_register_bar(dev, 2, PCI_BASE_ADDRESS_SPACE_IO, mr);
+memory_region_init_alias(mr, OBJECT(d), "sii3112.bar2", >data_ops[1], 0,
+ memory_region_size(>data_ops[1]));
+memory_region_add_subregion_overlap(>mmio, 0xc0, mr, 1);
 mr =

[PATCH 08/13] hw/ide: Rename PCIIDEState::*_bar attributes

2023-04-22 Thread Bernhard Beschow

The attributes represent memory regions containing operations which are mapped
by the device models into PCI BARs. Reflect this by changing the suffic into
"_ops".

Note that in a few commits piix will also use the {cmd,data}_ops but won't map
them into BARs. This further suggests that the "_bar" suffix doesn't match
very well.

Signed-off-by: Bernhard Beschow 
---
 include/hw/ide/pci.h |  6 +++---
 hw/ide/cmd646.c  | 10 +-
 hw/ide/pci.c | 18 +-
 hw/ide/piix.c|  2 +-
 hw/ide/via.c | 10 +-
 5 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/include/hw/ide/pci.h b/include/hw/ide/pci.h
index 597c77c7ad..5025df5b82 100644
--- a/include/hw/ide/pci.h
+++ b/include/hw/ide/pci.h
@@ -51,9 +51,9 @@ struct PCIIDEState {
 BMDMAState bmdma[2];
 qemu_irq isa_irq[2];
 uint32_t secondary; /* used only for cmd646 */
-MemoryRegion bmdma_bar;
-MemoryRegion cmd_bar[2];
-MemoryRegion data_bar[2];
+MemoryRegion bmdma_ops;
+MemoryRegion cmd_ops[2];
+MemoryRegion data_ops[2];
 };
 
 void bmdma_init(IDEBus *bus, BMDMAState *bm, PCIIDEState *d);
diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
index 85716aaf17..b9d005a357 100644
--- a/hw/ide/cmd646.c
+++ b/hw/ide/cmd646.c
@@ -251,13 +251,13 @@ static void pci_cmd646_ide_realize(PCIDevice *dev, Error 
**errp)
 dev->wmask[MRDMODE] = 0x0;
 dev->w1cmask[MRDMODE] = MRDMODE_INTR_CH0 | MRDMODE_INTR_CH1;
 
-pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_IO, >data_bar[0]);
-pci_register_bar(dev, 1, PCI_BASE_ADDRESS_SPACE_IO, >cmd_bar[0]);
-pci_register_bar(dev, 2, PCI_BASE_ADDRESS_SPACE_IO, >data_bar[1]);
-pci_register_bar(dev, 3, PCI_BASE_ADDRESS_SPACE_IO, >cmd_bar[1]);
+pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_IO, >data_ops[0]);
+pci_register_bar(dev, 1, PCI_BASE_ADDRESS_SPACE_IO, >cmd_ops[0]);
+pci_register_bar(dev, 2, PCI_BASE_ADDRESS_SPACE_IO, >data_ops[1]);
+pci_register_bar(dev, 3, PCI_BASE_ADDRESS_SPACE_IO, >cmd_ops[1]);
 
 bmdma_init_ops(d, _bmdma_ops);
-pci_register_bar(dev, 4, PCI_BASE_ADDRESS_SPACE_IO, >bmdma_bar);
+pci_register_bar(dev, 4, PCI_BASE_ADDRESS_SPACE_IO, >bmdma_ops);
 
 /* TODO: RST# value should be 0 */
 pci_conf[PCI_INTERRUPT_PIN] = 0x01; // interrupt on pin 1
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index a9194313bd..b2fcc00a64 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -527,15 +527,15 @@ void bmdma_init_ops(PCIIDEState *d, const MemoryRegionOps 
*bmdma_ops)
 {
 size_t i;
 
-memory_region_init(>bmdma_bar, OBJECT(d), "bmdma-container", 16);
+memory_region_init(>bmdma_ops, OBJECT(d), "bmdma-container", 16);
 for (i = 0; i < ARRAY_SIZE(d->bmdma); i++) {
 BMDMAState *bm = >bmdma[i];
 
 memory_region_init_io(>extra_io, OBJECT(d), bmdma_ops, bm, 
"bmdma-ops", 4);
-memory_region_add_subregion(>bmdma_bar, i * 8, >extra_io);
+memory_region_add_subregion(>bmdma_ops, i * 8, >extra_io);
 memory_region_init_io(>addr_ioport, OBJECT(d), 
_addr_ioport_ops, bm,
   "bmdma-ioport-ops", 4);
-memory_region_add_subregion(>bmdma_bar, i * 8 + 4, 
>addr_ioport);
+memory_region_add_subregion(>bmdma_ops, i * 8 + 4, 
>addr_ioport);
 }
 }
 
@@ -543,14 +543,14 @@ static void pci_ide_init(Object *obj)
 {
 PCIIDEState *d = PCI_IDE(obj);
 
-memory_region_init_io(>data_bar[0], OBJECT(d), _ide_data_le_ops,
+memory_region_init_io(>data_ops[0], OBJECT(d), _ide_data_le_ops,
   >bus[0], "pci-ide0-data-ops", 8);
-memory_region_init_io(>cmd_bar[0], OBJECT(d), _ide_cmd_le_ops,
+memory_region_init_io(>cmd_ops[0], OBJECT(d), _ide_cmd_le_ops,
   >bus[0], "pci-ide0-cmd-ops", 4);
 
-memory_region_init_io(>data_bar[1], OBJECT(d), _ide_data_le_ops,
+memory_region_init_io(>data_ops[1], OBJECT(d), _ide_data_le_ops,
   >bus[1], "pci-ide1-data-ops", 8);
-memory_region_init_io(>cmd_bar[1], OBJECT(d), _ide_cmd_le_ops,
+memory_region_init_io(>cmd_ops[1], OBJECT(d), _ide_cmd_le_ops,
   >bus[1], "pci-ide1-cmd-ops", 4);
 
 qdev_init_gpio_out(DEVICE(d), d->isa_irq, ARRAY_SIZE(d->isa_irq));
@@ -562,8 +562,8 @@ static void pci_ide_exitfn(PCIDevice *dev)
 unsigned i;
 
 for (i = 0; i < ARRAY_SIZE(d->bmdma); ++i) {
-memory_region_del_subregion(>bmdma_bar, >bmdma[i].extra_io);
-memory_region_del_subregion(>bmdma_bar, >bmdma[i].addr_ioport);
+memory_region_del_subregion(>bmdma_ops, >bmdma[i].extra_io);
+memory_region_del_subregion(>bmdma_ops, >bmdma[i].addr_ioport);
 }
 }
 
diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index 5611473d37..6942b484f9 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -140,7 +140,7 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error 
**errp)
 pci_conf[PCI_CLASS_PROG] = 0x80; // legacy ATA mode
 
 bmdma_init_ops(d, _bmdma_ops);
-

[PATCH 01/13] hw/ide/pci: Expose legacy interrupts as GPIOs

2023-04-22 Thread Bernhard Beschow

Exposing the legacy IDE interrupts as GPIOs allows them to be connected in the
parent device through qdev_connect_gpio_out(), i.e. without accessing private
data of TYPE_PCI_IDE.

Signed-off-by: Bernhard Beschow 
---
 hw/ide/pci.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index fc9224bbc9..942e216b9b 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -522,10 +522,18 @@ void bmdma_init(IDEBus *bus, BMDMAState *bm, PCIIDEState 
*d)
 bm->pci_dev = d;
 }
 
+static void pci_ide_init(Object *obj)
+{
+PCIIDEState *d = PCI_IDE(obj);
+
+qdev_init_gpio_out(DEVICE(d), d->isa_irq, ARRAY_SIZE(d->isa_irq));
+}
+
 static const TypeInfo pci_ide_type_info = {
 .name = TYPE_PCI_IDE,
 .parent = TYPE_PCI_DEVICE,
 .instance_size = sizeof(PCIIDEState),
+.instance_init = pci_ide_init,
 .abstract = true,
 .interfaces = (InterfaceInfo[]) {
 { INTERFACE_CONVENTIONAL_PCI_DEVICE },
-- 
2.40.0

[PATCH 00/13] Clean up PCI IDE device models

2023-04-22 Thread Bernhard Beschow

This series is yet another attempt to clean up the PCI IDE models. It is mainly
inspired the Mark's invaluable input from previous discussions. In particular,
this series attempts to follow the "PCI IDE controller specification" closer. As
a side effect, it also resolves usage of the isabus global in PIIX. Last but not
least it fixes the VIA IDE controller to not depend on its south bridge which
fixes a circular dependency.

The series is structured as follows: The first three commits resolve a circular
dependency between the VIA IDE controller and its south bridge, thereby
implementing legacy PCI IDE interrupt routing which was missing so far. The next
five patches factor out common code into the PCI IDE base class. The next two
patches resolve usage of the isabus global in PIIX by reusing now common code
from the base class. The same is then done for the SIL3112 controller. Finally,
a small convenience function is introduced which should hide some implementation
details in the PCI IDE base class.

Testing done:
* `make check`
* `make check-avocado`
* `qemu-system-ppc -machine pegasos2 -rtc base=localtime -device \
   ati-vga,guest_hwcursor=true,romfile="" -cdrom morphos-3.17.iso
   -kernel morphos-3.17/boot.img`
   The machine booted successfully and a startup sound was hearable
* `qemu-system-ppc -machine pegasos2 -rtc base=localtime -device \
   ati-vga,guest_hwcursor=true,romfile="" -cdrom morphos-3.17.iso
   -kernel morphos-3.17/boot.img`
   The machine booted successfully and applications could be started.
* qemu-system-x86_64 was used for hours during work

Bernhard Beschow (13):
  hw/ide/pci: Expose legacy interrupts as GPIOs
  hw/ide/via: Implement ISA IRQ routing
  hw/isa/vt82c686: Remove via_isa_set_irq()
  hw/ide: Extract IDEBus assignment into bmdma_init()
  hw/ide: Extract pci_ide_class_init()
  hw/ide: Extract bmdma_init_ops()
  hw/ide: Extract pci_ide_{cmd,data}_le_ops initialization into base
class constructor
  hw/ide: Rename PCIIDEState::*_bar attributes
  hw/ide/piix: Disuse isa_get_irq()
  hw/ide/piix: Reuse PCIIDEState::{cmd,data}_ops
  hw/ide/sii3112: Reuse PCIIDEState::{cmd,data}_ops
  hw/ide/sii3112: Reuse PCIIDEState::bmdma_ops
  hw/ide: Extract bmdma_clear_status()

 include/hw/ide/pci.h  |  12 ++-
 include/hw/isa/vt82c686.h |   2 -
 hw/ide/cmd646.c   |  59 ++-
 hw/ide/pci.c  |  73 ++-
 hw/ide/piix.c |  88 --
 hw/ide/sii3112.c  | 150 +++---
 hw/ide/via.c  |  64 +++-
 hw/isa/vt82c686.c |  23 --
 hw/ide/trace-events   |   7 +-
 9 files changed, 221 insertions(+), 257 deletions(-)

-- 
2.40.0

[PATCH 13/13] hw/ide: Extract bmdma_clear_status()

2023-04-22 Thread Bernhard Beschow

Extract bmdma_clear_status() mirroring bmdma_cmd_writeb().

Signed-off-by: Bernhard Beschow 
---
 include/hw/ide/pci.h |  1 +
 hw/ide/cmd646.c  |  2 +-
 hw/ide/pci.c |  7 +++
 hw/ide/piix.c|  2 +-
 hw/ide/sii3112.c | 12 +---
 hw/ide/via.c |  2 +-
 hw/ide/trace-events  |  1 +
 7 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/include/hw/ide/pci.h b/include/hw/ide/pci.h
index 81e0370202..6a286ad307 100644
--- a/include/hw/ide/pci.h
+++ b/include/hw/ide/pci.h
@@ -59,6 +59,7 @@ struct PCIIDEState {
 void bmdma_init(IDEBus *bus, BMDMAState *bm, PCIIDEState *d);
 void bmdma_init_ops(PCIIDEState *d, const MemoryRegionOps *bmdma_ops);
 void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val);
+void bmdma_clear_status(BMDMAState *bm, uint32_t val);
 void pci_ide_create_devs(PCIDevice *dev);
 
 #endif
diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
index b9d005a357..973c3ff0dc 100644
--- a/hw/ide/cmd646.c
+++ b/hw/ide/cmd646.c
@@ -144,7 +144,7 @@ static void bmdma_write(void *opaque, hwaddr addr,
 cmd646_update_irq(pci_dev);
 break;
 case 2:
-bm->status = (val & 0x60) | (bm->status & 1) | (bm->status & ~val & 
0x06);
+bmdma_clear_status(bm, val);
 break;
 case 3:
 if (bm == >pci_dev->bmdma[0]) {
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 3539b162b7..4aa06be7c6 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -318,6 +318,13 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val)
 bm->cmd = val & 0x09;
 }
 
+void bmdma_clear_status(BMDMAState *bm, uint32_t val)
+{
+trace_bmdma_update_status(val);
+
+bm->status = (val & 0x60) | (bm->status & BM_STATUS_DMAING) | (bm->status 
& ~val & 0x06);
+}
+
 static uint64_t bmdma_addr_read(void *opaque, hwaddr addr,
 unsigned width)
 {
diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index 406a67fa0f..9eab615e35 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -76,7 +76,7 @@ static void bmdma_write(void *opaque, hwaddr addr,
 bmdma_cmd_writeb(bm, val);
 break;
 case 2:
-bm->status = (val & 0x60) | (bm->status & 1) | (bm->status & ~val & 
0x06);
+bmdma_clear_status(bm, val);
 break;
 }
 }
diff --git a/hw/ide/sii3112.c b/hw/ide/sii3112.c
index 373c0dd1ee..1180ff55e7 100644
--- a/hw/ide/sii3112.c
+++ b/hw/ide/sii3112.c
@@ -66,7 +66,7 @@ static void sii3112_bmdma_write(void *opaque, hwaddr addr,
 uint64_t val, unsigned int size)
 {
 BMDMAState *bm = opaque;
-SiI3112PCIState *d = SII3112_PCI(bm->pci_dev);
+SiI3112PCIState *s = SII3112_PCI(bm->pci_dev);
 int i = (bm == >pci_dev->bmdma[0]) ? 0 : 1;
 
 trace_sii3112_bmdma_write(size, addr, val);
@@ -75,10 +75,10 @@ static void sii3112_bmdma_write(void *opaque, hwaddr addr,
 bmdma_cmd_writeb(bm, val);
 break;
 case 0x01:
-d->regs[i].swdata = val & 0x3f;
+s->regs[i].swdata = val & 0x3f;
 break;
 case 0x02:
-bm->status = (val & 0x60) | (bm->status & 1) | (bm->status & ~val & 6);
+bmdma_clear_status(bm, val);
 break;
 default:
 break;
@@ -160,8 +160,7 @@ static void sii3112_reg_write(void *opaque, hwaddr addr,
 d->regs[0].swdata = val & 0x3f;
 break;
 case 0x12:
-d->i.bmdma[0].status = (val & 0x60) | (d->i.bmdma[0].status & 1) |
-   (d->i.bmdma[0].status & ~val & 6);
+bmdma_clear_status(>i.bmdma[0], val);
 break;
 case 0x18:
 bmdma_cmd_writeb(>i.bmdma[1], val);
@@ -170,8 +169,7 @@ static void sii3112_reg_write(void *opaque, hwaddr addr,
 d->regs[1].swdata = val & 0x3f;
 break;
 case 0x1a:
-d->i.bmdma[1].status = (val & 0x60) | (d->i.bmdma[1].status & 1) |
-   (d->i.bmdma[1].status & ~val & 6);
+bmdma_clear_status(>i.bmdma[1], val);
 break;
 case 0x100:
 d->regs[0].scontrol = val & 0xfff;
diff --git a/hw/ide/via.c b/hw/ide/via.c
index 35dd97e49b..afb97f302a 100644
--- a/hw/ide/via.c
+++ b/hw/ide/via.c
@@ -75,7 +75,7 @@ static void bmdma_write(void *opaque, hwaddr addr,
 bmdma_cmd_writeb(bm, val);
 break;
 case 2:
-bm->status = (val & 0x60) | (bm->status & 1) | (bm->status & ~val & 
0x06);
+bmdma_clear_status(bm, val);
 break;
 default:;
 }
diff --git a/hw/ide/trace-events b/hw/ide/trace-events
index a479525e38..d219c64b61 100644
--- a/hw/ide/trace-events
+++ b/hw/ide/trace-events
@@ -30,6 +30,7 @@ bmdma_write_cmd646(uint64_t addr, uint64_t val) "bmdma: 
writeb 0x%"PRIx64" : 0x%
 # pci.c
 bmdma_reset(void) ""
 bmdma_cmd_writeb(uint32_t val) "val: 0x%08x"
+bmdma_update_status(uint32_t val) "val: 0x%08x"
 bmdma_addr_read(uint64_t data) "data: 0x%016"PRIx64
 bmdma_addr_write(uint64_t data) "data: 0x%016"PRIx64
 
-- 
2.40.0

[PATCH 10/13] hw/ide/piix: Reuse PCIIDEState::{cmd,data}_ops

2023-04-22 Thread Bernhard Beschow

Now that PCIIDEState::{cmd,data}_ops are initialized in the base class
constructor there is an opportunity for PIIX to reuse these attributes. This
resolves usage of ide_init_ioport() which would fall back internally to using
the isabus global due to NULL being passed as ISADevice by PIIX.

Signed-off-by: Bernhard Beschow 
---
 hw/ide/piix.c | 30 +-
 1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index a3a15dc7db..406a67fa0f 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -104,34 +104,32 @@ static void piix_ide_reset(DeviceState *dev)
 pci_set_byte(pci_conf + 0x20, 0x01);  /* BMIBA: 20-23h */
 }
 
-static bool pci_piix_init_bus(PCIIDEState *d, unsigned i, ISABus *isa_bus,
-  Error **errp)
+static void pci_piix_init_bus(PCIIDEState *d, unsigned i, ISABus *isa_bus)
 {
 static const struct {
 int iobase;
 int iobase2;
 int isairq;
 } port_info[] = {
-{0x1f0, 0x3f6, 14},
-{0x170, 0x376, 15},
+{0x1f0, 0x3f4, 14},
+{0x170, 0x374, 15},
 };
-int ret;
+MemoryRegion *address_space_io = pci_address_space_io(PCI_DEVICE(d));
 
 ide_bus_init(>bus[i], sizeof(d->bus[i]), DEVICE(d), i, 2);
-ret = ide_init_ioport(>bus[i], NULL, port_info[i].iobase,
-  port_info[i].iobase2);
-if (ret) {
-error_setg_errno(errp, -ret, "Failed to realize %s port %u",
- object_get_typename(OBJECT(d)), i);
-return false;
-}
+memory_region_add_subregion(address_space_io, port_info[i].iobase,
+>data_ops[i]);
+/*
+ * PIIX forwards the last byte of cmd_ops to ISA. Model this using a low
+ * prio so competing memory regions take precedence.
+ */
+memory_region_add_subregion_overlap(address_space_io, port_info[i].iobase2,
+>cmd_ops[i], -1);
 ide_bus_init_output_irq(>bus[i],
 isa_bus_get_irq(isa_bus, port_info[i].isairq));
 
 bmdma_init(>bus[i], >bmdma[i], d);
 ide_bus_register_restart_cb(>bus[i]);
-
-return true;
 }
 
 static void pci_piix_ide_realize(PCIDevice *dev, Error **errp)
@@ -160,9 +158,7 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error 
**errp)
 }
 
 for (unsigned i = 0; i < 2; i++) {
-if (!pci_piix_init_bus(d, i, isa_bus, errp)) {
-return;
-}
+pci_piix_init_bus(d, i, isa_bus);
 }
 }
 
-- 
2.40.0

[PATCH 09/13] hw/ide/piix: Disuse isa_get_irq()

2023-04-22 Thread Bernhard Beschow

isa_get_irq() asks for an ISADevice which piix-ide doesn't provide.
Passing a NULL pointer works but causes the isabus global to be used
then. By fishing out TYPE_ISA_BUS from the QOM tree it is possible to
achieve the same as using isa_get_irq().

This is an alternative solution to commit 9405d87be25d 'hw/ide: Fix
crash when plugging a piix3-ide device into the x-remote machine' which
allows for cleaning up the ISA API while keeping PIIX IDE functions
user-createable.

Signed-off-by: Bernhard Beschow 
---
 hw/ide/piix.c | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index 6942b484f9..a3a15dc7db 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -104,7 +104,8 @@ static void piix_ide_reset(DeviceState *dev)
 pci_set_byte(pci_conf + 0x20, 0x01);  /* BMIBA: 20-23h */
 }
 
-static bool pci_piix_init_bus(PCIIDEState *d, unsigned i, Error **errp)
+static bool pci_piix_init_bus(PCIIDEState *d, unsigned i, ISABus *isa_bus,
+  Error **errp)
 {
 static const struct {
 int iobase;
@@ -124,7 +125,8 @@ static bool pci_piix_init_bus(PCIIDEState *d, unsigned i, 
Error **errp)
  object_get_typename(OBJECT(d)), i);
 return false;
 }
-ide_bus_init_output_irq(>bus[i], isa_get_irq(NULL, 
port_info[i].isairq));
+ide_bus_init_output_irq(>bus[i],
+isa_bus_get_irq(isa_bus, port_info[i].isairq));
 
 bmdma_init(>bus[i], >bmdma[i], d);
 ide_bus_register_restart_cb(>bus[i]);
@@ -136,14 +138,29 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error 
**errp)
 {
 PCIIDEState *d = PCI_IDE(dev);
 uint8_t *pci_conf = dev->config;
+ISABus *isa_bus;
+bool ambiguous;
 
 pci_conf[PCI_CLASS_PROG] = 0x80; // legacy ATA mode
 
 bmdma_init_ops(d, _bmdma_ops);
 pci_register_bar(dev, 4, PCI_BASE_ADDRESS_SPACE_IO, >bmdma_ops);
 
+isa_bus = ISA_BUS(object_resolve_path_type("", TYPE_ISA_BUS, ));
+if (ambiguous) {
+error_setg(errp,
+   "More than one ISA bus found while %s supports only one",
+   object_get_typename(OBJECT(d)));
+return;
+}
+if (!isa_bus) {
+error_setg(errp, "No ISA bus found while %s requires one",
+   object_get_typename(OBJECT(d)));
+return;
+}
+
 for (unsigned i = 0; i < 2; i++) {
-if (!pci_piix_init_bus(d, i, errp)) {
+if (!pci_piix_init_bus(d, i, isa_bus, errp)) {
 return;
 }
 }
-- 
2.40.0

Re: [RFC PATCH 08/13] gfxstream + rutabaga prep: added need defintions, fields, and options

2023-04-22 Thread Akihiko Odaki


On 2023/04/21 10:12, Gurchetan Singh wrote:

This modifies the common virtio-gpu.h file have the fields and
defintions needed by gfxstream/rutabaga.  It also modifies VirtioGPUGL
to have the runtime options needed by rutabaga.  They are:

- a colon separated list of capset names, defined in the virtio spec
- a wayland socket path to enable guest Wayland passthrough

The command to run these would be:

-device virtio-vga-gl,capset_names=gfxstream:cross-domain, \
 wayland_socket_path=/run/user/1000/wayland-0,hostmem=8G  \


It will be nice if it automatically determines the socket path according to:
https://wayland.freedesktop.org/docs/html/apb.html#Client-classwl__display_1af048371dfef7577bd39a3c04b78d0374

A documentation to set up cross-domain just like what you can find in 
the crosvm book* will also be helpful.

https://crosvm.dev/book/devices/wayland.html



Signed-off-by: Gurchetan Singh 
---
  hw/display/virtio-gpu-gl.c | 2 ++
  include/hw/virtio/virtio-gpu.h | 8 
  2 files changed, 10 insertions(+)

diff --git a/hw/display/virtio-gpu-gl.c b/hw/display/virtio-gpu-gl.c
index 547e697333..15270b0c8a 100644
--- a/hw/display/virtio-gpu-gl.c
+++ b/hw/display/virtio-gpu-gl.c
@@ -29,6 +29,8 @@ static void virtio_gpu_gl_device_realize(DeviceState *qdev, 
Error **errp)
  static Property virtio_gpu_gl_properties[] = {
  DEFINE_PROP_BIT("stats", VirtIOGPU, parent_obj.conf.flags,
  VIRTIO_GPU_FLAG_STATS_ENABLED, false),
+DEFINE_PROP_STRING("capset_names", VirtIOGPUGL, capset_names),
+DEFINE_PROP_STRING("wayland_socket_path", VirtIOGPUGL, 
wayland_socket_path),
  DEFINE_PROP_END_OF_LIST(),
  };
  
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h

index 421733d751..a35ade3608 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -94,6 +94,7 @@ enum virtio_gpu_base_conf_flags {
  VIRTIO_GPU_FLAG_DMABUF_ENABLED,
  VIRTIO_GPU_FLAG_BLOB_ENABLED,
  VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED,
+VIRTIO_GPU_FLAG_RUTABAGA_ENABLED,
  };
  
  #define virtio_gpu_virgl_enabled(_cfg) \

@@ -106,6 +107,8 @@ enum virtio_gpu_base_conf_flags {
  (_cfg.flags & (1 << VIRTIO_GPU_FLAG_DMABUF_ENABLED))
  #define virtio_gpu_blob_enabled(_cfg) \
  (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED))
+#define virtio_gpu_rutabaga_enabled(_cfg) \
+(_cfg.flags & (1 << VIRTIO_GPU_FLAG_RUTABAGA_ENABLED))
  #define virtio_gpu_hostmem_enabled(_cfg) \
  (_cfg.hostmem > 0)
  #define virtio_gpu_context_init_enabled(_cfg) \
@@ -217,6 +220,11 @@ struct VirtIOGPUGL {
  
  bool renderer_inited;

  bool renderer_reset;
+
+char *capset_names;
+char *wayland_socket_path;
+uint32_t num_capsets;
+void *rutabaga;


I prefer to have a line:
struct rutabaga;

In virtio-gpu.h and use it here. Perhaps it may be a bit weird to have 
such a declaration in a renderer-independent file, but it's practically 
harmless, can prevent something rouge from being assigned to the member, 
and allows to use the variable without annoying casts.



  };
  
  struct VhostUserGPU {

Re: [PATCH] async: Suppress GCC13 false positive in aio_bh_poll()

2023-04-22 Thread Stefan Hajnoczi

On Thu, 20 Apr 2023 at 16:31, Cédric Le Goater  wrote:
>
> From: Cédric Le Goater 
>
> GCC13 reports an error :
>
> ../util/async.c: In function ‘aio_bh_poll’:
> include/qemu/queue.h:303:22: error: storing the address of local variable 
> ‘slice’ in ‘*ctx.bh_slice_list.sqh_last’ [-Werror=dangling-pointer=]
>   303 | (head)->sqh_last = &(elm)->field.sqe_next;
>   \
>   | ~^~~~
> ../util/async.c:169:5: note: in expansion of macro ‘QSIMPLEQ_INSERT_TAIL’
>   169 | QSIMPLEQ_INSERT_TAIL(>bh_slice_list, , next);
>   | ^~~~
> ../util/async.c:161:17: note: ‘slice’ declared here
>   161 | BHListSlice slice;
>   | ^
> ../util/async.c:161:17: note: ‘ctx’ declared here
>
> But the local variable 'slice' is removed from the global context list
> in following loop of the same routine. Add a pragma to silent GCC.
>
> Cc: Stefan Hajnoczi 
> Cc: Paolo Bonzini 
> Cc: Daniel P. Berrangé 
> Signed-off-by: Cédric Le Goater 
> ---
>  util/async.c | 14 ++
>  1 file changed, 14 insertions(+)
>
> diff --git a/util/async.c b/util/async.c
> index 21016a1ac7..856e1a8a33 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -164,7 +164,21 @@ int aio_bh_poll(AioContext *ctx)
>
>  /* Synchronizes with QSLIST_INSERT_HEAD_ATOMIC in aio_bh_enqueue().  */
>  QSLIST_MOVE_ATOMIC(_list, >bh_list);
> +
> +/*
> + * GCC13 [-Werror=dangling-pointer=] complains that the local variable
> + * 'slice' is being stored in the global 'ctx->bh_slice_list' but the
> + * list is emptied before this function returns.
> + */
> +#if !defined(__clang__)
> +#pragma GCC diagnostic push
> +#pragma GCC diagnostic ignored "-Wpragmas"
> +#pragma GCC diagnostic ignored "-Wdangling-pointer="
> +#endif
>  QSIMPLEQ_INSERT_TAIL(>bh_slice_list, , next);
> +#if !defined(__clang__)
> +#pragma GCC diagnostic pop
> +#endif

Reviewed-by: Stefan Hajnoczi

[PATCH v4 6/7] target/riscv: Make the short cut really work in pmp_hart_has_privs

2023-04-22 Thread Weiwei Li

Return the result directly for short cut, since we needn't do the
following check on the PMP entries if there is no PMP rules.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 target/riscv/pmp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index 0cef9e3e1d..b0f1b0a715 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -319,6 +319,7 @@ int pmp_hart_has_privs(CPURISCVState *env, target_ulong 
addr,
allowed_privs, mode)) {
 ret = MAX_RISCV_PMPS;
 }
+return ret;
 }
 
 if (size == 0) {
-- 
2.25.1

[PATCH v4 1/7] target/riscv: Update pmp_get_tlb_size()

2023-04-22 Thread Weiwei Li

PMP entries before the matched PMP entry (including the matched PMP entry)
may only cover partial of the TLB page, which may make different regions in
that page allow different RWX privs, such as for PMP0 (0x8008~0x800F,
R) and PMP1 (0x80001000~0x80001FFF, RWX) write access to 0x8000 will
match PMP1. However we cannot cache the translation result in the TLB since
this will make the write access to 0x8008 bypass the check of PMP0. So we
should check all of them instead of the matched PMP entry in pmp_get_tlb_size()
and set the tlb_size to 1 in this case.
Set tlb_size to TARGET_PAGE_SIZE if PMP is not support or there is no PMP rules.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 target/riscv/cpu_helper.c |  7 ++---
 target/riscv/pmp.c| 64 ++-
 target/riscv/pmp.h|  3 +-
 3 files changed, 52 insertions(+), 22 deletions(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 433ea529b0..075fc0538a 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -703,11 +703,8 @@ static int get_physical_address_pmp(CPURISCVState *env, 
int *prot,
 }
 
 *prot = pmp_priv_to_page_prot(pmp_priv);
-if ((tlb_size != NULL) && pmp_index != MAX_RISCV_PMPS) {
-target_ulong tlb_sa = addr & ~(TARGET_PAGE_SIZE - 1);
-target_ulong tlb_ea = tlb_sa + TARGET_PAGE_SIZE - 1;
-
-*tlb_size = pmp_get_tlb_size(env, pmp_index, tlb_sa, tlb_ea);
+if (tlb_size != NULL) {
+*tlb_size = pmp_get_tlb_size(env, addr);
 }
 
 return TRANSLATE_SUCCESS;
diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index 1f5aca42e8..ad20a319c1 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -601,28 +601,62 @@ target_ulong mseccfg_csr_read(CPURISCVState *env)
 }
 
 /*
- * Calculate the TLB size if the start address or the end address of
- * PMP entry is presented in the TLB page.
+ * Calculate the TLB size. If the PMP rules may make different regions in
+ * the TLB page of 'addr' allow different RWX privs, set the size to 1
+ * (to make the translation result uncached in the TLB and only be used for
+ * a single translation). Set the size to TARGET_PAGE_SIZE otherwise.
  */
-target_ulong pmp_get_tlb_size(CPURISCVState *env, int pmp_index,
-  target_ulong tlb_sa, target_ulong tlb_ea)
+target_ulong pmp_get_tlb_size(CPURISCVState *env, target_ulong addr)
 {
-target_ulong pmp_sa = env->pmp_state.addr[pmp_index].sa;
-target_ulong pmp_ea = env->pmp_state.addr[pmp_index].ea;
+target_ulong pmp_sa;
+target_ulong pmp_ea;
+target_ulong tlb_sa = addr & ~(TARGET_PAGE_SIZE - 1);
+target_ulong tlb_ea = tlb_sa + TARGET_PAGE_SIZE - 1;
+int i;
 
-if (pmp_sa <= tlb_sa && pmp_ea >= tlb_ea) {
+/*
+ * If PMP is not supported or there is no PMP rule, which means the allowed
+ * RWX privs of the page will not affected by PMP or PMP will provide the
+ * same option (disallow accesses or allow default RWX privs) for all
+ * addresses, set the size to TARGET_PAGE_SIZE.
+ */
+if (!riscv_cpu_cfg(env)->pmp || !pmp_get_num_rules(env)) {
 return TARGET_PAGE_SIZE;
-} else {
+}
+
+for (i = 0; i < MAX_RISCV_PMPS; i++) {
+if (pmp_get_a_field(env->pmp_state.pmp[i].cfg_reg) == PMP_AMATCH_OFF) {
+continue;
+}
+
+pmp_sa = env->pmp_state.addr[i].sa;
+pmp_ea = env->pmp_state.addr[i].ea;
+
 /*
- * At this point we have a tlb_size that is the smallest possible size
- * That fits within a TARGET_PAGE_SIZE and the PMP region.
- *
- * If the size is less then TARGET_PAGE_SIZE we drop the size to 1.
- * This means the result isn't cached in the TLB and is only used for
- * a single translation.
+ * Only the first PMP entry that covers (whole or partial of) the TLB
+ * page really matters:
+ * If it can cover the whole page, set the size to TARGET_PAGE_SIZE.
+ * The following PMP entries have lower priority and will not affect
+ * the allowed RWX privs of the page.
+ * If it only cover partial of the TLB page, set the size to 1 since
+ * the allowed RWX privs for the covered region may be different from
+ * other region of the page.
  */
-return 1;
+if (pmp_sa <= tlb_sa && pmp_ea >= tlb_ea) {
+return TARGET_PAGE_SIZE;
+} else if ((pmp_sa >= tlb_sa && pmp_sa <= tlb_ea) ||
+   (pmp_ea >= tlb_sa && pmp_ea <= tlb_ea)) {
+return 1;
+}
 }
+
+/*
+ * If no PMP entry covers any region of the TLB page, similar to the above
+ * case that there is no PMP rule, PMP will provide the same option
+ * (disallow accesses or allow default RWX privs) for the whole page,
+ * set the size to TARGET_PAGE_SIZE.
+ */
+return TARGET_PAGE_SIZE;
 }
 
 /*
diff --git

[PATCH v4 7/7] target/riscv: Separate pmp_update_rule() in pmpcfg_csr_write

2023-04-22 Thread Weiwei Li

Use pmp_update_rule_addr() and pmp_update_rule_nums() separately to
update rule nums only once for each pmpcfg_csr_write. Then we can also
move tlb_flush into pmp_update_rule_nums().

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 target/riscv/pmp.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index b0f1b0a715..5b765a9807 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -121,7 +121,7 @@ static bool pmp_write_cfg(CPURISCVState *env, uint32_t 
pmp_index, uint8_t val)
 qemu_log_mask(LOG_GUEST_ERROR, "ignoring pmpcfg write - locked\n");
 } else if (env->pmp_state.pmp[pmp_index].cfg_reg != val) {
 env->pmp_state.pmp[pmp_index].cfg_reg = val;
-pmp_update_rule(env, pmp_index);
+pmp_update_rule_addr(env, pmp_index);
 return true;
 }
 } else {
@@ -207,6 +207,8 @@ void pmp_update_rule_nums(CPURISCVState *env)
 env->pmp_state.num_rules++;
 }
 }
+
+tlb_flush(env_cpu(env));
 }
 
 /*
@@ -492,7 +494,7 @@ void pmpcfg_csr_write(CPURISCVState *env, uint32_t 
reg_index,
 
 /* If PMP permission of any addr has been changed, flush TLB pages. */
 if (modified) {
-tlb_flush(env_cpu(env));
+pmp_update_rule_nums(env);
 }
 }
 
@@ -545,7 +547,6 @@ void pmpaddr_csr_write(CPURISCVState *env, uint32_t 
addr_index,
 if (env->pmp_state.pmp[addr_index].addr_reg != val) {
 env->pmp_state.pmp[addr_index].addr_reg = val;
 pmp_update_rule(env, addr_index);
-tlb_flush(env_cpu(env));
 }
 } else {
 qemu_log_mask(LOG_GUEST_ERROR,
-- 
2.25.1

[PATCH v4 4/7] target/riscv: Flush TLB only when pmpcfg/pmpaddr really changes

2023-04-22 Thread Weiwei Li

TLB needn't be flushed when pmpcfg/pmpaddr don't changes.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: Alistair Francis 
Reviewed-by: LIU Zhiwei 
---
 target/riscv/pmp.c | 24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index 9ae3bfea22..0cef9e3e1d 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -26,7 +26,7 @@
 #include "trace.h"
 #include "exec/exec-all.h"
 
-static void pmp_write_cfg(CPURISCVState *env, uint32_t addr_index,
+static bool pmp_write_cfg(CPURISCVState *env, uint32_t addr_index,
   uint8_t val);
 static uint8_t pmp_read_cfg(CPURISCVState *env, uint32_t addr_index);
 static void pmp_update_rule(CPURISCVState *env, uint32_t pmp_index);
@@ -83,7 +83,7 @@ static inline uint8_t pmp_read_cfg(CPURISCVState *env, 
uint32_t pmp_index)
  * Accessor to set the cfg reg for a specific PMP/HART
  * Bounds checks and relevant lock bit.
  */
-static void pmp_write_cfg(CPURISCVState *env, uint32_t pmp_index, uint8_t val)
+static bool pmp_write_cfg(CPURISCVState *env, uint32_t pmp_index, uint8_t val)
 {
 if (pmp_index < MAX_RISCV_PMPS) {
 bool locked = true;
@@ -119,14 +119,17 @@ static void pmp_write_cfg(CPURISCVState *env, uint32_t 
pmp_index, uint8_t val)
 
 if (locked) {
 qemu_log_mask(LOG_GUEST_ERROR, "ignoring pmpcfg write - locked\n");
-} else {
+} else if (env->pmp_state.pmp[pmp_index].cfg_reg != val) {
 env->pmp_state.pmp[pmp_index].cfg_reg = val;
 pmp_update_rule(env, pmp_index);
+return true;
 }
 } else {
 qemu_log_mask(LOG_GUEST_ERROR,
   "ignoring pmpcfg write - out of bounds\n");
 }
+
+return false;
 }
 
 static void pmp_decode_napot(target_ulong a, target_ulong *sa,
@@ -477,16 +480,19 @@ void pmpcfg_csr_write(CPURISCVState *env, uint32_t 
reg_index,
 int i;
 uint8_t cfg_val;
 int pmpcfg_nums = 2 << riscv_cpu_mxl(env);
+bool modified = false;
 
 trace_pmpcfg_csr_write(env->mhartid, reg_index, val);
 
 for (i = 0; i < pmpcfg_nums; i++) {
 cfg_val = (val >> 8 * i)  & 0xff;
-pmp_write_cfg(env, (reg_index * 4) + i, cfg_val);
+modified |= pmp_write_cfg(env, (reg_index * 4) + i, cfg_val);
 }
 
 /* If PMP permission of any addr has been changed, flush TLB pages. */
-tlb_flush(env_cpu(env));
+if (modified) {
+tlb_flush(env_cpu(env));
+}
 }
 
 
@@ -535,9 +541,11 @@ void pmpaddr_csr_write(CPURISCVState *env, uint32_t 
addr_index,
 }
 
 if (!pmp_is_locked(env, addr_index)) {
-env->pmp_state.pmp[addr_index].addr_reg = val;
-pmp_update_rule(env, addr_index);
-tlb_flush(env_cpu(env));
+if (env->pmp_state.pmp[addr_index].addr_reg != val) {
+env->pmp_state.pmp[addr_index].addr_reg = val;
+pmp_update_rule(env, addr_index);
+tlb_flush(env_cpu(env));
+}
 } else {
 qemu_log_mask(LOG_GUEST_ERROR,
   "ignoring pmpaddr write - locked\n");
-- 
2.25.1

[PATCH v4 5/7] accel/tcg: Uncache the host address for instruction fetch when tlb size < 1

2023-04-22 Thread Weiwei Li

When PMP entry overlap part of the page, we'll set the tlb_size to 1, which
will make the address in tlb entry set with TLB_INVALID_MASK, and the next
access will again go through tlb_fill.However, this way will not work in
tb_gen_code() => get_page_addr_code_hostp(): the TLB host address will be
cached, and the following instructions can use this host address directly
which may lead to the bypass of PMP related check.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1542.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: LIU Zhiwei 
Reviewed-by: Richard Henderson 
---
 accel/tcg/cputlb.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index e984a98dc4..efa0cb67c9 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1696,6 +1696,11 @@ tb_page_addr_t get_page_addr_code_hostp(CPUArchState 
*env, target_ulong addr,
 if (p == NULL) {
 return -1;
 }
+
+if (full->lg_page_size < TARGET_PAGE_BITS) {
+return -1;
+}
+
 if (hostp) {
 *hostp = p;
 }
-- 
2.25.1

[PATCH v4 3/7] target/riscv: Flush TLB when pmpaddr is updated

2023-04-22 Thread Weiwei Li

TLB should be flushed not only for pmpcfg csr changes, but also for
pmpaddr csr changes.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: Alistair Francis 
Reviewed-by: LIU Zhiwei 
---
 target/riscv/pmp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index ad20a319c1..9ae3bfea22 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -537,6 +537,7 @@ void pmpaddr_csr_write(CPURISCVState *env, uint32_t 
addr_index,
 if (!pmp_is_locked(env, addr_index)) {
 env->pmp_state.pmp[addr_index].addr_reg = val;
 pmp_update_rule(env, addr_index);
+tlb_flush(env_cpu(env));
 } else {
 qemu_log_mask(LOG_GUEST_ERROR,
   "ignoring pmpaddr write - locked\n");
-- 
2.25.1

[PATCH v4 0/7] target/riscv: Fix PMP related problem

2023-04-22 Thread Weiwei Li

This patchset tries to fix the PMP bypass problem issue 
https://gitlab.com/qemu-project/qemu/-/issues/1542:

TLB will be cached if the matched PMP entry cover the whole page.  However PMP 
entries with higher priority may cover part of the page (but not match the 
access address), which means different regions in this page may have different 
permission rights. So it also cannot be cached (patch 1).

Writing to pmpaddr didn't trigger tlb flush (patch 3).

We set the tlb_size to 1 to make the TLB_INVALID_MASK set, and and the next 
access will again go through tlb_fill. However, this way will not work in 
tb_gen_code() => get_page_addr_code_hostp(): the TLB host address will be 
cached, and the following instructions can use this host address directly which 
may lead to the bypass of PMP related check (patch 5).

The port is available here:
https://github.com/plctlab/plct-qemu/tree/plct-pmp-fix-v4

v4:

Update comments for Patch 1, and move partial check code from Patch 2 to Patch 1

Restore log message change in Patch 2

Update commit message and the way to improve the problem in Patch 6


v3:

Ignore disabled PMP entry in pmp_get_tlb_size() in Patch 1

Drop Patch 5, since tb jmp cache have been flushed in tlb_flush, so flush tb 
seems unnecessary.

Fix commit message problems in Patch 8 (Patch 7 in new patchset)


v2:

Update commit message for patch 1

Add default tlb_size when pmp is diabled or there is no rules and only get the 
tlb size when translation success in patch 2

Update get_page_addr_code_hostp instead of probe_access_internal to fix the 
cached host address for instruction fetch in patch 6

Add patch 7 to make the short up really work in pmp_hart_has_privs

Add patch 8 to use pmp_update_rule_addr() and pmp_update_rule_nums() separately


Weiwei Li (7):
  target/riscv: Update pmp_get_tlb_size()
  target/riscv: Move pmp_get_tlb_size apart from
get_physical_address_pmp
  target/riscv: Flush TLB when pmpaddr is updated
  target/riscv: Flush TLB only when pmpcfg/pmpaddr really changes
  accel/tcg: Uncache the host address for instruction fetch when tlb
size < 1
  target/riscv: Make the short cut really work in pmp_hart_has_privs
  target/riscv: Separate pmp_update_rule() in pmpcfg_csr_write

 accel/tcg/cputlb.c|  5 +++
 target/riscv/cpu_helper.c | 19 +++-
 target/riscv/pmp.c| 91 +--
 target/riscv/pmp.h|  3 +-
 4 files changed, 80 insertions(+), 38 deletions(-)

-- 
2.25.1

1 2 >

1 - 100 of 112 matches

Mail list logo