[PATCH v4 0/7] RTC: New logic to emulate RTC
Changes in v4: Rebase to latest head. Changing in patch 6: Set the timer to one second earlier before target alarm when AF bit is clear. In version 3, in order to solve the async between UF, AF and UIP, the timer will keep running when UF or AF are clear. This is a little ugly, especially when a userspace program is using the alarm and we cannot achieve any power saving. In this version, when the AF bit is cleared, we will set the timer to one second earlier before the alarm. With this changing, we can avoid the unnecessary timer and keep the sync between UF, AF and UIP. Please help to review the patch 6. Changes in v3: Rebase to latest head. Remove the logic to update time format when DM bit changed. Allow to migrate from old version. Solve the async when reading UF and UIP Changes in v2: Add UIP check logic. Add logic that next second tick will occur in exactly 500ms later after reset divider Current RTC emulation uses periodic timer(2 timers per second) to update RTC clock. And it will stop CPU staying at deep C-state for long period. Our experience shows the Pkg C6 residency reduced 6% when running 64 idle guest. The following patch stop the two periodic timer and only updating RTC clock when guest try to read it. --- Yang Zhang (7): RTC: Remove the logic to update time format when DM bit changed RTC: Update the RTC clock only when reading it RTC: Add UIP(update in progress) check logic RTC: Set internal millisecond register to 500ms when reset divider RTC: Add RTC update-ended interrupt support RTC: Add alarm support RTC: Allow to migrate from old version hw/mc146818rtc.c | 617 - 1 files changed, 465 insertions(+), 152 deletions(-) best regards yang -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 1/7] RTC: Remove the logic to update time format when DM bit changed
Change DM(date mode) and 24/12 control bit don't affect the internal registers. It only indicates what format is using for those registers. So we don't need to update time format when it is modified. Signed-off-by: Yang Zhang yang.z.zh...@intel.com --- hw/mc146818rtc.c | 10 +- 1 files changed, 1 insertions(+), 9 deletions(-) diff --git a/hw/mc146818rtc.c b/hw/mc146818rtc.c index a46fdfc..9b49cbc 100644 --- a/hw/mc146818rtc.c +++ b/hw/mc146818rtc.c @@ -252,15 +252,7 @@ static void cmos_ioport_write(void *opaque, uint32_t addr, uint32_t data) rtc_set_time(s); } } -if (((s-cmos_data[RTC_REG_B] ^ data) (REG_B_DM | REG_B_24H)) -!(data REG_B_SET)) { -/* If the time format has changed and not in set mode, - update the registers immediately. */ -s-cmos_data[RTC_REG_B] = data; -rtc_copy_date(s); -} else { -s-cmos_data[RTC_REG_B] = data; -} +s-cmos_data[RTC_REG_B] = data; rtc_timer_update(s, qemu_get_clock_ns(rtc_clock)); break; case RTC_REG_C: -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 2/7] RTC: Update the RTC clock only when reading it
There has no need to use two periodic timer to update RTC time. In this patch, we only update it when guest reading it. Signed-off-by: Yang Zhang yang.z.zh...@intel.com --- hw/mc146818rtc.c | 207 +- 1 files changed, 66 insertions(+), 141 deletions(-) diff --git a/hw/mc146818rtc.c b/hw/mc146818rtc.c index 9b49cbc..82a5b8a 100644 --- a/hw/mc146818rtc.c +++ b/hw/mc146818rtc.c @@ -44,6 +44,9 @@ # define DPRINTF_C(format, ...) do { } while (0) #endif +#define USEC_PER_SEC100L +#define NS_PER_USEC 1000L + #define RTC_REINJECT_ON_ACK_COUNT 20 #define RTC_SECONDS 0 @@ -85,6 +88,8 @@ typedef struct RTCState { uint8_t cmos_data[128]; uint8_t cmos_index; struct tm current_tm; +int64_t offset_sec; +int32_t offset_usec; int32_t base_year; qemu_irq irq; qemu_irq sqw_irq; @@ -92,21 +97,29 @@ typedef struct RTCState { /* periodic timer */ QEMUTimer *periodic_timer; int64_t next_periodic_time; -/* second update */ -int64_t next_second_time; uint16_t irq_reinject_on_ack_count; uint32_t irq_coalesced; uint32_t period; QEMUTimer *coalesced_timer; -QEMUTimer *second_timer; -QEMUTimer *second_timer2; Notifier clock_reset_notifier; LostTickPolicy lost_tick_policy; Notifier suspend_notifier; } RTCState; static void rtc_set_time(RTCState *s); -static void rtc_copy_date(RTCState *s); +static void rtc_calibrate_time(RTCState *s); +static void rtc_set_cmos(RTCState *s); + +static uint64_t get_guest_rtc_us(RTCState *s) +{ +int64_t host_usec, offset_usec, guest_usec; + +host_usec = qemu_get_clock_ns(host_clock) / NS_PER_USEC; +offset_usec = s-offset_sec * USEC_PER_SEC + s-offset_usec; +guest_usec = host_usec + offset_usec; + +return guest_usec; +} #ifdef TARGET_I386 static void rtc_coalesced_timer_update(RTCState *s) @@ -207,6 +220,20 @@ static void rtc_periodic_timer(void *opaque) } } +static void rtc_set_offset(RTCState *s) +{ +struct tm *tm = s-current_tm; +int64_t host_usec, guest_sec, guest_usec; + +host_usec = qemu_get_clock_ns(host_clock) / NS_PER_USEC; + +guest_sec = mktimegm(tm); +guest_usec = guest_sec * USEC_PER_SEC; + +s-offset_sec = (guest_usec - host_usec) / USEC_PER_SEC; +s-offset_usec = (guest_usec - host_usec) % USEC_PER_SEC; +} + static void cmos_ioport_write(void *opaque, uint32_t addr, uint32_t data) { RTCState *s = opaque; @@ -233,6 +260,7 @@ static void cmos_ioport_write(void *opaque, uint32_t addr, u int32_t data) /* if in set mode, do not update the time */ if (!(s-cmos_data[RTC_REG_B] REG_B_SET)) { rtc_set_time(s); +rtc_set_offset(s); } break; case RTC_REG_A: @@ -243,6 +271,11 @@ static void cmos_ioport_write(void *opaque, uint32_t addr, uint32_t data) break; case RTC_REG_B: if (data REG_B_SET) { +/* update cmos to when the rtc was stopping */ +if (!(s-cmos_data[RTC_REG_B] REG_B_SET)) { +rtc_calibrate_time(s); +rtc_set_cmos(s); +} /* set mode: reset UIP mode */ s-cmos_data[RTC_REG_A] = ~REG_A_UIP; data = ~REG_B_UIE; @@ -250,6 +283,7 @@ static void cmos_ioport_write(void *opaque, uint32_t addr, u int32_t data) /* if disabling set mode, update the time */ if (s-cmos_data[RTC_REG_B] REG_B_SET) { rtc_set_time(s); +rtc_set_offset(s); } } s-cmos_data[RTC_REG_B] = data; @@ -305,7 +339,7 @@ static void rtc_set_time(RTCState *s) rtc_change_mon_event(tm); } -static void rtc_copy_date(RTCState *s) +static void rtc_set_cmos(RTCState *s) { const struct tm *tm = s-current_tm; int year; @@ -331,122 +365,16 @@ static void rtc_copy_date(RTCState *s) s-cmos_data[RTC_YEAR] = rtc_to_bcd(s, year); } -/* month is between 0 and 11. */ -static int get_days_in_month(int month, int year) -{ -static const int days_tab[12] = { -31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 -}; -int d; -if ((unsigned )month = 12) -return 31; -d = days_tab[month]; -if (month == 1) { -if ((year % 4) == 0 ((year % 100) != 0 || (year % 400) == 0)) -d++; -} -return d; -} - -/* update 'tm' to the next second */ -static void rtc_next_second(struct tm *tm) +static void rtc_calibrate_time(RTCState *s) { -int days_in_month; - -tm-tm_sec++; -if
[PATCH v4 3/7] RTC: Add UIP(update in progress) check logic
The UIP(update in progress) is set when RTC is updating. And the update cycle begins 244us later after UIP is set. And it is cleared when update end. Signed-off-by: Yang Zhang yang.z.zh...@intel.com --- hw/mc146818rtc.c | 18 ++ 1 files changed, 18 insertions(+), 0 deletions(-) diff --git a/hw/mc146818rtc.c b/hw/mc146818rtc.c index 82a5b8a..6ebb8f6 100644 --- a/hw/mc146818rtc.c +++ b/hw/mc146818rtc.c @@ -377,6 +377,21 @@ static void rtc_calibrate_time(RTCState *s) s-current_tm = *ret; } +static int update_in_progress(RTCState *s) +{ +int64_t guest_usec; + +if (s-cmos_data[RTC_REG_B] REG_B_SET) { +return 0; +} +guest_usec = get_guest_rtc_us(s); +/* UIP bit will be set at last 244us of every second. */ +if ((guest_usec % USEC_PER_SEC) = (USEC_PER_SEC - 244)) { +return 1; +} +return 0; +} + static uint32_t cmos_ioport_read(void *opaque, uint32_t addr) { RTCState *s = opaque; @@ -402,6 +417,9 @@ static uint32_t cmos_ioport_read(void *opaque, uint32_t addr) break; case RTC_REG_A: ret = s-cmos_data[s-cmos_index]; +if (update_in_progress(s)) { +ret |= REG_A_UIP; +} break; case RTC_REG_C: ret = s-cmos_data[s-cmos_index]; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 4/7] RTC: Set internal millisecond register to 500ms when reset divider
The first update cycle begins one - half seconds later when divider reset is removing. Signed-off-by: Yang Zhang yang.z.zh...@intel.com --- hw/mc146818rtc.c | 38 +- 1 files changed, 33 insertions(+), 5 deletions(-) diff --git a/hw/mc146818rtc.c b/hw/mc146818rtc.c index 6ebb8f6..5e7fbb5 100644 --- a/hw/mc146818rtc.c +++ b/hw/mc146818rtc.c @@ -110,6 +110,8 @@ static void rtc_set_time(RTCState *s); static void rtc_calibrate_time(RTCState *s); static void rtc_set_cmos(RTCState *s); +static int32_t divider_reset; + static uint64_t get_guest_rtc_us(RTCState *s) { int64_t host_usec, offset_usec, guest_usec; @@ -220,16 +222,24 @@ static void rtc_periodic_timer(void *opaque) } } -static void rtc_set_offset(RTCState *s) +static void rtc_set_offset(RTCState *s, int32_t start_usec) { struct tm *tm = s-current_tm; -int64_t host_usec, guest_sec, guest_usec; +int64_t host_usec, guest_sec, guest_usec, offset_usec, old_guest_usec; host_usec = qemu_get_clock_ns(host_clock) / NS_PER_USEC; +offset_usec = s-offset_sec * USEC_PER_SEC + s-offset_usec; +old_guest_usec = (host_usec + offset_usec) % USEC_PER_SEC; guest_sec = mktimegm(tm); -guest_usec = guest_sec * USEC_PER_SEC; +/* start_usec equal 0 means rtc internal millisecond is + * same with before */ +if (start_usec == 0) { +guest_usec = guest_sec * USEC_PER_SEC + old_guest_usec; +} else { +guest_usec = guest_sec * USEC_PER_SEC + start_usec; +} s-offset_sec = (guest_usec - host_usec) / USEC_PER_SEC; s-offset_usec = (guest_usec - host_usec) % USEC_PER_SEC; } @@ -260,10 +270,22 @@ static void cmos_ioport_write(void *opaque, uint32_t addr, uint32_t data) /* if in set mode, do not update the time */ if (!(s-cmos_data[RTC_REG_B] REG_B_SET)) { rtc_set_time(s); -rtc_set_offset(s); +rtc_set_offset(s, 0); } break; case RTC_REG_A: +/* when the divider reset is removed, the first update cycle + * begins one-half second later*/ +if (((s-cmos_data[RTC_REG_A] 0x60) == 0x60) +((data 0x70) 4) = 2) { +divider_reset = 1; +if (!(s-cmos_data[RTC_REG_B] REG_B_SET)) { +rtc_calibrate_time(s); +rtc_set_offset(s, 50); +s-cmos_data[RTC_REG_A] = ~REG_A_UIP; +divider_reset = 0; +} +} /* UIP bit is read only */ s-cmos_data[RTC_REG_A] = (data ~REG_A_UIP) | (s-cmos_data[RTC_REG_A] REG_A_UIP); @@ -283,7 +305,13 @@ static void cmos_ioport_write(void *opaque, uint32_t addr, uint32_t data) /* if disabling set mode, update the time */ if (s-cmos_data[RTC_REG_B] REG_B_SET) { rtc_set_time(s); -rtc_set_offset(s); +if (divider_reset == 1) { +rtc_set_offset(s, 50); +s-cmos_data[RTC_REG_A] = ~REG_A_UIP; +divider_reset = 0; +} else { +rtc_set_offset(s, 0); +} } } s-cmos_data[RTC_REG_B] = data; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 5/7] RTC:Add RTC update-ended interrupt support
Use a timer to emulate update cycle. When update cycle ended and UIE is setting, then raise an interrupt. The timer runs only when UF or AF is cleared. Signed-off-by: Yang Zhang yang.z.zh...@intel.com --- hw/mc146818rtc.c | 86 ++ 1 files changed, 80 insertions(+), 6 deletions(-) diff --git a/hw/mc146818rtc.c b/hw/mc146818rtc.c index 5e7fbb5..fae049e 100644 --- a/hw/mc146818rtc.c +++ b/hw/mc146818rtc.c @@ -97,6 +97,11 @@ typedef struct RTCState { /* periodic timer */ QEMUTimer *periodic_timer; int64_t next_periodic_time; +/* update-ended timer */ +QEMUTimer *update_timer; +QEMUTimer *update_timer2; +uint64_t next_update_time; +uint32_t use_timer; uint16_t irq_reinject_on_ack_count; uint32_t irq_coalesced; uint32_t period; @@ -157,7 +162,8 @@ static void rtc_coalesced_timer(void *opaque) } #endif -static void rtc_timer_update(RTCState *s, int64_t current_time) +/* handle periodic timer */ +static void periodic_timer_update(RTCState *s, int64_t current_time) { int period_code, period; int64_t cur_clock, next_irq_clock; @@ -195,7 +201,7 @@ static void rtc_periodic_timer(void *opaque) { RTCState *s = opaque; -rtc_timer_update(s, s-next_periodic_time); +periodic_timer_update(s, s-next_periodic_time); s-cmos_data[RTC_REG_C] |= REG_C_PF; if (s-cmos_data[RTC_REG_B] REG_B_PIE) { s-cmos_data[RTC_REG_C] |= REG_C_IRQF; @@ -222,6 +228,58 @@ static void rtc_periodic_timer(void *opaque) } } +/* handle update-ended timer */ +static void check_update_timer(RTCState *s) +{ +uint64_t next_update_time, expire_time; +uint64_t guest_usec; +qemu_del_timer(s-update_timer); +qemu_del_timer(s-update_timer2); + +if (!((s-cmos_data[RTC_REG_C] (REG_C_UF | REG_C_AF)) == +(REG_C_UF | REG_C_AF)) !(s-cmos_data[RTC_REG_B] REG_B_SET)) { +s-use_timer = 1; +guest_usec = get_guest_rtc_us(s) % USEC_PER_SEC; +if (guest_usec = (USEC_PER_SEC - 244)) { +/* RTC is in update cycle when enabling UIE */ +s-cmos_data[RTC_REG_A] |= REG_A_UIP; +next_update_time = (USEC_PER_SEC - guest_usec) * NS_PER_USEC; +expire_time = qemu_get_clock_ns(rtc_clock) + next_update_time; +qemu_mod_timer(s-update_timer2, expire_time); +} else { +next_update_time = (USEC_PER_SEC - guest_usec - 244) * NS_PER_USEC; +expire_time = qemu_get_clock_ns(rtc_clock) + next_update_time; +s-next_update_time = expire_time; +qemu_mod_timer(s-update_timer, expire_time); +} +} else { +s-use_timer = 0; +} +} + +static void rtc_update_timer(void *opaque) +{ +RTCState *s = opaque; + +if (!(s-cmos_data[RTC_REG_B] REG_B_SET)) { +s-cmos_data[RTC_REG_A] |= REG_A_UIP; +qemu_mod_timer(s-update_timer2, s-next_update_time + 244000UL); +} +} + +static void rtc_update_timer2(void *opaque) +{ +RTCState *s = opaque; + +if (!(s-cmos_data[RTC_REG_B] REG_B_SET)) { +s-cmos_data[RTC_REG_C] |= REG_C_UF; +s-cmos_data[RTC_REG_A] = ~REG_A_UIP; +s-cmos_data[RTC_REG_C] |= REG_C_IRQF; +qemu_irq_raise(s-irq); +} +check_update_timer(s); +} + static void rtc_set_offset(RTCState *s, int32_t start_usec) { struct tm *tm = s-current_tm; @@ -283,13 +341,14 @@ static void cmos_ioport_write(void *opaque, uint32_t addr, uint32_t data) rtc_calibrate_time(s); rtc_set_offset(s, 50); s-cmos_data[RTC_REG_A] = ~REG_A_UIP; +check_update_timer(s); divider_reset = 0; } } /* UIP bit is read only */ s-cmos_data[RTC_REG_A] = (data ~REG_A_UIP) | (s-cmos_data[RTC_REG_A] REG_A_UIP); -rtc_timer_update(s, qemu_get_clock_ns(rtc_clock)); +periodic_timer_update(s, qemu_get_clock_ns(rtc_clock)); break; case RTC_REG_B: if (data REG_B_SET) { @@ -315,7 +374,8 @@ static void cmos_ioport_write(void *opaque, uint32_t addr, uint32_t data) } } s-cmos_data[RTC_REG_B] = data; -rtc_timer_update(s, qemu_get_clock_ns(rtc_clock)); +periodic_timer_update(s, qemu_get_clock_ns(rtc_clock)); +check_update_timer(s); break; case RTC_REG_C: case RTC_REG_D: @@ -445,7 +505,7 @@ static uint32_t cmos_ioport_read(void *opaque, uint32_t addr) break; case RTC_REG_A: ret = s-cmos_data[s-cmos_index]; -if (update_in_progress(s)) { +if ((s-use_timer == 0) update_in_progress(s)) { ret |= REG_A_UIP; } break; @@ -453,6 +513,12 @@ static uint32_t cmos_ioport_read(void *opaque, uint32_t addr)
[PATCH v4 6/7] RTC:Add alarm support
Changing in this patch: Set the timer to one second earlier before target alarm when AF bit is clear. In version 3, in order to solve the async between UF, AF and UIP, the timer will keep running when UF or AF are clear. This is a little ugly, especially when a userspace program is using the alarm and we cannot achieve any power saving. In this version, when the AF bit is cleared, we will set the timer to one second earlier before the alarm. With this changing, we can avoid the unnecessary timer and keep the sync between UF, AF and UIP. Set the timer to one second earlier before target alarm when AF bit is clear. Signed-off-by: Yang Zhang yang.z.zh...@intel.com --- hw/mc146818rtc.c | 274 ++ 1 files changed, 255 insertions(+), 19 deletions(-) diff --git a/hw/mc146818rtc.c b/hw/mc146818rtc.c index fae049e..c03606f 100644 --- a/hw/mc146818rtc.c +++ b/hw/mc146818rtc.c @@ -46,6 +46,11 @@ #define USEC_PER_SEC100L #define NS_PER_USEC 1000L +#define NS_PER_SEC 10ULL +#define SEC_PER_MIN 60 +#define SEC_PER_HOUR3600 +#define MIN_PER_HOUR60 +#define HOUR_PER_DAY24 #define RTC_REINJECT_ON_ACK_COUNT 20 @@ -114,6 +119,8 @@ typedef struct RTCState { static void rtc_set_time(RTCState *s); static void rtc_calibrate_time(RTCState *s); static void rtc_set_cmos(RTCState *s); +static inline int rtc_from_bcd(RTCState *s, int a); +static uint64_t get_next_alarm(RTCState *s); static int32_t divider_reset; @@ -232,29 +239,47 @@ static void rtc_periodic_timer(void *opaque) static void check_update_timer(RTCState *s) { uint64_t next_update_time, expire_time; -uint64_t guest_usec; +uint64_t guest_usec, next_alarm_sec; + qemu_del_timer(s-update_timer); qemu_del_timer(s-update_timer2); -if (!((s-cmos_data[RTC_REG_C] (REG_C_UF | REG_C_AF)) == -(REG_C_UF | REG_C_AF)) !(s-cmos_data[RTC_REG_B] REG_B_SET)) { -s-use_timer = 1; +if (!(s-cmos_data[RTC_REG_B] REG_B_SET)) { guest_usec = get_guest_rtc_us(s) % USEC_PER_SEC; -if (guest_usec = (USEC_PER_SEC - 244)) { -/* RTC is in update cycle when enabling UIE */ -s-cmos_data[RTC_REG_A] |= REG_A_UIP; -next_update_time = (USEC_PER_SEC - guest_usec) * NS_PER_USEC; -expire_time = qemu_get_clock_ns(rtc_clock) + next_update_time; -qemu_mod_timer(s-update_timer2, expire_time); -} else { -next_update_time = (USEC_PER_SEC - guest_usec - 244) * NS_PER_USEC; -expire_time = qemu_get_clock_ns(rtc_clock) + next_update_time; -s-next_update_time = expire_time; -qemu_mod_timer(s-update_timer, expire_time); +/* if UF is clear, reprogram to next second */ +if (!(s-cmos_data[RTC_REG_C] REG_C_UF)) { +program_next_second: +s-use_timer = 1; +if (guest_usec = (USEC_PER_SEC - 244)) { +/* RTC is in update cycle when enabling UIE */ +s-cmos_data[RTC_REG_A] |= REG_A_UIP; +next_update_time = (USEC_PER_SEC - guest_usec) * NS_PER_USEC; +expire_time = qemu_get_clock_ns(rtc_clock) + next_update_time; +qemu_mod_timer(s-update_timer2, expire_time); +} else { +next_update_time = (USEC_PER_SEC - guest_usec - 244) +* NS_PER_USEC; +expire_time = qemu_get_clock_ns(rtc_clock) + next_update_time; +s-next_update_time = expire_time; +qemu_mod_timer(s-update_timer, expire_time); +} +return ; +} else if (!(s-cmos_data[RTC_REG_C] REG_C_AF)) { +/* UF is set, but AF is clear. Program to one second + * earlier before target alarm*/ +next_alarm_sec = get_next_alarm(s); +if (next_alarm_sec == 1) { +goto program_next_second; +} else { +next_update_time = (USEC_PER_SEC - guest_usec) * NS_PER_USEC; +next_update_time += (next_alarm_sec - 1) * NS_PER_SEC; +expire_time = qemu_get_clock_ns(rtc_clock) + next_update_time; +s-next_update_time = expire_time; +qemu_mod_timer(s-update_timer2, expire_time); +} } -} else { -s-use_timer = 0; } +s-use_timer = 0; } static void rtc_update_timer(void *opaque) @@ -267,15 +292,215 @@ static void rtc_update_timer(void *opaque) } } +static inline uint8_t convert_hour(RTCState *s, uint8_t hour) +{ +if (!(s-cmos_data[RTC_REG_B] REG_B_24H)) { +hour %= 12; +if (s-cmos_data[RTC_HOURS] 0x80) { +hour += 12; +} +} +return hour; +} + +static uint64_t get_next_alarm(RTCState *s) +{ +int32_t alarm_sec, alarm_min, alarm_hour, cur_hour, cur_min, cur_sec; +int32_t hour, min; +uint64_t
[PATCH v4 7/7] RTC:Allow to migrate from old version
The new logic is compatible with old. So it should not block migrate from old version. But new version cannot migrate to old. Signed-off-by: Yang Zhang yang.z.zh...@intel.com --- hw/mc146818rtc.c | 48 1 files changed, 44 insertions(+), 4 deletions(-) diff --git a/hw/mc146818rtc.c b/hw/mc146818rtc.c index c03606f..61ac3c3 100644 --- a/hw/mc146818rtc.c +++ b/hw/mc146818rtc.c @@ -827,11 +827,51 @@ static int rtc_post_load(void *opaque, int version_id) return 0; } +static int rtc_load_old(QEMUFile *f, void *opaque, int version_id) +{ +RTCState *s = opaque; + +if (version_id 2) { +return -EINVAL; +} + +qemu_get_buffer(f, s-cmos_data, sizeof(s-cmos_data)); +/* dummy load for compatibility */ +qemu_get_byte(f); /* cmos_index */ +qemu_get_be32(f); /* tm_sec */ +qemu_get_be32(f); /* tm_min */ +qemu_get_be32(f); /* tm_hour */ +qemu_get_be32(f); /* tm_wday */ +qemu_get_be32(f); /* tm_mday */ +qemu_get_be32(f); /* tm_mon */ +qemu_get_be32(f); /* tm_year */ +qemu_get_be64(f); /* periodic_timer */ +qemu_get_be64(f); /* next_periodic_time */ +qemu_get_be64(f); /* next_second_time */ +qemu_get_be64(f); /* second_timer */ +qemu_get_be64(f); /* second_timer2 */ +qemu_get_be32(f); /* irq_coalesced */ +qemu_get_be32(f); /* period */ + + +rtc_set_date_from_host(s-dev); +periodic_timer_update(s, qemu_get_clock_ns(rtc_clock)); +check_update_timer(s); + +#ifdef TARGET_I386 +if (s-lost_tick_policy == LOST_TICK_SLEW) { +rtc_coalesced_timer_update(s); +} +#endif +return 0; +} + static const VMStateDescription vmstate_rtc = { .name = mc146818rtc, -.version_id = 2, -.minimum_version_id = 1, -.minimum_version_id_old = 1, +.version_id = 3, +.minimum_version_id = 3, +.minimum_version_id_old = 2, +.load_state_old = rtc_load_old, .post_load = rtc_post_load, .fields = (VMStateField []) { VMSTATE_BUFFER(cmos_data, RTCState), @@ -969,7 +1009,7 @@ static int rtc_initfn(ISADevice *dev) memory_region_init_io(s-io, cmos_ops, s, rtc, 2); isa_register_ioport(dev, s-io, base); -qdev_set_legacy_instance_id(dev-qdev, base, 2); +qdev_set_legacy_instance_id(dev-qdev, base, 3); qemu_register_reset(rtc_reset, s); object_property_add(OBJECT(s), date, struct tm, -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Questing regarding KVM Guest PMU
On Sun, Mar 18, 2012 at 10:21 PM, Gleb Natapov g...@redhat.com wrote: On Sun, Mar 18, 2012 at 09:47:55PM +0530, shashank rachamalla wrote: I guess things are working fine with perf. But why not with oprofile ? Looks like it. I never tried oprofile. Will try to reproduce your problem and see what oprofile is doing. I am using ubuntu 10.04 with 2.6.32-21-generic kernel as guest and oprofile 0.9.6. Also, I have tried to capture kvm-events ( perf patch ) in host while running oprofile and perf in guest. Please see the attachment. I have run the tests in three cases for the around 5 secs. There are more number of MSR reads and writes in case of perf which I think is normal. However, there are very few MSR reads and writes with oprofile. Also, the number of NMI exceptions are too high in case of oprofile. Which host kernel are you using? Try latest kvm.git and check if you see something unusual in dmesg. Currenly running 3.3.0-rc5. will try with the latest source from kvm git and let you know. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next PATCH v0 0/5] Series short description
This series is a follow up to this thread: http://www.spinics.net/lists/netdev/msg191360.html This adds two NTF_XXX bits to signal if the PF_BRIDGE netlink command should be parsed by the embedded bridge or the SW bridge. The insight here is the SW bridge is always the master device (NTF_MASTER) and the embedded bridge is the lower device (NTF_LOWERDEV). Without either flag set the command is parsed by the SW bridge to support existing tooling. To make this work correctly I added three new ndo ops ndo_fdb_add ndo_fdb_del ndo_fdb_dump to add, delete, and dump FDB entries. These operations can be used by drivers to program embedded nics or by software bridges. We have at least three SW bridge now net/bridge, openvswitch, and macvlan. And three variants of embedded bridges SR-IOV devices, multi-function devices and Distributed Switch Architecture (DSA). I think at least in this case adding netdevice ops is the cleanest way to implement this. I thought about notifier hooks and other methods but this seems to be the simplest. I've tested these three scenarios, embedded bridge only, sw bridge only, and embedded bridge and SW bridge. These are working on the Intel 82599 devices with this patch series. I am also working on a patch for the macvlan drivers. I'll submit that as an RFC shortly so far I only have the passthru mode wired up. Thanks to Stephen, Ben, and Jamal for bearing with me and the feedback on the last round of patches. As always any comments/feedback appreciated! --- John Fastabend (5): ixgbe: allow RAR table to be updated in promisc mode ixgbe: enable FDB netdevice ops net: add fdb generic dump routine net: addr_list: add exclusive dev_uc_add net: add generic PF_BRIDGE:RTM_XXX FDB hooks drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 80 +- include/linux/neighbour.h |3 include/linux/netdevice.h | 27 +++ include/linux/rtnetlink.h |4 + net/bridge/br_device.c|3 net/bridge/br_fdb.c | 128 net/bridge/br_netlink.c | 12 -- net/bridge/br_private.h | 15 ++ net/core/dev_addr_lists.c | 19 ++ net/core/rtnetlink.c | 194 + 10 files changed, 363 insertions(+), 122 deletions(-) -- Signature -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next PATCH v0 2/5] net: addr_list: add exclusive dev_uc_add
This adds a dev_uc_add_excl() call similar to the original dev_uc_add() except it sets the global bit. With this change the reference count will not be bumped and -EEXIST will be returned if a duplicate address exists. This is useful for drivers that support SR-IOV and want to manage the unicast lists. Signed-off-by: John Fastabend john.r.fastab...@intel.com --- include/linux/netdevice.h |1 + net/core/dev_addr_lists.c | 19 +++ 2 files changed, 20 insertions(+), 0 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 4208901..5e43cec 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2571,6 +2571,7 @@ extern int dev_addr_init(struct net_device *dev); /* Functions used for unicast addresses handling */ extern int dev_uc_add(struct net_device *dev, unsigned char *addr); +extern int dev_uc_add_excl(struct net_device *dev, unsigned char *addr); extern int dev_uc_del(struct net_device *dev, unsigned char *addr); extern int dev_uc_sync(struct net_device *to, struct net_device *from); extern void dev_uc_unsync(struct net_device *to, struct net_device *from); diff --git a/net/core/dev_addr_lists.c b/net/core/dev_addr_lists.c index 29c07fe..c7d27ad 100644 --- a/net/core/dev_addr_lists.c +++ b/net/core/dev_addr_lists.c @@ -377,6 +377,25 @@ EXPORT_SYMBOL(dev_addr_del_multiple); */ /** + * dev_uc_add_excl - Add a global secondary unicast address + * @dev: device + * @addr: address to add + */ +int dev_uc_add_excl(struct net_device *dev, unsigned char *addr) +{ + int err; + + netif_addr_lock_bh(dev); + err = __hw_addr_add_ex(dev-uc, addr, dev-addr_len, + NETDEV_HW_ADDR_T_UNICAST, true); + if (!err) + __dev_set_rx_mode(dev); + netif_addr_unlock_bh(dev); + return err; +} +EXPORT_SYMBOL(dev_uc_add_excl); + +/** * dev_uc_add - Add a secondary unicast address * @dev: device * @addr: address to add -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next PATCH v0 3/5] net: add fdb generic dump routine
This adds a generic dump routine drivers can call. It should be sufficient to handle any bridging model that uses the unicast address list. This should be most SR-IOV enabled NICs. Signed-off-by: John Fastabend john.r.fastab...@intel.com --- net/core/rtnetlink.c | 56 ++ 1 files changed, 56 insertions(+), 0 deletions(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 8c3278a..35ee2d6 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -2082,6 +2082,62 @@ static int rtnl_fdb_del(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg) return err; } +/** + * ndo_dflt_fdb_dump: default netdevice operation to dump an FDB table. + * @nlh: netlink message header + * @dev: netdevice + * + * Default netdevice operation to dump the existing unicast address list. + * Returns zero on success. + */ +int ndo_dflt_fdb_dump(struct sk_buff *skb, + struct netlink_callback *cb, + struct net_device *dev, + int idx) +{ + struct netdev_hw_addr *ha; + struct nlmsghdr *nlh; + struct ndmsg *ndm; + u32 pid, seq; + + pid = NETLINK_CB(cb-skb).pid; + seq = cb-nlh-nlmsg_seq; + + netif_addr_lock_bh(dev); + list_for_each_entry(ha, dev-uc.list, list) { + if (idx cb-args[0]) + goto skip; + + nlh = nlmsg_put(skb, pid, seq, + RTM_NEWNEIGH, sizeof(*ndm), NLM_F_MULTI); + if (!nlh) + break; + + ndm = nlmsg_data(nlh); + ndm-ndm_family = AF_BRIDGE; + ndm-ndm_pad1= 0; + ndm-ndm_pad2= 0; + ndm-ndm_flags = NTF_LOWERDEV; + ndm-ndm_type= 0; + ndm-ndm_ifindex = dev-ifindex; + ndm-ndm_state = NUD_PERMANENT; + + NLA_PUT(skb, NDA_LLADDR, ETH_ALEN, ha-addr); + + nlmsg_end(skb, nlh); +skip: + ++idx; + } + netif_addr_unlock_bh(dev); + + return idx; +nla_put_failure: + netif_addr_unlock_bh(dev); + nlmsg_cancel(skb, nlh); + return idx; +} +EXPORT_SYMBOL(ndo_dflt_fdb_dump); + static int rtnl_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb) { int idx = 0; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next PATCH v0 4/5] ixgbe: enable FDB netdevice ops
Enable FDB ops on ixgbe when in SR-IOV mode. Signed-off-by: John Fastabend john.r.fastab...@intel.com --- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 59 + 1 files changed, 59 insertions(+), 0 deletions(-) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index 1d8f9f8..32adb4f 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -7586,6 +7586,62 @@ static int ixgbe_set_features(struct net_device *netdev, } +static int ixgbe_ndo_fdb_add(struct ndmsg *ndm, +struct net_device *dev, +unsigned char *addr, +u16 flags) +{ + struct ixgbe_adapter *adapter = netdev_priv(dev); + int err = -EOPNOTSUPP; + + if (ndm-ndm_state NUD_PERMANENT) { + pr_info(%s: FDB only supports static addresses\n, + ixgbe_driver_name); + return -EINVAL; + } + + if (adapter-flags IXGBE_FLAG_SRIOV_ENABLED) + err = dev_uc_add_excl(dev, addr); + + /* Only return duplicate errors if NLM_F_EXCL is set */ + if (err == -EEXIST !(flags NLM_F_EXCL)) + err = 0; + + return err; +} + +static int ixgbe_ndo_fdb_del(struct ndmsg *ndm, +struct net_device *dev, +unsigned char *addr) +{ + struct ixgbe_adapter *adapter = netdev_priv(dev); + int err = -EOPNOTSUPP; + + if (ndm-ndm_state NUD_PERMANENT) { + pr_info(%s: FDB only supports static addresses\n, + ixgbe_driver_name); + return -EINVAL; + } + + if (adapter-flags IXGBE_FLAG_SRIOV_ENABLED) + err = dev_uc_del(dev, addr); + + return err; +} + +static int ixgbe_ndo_fdb_dump(struct sk_buff *skb, + struct netlink_callback *cb, + struct net_device *dev, + int idx) +{ + struct ixgbe_adapter *adapter = netdev_priv(dev); + + if (adapter-flags IXGBE_FLAG_SRIOV_ENABLED) + idx = ndo_dflt_fdb_dump(skb, cb, dev, idx); + + return idx; +} + static const struct net_device_ops ixgbe_netdev_ops = { .ndo_open = ixgbe_open, .ndo_stop = ixgbe_close, @@ -7620,6 +7676,9 @@ static const struct net_device_ops ixgbe_netdev_ops = { #endif /* IXGBE_FCOE */ .ndo_set_features = ixgbe_set_features, .ndo_fix_features = ixgbe_fix_features, + .ndo_fdb_add= ixgbe_ndo_fdb_add, + .ndo_fdb_del= ixgbe_ndo_fdb_del, + .ndo_fdb_dump = ixgbe_ndo_fdb_dump, }; static void __devinit ixgbe_probe_vf(struct ixgbe_adapter *adapter, -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next PATCH v0 5/5] ixgbe: allow RAR table to be updated in promisc mode
This allows RAR table updates while in promiscuous. With SR-IOV enabled it is valuable to allow the RAR table to be updated even when in promisc mode to configure forwarding Signed-off-by: John Fastabend john.r.fastab...@intel.com --- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 21 +++-- 1 files changed, 11 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index 32adb4f..d1925b5 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -3400,16 +3400,17 @@ void ixgbe_set_rx_mode(struct net_device *netdev) } ixgbe_vlan_filter_enable(adapter); hw-addr_ctrl.user_set_promisc = false; - /* -* Write addresses to available RAR registers, if there is not -* sufficient space to store all the addresses then enable -* unicast promiscuous mode -*/ - count = ixgbe_write_uc_addr_list(netdev); - if (count 0) { - fctrl |= IXGBE_FCTRL_UPE; - vmolr |= IXGBE_VMOLR_ROPE; - } + } + + /* +* Write addresses to available RAR registers, if there is not +* sufficient space to store all the addresses then enable +* unicast promiscuous mode +*/ + count = ixgbe_write_uc_addr_list(netdev); + if (count 0) { + fctrl |= IXGBE_FCTRL_UPE; + vmolr |= IXGBE_VMOLR_ROPE; } if (adapter-num_vfs) { -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Questing regarding KVM Guest PMU
On Mon, Mar 19, 2012 at 12:20:30PM +0530, shashank rachamalla wrote: On Sun, Mar 18, 2012 at 10:21 PM, Gleb Natapov g...@redhat.com wrote: On Sun, Mar 18, 2012 at 09:47:55PM +0530, shashank rachamalla wrote: I guess things are working fine with perf. But why not with oprofile ? Looks like it. I never tried oprofile. Will try to reproduce your problem and see what oprofile is doing. I am using ubuntu 10.04 with 2.6.32-21-generic kernel as guest and oprofile 0.9.6. Also, I have tried to capture kvm-events ( perf patch ) in host while running oprofile and perf in guest. Please see the attachment. I have run the tests in three cases for the around 5 secs. There are more number of MSR reads and writes in case of perf which I think is normal. However, there are very few MSR reads and writes with oprofile. Also, the number of NMI exceptions are too high in case of oprofile. Which host kernel are you using? Try latest kvm.git and check if you see something unusual in dmesg. Currenly running 3.3.0-rc5. will try with the latest source from kvm git and let you know. Thanks, there were some fixes that didn't make it into 3.3. rdpmc instruction emulation fix is one of them. If oprofile uses it this can explain the problem. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next PATCH v0 1/5] net: add generic PF_BRIDGE:RTM_ FDB hooks
Forgot to change the title resending with a title that won't be dropped by netdev and kvm mailing lists. And updated my local repo so it won't happen again. --- This adds two new flags NTF_MASTER and NTF_LOWERDEV that can now be used to specify where PF_BRIDGE netlink commands should be sent. NTF_MASTER sends the commands to the 'dev-master' device for parsing. Typically this will be the linux net/bridge, macvlan, or open-vswitch devices. Also without any flags set the command will be handled by the master device as well so that current user space tools continue to work as expected. The NTF_LOWERDEV flag will push the PF_BRIDGE commands to the device. In the basic example below the commands are then parsed and programmed in the embedded bridge. Note if both NTF_LOWERDEV and NTF_MASTER bits are set then the command will be sent both to 'dev-master' and 'dev' this allows user space to easily keep the embedded bridge and software bridge in sync. To support this new net device ops were added to call into the device and the existing bridging code was refactored to use these. There should be no change from user space. A basic setup with a SR-IOV enabled NIC looks like this, veth0 veth2 | | | bridge0 | software bridging / / ethx.y ethx VF PF \ \ propagate FDB entries to HW \ \ | Embedded Bridge | hardware offloaded switching In this case the embedded bridge must be managed to allow 'veth0' to communicate with 'ethx.y' correctly. At present drivers managing the embedded bridge either send frames onto the network which then get dropped by the switch OR the embedded bridge will flood these frames. With this patch we have a mechanism to manage the embedded bridge correctly from user space. This example is specific to SR-IOV but replacing the VF with another PF or dropping this into the DSA framework generates similar management issues. Examples session using the 'br'[1] tool to add, dump and then delete a mac address with a new embedded option and enabled ixgbe driver: # br fdb add 22:35:19:ac:60:59 dev eth3 # br fdb portmac addrflags veth0 22:35:19:ac:60:58 static veth0 9a:5f:81:f7:f6:ec local eth300:1b:21:55:23:59 local eth322:35:19:ac:60:59 static veth0 22:35:19:ac:60:57 static #br fdb add 22:35:19:ac:60:59 embedded dev eth3 #br fdb portmac addrflags veth0 22:35:19:ac:60:58 static veth0 9a:5f:81:f7:f6:ec local eth300:1b:21:55:23:59 local eth322:35:19:ac:60:59 static veth0 22:35:19:ac:60:57 static eth322:35:19:ac:60:59 local embedded #br fdb del 22:35:19:ac:60:59 embedded dev eth3 I added a couple lines to 'br' to set the flags correctly is all. It is my opinion that the merit of this patch is now embedded and SW bridges can both be modeled correctly in user space using very nearly the same message passing. [1] 'br' tool was published as an RFC here and will be renamed 'bridge' http://patchwork.ozlabs.org/patch/117664/ Thanks to Jamal Hadi Salim, Stephen Hemminger and Ben Hutchings for valuable feedback, suggestions, and review. Signed-off-by: John Fastabend john.r.fastab...@intel.com --- include/linux/neighbour.h |3 + include/linux/netdevice.h | 26 include/linux/rtnetlink.h |4 + net/bridge/br_device.c|3 + net/bridge/br_fdb.c | 128 ++ net/bridge/br_netlink.c | 12 net/bridge/br_private.h | 15 - net/core/rtnetlink.c | 138 + 8 files changed, 217 insertions(+), 112 deletions(-) diff --git a/include/linux/neighbour.h b/include/linux/neighbour.h index b188f68..3a94409 100644 --- a/include/linux/neighbour.h +++ b/include/linux/neighbour.h @@ -33,6 +33,9 @@ enum { #define NTF_PROXY 0x08/* == ATF_PUBL */ #define NTF_ROUTER 0x80 +#define NTF_LOWERDEV 0x02 +#define NTF_MASTER 0x04 + /* * Neighbor Cache Entry States. */ diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 4535a4e..4208901 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -54,6 +54,7 @@ #include net/netprio_cgroup.h #include linux/netdev_features.h +#include linux/neighbour.h struct netpoll_info; struct phy_device; @@ -904,6 +905,19 @@ struct netdev_fcoe_hbainfo { * feature set might be less than what was returned by ndo_fix_features()). * Must return 0 or -errno if it changed dev-features itself. * + * int (*ndo_fdb_add)(struct ndmsg *ndm, struct net_device *dev, + * unsigned char *addr, u16 flags) + * Adds an FDB entry to dev for addr. The ndmsg contains flags to indicate + * if the dev-master FDB
Re: [PATCH 0/2 v3] kvm: notify host when guest panicked
At 03/08/2012 03:57 PM, Wen Congyang Wrote: We can know the guest is paniced when the guest runs on xen. But we do not have such feature on kvm. Another purpose of this feature is: management app(for example: libvirt) can do auto dump when the guest is crashed. If management app does not do auto dump, the guest's user can do dump by hand if he sees the guest is paniced. I touch the hypervisor instead of using virtio-serial, because 1. it is simple 2. the virtio-serial is an optional device, and the guest may not have such device. Changes from v2 to v3: 1. correct spelling Changes from v1 to v2: 1. split up host and guest-side changes 2. introduce new request flag to avoid changing return values. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ Hi all: we neet this feature, but we don't decide how to implement it. We have two solution: 1. use vmcall 2. use virtio-serial. I will not change this patch set before we decide how to do it. Can we make a decision recent days? Thanks Wen Congyang -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PTACH] libcacard: fix VCARD_ATTR_PREFIX macro
Hi, Alon Levy al...@redhat.com wrote: Thanks for the patch, but I think you are using a not up to date tree, it's fixed by: commit 0202181245297a9e847c05f4a18623219d95e93e Author: Hans de Goede hdego...@redhat.com Date: Fri Mar 2 16:49:44 2012 +0100 libcacard: Fix compilation with gcc-4.7 (same fix). My fault, sorry for the noise. Regards, Marcel -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC] kvm/powerpc: Add new ioctl to retreive support page sizes and encodings
On 03/18/2012 10:47 PM, Benjamin Herrenschmidt wrote: On Sun, 2012-03-18 at 12:23 +0200, Avi Kivity wrote: -ENODOCS What kind of docs do you expect ? Where ? Documentation/virtual/kvm/api.txt. I don't see any of the other private ioctls we use on ppc documented either... Please send patches. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: Switch to srcu-less get_dirty_log()
On 03/16/2012 05:44 PM, Takuya Yoshikawa wrote: On Fri, 16 Mar 2012 16:28:56 +0800 Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote: Thanks for your explanation, maybe you are right, i do not know migration much. What i worried about is, you have changed the behaviour of GET_DIRTY_LOG, in the current one, it can get all the dirty pages when it is called; after your change, GET_DIRTY_LOG can get a empty dirty bitmap but dirty page exists. The current code also see the same situation because nothing prevents the guest from writing to pages before GET_DIRTY_LOG returns and the userspace checks the bitmap. Everything is running. The current code is under the protection of s-rcu: IIRC, it always holds s-rcu when write guest page and set dirty bit, that mean the dirty page is logged either in the old dirty_bitmap or in the current memslot-dirty_bitmap. Yes? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 2/3] virtio-scsi: add error handling
On Sun, Feb 05, 2012 at 12:16:01PM +0100, Paolo Bonzini wrote: This commit adds basic error handling to the virtio-scsi HBA device. Task management functions are sent synchronously via the control virtqueue. Cc: linux-scsi linux-s...@vger.kernel.org Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Cc: kvm@vger.kernel.org Acked-by: Pekka Enberg penb...@kernel.org Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- v3-v4: fixed 32-bit compilation; adjusted call to virtscsi_kick_cmd v2-v3: added mempool, used GFP_NOIO instead of GFP_ATOMIC, formatting fixes v1-v2: use scmd_printk drivers/scsi/virtio_scsi.c | 73 +++- 1 files changed, 72 insertions(+), 1 deletions(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 3f87ae0..68104cd 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -29,6 +29,7 @@ /* Command queue element */ struct virtio_scsi_cmd { struct scsi_cmnd *sc; + struct completion *comp; union { struct virtio_scsi_cmd_req cmd; struct virtio_scsi_ctrl_tmf_req tmf; @@ -168,11 +169,12 @@ static void virtscsi_req_done(struct virtqueue *vq) virtscsi_vq_done(vq, virtscsi_complete_cmd); }; -/* These are still stubs. */ static void virtscsi_complete_free(void *buf) { struct virtio_scsi_cmd *cmd = buf; + if (cmd-comp) + complete_all(cmd-comp); mempool_free(cmd, virtscsi_cmd_pool); } @@ -306,12 +308,81 @@ out: return ret; } +static int virtscsi_tmf(struct virtio_scsi *vscsi, struct virtio_scsi_cmd *cmd) +{ + DECLARE_COMPLETION_ONSTACK(comp); + int ret; + + cmd-comp = comp; + ret = virtscsi_kick_cmd(vscsi, vscsi-ctrl_vq, cmd, +sizeof cmd-req.tmf, sizeof cmd-resp.tmf, +GFP_NOIO); + if (ret 0) + return FAILED; + + wait_for_completion(comp); + if (cmd-resp.tmf.response != VIRTIO_SCSI_S_OK + cmd-resp.tmf.response != VIRTIO_SCSI_S_FUNCTION_SUCCEEDED) + return FAILED; + + return SUCCESS; +} Hi Paolo, This is against v5. From 34ef5e64fc205044e4326fcc5dcf2aa6b219763a Mon Sep 17 00:00:00 2001 From: Hu Tao hu...@cn.fujitsu.com Date: Mon, 19 Mar 2012 15:58:22 +0800 Subject: [PATCH] fix two problems in tmf This patch fix two problems in tmf: 1. race in virtscsi_tmf that the cmd may have been already freed when waking up from the completion 2. cmd leak if virtscsi_kick_cmd fails. Signed-off-by: Hu Tao hu...@cn.fujitsu.com --- drivers/scsi/virtio_scsi.c | 17 ++--- 1 files changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index efccd72..3b8a6e6 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -175,7 +175,8 @@ static void virtscsi_complete_free(void *buf) if (cmd-comp) complete_all(cmd-comp); - mempool_free(cmd, virtscsi_cmd_pool); + else + mempool_free(cmd, virtscsi_cmd_pool); } static void virtscsi_ctrl_done(struct virtqueue *vq) @@ -311,21 +312,23 @@ out: static int virtscsi_tmf(struct virtio_scsi *vscsi, struct virtio_scsi_cmd *cmd) { DECLARE_COMPLETION_ONSTACK(comp); - int ret; + int ret = FAILED; cmd-comp = comp; ret = virtscsi_kick_cmd(vscsi, vscsi-ctrl_vq, cmd, sizeof cmd-req.tmf, sizeof cmd-resp.tmf, GFP_NOIO); if (ret 0) - return FAILED; + goto failed; wait_for_completion(comp); - if (cmd-resp.tmf.response != VIRTIO_SCSI_S_OK - cmd-resp.tmf.response != VIRTIO_SCSI_S_FUNCTION_SUCCEEDED) - return FAILED; + if (cmd-resp.tmf.response == VIRTIO_SCSI_S_OK || + cmd-resp.tmf.response == VIRTIO_SCSI_S_FUNCTION_SUCCEEDED) + ret = SUCCESS; - return SUCCESS; +failed: + mempool_free(cmd, virtscsi_cmd_pool); + return ret; } static int virtscsi_device_reset(struct scsi_cmnd *sc) -- 1.7.1
[PATCH] pci-assign: Fall back to host-side MSI if INTx sharing fails
If the host or the device does not support INTx sharing, retry the IRQ assignment with host-side MSI support enabled but warn about potential consequences. This allows to preserve the previous behavior where we defaulted to MSI and did not support INTx sharing at all. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Detecting if the user actually specified prefer_msi=off as property of pci-assign is non-trivial. So I decided to go for the retry approach, ignoring potential user requests. The warning should attract the attention. hw/device-assignment.c | 12 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 89823f1..c953713 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -835,6 +835,7 @@ static int assign_irq(AssignedDevice *dev) dev-irq_requested_type = 0; } +retry: assigned_irq_data.flags = KVM_DEV_IRQ_GUEST_INTX; if (dev-features ASSIGNED_DEVICE_PREFER_MSI_MASK dev-cap.available ASSIGNED_DEVICE_CAP_MSI) @@ -844,6 +845,17 @@ static int assign_irq(AssignedDevice *dev) r = kvm_assign_irq(kvm_state, assigned_irq_data); if (r 0) { +if (r == -EIO !(dev-features ASSIGNED_DEVICE_PREFER_MSI_MASK) +dev-cap.available ASSIGNED_DEVICE_CAP_MSI) { +/* Retry with host-side MSI. There might be an IRQ conflict and + * either the kernel or the device doesn't support sharing. */ +fprintf(stderr, +Host-side INTx sharing not supported, +using MSI instead.\n +Some devices do not to work properly in this mode.\n); +dev-features |= ASSIGNED_DEVICE_PREFER_MSI_MASK; +goto retry; +} fprintf(stderr, Failed to assign irq for \%s\: %s\n, dev-dev.qdev.id, strerror(-r)); fprintf(stderr, Perhaps you are assigning a device -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] virtio-pci: fix abort when fail to allocate ioeventfd
On Fri, Mar 16, 2012 at 04:59:35PM +0800, Amos Kong wrote: On 14/03/12 19:46, Stefan Hajnoczi wrote: On Wed, Mar 14, 2012 at 10:46 AM, Avi Kivitya...@redhat.com wrote: On 03/14/2012 12:39 PM, Stefan Hajnoczi wrote: On Wed, Mar 14, 2012 at 10:05 AM, Avi Kivitya...@redhat.com wrote: On 03/14/2012 11:59 AM, Stefan Hajnoczi wrote: On Wed, Mar 14, 2012 at 9:22 AM, Avi Kivitya...@redhat.com wrote: On 03/13/2012 12:42 PM, Amos Kong wrote: Boot up guest with 232 virtio-blk disk, qemu will abort for fail to allocate ioeventfd. This patchset changes kvm_has_many_ioeventfds(), and check if available ioeventfd exists. If not, virtio-pci will fallback to userspace, and don't use ioeventfd for io notification. How about an alternative way of solving this, within the memory core: trap those writes in qemu and write to the ioeventfd yourself. This way ioeventfds work even without kvm: core: create eventfd core: install handler for memory address that writes to ioeventfd kvm (optional): install kernel handler for ioeventfd Can you give some detail about this? I'm not familiar with Memory API. btw, can we fix this problem by replacing abort() by a error note? virtio-pci will auto fallback to userspace. diff --git a/kvm-all.c b/kvm-all.c index 3c6b4f0..cf23dbf 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -749,7 +749,8 @@ static void kvm_mem_ioeventfd_add(MemoryRegionSection *section, r = kvm_set_ioeventfd_mmio_long(fd, section-offset_within_address_space, data, true); if (r 0) { -abort(); +fprintf(stderr, %s: unable to map ioeventfd: %s.\nFallback to +userspace (slower).\n, __func__, strerror(-r)); The challenge is propagating the error code. If virtio-pci.c doesn't know that ioeventfd has failed, then it's not possible to fall back to a userspace handler. I believe Avi's suggestion is to put the fallback code into the KVM memory API implementation so that virtio-pci.c doesn't need to know that ioeventfd failed at all. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: Switch to srcu-less get_dirty_log()
On Mon, 19 Mar 2012 17:34:49 +0800 Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote: The current code is under the protection of s-rcu: IIRC, it always holds s-rcu when write guest page and set dirty bit, that mean the dirty page is logged either in the old dirty_bitmap or in the current memslot-dirty_bitmap. Yes? Yes. I just wanted to explain that getting clear dirty bitmap by GET_DIRTY_LOG does not mean there is no dirty page: it just means that there was nothing logged when we updated the bitmap in get_dirty_log(). We cannot know if anything happend between the bitmap update and result check in the userspace. So even when we get a clear bitmap, we need to stop the VCPU threads and then do GET_DIRTY_LOG once more for live migration. The important thing is that every bit set by mark_page_dirty() can be found at some time in the future, including the final GET_DIRTY_LOG. Takuya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] virtio-pci: fix abort when fail to allocate ioeventfd
On 03/16/2012 10:59 AM, Amos Kong wrote: Can you give some detail about this? I'm not familiar with Memory API. Well there's a huge amount of detail needed here. The idea is that memory_region_add_eventfd() will always work, with or without kvm, and even if kvm is enabled but we run out of ioeventfds. One way to do this is to implement core_eventfd_add() in exec.c. This is unlikely to be easy however. btw, can we fix this problem by replacing abort() by a error note? virtio-pci will auto fallback to userspace. But other users will silently break, need to audit all other users of ioeventfd, for example ivshmem. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: performance trouble
On Fri, Mar 16, 2012 at 11:13:31AM +0100, David Cure wrote: hello, sorry for the delay, Le Thu, Feb 23, 2012 at 10:38:07AM +0200, Gleb Natapov ecrivait : Ah, I guess the reason is that it records events only of IO thread. You need to trace all vcpu threads too. Not sure trace-cmd allows more then one -P option though. I manage to have the physical server with only one VM with the slowly function and take trace during the slowly function. I upload trace in http://www.roullier.net/Report/ with : o report.txt.3.1.gz : with kernel 3.1 o report.txt.3.2.gz : with kernel 3.2 o report.txt.vhost-net-3.1.gz : with kernel 3.1 and vhost-net o report.txt.vhost-net.3.2.gz : with kernel 3.2 and vhost-net With 3.2 + vhost-net we have 10.5s (to remember 8s with vmware esxi 4). Can you run the same test on Linux guest and on Windows vm with 1 cpu? I see a lot of IPIs between vcpus. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 7/7] RTC:Allow to migrate from old version
On 03/19/2012 08:14 AM, Zhang, Yang Z wrote: The new logic is compatible with old. So it should not block migrate from old version. But new version cannot migrate to old. +static int rtc_load_old(QEMUFile *f, void *opaque, int version_id) +{ +RTCState *s = opaque; + +if (version_id 2) { +return -EINVAL; +} + +qemu_get_buffer(f, s-cmos_data, sizeof(s-cmos_data)); +/* dummy load for compatibility */ +qemu_get_byte(f); /* cmos_index */ +qemu_get_be32(f); /* tm_sec */ +qemu_get_be32(f); /* tm_min */ +qemu_get_be32(f); /* tm_hour */ +qemu_get_be32(f); /* tm_wday */ +qemu_get_be32(f); /* tm_mday */ +qemu_get_be32(f); /* tm_mon */ +qemu_get_be32(f); /* tm_year */ +qemu_get_be64(f); /* periodic_timer */ +qemu_get_be64(f); /* next_periodic_time */ +qemu_get_be64(f); /* next_second_time */ +qemu_get_be64(f); /* second_timer */ +qemu_get_be64(f); /* second_timer2 */ +qemu_get_be32(f); /* irq_coalesced */ +qemu_get_be32(f); /* period */ + Why don't you just convert the data to the new in-memory format? Then you don't need a version update. + +rtc_set_date_from_host(s-dev); If the guest is intentionally running with an incorrect date, this breaks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 2/3] virtio-scsi: add error handling
Il 19/03/2012 10:55, Hu Tao ha scritto: + int ret = FAILED; cmd-comp = comp; ret = virtscsi_kick_cmd(vscsi, vscsi-ctrl_vq, cmd, sizeof cmd-req.tmf, sizeof cmd-resp.tmf, GFP_NOIO); if (ret 0) - return FAILED; + goto failed; This will return the errno, not FAILED. I have already fixed this up locally, though I've been lazy on actually sending out the fix. I'll do this today. Paolo wait_for_completion(comp); - if (cmd-resp.tmf.response != VIRTIO_SCSI_S_OK - cmd-resp.tmf.response != VIRTIO_SCSI_S_FUNCTION_SUCCEEDED) - return FAILED; + if (cmd-resp.tmf.response == VIRTIO_SCSI_S_OK || + cmd-resp.tmf.response == VIRTIO_SCSI_S_FUNCTION_SUCCEEDED) + ret = SUCCESS; - return SUCCESS; +failed: + mempool_free(cmd, virtscsi_cmd_pool); + return ret; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-ppc while I'm on vacation
On Sun, Mar 18, 2012 at 11:10:43PM +0100, Alexander Graf wrote: Hence I asked Paul to take on temporary maintainership of the kvm-ppc tree for the next 3 weeks. During that time, he'll be allowed to send pull requests to Avi and Marcelo and is obliged to fix the build whenever it breaks :). Thanks, Alex. Avi, Marcelo, what do you plan to do with Alex's existing pull request? Are you about to do that pull? Thanks, Paul. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-ppc while I'm on vacation
On 03/19/2012 01:56 PM, Paul Mackerras wrote: On Sun, Mar 18, 2012 at 11:10:43PM +0100, Alexander Graf wrote: Hence I asked Paul to take on temporary maintainership of the kvm-ppc tree for the next 3 weeks. During that time, he'll be allowed to send pull requests to Avi and Marcelo and is obliged to fix the build whenever it breaks :). Thanks, Alex. Avi, Marcelo, what do you plan to do with Alex's existing pull request? Are you about to do that pull? It looks good to me, I'll pull it if Marcelo doesn't beat me to it. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: question regarding intel_idle inside kvm
On 03/16/2012 12:19 AM, Daniel Lezcano wrote: Hi all, I recently did some modification in the cpuidle core and the patches were merge to linux-next. Someone reported a problem with the intel_idle cpuidle driver. I tried to reproduce the problem with kvm but the kernel fails to intialize the driver because of intel_intel_init function fails in the processor probe. After digging a bit, I found it fails at: drivers/idle/intel_idle.c static int intel_idle_probe(void) { ... if (boot_cpu_data.cpuid_level CPUID_MWAIT_LEAF) return -ENODEV; ^ ... } I assumed the virtualized processor does not support this, so I specified the -cpu host because the host was running the intel_idle driver. But the driver still fails in kvm. I was wondering why that happens ? Does anyone have an idea of this problem ? intel_idle() uses mwait, which kvm does not virtualize (it's very expensive to do so and brings no benefits). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: question regarding intel_idle inside kvm
On 03/19/2012 01:31 PM, Avi Kivity wrote: On 03/16/2012 12:19 AM, Daniel Lezcano wrote: Hi all, I recently did some modification in the cpuidle core and the patches were merge to linux-next. Someone reported a problem with the intel_idle cpuidle driver. I tried to reproduce the problem with kvm but the kernel fails to intialize the driver because of intel_intel_init function fails in the processor probe. After digging a bit, I found it fails at: drivers/idle/intel_idle.c static int intel_idle_probe(void) { ... if (boot_cpu_data.cpuid_level CPUID_MWAIT_LEAF) return -ENODEV; ^ ... } I assumed the virtualized processor does not support this, so I specified the -cpu host because the host was running the intel_idle driver. But the driver still fails in kvm. I was wondering why that happens ? Does anyone have an idea of this problem ? intel_idle() uses mwait, which kvm does not virtualize (it's very expensive to do so and brings no benefits). Ok, thanks for the information. I was afraid of that :/ I will go to for a real host then :) Thanks ! -- Daniel -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 0/2] support to migrate with IPv6 address
Those patches make tcp migration use the help functions in qemu-socket.c for support IPv6 migration. Changes from v1: - split different changes to small patches, it will be easier to review - fixed some problem according to Kevin's comment Changes from v2: - fix issue of returning real error - set s-fd to -1 when parse fails, won't call migrate_fd_error() Changes from v3: - try to use help functions in qemu-socket.c --- Amos Kong (2): qemu-socket: change inet_connect() to to support nonblock socket use inet_listen()/inet_connect() to support ipv6 migration migration-tcp.c | 75 +++ nbd.c |2 + qemu-char.c |2 + qemu-sockets.c | 73 ++ qemu_socket.h |4 +-- ui/vnc.c|2 + 6 files changed, 82 insertions(+), 76 deletions(-) -- Amos Kong -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 1/2] qemu-socket: change inet_connect() to to support nonblock socket
Change inet_connect(const char *str, int socktype) to inet_connect(const char *str, bool block, int *sock_err), socktype is unused, block is used to assign if set socket to block/nonblock, sock_err is used to restore socket error. Connect's successful for nonblock socket when following errors are returned: -EINPROGRESS or -EWOULDBLOCK -WSAEALREADY or -WSAEINVAL (win32) Also change the wrap function inet_connect_opts(QemuOpts *opts) to inet_connect_opts(QemuOpts *opts, int *sock_err). Add a bool entry(block) for dummy_opts to tag block type. Change nbd, vnc to use new interface. Signed-off-by: Amos Kong ak...@redhat.com --- nbd.c |2 +- qemu-char.c|2 +- qemu-sockets.c | 73 qemu_socket.h |4 ++- ui/vnc.c |2 +- 5 files changed, 62 insertions(+), 21 deletions(-) diff --git a/nbd.c b/nbd.c index 567e94e..ad4de06 100644 --- a/nbd.c +++ b/nbd.c @@ -146,7 +146,7 @@ int tcp_socket_outgoing(const char *address, uint16_t port) int tcp_socket_outgoing_spec(const char *address_and_port) { -return inet_connect(address_and_port, SOCK_STREAM); +return inet_connect(address_and_port, true, NULL); } int tcp_socket_incoming(const char *address, uint16_t port) diff --git a/qemu-char.c b/qemu-char.c index bb9e3f5..d3543ea 100644 --- a/qemu-char.c +++ b/qemu-char.c @@ -2443,7 +2443,7 @@ static CharDriverState *qemu_chr_open_socket(QemuOpts *opts) if (is_listen) { fd = inet_listen_opts(opts, 0); } else { -fd = inet_connect_opts(opts); +fd = inet_connect_opts(opts, NULL); } } if (fd 0) { diff --git a/qemu-sockets.c b/qemu-sockets.c index 6bcb8e3..8ed45f8 100644 --- a/qemu-sockets.c +++ b/qemu-sockets.c @@ -51,6 +51,9 @@ static QemuOptsList dummy_opts = { },{ .name = ipv6, .type = QEMU_OPT_BOOL, +},{ +.name = block, +.type = QEMU_OPT_BOOL, }, { /* end if list */ } }, @@ -194,14 +197,15 @@ listen: return slisten; } -int inet_connect_opts(QemuOpts *opts) +int inet_connect_opts(QemuOpts *opts, int *sock_err) { struct addrinfo ai,*res,*e; const char *addr; const char *port; char uaddr[INET6_ADDRSTRLEN+1]; char uport[33]; -int sock,rc; +int sock, rc, err; +bool block; memset(ai,0, sizeof(ai)); ai.ai_flags = AI_CANONNAME | AI_ADDRCONFIG; @@ -210,9 +214,11 @@ int inet_connect_opts(QemuOpts *opts) addr = qemu_opt_get(opts, host); port = qemu_opt_get(opts, port); +block = qemu_opt_get_bool(opts, block, 0); if (addr == NULL || port == NULL) { fprintf(stderr, inet_connect: host and/or port not specified\n); -return -1; +err = -EINVAL; +goto err; } if (qemu_opt_get_bool(opts, ipv4, 0)) @@ -224,7 +230,8 @@ int inet_connect_opts(QemuOpts *opts) if (0 != (rc = getaddrinfo(addr, port, ai, res))) { fprintf(stderr,getaddrinfo(%s,%s): %s\n, addr, port, gai_strerror(rc)); - return -1; +err = -EINVAL; +goto err; } for (e = res; e != NULL; e = e-ai_next) { @@ -241,21 +248,52 @@ int inet_connect_opts(QemuOpts *opts) continue; } setsockopt(sock,SOL_SOCKET,SO_REUSEADDR,(void*)on,sizeof(on)); - +if (!block) { +socket_set_nonblock(sock); +} /* connect to peer */ -if (connect(sock,e-ai_addr,e-ai_addrlen) 0) { -if (NULL == e-ai_next) -fprintf(stderr, %s: connect(%s,%s,%s,%s): %s\n, __FUNCTION__, -inet_strfamily(e-ai_family), -e-ai_canonname, uaddr, uport, strerror(errno)); -closesocket(sock); -continue; +do { +err = 0; +if (connect(sock, e-ai_addr, e-ai_addrlen) 0) { +err = -socket_error(); +if (block) { +break; +} +} +} while (err == -EINTR); + +if (err = 0) { +goto success; +} else if (!block (err == -EINPROGRESS || err == -EWOULDBLOCK)) { +goto success; +#ifdef _WIN32 +} else if (!block (sock_err == -WSAEALREADY || + sock_err == -WSAEINVAL)) { +goto success; +#endif } -freeaddrinfo(res); -return sock; + +if (NULL == e-ai_next) { +fprintf(stderr, %s: connect(%s,%s,%s,%s): %s\n, __func__, +inet_strfamily(e-ai_family), +e-ai_canonname, uaddr, uport, strerror(errno)); +} +closesocket(sock); } freeaddrinfo(res); + +err: +if (sock_err) { +*sock_err = err; +} return -1; + +success: +freeaddrinfo(res); +if (sock_err) { +*sock_err = err; +} +return sock; }
[PATCH v4 2/2] use inet_listen()/inet_connect() to support ipv6 migration
Use help functions in qemu-socket.c for tcp migration, which already support ipv6 addresses. For IPv6 brackets must be mandatory if you require a port. Referencing to RFC5952, the recommended format is: [2312::8274]:5200 test status: Successed listen side: qemu-kvm -incoming tcp:[2312::8274]:5200 client side: qemu-kvm ... (qemu) migrate -d tcp:[2312::8274]:5200 Signed-off-by: Amos Kong ak...@redhat.com --- migration-tcp.c | 75 +++ 1 files changed, 20 insertions(+), 55 deletions(-) diff --git a/migration-tcp.c b/migration-tcp.c index 35a5781..6c66c7a 100644 --- a/migration-tcp.c +++ b/migration-tcp.c @@ -81,43 +81,31 @@ static void tcp_wait_for_connect(void *opaque) int tcp_start_outgoing_migration(MigrationState *s, const char *host_port) { -struct sockaddr_in addr; -int ret; - -ret = parse_host_port(addr, host_port); -if (ret 0) { -return ret; -} +int sock_err; s-get_error = socket_errno; s-write = socket_write; s-close = tcp_close; -s-fd = qemu_socket(PF_INET, SOCK_STREAM, 0); -if (s-fd == -1) { -DPRINTF(Unable to open socket); -return -socket_error(); -} - -socket_set_nonblock(s-fd); +s-fd = inet_connect(host_port, false, sock_err); -do { -ret = connect(s-fd, (struct sockaddr *)addr, sizeof(addr)); -if (ret == -1) { -ret = -socket_error(); -} -if (ret == -EINPROGRESS || ret == -EWOULDBLOCK) { -qemu_set_fd_handler2(s-fd, NULL, NULL, tcp_wait_for_connect, s); -return 0; +if (sock_err == -EINPROGRESS || sock_err == -EWOULDBLOCK) { +DPRINTF(connect in progress); +qemu_set_fd_handler2(s-fd, NULL, NULL, tcp_wait_for_connect, s); +#ifdef _WIN32 +} else if (sock_err == -WSAEALREADY || sock_err == -WSAEINVAL) { +DPRINTF(connect in progress); +qemu_set_fd_handler2(s-fd, NULL, NULL, tcp_wait_for_connect, s); +#endif +} else if (sock_err 0) { +DPRINTF(connect failed: %s\n, strerror(-sock_err)); +if (s-fd != -1) { +migrate_fd_error(s); } -} while (ret == -EINTR); - -if (ret 0) { -DPRINTF(connect failed\n); -migrate_fd_error(s); -return ret; +return sock_err; +} else { +migrate_fd_connect(s); } -migrate_fd_connect(s); return 0; } @@ -157,38 +145,15 @@ out2: int tcp_start_incoming_migration(const char *host_port) { -struct sockaddr_in addr; -int val; int s; -DPRINTF(Attempting to start an incoming migration\n); - -if (parse_host_port(addr, host_port) 0) { -fprintf(stderr, invalid host/port combination: %s\n, host_port); -return -EINVAL; -} - -s = qemu_socket(PF_INET, SOCK_STREAM, 0); -if (s == -1) { -return -socket_error(); -} - -val = 1; -setsockopt(s, SOL_SOCKET, SO_REUSEADDR, (const char *)val, sizeof(val)); - -if (bind(s, (struct sockaddr *)addr, sizeof(addr)) == -1) { -goto err; -} -if (listen(s, 1) == -1) { -goto err; +s = inet_listen(host_port, NULL, 256, SOCK_STREAM, 0); +if (s 0) { +return s; } qemu_set_fd_handler2(s, NULL, tcp_accept_incoming_migration, NULL, (void *)(intptr_t)s); return 0; - -err: -close(s); -return -socket_error(); } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3][Autotest][virt] autotest.base_utils: Move virt.utils.Thread-base_utils.InterruptedThread
It is necessary for adding syncdata class. Signed-off-by: Jiří Župka jzu...@redhat.com --- client/common_lib/base_barrier.py |2 +- client/common_lib/base_utils.py| 65 ++ .../kvm/tests/migration_with_file_transfer.py |6 +- client/tests/kvm/tests/migration_with_reboot.py|4 +- client/tests/kvm/tests/nic_bonding.py |9 ++- client/tests/kvm/tests/vmstop.py |6 +- client/virt/tests/nic_promisc.py |5 +- client/virt/tests/nicdriver_unload.py |4 +- client/virt/tests/ntttcp.py|2 +- client/virt/virt_test_utils.py |2 +- client/virt/virt_utils.py | 69 +--- 11 files changed, 88 insertions(+), 86 deletions(-) diff --git a/client/common_lib/base_barrier.py b/client/common_lib/base_barrier.py index d20916a..df4da49 100644 --- a/client/common_lib/base_barrier.py +++ b/client/common_lib/base_barrier.py @@ -50,7 +50,7 @@ class listen_server(object): sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) sock.bind((self.address, self.port)) -sock.listen(10) +sock.listen(100) return sock diff --git a/client/common_lib/base_utils.py b/client/common_lib/base_utils.py index 972d18a..c40e5dc 100644 --- a/client/common_lib/base_utils.py +++ b/client/common_lib/base_utils.py @@ -817,6 +817,71 @@ def run_parallel(commands, timeout=None, ignore_status=False, return [bg_job.result for bg_job in bg_jobs] +class InterruptedThread(Thread): + +Run a function in a background thread. + +def __init__(self, target, args=(), kwargs={}): + +Initialize the instance. + +@param target: Function to run in the thread. +@param args: Arguments to pass to target. +@param kwargs: Keyword arguments to pass to target. + +Thread.__init__(self) +self._target = target +self._args = args +self._kwargs = kwargs + + +def run(self): + +Run target (passed to the constructor). No point in calling this +function directly. Call start() to make this function run in a new +thread. + +self._e = None +self._retval = None +try: +try: +self._retval = self._target(*self._args, **self._kwargs) +except Exception: +self._e = sys.exc_info() +raise +finally: +# Avoid circular references (start() may be called only once so +# it's OK to delete these) +del self._target, self._args, self._kwargs + + +def join(self, timeout=None, suppress_exception=False): + +Join the thread. If target raised an exception, re-raise it. +Otherwise, return the value returned by target. + +@param timeout: Timeout value to pass to threading.Thread.join(). +@param suppress_exception: If True, don't re-raise the exception. + +Thread.join(self, timeout) +try: +if self._e: +if not suppress_exception: +# Because the exception was raised in another thread, we +# need to explicitly insert the current context into it +s = error.exception_context(self._e[1]) +s = error.join_contexts(error.get_context(), s) +error.set_exception_context(self._e[1], s) +raise self._e[0], self._e[1], self._e[2] +else: +return self._retval +finally: +# Avoid circular references (join() may be called multiple times +# so we can't delete these) +self._e = None +self._retval = None + + @deprecated def run_bg(command): Function deprecated. Please use BgJob class instead. diff --git a/client/tests/kvm/tests/migration_with_file_transfer.py b/client/tests/kvm/tests/migration_with_file_transfer.py index 075148d..073b87e 100644 --- a/client/tests/kvm/tests/migration_with_file_transfer.py +++ b/client/tests/kvm/tests/migration_with_file_transfer.py @@ -56,13 +56,13 @@ def run_migration_with_file_transfer(test, params, env): error.context(transferring file to guest while migrating, logging.info) -bg = virt_utils.Thread(vm.copy_files_to, (host_path, guest_path), - dict(verbose=True, timeout=transfer_timeout)) +bg = utils.InterruptedThread(vm.copy_files_to, (host_path, guest_path), + dict(verbose=True, timeout=transfer_timeout)) run_and_migrate(bg) error.context(transferring file back to host while migrating, logging.info) -bg
[PATCH 3/3][Autotest][virt] virt.virt_utils: Add framework for multihost migration.
Multihost migration framework makes multi host migration guest with load easy. This patch also replaces old tests for multihost migration with version which using the framework. Multihost miration framework take care about: - preparing environment before migration - preparing guest for migration on source and dest host - start guest - start work on guest - migration between hosts - check work on destination host - close guest - postprocess environment after migraiton The framework also allow start multiple migraiton independently in some time with multiple hosts and different work on guests. Signed-off-by: Jiří Župka jzu...@redhat.com --- client/tests/kvm/multi_host.srv| 94 +++-- client/tests/kvm/tests/cpuflags.py | 203 --- client/tests/kvm/tests/migration_multi_host.py | 105 +- client/virt/base.cfg.sample|3 + client/virt/subtests.cfg.sample| 16 +- client/virt/virt_env_process.py| 15 +- client/virt/virt_utils.py | 506 client/virt/virt_vm.py | 51 +++ 8 files changed, 726 insertions(+), 267 deletions(-) diff --git a/client/tests/kvm/multi_host.srv b/client/tests/kvm/multi_host.srv index a4bb20f..c661253 100644 --- a/client/tests/kvm/multi_host.srv +++ b/client/tests/kvm/multi_host.srv @@ -37,14 +37,22 @@ def generate_mac_address(): return mac -def run(pair): -logging.info(KVM test running on source host [%s] and destination - host [%s]\n, pair[0], pair[1]) - -source = hosts.create_host(pair[0]) -dest = hosts.create_host(pair[1]) -source_at = autotest_remote.Autotest(source) -dest_at = autotest_remote.Autotest(dest) +def run(machines): +logging.info(KVM test running on hosts %s\n, machines) +class Machines(object): +def __init__(self, host): +self.host = host +self.at = None +self.params = None +self.control = None + +_hosts = {} +for machine in machines: +_hosts[machine] = Machines(hosts.create_host(machine)) + +ats = [] +for host in _hosts.itervalues(): +host.at = autotest_remote.Autotest(host.host) cfg_file = os.path.join(KVM_DIR, multi-host-tests.cfg) @@ -56,7 +64,9 @@ def run(pair): parser.parse_file(cfg_file) test_dicts = parser.get_dicts() -source_control_file = dest_control_file = +ips = [] +for host in _hosts.itervalues(): +host.control = testname = kvm bindir = os.path.join(job.testdir, testname) job.install_pkg(testname, 'test', bindir) @@ -64,21 +74,29 @@ job.install_pkg(testname, 'test', bindir) kvm_test_dir = os.path.join(os.environ['AUTODIR'],'tests', 'kvm') sys.path.append(kvm_test_dir) +ips.append(host.host.ip) import sys for params in test_dicts: -params['srchost'] = source.ip -params['dsthost'] = dest.ip +params['hosts'] = ips + +params['not_preprocess'] = yes +for vm in params.get(vms).split(): +for nic in params.get('nics',).split(): +params['nic_mac_%s_%s' % (nic, vm)] = generate_mac_address() -for nic in params.get('nics',).split(): -params['nic_mac_%s' % nic] = generate_mac_address() +params['mater_images_clone'] = image1 +params['kill_vm'] = yes -source_params = params.copy() -source_params['role'] = source +s_host = _hosts[machines[0]] +s_host.params = params.copy() +s_host.params['clone_master'] = yes +s_host.params['hostid'] = machines[0] -dest_params = params.copy() -dest_params['role'] = destination -dest_params['migration_mode'] = tcp +for machine, host in _hosts.items()[1:]: +host.params = params.copy() +host.params['clone_master'] = no +host.params['hostid'] = machine # Report the parameters we've received print Test parameters: @@ -87,27 +105,31 @@ sys.path.append(kvm_test_dir) for key in keys: logging.debug(%s = %s, key, params[key]) -source_control_file += (job.run_test('kvm', tag='%s', params=%s) % -(source_params['shortname'], source_params)) -dest_control_file += (job.run_test('kvm', tag='%s', params=%s) % - (dest_params['shortname'], dest_params)) +for host in _hosts.itervalues(): +host.control += (job.run_test('kvm', tag='%s', params=%s) % + (host.params['shortname'], host.params)) -logging.info('Source control file:\n%s', source_control_file) -logging.info('Destination control file:\n%s', dest_control_file) -dest_command = subcommand(dest_at.run, - [dest_control_file, dest.hostname]) +
[PATCH 5/5] qemu-kvm: i8254: Reorganize i8254-kvm code
Include i8254-kvm.c instead of building it as a separate module. This allows to reduce the diff to upstream and will help with merging the latter. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Makefile.target |1 - hw/i8254-kvm.c | 12 hw/i8254.c | 43 +-- hw/i8254.h | 49 ++--- 4 files changed, 47 insertions(+), 58 deletions(-) diff --git a/Makefile.target b/Makefile.target index 5f6b963..24386a4 100644 --- a/Makefile.target +++ b/Makefile.target @@ -251,7 +251,6 @@ obj-i386-y += testdev.o obj-i386-y += acpi.o acpi_piix4.o obj-i386-y += i8254.o -obj-i386-$(CONFIG_KVM_PIT) += i8254-kvm.o obj-i386-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += device-assignment.o # shared objects diff --git a/hw/i8254-kvm.c b/hw/i8254-kvm.c index 7316111..e2aa4d6 100644 --- a/hw/i8254-kvm.c +++ b/hw/i8254-kvm.c @@ -21,14 +21,8 @@ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN * THE SOFTWARE. */ -#include hw.h -#include pc.h -#include isa.h -#include qemu-timer.h -#include i8254.h -#include qemu-kvm.h -extern VMStateDescription vmstate_pit; +#ifdef CONFIG_KVM_PIT static void kvm_pit_pre_save(void *opaque) { @@ -103,7 +97,7 @@ static void dummy_timer(void *opaque) { } -void kvm_pit_init(PITState *pit) +static void qemu_kvm_pit_init(PITState *pit) { PITChannelState *s; @@ -116,3 +110,5 @@ void kvm_pit_init(PITState *pit) vmstate_pit.post_load = kvm_pit_post_load; return; } + +#endif diff --git a/hw/i8254.c b/hw/i8254.c index ca24ab9..a8e20cb 100644 --- a/hw/i8254.c +++ b/hw/i8254.c @@ -30,8 +30,45 @@ //#define DEBUG_PIT +#define RW_STATE_LSB 1 +#define RW_STATE_MSB 2 +#define RW_STATE_WORD0 3 +#define RW_STATE_WORD1 4 + +typedef struct PITChannelState { +int count; /* can be 65536 */ +uint16_t latched_count; +uint8_t count_latched; +uint8_t status_latched; +uint8_t status; +uint8_t read_state; +uint8_t write_state; +uint8_t write_latch; +uint8_t rw_mode; +uint8_t mode; +uint8_t bcd; /* not supported */ +uint8_t gate; /* timer start */ +int64_t count_load_time; +/* irq handling */ +int64_t next_transition_time; +QEMUTimer *irq_timer; +qemu_irq irq; +uint32_t irq_disabled; +} PITChannelState; + +typedef struct PITState { +ISADevice dev; +MemoryRegion ioports; +uint32_t iobase; +PITChannelState channels[3]; +} PITState; + static void pit_irq_timer_update(PITChannelState *s, int64_t current_time); +#ifdef CONFIG_KVM_PIT +static void qemu_kvm_pit_init(PITState *pit); +#endif + static int pit_get_count(PITChannelState *s) { uint64_t d; @@ -412,7 +449,7 @@ static int pit_load_old(QEMUFile *f, void *opaque, int version_id) return 0; } -VMStateDescription vmstate_pit = { +static VMStateDescription vmstate_pit = { .name = i8254, .version_id = 3, .minimum_version_id = 2, @@ -482,7 +519,7 @@ static int pit_initfn(ISADevice *dev) #ifdef CONFIG_KVM_PIT if (kvm_enabled() kvm_irqchip_in_kernel()) { -kvm_pit_init(pit); +qemu_kvm_pit_init(pit); return 0; } #endif @@ -531,3 +568,5 @@ static void pit_register_types(void) } type_init(pit_register_types) + +#include i8254-kvm.c diff --git a/hw/i8254.h b/hw/i8254.h index 3313662..47b2570 100644 --- a/hw/i8254.h +++ b/hw/i8254.h @@ -25,55 +25,10 @@ #ifndef HW_I8254_H #define HW_I8254_H +#include hw.h +#include isa.h #include kvm.h -#define PIT_SAVEVM_NAME i8254 -#define PIT_SAVEVM_VERSION 2 - -#define RW_STATE_LSB 1 -#define RW_STATE_MSB 2 -#define RW_STATE_WORD0 3 -#define RW_STATE_WORD1 4 - -#define PIT_FLAGS_HPET_LEGACY 1 - -typedef struct PITChannelState { -int count; /* can be 65536 */ -uint16_t latched_count; -uint8_t count_latched; -uint8_t status_latched; -uint8_t status; -uint8_t read_state; -uint8_t write_state; -uint8_t write_latch; -uint8_t rw_mode; -uint8_t mode; -uint8_t bcd; /* not supported */ -uint8_t gate; /* timer start */ -int64_t count_load_time; -/* irq handling */ -int64_t next_transition_time; -QEMUTimer *irq_timer; -qemu_irq irq; -uint32_t irq_disabled; -} PITChannelState; - -struct PITState { -ISADevice dev; -MemoryRegion ioports; -uint32_t iobase; -PITChannelState channels[3]; -}; - -void pit_save(QEMUFile *f, void *opaque); - -int pit_load(QEMUFile *f, void *opaque, int version_id); - -typedef struct PITState PITState; - -/* i8254-kvm.c */ -void kvm_pit_init(PITState *pit); - #define PIT_FREQ 1193182 typedef struct PITChannelInfo { -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] qemu-kvm: i8254: Reset broken pit_load_old to upstream version
pit_load_old is only called with version_id == 1, but PIT_SAVEVM_VERSION is 2. So this function is broken in qemu_kvm for ages, and also the dummy qemu_get_be32 is pointless. Revert to upstream. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/i8254.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/hw/i8254.c b/hw/i8254.c index 8925139..7089832 100644 --- a/hw/i8254.c +++ b/hw/i8254.c @@ -390,10 +390,9 @@ static int pit_load_old(QEMUFile *f, void *opaque, int version_id) PITChannelState *s; int i; -if (version_id != PIT_SAVEVM_VERSION) +if (version_id != 1) return -EINVAL; -(void)qemu_get_be32(f); for(i = 0; i 3; i++) { s = pit-channels[i]; s-count=qemu_get_be32(f); -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] qemu-kvm: i8254: Revert pit_load_count to upstream version
pit_irq_timer_update now checks generically if a channel IRQ is disabled, so we can drop the hacks from qemu-kvm. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/i8254.c | 19 +++ 1 files changed, 7 insertions(+), 12 deletions(-) diff --git a/hw/i8254.c b/hw/i8254.c index 7089832..befad05 100644 --- a/hw/i8254.c +++ b/hw/i8254.c @@ -187,18 +187,13 @@ void pit_get_channel_info(ISADevice *dev, int channel, PITChannelInfo *info) info-out = pit_get_out(s, qemu_get_clock_ns(vm_clock)); } -static inline void pit_load_count(PITState *s, int val, int chan) +static inline void pit_load_count(PITChannelState *s, int val) { if (val == 0) val = 0x1; -s-channels[chan].count_load_time = qemu_get_clock_ns(vm_clock); -s-channels[chan].count = val; -#ifdef TARGET_I386 -if (chan == 0 s-channels[0].irq_disabled) { -return; -} -#endif -pit_irq_timer_update(s-channels[chan], s-channels[chan].count_load_time); +s-count_load_time = qemu_get_clock_ns(vm_clock); +s-count = val; +pit_irq_timer_update(s, s-count_load_time); } /* if already latched, do not latch again */ @@ -260,17 +255,17 @@ static void pit_ioport_write(void *opaque, uint32_t addr, uint32_t val) switch(s-write_state) { default: case RW_STATE_LSB: -pit_load_count(pit, val, addr); +pit_load_count(s, val); break; case RW_STATE_MSB: -pit_load_count(pit, val 8, addr); +pit_load_count(s, val 8); break; case RW_STATE_WORD0: s-write_latch = val; s-write_state = RW_STATE_WORD1; break; case RW_STATE_WORD1: -pit_load_count(pit, s-write_latch | (val 8), addr); +pit_load_count(s, s-write_latch | (val 8)); s-write_state = RW_STATE_WORD0; break; } -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] qemu-kvm: i8254: Drop bogus irq_disabled clearing in pit_reset
The IRQ output line is reset along with the HPET (via signaling a new state on the corresponding GPIO line), not the PIT itself. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/i8254.c |3 --- 1 files changed, 0 insertions(+), 3 deletions(-) diff --git a/hw/i8254.c b/hw/i8254.c index befad05..ca24ab9 100644 --- a/hw/i8254.c +++ b/hw/i8254.c @@ -432,9 +432,6 @@ static void pit_reset(DeviceState *dev) PITChannelState *s; int i; -#ifdef TARGET_I386 -pit-channels[0].irq_disabled = 0; -#endif for(i = 0;i 3; i++) { s = pit-channels[i]; s-mode = 3; -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] qemu-kvm: i8254/pcspk: Remove unused bits to make PC speaker kvm-aware
Due to old-style-only creation of the in-kernel PIT (became broken long ago during refactorings), the kernel always handled the speaker port in qemu-kvm for a long while. Thus all bits that try to make the user space speaker emulating kvm-aware are actually unused. Upstream will come with fully-working speaker emulation even while the in-kernel PIT is enabled, so let's drop the dead bits from qemu-kvm to ease merging with upstream. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Makefile.objs |2 +- Makefile.target |4 ++-- hw/i8254.c | 44 hw/pcspk.c |6 -- 4 files changed, 3 insertions(+), 53 deletions(-) diff --git a/Makefile.objs b/Makefile.objs index c33c0f2..39791ac 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -212,7 +212,7 @@ hw-obj-$(CONFIG_SERIAL) += serial.o hw-obj-$(CONFIG_PARALLEL) += parallel.o # Moved back to Makefile.target due to #include qemu-kvm.h: #hw-obj-$(CONFIG_I8254) += i8254.o -#hw-obj-$(CONFIG_PCSPK) += pcspk.o +hw-obj-$(CONFIG_PCSPK) += pcspk.o hw-obj-$(CONFIG_PCKBD) += pckbd.o hw-obj-$(CONFIG_USB_UHCI) += usb-uhci.o hw-obj-$(CONFIG_USB_OHCI) += usb-ohci.o diff --git a/Makefile.target b/Makefile.target index ae04331..5f6b963 100644 --- a/Makefile.target +++ b/Makefile.target @@ -250,7 +250,7 @@ obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o obj-i386-y += testdev.o obj-i386-y += acpi.o acpi_piix4.o -obj-i386-y += pcspk.o i8254.o +obj-i386-y += i8254.o obj-i386-$(CONFIG_KVM_PIT) += i8254-kvm.o obj-i386-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += device-assignment.o @@ -308,7 +308,7 @@ obj-lm32-y += milkymist-vgafb.o obj-lm32-y += framebuffer.o obj-mips-y = mips_r4k.o mips_jazz.o mips_malta.o mips_mipssim.o -obj-mips-y += pcspk.o i8254.o +obj-mips-y += i8254.o obj-mips-y += acpi.o acpi_piix4.o obj-mips-y += mips_addr.o mips_timer.o mips_int.o obj-mips-y += gt64xxx.o mc146818rtc.o diff --git a/hw/i8254.c b/hw/i8254.c index 33d94e1..8925139 100644 --- a/hw/i8254.c +++ b/hw/i8254.c @@ -176,55 +176,11 @@ void pit_set_gate(ISADevice *dev, int channel, int val) s-gate = val; } -#ifdef CONFIG_KVM_PIT -static void kvm_get_pit_ch2(ISADevice *dev, -struct kvm_pit_state *inkernel_state) -{ -struct PITState *pit = DO_UPCAST(struct PITState, dev, dev); -struct kvm_pit_state pit_state; - -if (kvm_enabled() kvm_irqchip_in_kernel()) { -kvm_get_pit(kvm_state, pit_state); -pit-channels[2].mode = pit_state.channels[2].mode; -pit-channels[2].count = pit_state.channels[2].count; -pit-channels[2].count_load_time = pit_state.channels[2].count_load_time; -pit-channels[2].gate = pit_state.channels[2].gate; -if (inkernel_state) { -memcpy(inkernel_state, pit_state, sizeof(*inkernel_state)); -} -} -} - -#if 0 -static void kvm_set_pit_ch2(ISADevice *dev, -struct kvm_pit_state *inkernel_state) -{ -struct PITState *pit = DO_UPCAST(struct PITState, dev, dev); - -if (kvm_enabled() kvm_irqchip_in_kernel()) { -inkernel_state-channels[2].mode = pit-channels[2].mode; -inkernel_state-channels[2].count = pit-channels[2].count; -inkernel_state-channels[2].count_load_time = -pit-channels[2].count_load_time; -inkernel_state-channels[2].gate = pit-channels[2].gate; -kvm_set_pit(kvm_state, inkernel_state); -} -} -#endif -#else -static inline void kvm_get_pit_ch2(ISADevice *dev, - struct kvm_pit_state *inkernel_state) { } -static inline void kvm_set_pit_ch2(ISADevice *dev, - struct kvm_pit_state *inkernel_state) { } -#endif - void pit_get_channel_info(ISADevice *dev, int channel, PITChannelInfo *info) { PITState *pit = DO_UPCAST(PITState, dev, dev); PITChannelState *s = pit-channels[channel]; -kvm_get_pit_ch2(dev, NULL); - info-gate = s-gate; info-mode = s-mode; info-initial_count = s-count; diff --git a/hw/pcspk.c b/hw/pcspk.c index bb25ffb..e430324 100644 --- a/hw/pcspk.c +++ b/hw/pcspk.c @@ -29,7 +29,6 @@ #include qemu-timer.h #include i8254.h #include pcspk.h -#include qemu-kvm.h #define PCSPK_BUF_LEN 1792 #define PCSPK_SAMPLE_RATE 32000 @@ -141,9 +140,6 @@ static void pcspk_io_write(void *opaque, target_phys_addr_t addr, uint64_t val, { PCSpkState *s = opaque; const int gate = val 1; -PITChannelInfo ch; - -pit_get_channel_info(s-pit, 2, ch); s-data_on = (val 1) 1; pit_set_gate(s-pit, 2, gate); @@ -152,8 +148,6 @@ static void pcspk_io_write(void *opaque, target_phys_addr_t addr, uint64_t val, s-play_pos = 0; AUD_set_active_out(s-voice, gate s-data_on); } - -/* kvm_set_pit_ch2(s-pit, inkernel_state); ?? */ } static const MemoryRegionOps pcspk_io_ops = { -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe
[PATCH 0/5] qemu-kvm: Prepare kvm PIT for upstream merge
Some preparation patches to arrange qemu-kvm for merging in latest qemu with its own in-kernel PIT support. Later on, we can switch to that version without losing features on the way, even just temporarily. Jan Kiszka (5): qemu-kvm: i8254/pcspk: Remove unused bits to make PC speaker kvm-aware qemu-kvm: i8254: Reset broken pit_load_old to upstream version qemu-kvm: i8254: Revert pit_load_count to upstream version qemu-kvm: i8254: Drop bogus irq_disabled clearing in pit_reset qemu-kvm: i8254: Reorganize i8254-kvm code Makefile.objs |2 +- Makefile.target |5 +- hw/i8254-kvm.c | 12 ++ hw/i8254.c | 112 --- hw/i8254.h | 49 +--- hw/pcspk.c |6 --- 6 files changed, 58 insertions(+), 128 deletions(-) -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pci-assign: Fall back to host-side MSI if INTx sharing fails
On Mon, 2012-03-19 at 10:56 +0100, Jan Kiszka wrote: If the host or the device does not support INTx sharing, retry the IRQ assignment with host-side MSI support enabled but warn about potential consequences. This allows to preserve the previous behavior where we defaulted to MSI and did not support INTx sharing at all. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Detecting if the user actually specified prefer_msi=off as property of pci-assign is non-trivial. So I decided to go for the retry approach, ignoring potential user requests. The warning should attract the attention. hw/device-assignment.c | 12 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 89823f1..c953713 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -835,6 +835,7 @@ static int assign_irq(AssignedDevice *dev) dev-irq_requested_type = 0; } +retry: assigned_irq_data.flags = KVM_DEV_IRQ_GUEST_INTX; if (dev-features ASSIGNED_DEVICE_PREFER_MSI_MASK dev-cap.available ASSIGNED_DEVICE_CAP_MSI) @@ -844,6 +845,17 @@ static int assign_irq(AssignedDevice *dev) r = kvm_assign_irq(kvm_state, assigned_irq_data); if (r 0) { +if (r == -EIO !(dev-features ASSIGNED_DEVICE_PREFER_MSI_MASK) +dev-cap.available ASSIGNED_DEVICE_CAP_MSI) { +/* Retry with host-side MSI. There might be an IRQ conflict and + * either the kernel or the device doesn't support sharing. */ +fprintf(stderr, +Host-side INTx sharing not supported, +using MSI instead.\n +Some devices do not to work properly in this mode.\n); +dev-features |= ASSIGNED_DEVICE_PREFER_MSI_MASK; +goto retry; +} fprintf(stderr, Failed to assign irq for \%s\: %s\n, dev-dev.qdev.id, strerror(-r)); fprintf(stderr, Perhaps you are assigning a device Acked-by: Alex Williamson alex.william...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC] virtio-pci: add MMIO property
Currently virtio-pci is specified so that configuration of the device is done through a PCI IO space (via BAR 0 of the virtual PCI device). However, Linux guests happen to use ioread/iowrite/iomap primitives for access, and these work uniformly across memory/io BARs. While PCI IO accesses are faster than MMIO on x86 kvm, MMIO might be helpful on other systems which don't implement PIO or where PIO is slower than MMIO. Add a property to make it possible to tweak the BAR type. Signed-off-by: Michael S. Tsirkin m...@redhat.com This is harmless by default but causes segfaults in memory.c when enabled. Thus an RFC until I figure out what's wrong. --- hw/virtio-pci.c | 16 ++-- hw/virtio-pci.h |4 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 28498ec..6f338d2 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -655,6 +655,7 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev) { uint8_t *config; uint32_t size; +uint8_t bar0_type; proxy-vdev = vdev; @@ -684,8 +685,14 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev) memory_region_init_io(proxy-bar, virtio_pci_config_ops, proxy, virtio-pci, size); -pci_register_bar(proxy-pci_dev, 0, PCI_BASE_ADDRESS_SPACE_IO, - proxy-bar); + +if (proxy-flags VIRTIO_PCI_FLAG_USE_MMIO) { +bar0_type = PCI_BASE_ADDRESS_SPACE_MEMORY; +} else { +bar0_type = PCI_BASE_ADDRESS_SPACE_IO; +} + +pci_register_bar(proxy-pci_dev, 0, bar0_type, proxy-bar); if (!kvm_has_many_ioeventfds()) { proxy-flags = ~VIRTIO_PCI_FLAG_USE_IOEVENTFD; @@ -823,6 +830,7 @@ static Property virtio_blk_properties[] = { DEFINE_PROP_BIT(ioeventfd, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true), DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2), DEFINE_VIRTIO_BLK_FEATURES(VirtIOPCIProxy, host_features), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; @@ -856,6 +864,7 @@ static Property virtio_net_properties[] = { DEFINE_PROP_UINT32(x-txtimer, VirtIOPCIProxy, net.txtimer, TX_TIMER_INTERVAL), DEFINE_PROP_INT32(x-txburst, VirtIOPCIProxy, net.txburst, TX_BURST), DEFINE_PROP_STRING(tx, VirtIOPCIProxy, net.tx), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; @@ -888,6 +897,7 @@ static Property virtio_serial_properties[] = { DEFINE_PROP_HEX32(class, VirtIOPCIProxy, class_code, 0), DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features), DEFINE_PROP_UINT32(max_ports, VirtIOPCIProxy, serial.max_virtserial_ports, 31), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; @@ -915,6 +925,7 @@ static TypeInfo virtio_serial_info = { static Property virtio_balloon_properties[] = { DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; @@ -969,6 +980,7 @@ static int virtio_scsi_exit_pci(PCIDevice *pci_dev) static Property virtio_scsi_properties[] = { DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2), DEFINE_VIRTIO_SCSI_PROPERTIES(VirtIOPCIProxy, host_features, scsi), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/virtio-pci.h b/hw/virtio-pci.h index e560428..e6a8861 100644 --- a/hw/virtio-pci.h +++ b/hw/virtio-pci.h @@ -24,6 +24,10 @@ #define VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT 1 #define VIRTIO_PCI_FLAG_USE_IOEVENTFD (1 VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT) +/* Some guests don't support port IO. Use MMIO instead. */ +#define VIRTIO_PCI_FLAG_USE_MMIO_BIT 2 +#define VIRTIO_PCI_FLAG_USE_MMIO (1 VIRTIO_PCI_FLAG_USE_MMIO_BIT) + typedef struct { PCIDevice pci_dev; VirtIODevice *vdev; -- 1.7.9.111.gf3fb0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PATCH: nVMX: Better MSR_IA32_FEATURE_CONTROL handling
Hi, in a minute I'll send a new version of the MSR_IA32_FEATURE_CONTROL patch for nested VMX; I just wanted to reply first to your comments so you'll know what to expect: On Wed, Mar 07, 2012, Avi Kivity wrote about Re: PATCH: nVMX: Better MSR_IA32_FEATURE_CONTROL handling: On 03/07/2012 05:58 PM, Nadav Har'El wrote: + u64 msr_ia32_feature_control; }; Need to add to the list of exported MSRs so it can be live migrated (msrs_to_save). Did this. The variable itself should live in vcpu-arch, even if some bits are vendor specific. But not this. I understand what you explained about vmx.c being for Intel *hosts*, not about emulating Intel *guests*, but I do think that since none of the bits in this MSR are relevant on AMD hosts (which don't do nested VMX), it isn't useful to support this MSR outside vmx.c. So I left this variable it in vmx-nested. As I noted earlier, svm.c did exactly the same thing (nested.vm_cr_msr), so at least there's symmetry here. @@ -1999,7 +2000,7 @@ static int vmx_get_vmx_msr(struct kvm_vc switch (msr_index) { case MSR_IA32_FEATURE_CONTROL: - *pdata = 0; + *pdata = to_vmx(vcpu)-nested.msr_ia32_feature_control; break; In a separate patch, please move this outside vmx_get_vmx_msr(). It's not a vmx msr. Done, but not split into two patches: The patch removes the old case in vmx_get_vmx_msr() (and also removes vmx_set_vmx_msr() entirely) and instead adds the case in vmx_get_msr() and vmx_set_msr(). +#define VMXON_NEEDED_FEATURES \ + (FEATURE_CONTROL_LOCKED | FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX) Use const u64 instead of #define please, it jars my eyes. I would, if Linux coding style allowed to declare variables in the middle of blocks. Unfortunately it doesn't, so I left this #define. I don't think it's that bad. -- Nadav Har'El|Monday, Mar 19 2012, n...@math.technion.ac.il |- Phone +972-523-790466, ICQ 13349191 |A conscience does not prevent sin. It http://nadav.harel.org.il |only prevents you from enjoying it. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] qemu-kvm: Drop installation of self-built optionroms
All corresponding binaries are now in pc-bios, so we can remove this diff to upstream. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Makefile |5 - 1 files changed, 0 insertions(+), 5 deletions(-) diff --git a/Makefile b/Makefile index 908954c..1bc3cb0 100644 --- a/Makefile +++ b/Makefile @@ -294,12 +294,7 @@ endif ifneq ($(BLOBS),) $(INSTALL_DIR) $(DESTDIR)$(datadir) set -e; for x in $(BLOBS); do \ - if [ -f $(SRC_PATH)/pc-bios/$$x ];then \ $(INSTALL_DATA) $(SRC_PATH)/pc-bios/$$x $(DESTDIR)$(datadir); \ - fi \ - ; if [ -f pc-bios/optionrom/$$x ];then \ - $(INSTALL_DATA) pc-bios/optionrom/$$x $(DESTDIR)$(datadir); \ - fi \ done endif $(INSTALL_DIR) $(DESTDIR)$(datadir)/keymaps -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PATCH: nVMX: Better MSR_IA32_FEATURE_CONTROL handling
The existing code emulated the guest's use of the IA32_FEATURE_CONTROL MSR in a way that was enough to run nested VMX guests, but did not fully conform to the VMX specification, and in particular did not allow a guest BIOS to prevent the guest OS from using VMX by setting the lock bit on this MSR. This patch emulates this MSR better, allowing the guest to lock it, and verifying its setting on VMXON. Also make sure that this MSR (and of course, VMXON state) is reset on guest vcpu reset (via SIPI). Signed-off-by: Nadav Har'El n...@il.ibm.com Reported-by: Julian Stecklina j...@alien8.de --- arch/x86/kvm/vmx.c | 43 +++ arch/x86/kvm/x86.c |3 ++- 2 files changed, 25 insertions(+), 21 deletions(-) --- .before/arch/x86/kvm/vmx.c 2012-03-19 18:34:24.0 +0200 +++ .after/arch/x86/kvm/vmx.c 2012-03-19 18:34:24.0 +0200 @@ -352,6 +352,7 @@ struct nested_vmx { * we must keep them pinned while L2 runs. */ struct page *apic_access_page; + u64 msr_ia32_feature_control; }; struct vcpu_vmx { @@ -1998,9 +1999,6 @@ static int vmx_get_vmx_msr(struct kvm_vc } switch (msr_index) { - case MSR_IA32_FEATURE_CONTROL: - *pdata = 0; - break; case MSR_IA32_VMX_BASIC: /* * This MSR reports some information about VMX support. We @@ -2072,21 +2070,6 @@ static int vmx_get_vmx_msr(struct kvm_vc return 1; } -static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data) -{ - if (!nested_vmx_allowed(vcpu)) - return 0; - - if (msr_index == MSR_IA32_FEATURE_CONTROL) - /* TODO: the right thing. */ - return 1; - /* -* No need to treat VMX capability MSRs specially: If we don't handle -* them, handle_wrmsr will #GP(0), which is correct (they are readonly) -*/ - return 0; -} - /* * Reads an msr value (of 'msr_index') into 'pdata'. * Returns 0 on success, non-0 otherwise. @@ -2129,6 +2112,9 @@ static int vmx_get_msr(struct kvm_vcpu * case MSR_IA32_SYSENTER_ESP: data = vmcs_readl(GUEST_SYSENTER_ESP); break; + case MSR_IA32_FEATURE_CONTROL: + data = to_vmx(vcpu)-nested.msr_ia32_feature_control; + break; case MSR_TSC_AUX: if (!to_vmx(vcpu)-rdtscp_enabled) return 1; @@ -2197,6 +2183,12 @@ static int vmx_set_msr(struct kvm_vcpu * } ret = kvm_set_msr_common(vcpu, msr_index, data); break; + case MSR_IA32_FEATURE_CONTROL: + if (to_vmx(vcpu)-nested.msr_ia32_feature_control +FEATURE_CONTROL_LOCKED) + return 1; + to_vmx(vcpu)-nested.msr_ia32_feature_control = data; + break; case MSR_TSC_AUX: if (!vmx-rdtscp_enabled) return 1; @@ -2205,8 +2197,6 @@ static int vmx_set_msr(struct kvm_vcpu * return 1; /* Otherwise falls through */ default: - if (vmx_set_vmx_msr(vcpu, msr_index, data)) - break; msr = find_msr_entry(vmx, msr_index); if (msr) { msr-data = data; @@ -3807,6 +3797,8 @@ static int vmx_vcpu_setup(struct vcpu_vm return 0; } +static void free_nested(struct vcpu_vmx *vmx); + static int vmx_vcpu_reset(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); @@ -3920,6 +3912,9 @@ static int vmx_vcpu_reset(struct kvm_vcp /* HACK: Don't enable emulation on guest boot/reset */ vmx-emulation_required = 0; + /* Reset nested-VMX settings: */ + vmx-nested.msr_ia32_feature_control = 0; + free_nested(vmx); out: return ret; } @@ -5031,6 +5026,14 @@ static int handle_vmon(struct kvm_vcpu * return 1; } +#define VMXON_NEEDED_FEATURES \ + (FEATURE_CONTROL_LOCKED | FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX) + if ((vmx-nested.msr_ia32_feature_control VMXON_NEEDED_FEATURES) + != VMXON_NEEDED_FEATURES) { + kvm_inject_gp(vcpu, 0); + return 1; + } + INIT_LIST_HEAD((vmx-nested.vmcs02_pool)); vmx-nested.vmcs02_num = 0; --- .before/arch/x86/kvm/x86.c 2012-03-19 18:34:24.0 +0200 +++ .after/arch/x86/kvm/x86.c 2012-03-19 18:34:24.0 +0200 @@ -799,7 +799,8 @@ static u32 msrs_to_save[] = { #ifdef CONFIG_X86_64 MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR, #endif - MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA + MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA, + MSR_IA32_FEATURE_CONTROL }; static unsigned num_msrs_to_save; -- To unsubscribe from this list: send the line unsubscribe
Re: [RFC 2/2] kvm: guest-side changes for tmem on KVM
On Fri, Mar 16, 2012 at 10:30:35AM +0530, Akshay Karle wrote: +/* kvm tmem foundation ops/hypercalls */ + +static inline int kvm_tmem_op(u32 tmem_cmd, u32 tmem_pool, struct tmem_oid oid, + u32 index, u32 tmem_offset, u32 pfn_offset, unsigned long pfn, u32 len, uint16_t cli_id) That is rather long list of arguments. Could you pass in a structure instead? Are you actually using all of the arguments in every call? For different functions different parameters are used. If we want to reduce the number of arguments, the tmem_ops structure can be created in the functions calling kvm_tmem_op instead of creating it here and that can be passed, will make these changes in the next patch. +{ + struct tmem_ops op; + int rc = 0; + op.cmd = tmem_cmd; + op.pool_id = tmem_pool; + op.u.gen.oid[0] = oid.oid[0]; + op.u.gen.oid[1] = oid.oid[1]; + op.u.gen.oid[2] = oid.oid[2]; + op.u.gen.index = index; + op.u.gen.tmem_offset = tmem_offset; + op.u.gen.pfn_offset = pfn_offset; + op.u.gen.pfn = pfn; + op.u.gen.len = len; + op.u.gen.cli_id = cli_id; + rc = kvm_hypercall1(KVM_HC_TMEM, virt_to_phys(op)); + rc = rc + 1000; Why the addition? If you notice the host patch I had subtracted 1000 while passing the return value in the kvm_emulate_hypercall function. This was to avoid the guest kernel panic due to the return of a non-negative value by the kvm_hypercall. In order to get the original value back I added 1000. Avi, is there a right way of doing this? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/2] kvm: guest-side changes for tmem on KVM
On 03/19/2012 07:49 PM, Konrad Rzeszutek Wilk wrote: On Fri, Mar 16, 2012 at 10:30:35AM +0530, Akshay Karle wrote: +/* kvm tmem foundation ops/hypercalls */ + +static inline int kvm_tmem_op(u32 tmem_cmd, u32 tmem_pool, struct tmem_oid oid, +u32 index, u32 tmem_offset, u32 pfn_offset, unsigned long pfn, u32 len, uint16_t cli_id) That is rather long list of arguments. Could you pass in a structure instead? Are you actually using all of the arguments in every call? For different functions different parameters are used. If we want to reduce the number of arguments, the tmem_ops structure can be created in the functions calling kvm_tmem_op instead of creating it here and that can be passed, will make these changes in the next patch. +{ +struct tmem_ops op; +int rc = 0; +op.cmd = tmem_cmd; +op.pool_id = tmem_pool; +op.u.gen.oid[0] = oid.oid[0]; +op.u.gen.oid[1] = oid.oid[1]; +op.u.gen.oid[2] = oid.oid[2]; +op.u.gen.index = index; +op.u.gen.tmem_offset = tmem_offset; +op.u.gen.pfn_offset = pfn_offset; +op.u.gen.pfn = pfn; +op.u.gen.len = len; +op.u.gen.cli_id = cli_id; +rc = kvm_hypercall1(KVM_HC_TMEM, virt_to_phys(op)); +rc = rc + 1000; Why the addition? If you notice the host patch I had subtracted 1000 while passing the return value in the kvm_emulate_hypercall function. This was to avoid the guest kernel panic due to the return of a non-negative value by the kvm_hypercall. In order to get the original value back I added 1000. Avi, is there a right way of doing this? Why would the guest kernel panic due to the return of a non-negative value? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: add MMIO property
On 03/19/2012 05:56 PM, Michael S. Tsirkin wrote: Currently virtio-pci is specified so that configuration of the device is done through a PCI IO space (via BAR 0 of the virtual PCI device). However, Linux guests happen to use ioread/iowrite/iomap primitives for access, and these work uniformly across memory/io BARs. While PCI IO accesses are faster than MMIO on x86 kvm, MMIO might be helpful on other systems which don't implement PIO or where PIO is slower than MMIO. Add a property to make it possible to tweak the BAR type. Signed-off-by: Michael S. Tsirkin m...@redhat.com This is harmless by default but causes segfaults in memory.c when enabled. Thus an RFC until I figure out what's wrong. Should be done via an extra BAR (with the same layout, perhaps extended) so compatibility is preserved. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: add MMIO property
On Mon, Mar 19, 2012 at 07:58:12PM +0200, Avi Kivity wrote: On 03/19/2012 05:56 PM, Michael S. Tsirkin wrote: Currently virtio-pci is specified so that configuration of the device is done through a PCI IO space (via BAR 0 of the virtual PCI device). However, Linux guests happen to use ioread/iowrite/iomap primitives for access, and these work uniformly across memory/io BARs. While PCI IO accesses are faster than MMIO on x86 kvm, MMIO might be helpful on other systems which don't implement PIO or where PIO is slower than MMIO. Add a property to make it possible to tweak the BAR type. Signed-off-by: Michael S. Tsirkin m...@redhat.com This is harmless by default but causes segfaults in memory.c when enabled. Thus an RFC until I figure out what's wrong. Should be done via an extra BAR (with the same layout, perhaps extended) so compatibility is preserved. No, that would need guest changes to be of use. The point of this hack is to make things work for Linux guests where PIO does not work. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH RFC] virtio-pci: add MMIO property
On 03/19/2012 10:56 AM, Michael S. Tsirkin wrote: Currently virtio-pci is specified so that configuration of the device is done through a PCI IO space (via BAR 0 of the virtual PCI device). However, Linux guests happen to use ioread/iowrite/iomap primitives for access, and these work uniformly across memory/io BARs. While PCI IO accesses are faster than MMIO on x86 kvm, MMIO might be helpful on other systems which don't implement PIO or where PIO is slower than MMIO. Add a property to make it possible to tweak the BAR type. Signed-off-by: Michael S. Tsirkinm...@redhat.com This is harmless by default but causes segfaults in memory.c when enabled. Thus an RFC until I figure out what's wrong. Doesn't this violate the virtio-pci spec? Making the same vendor/device ID have different semantics depending on a magic flag in QEMU seems like a pretty bad idea to me. Regards, Anthony Liguori --- hw/virtio-pci.c | 16 ++-- hw/virtio-pci.h |4 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 28498ec..6f338d2 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -655,6 +655,7 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev) { uint8_t *config; uint32_t size; +uint8_t bar0_type; proxy-vdev = vdev; @@ -684,8 +685,14 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev) memory_region_init_io(proxy-bar,virtio_pci_config_ops, proxy, virtio-pci, size); -pci_register_bar(proxy-pci_dev, 0, PCI_BASE_ADDRESS_SPACE_IO, -proxy-bar); + +if (proxy-flags VIRTIO_PCI_FLAG_USE_MMIO) { +bar0_type = PCI_BASE_ADDRESS_SPACE_MEMORY; +} else { +bar0_type = PCI_BASE_ADDRESS_SPACE_IO; +} + +pci_register_bar(proxy-pci_dev, 0, bar0_type,proxy-bar); if (!kvm_has_many_ioeventfds()) { proxy-flags= ~VIRTIO_PCI_FLAG_USE_IOEVENTFD; @@ -823,6 +830,7 @@ static Property virtio_blk_properties[] = { DEFINE_PROP_BIT(ioeventfd, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true), DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2), DEFINE_VIRTIO_BLK_FEATURES(VirtIOPCIProxy, host_features), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; @@ -856,6 +864,7 @@ static Property virtio_net_properties[] = { DEFINE_PROP_UINT32(x-txtimer, VirtIOPCIProxy, net.txtimer, TX_TIMER_INTERVAL), DEFINE_PROP_INT32(x-txburst, VirtIOPCIProxy, net.txburst, TX_BURST), DEFINE_PROP_STRING(tx, VirtIOPCIProxy, net.tx), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; @@ -888,6 +897,7 @@ static Property virtio_serial_properties[] = { DEFINE_PROP_HEX32(class, VirtIOPCIProxy, class_code, 0), DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features), DEFINE_PROP_UINT32(max_ports, VirtIOPCIProxy, serial.max_virtserial_ports, 31), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; @@ -915,6 +925,7 @@ static TypeInfo virtio_serial_info = { static Property virtio_balloon_properties[] = { DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; @@ -969,6 +980,7 @@ static int virtio_scsi_exit_pci(PCIDevice *pci_dev) static Property virtio_scsi_properties[] = { DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2), DEFINE_VIRTIO_SCSI_PROPERTIES(VirtIOPCIProxy, host_features, scsi), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/virtio-pci.h b/hw/virtio-pci.h index e560428..e6a8861 100644 --- a/hw/virtio-pci.h +++ b/hw/virtio-pci.h @@ -24,6 +24,10 @@ #define VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT 1 #define VIRTIO_PCI_FLAG_USE_IOEVENTFD (1 VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT) +/* Some guests don't support port IO. Use MMIO instead. */ +#define VIRTIO_PCI_FLAG_USE_MMIO_BIT 2 +#define VIRTIO_PCI_FLAG_USE_MMIO (1 VIRTIO_PCI_FLAG_USE_MMIO_BIT) + typedef struct { PCIDevice pci_dev; VirtIODevice *vdev; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: KVM inside Oracle VM
-Original Message- From: Sever Apostu Sent: Sunday, March 18, 2012 10:27 PM To: kvm@vger.kernel.org Subject: KVM inside Oracle VM Hi, I'm planning on building an Oracle VM machine with KVM inside (three KVM machines inside one Oracle VM machine). The trouble is I have yet to find any reference on that :-) Also, although I am familiar with Xen and Oracle VM, I know little about KVM. What do you think, would this be a reasonable mix of the two virtualization techniques ? Any major drawbacks I should consider ? Thank you, Sever -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Any chance anyone has any feedback about KVM installed inside a Xen guest ? Thank you, Sever -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: KVM inside Oracle VM
-Original Message- From: Paolo Bonzini [mailto:pbonz...@redhat.com] Sent: Monday, March 19, 2012 10:01 PM To: Sever Apostu Cc: kvm@vger.kernel.org Subject: Re: KVM inside Oracle VM Il 19/03/2012 20:29, Sever Apostu ha scritto: Any chance anyone has any feedback about KVM installed inside a Xen guest ? It's really a Xen question more than a KVM question. Performance and stability will probably suffer. Paolo Thank you for the reply, Paolo! Most likely both will suffer and I am prepared to live with that, but is it conceptually possible? I am asking you guys this instead of trying it myself because I am building a project plan that would require someone else providing the infrastructure, so I am wondering whether I should give this one day of testing or simply dismiss it as absurd :-) I will follow-up on this on the Xen mailing list as well. Thank you, Sever -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2] virtio-pci: add MMIO property
Currently virtio-pci is specified so that configuration of the device is done through a PCI IO space (via BAR 0 of the virtual PCI device). However, Linux guests happen to use ioread/iowrite/iomap primitives for access, and these work uniformly across memory/io BARs. While PCI IO accesses are faster than MMIO on x86 kvm, MMIO might be helpful on other systems: for example IBM pSeries machines not all firmware/hypervisor versions necessarily support PCI PIO access on all domains. Add a property to make it possible to tweak the BAR type. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- OK I added old_mmio (BTW: would be nice if ops were checked when region is inited) and now things work in userspace. However, when I add ioeventfd=on I get an assert: qemu/kvm-all.c:747: kvm_mem_ioeventfd_add: Assertion `match_data section-size == 4' failed. How to reproduce: 1. apply patch 2. create virtio device with flags mmio=on,ioeventfd=on hw/virtio-pci.c | 68 +- hw/virtio-pci.h |5 2 files changed, 71 insertions(+), 2 deletions(-) diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 28498ec..b061000 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -510,8 +510,58 @@ const MemoryRegionPortio virtio_portio[] = { PORTIO_END_OF_LIST() }; +static void virtio_pci_config_mmio_writeb(void *opaque, target_phys_addr_t addr, uint32_t val) +{ +VirtIOPCIProxy *proxy = opaque; +virtio_pci_config_writeb(opaque, addr proxy-bar0_mask, val); +} + +static void virtio_pci_config_mmio_writew(void *opaque, target_phys_addr_t addr, uint32_t val) +{ +VirtIOPCIProxy *proxy = opaque; +virtio_pci_config_writew(opaque, addr proxy-bar0_mask, val); +} + +static void virtio_pci_config_mmio_writel(void *opaque, target_phys_addr_t addr, uint32_t val) +{ +VirtIOPCIProxy *proxy = opaque; +virtio_pci_config_writel(opaque, addr proxy-bar0_mask, val); +} + +static uint32_t virtio_pci_config_mmio_readb(void *opaque, target_phys_addr_t addr) +{ +VirtIOPCIProxy *proxy = opaque; +return virtio_pci_config_readb(opaque, addr proxy-bar0_mask); +} + +static uint32_t virtio_pci_config_mmio_readw(void *opaque, target_phys_addr_t addr) +{ +VirtIOPCIProxy *proxy = opaque; +uint32_t val = virtio_pci_config_readw(opaque, addr proxy-bar0_mask); +return val; +} + +static uint32_t virtio_pci_config_mmio_readl(void *opaque, target_phys_addr_t addr) +{ +VirtIOPCIProxy *proxy = opaque; +uint32_t val = virtio_pci_config_readl(opaque, addr proxy-bar0_mask); +return val; +} + static const MemoryRegionOps virtio_pci_config_ops = { .old_portio = virtio_portio, +.old_mmio = { +.read = { +virtio_pci_config_mmio_readb, +virtio_pci_config_mmio_readw, +virtio_pci_config_mmio_readl, +}, +.write = { +virtio_pci_config_mmio_writeb, +virtio_pci_config_mmio_writew, +virtio_pci_config_mmio_writel, +}, +}, .endianness = DEVICE_LITTLE_ENDIAN, }; @@ -655,6 +705,7 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev) { uint8_t *config; uint32_t size; +uint8_t bar0_type; proxy-vdev = vdev; @@ -682,10 +733,18 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev) if (size (size-1)) size = 1 qemu_fls(size); +proxy-bar0_mask = size - 1; + memory_region_init_io(proxy-bar, virtio_pci_config_ops, proxy, virtio-pci, size); -pci_register_bar(proxy-pci_dev, 0, PCI_BASE_ADDRESS_SPACE_IO, - proxy-bar); + +if (proxy-flags VIRTIO_PCI_FLAG_USE_MMIO) { +bar0_type = PCI_BASE_ADDRESS_SPACE_MEMORY; +} else { +bar0_type = PCI_BASE_ADDRESS_SPACE_IO; +} + +pci_register_bar(proxy-pci_dev, 0, bar0_type, proxy-bar); if (!kvm_has_many_ioeventfds()) { proxy-flags = ~VIRTIO_PCI_FLAG_USE_IOEVENTFD; @@ -823,6 +882,7 @@ static Property virtio_blk_properties[] = { DEFINE_PROP_BIT(ioeventfd, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true), DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2), DEFINE_VIRTIO_BLK_FEATURES(VirtIOPCIProxy, host_features), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; @@ -856,6 +916,7 @@ static Property virtio_net_properties[] = { DEFINE_PROP_UINT32(x-txtimer, VirtIOPCIProxy, net.txtimer, TX_TIMER_INTERVAL), DEFINE_PROP_INT32(x-txburst, VirtIOPCIProxy, net.txburst, TX_BURST), DEFINE_PROP_STRING(tx, VirtIOPCIProxy, net.tx), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; @@ -888,6 +949,7 @@ static Property virtio_serial_properties[] = { DEFINE_PROP_HEX32(class, VirtIOPCIProxy, class_code, 0),
Re: [Qemu-devel] [PATCH RFC] virtio-pci: add MMIO property
On Mon, Mar 19, 2012 at 02:19:33PM -0500, Anthony Liguori wrote: On 03/19/2012 10:56 AM, Michael S. Tsirkin wrote: Currently virtio-pci is specified so that configuration of the device is done through a PCI IO space (via BAR 0 of the virtual PCI device). However, Linux guests happen to use ioread/iowrite/iomap primitives for access, and these work uniformly across memory/io BARs. While PCI IO accesses are faster than MMIO on x86 kvm, MMIO might be helpful on other systems which don't implement PIO or where PIO is slower than MMIO. Add a property to make it possible to tweak the BAR type. Signed-off-by: Michael S. Tsirkinm...@redhat.com This is harmless by default but causes segfaults in memory.c when enabled. Thus an RFC until I figure out what's wrong. Doesn't this violate the virtio-pci spec? The point is to change the BAR type depending on the architecture. IO is fastest on x86 but maybe not on other architectures. Making the same vendor/device ID have different semantics depending on a magic flag in QEMU seems like a pretty bad idea to me. Regards, Anthony Liguori We do this with MSI-X so why not the BAR type? --- hw/virtio-pci.c | 16 ++-- hw/virtio-pci.h |4 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 28498ec..6f338d2 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -655,6 +655,7 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev) { uint8_t *config; uint32_t size; +uint8_t bar0_type; proxy-vdev = vdev; @@ -684,8 +685,14 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev) memory_region_init_io(proxy-bar,virtio_pci_config_ops, proxy, virtio-pci, size); -pci_register_bar(proxy-pci_dev, 0, PCI_BASE_ADDRESS_SPACE_IO, -proxy-bar); + +if (proxy-flags VIRTIO_PCI_FLAG_USE_MMIO) { +bar0_type = PCI_BASE_ADDRESS_SPACE_MEMORY; +} else { +bar0_type = PCI_BASE_ADDRESS_SPACE_IO; +} + +pci_register_bar(proxy-pci_dev, 0, bar0_type,proxy-bar); if (!kvm_has_many_ioeventfds()) { proxy-flags= ~VIRTIO_PCI_FLAG_USE_IOEVENTFD; @@ -823,6 +830,7 @@ static Property virtio_blk_properties[] = { DEFINE_PROP_BIT(ioeventfd, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true), DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2), DEFINE_VIRTIO_BLK_FEATURES(VirtIOPCIProxy, host_features), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; @@ -856,6 +864,7 @@ static Property virtio_net_properties[] = { DEFINE_PROP_UINT32(x-txtimer, VirtIOPCIProxy, net.txtimer, TX_TIMER_INTERVAL), DEFINE_PROP_INT32(x-txburst, VirtIOPCIProxy, net.txburst, TX_BURST), DEFINE_PROP_STRING(tx, VirtIOPCIProxy, net.tx), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; @@ -888,6 +897,7 @@ static Property virtio_serial_properties[] = { DEFINE_PROP_HEX32(class, VirtIOPCIProxy, class_code, 0), DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features), DEFINE_PROP_UINT32(max_ports, VirtIOPCIProxy, serial.max_virtserial_ports, 31), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; @@ -915,6 +925,7 @@ static TypeInfo virtio_serial_info = { static Property virtio_balloon_properties[] = { DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; @@ -969,6 +980,7 @@ static int virtio_scsi_exit_pci(PCIDevice *pci_dev) static Property virtio_scsi_properties[] = { DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2), DEFINE_VIRTIO_SCSI_PROPERTIES(VirtIOPCIProxy, host_features, scsi), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/virtio-pci.h b/hw/virtio-pci.h index e560428..e6a8861 100644 --- a/hw/virtio-pci.h +++ b/hw/virtio-pci.h @@ -24,6 +24,10 @@ #define VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT 1 #define VIRTIO_PCI_FLAG_USE_IOEVENTFD (1 VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT) +/* Some guests don't support port IO. Use MMIO instead. */ +#define VIRTIO_PCI_FLAG_USE_MMIO_BIT 2 +#define VIRTIO_PCI_FLAG_USE_MMIO (1 VIRTIO_PCI_FLAG_USE_MMIO_BIT) + typedef struct { PCIDevice pci_dev; VirtIODevice *vdev; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH RFC] virtio-pci: add MMIO property
On 03/19/2012 03:49 PM, Michael S. Tsirkin wrote: On Mon, Mar 19, 2012 at 02:19:33PM -0500, Anthony Liguori wrote: On 03/19/2012 10:56 AM, Michael S. Tsirkin wrote: Currently virtio-pci is specified so that configuration of the device is done through a PCI IO space (via BAR 0 of the virtual PCI device). However, Linux guests happen to use ioread/iowrite/iomap primitives for access, and these work uniformly across memory/io BARs. While PCI IO accesses are faster than MMIO on x86 kvm, MMIO might be helpful on other systems which don't implement PIO or where PIO is slower than MMIO. Add a property to make it possible to tweak the BAR type. Signed-off-by: Michael S. Tsirkinm...@redhat.com This is harmless by default but causes segfaults in memory.c when enabled. Thus an RFC until I figure out what's wrong. Doesn't this violate the virtio-pci spec? The point is to change the BAR type depending on the architecture. IO is fastest on x86 but maybe not on other architectures. Are we going to document that the BAR is X on architecture Y in the spec? I think the better way to do this is to use a separate device id range for MMIO virtio-pci. You can make the same driver hand both ranges and that way the device is presented consistently to the guest regardless of what the architecture is. Making the same vendor/device ID have different semantics depending on a magic flag in QEMU seems like a pretty bad idea to me. Regards, Anthony Liguori We do this with MSI-X so why not the BAR type? We extend the bar size with MSI-X and use a transport flag to indicate that it's available, right? Regards, Anthony LIguori --- hw/virtio-pci.c | 16 ++-- hw/virtio-pci.h |4 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 28498ec..6f338d2 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -655,6 +655,7 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev) { uint8_t *config; uint32_t size; +uint8_t bar0_type; proxy-vdev = vdev; @@ -684,8 +685,14 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev) memory_region_init_io(proxy-bar,virtio_pci_config_ops, proxy, virtio-pci, size); -pci_register_bar(proxy-pci_dev, 0, PCI_BASE_ADDRESS_SPACE_IO, -proxy-bar); + +if (proxy-flags VIRTIO_PCI_FLAG_USE_MMIO) { +bar0_type = PCI_BASE_ADDRESS_SPACE_MEMORY; +} else { +bar0_type = PCI_BASE_ADDRESS_SPACE_IO; +} + +pci_register_bar(proxy-pci_dev, 0, bar0_type,proxy-bar); if (!kvm_has_many_ioeventfds()) { proxy-flags= ~VIRTIO_PCI_FLAG_USE_IOEVENTFD; @@ -823,6 +830,7 @@ static Property virtio_blk_properties[] = { DEFINE_PROP_BIT(ioeventfd, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true), DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2), DEFINE_VIRTIO_BLK_FEATURES(VirtIOPCIProxy, host_features), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; @@ -856,6 +864,7 @@ static Property virtio_net_properties[] = { DEFINE_PROP_UINT32(x-txtimer, VirtIOPCIProxy, net.txtimer, TX_TIMER_INTERVAL), DEFINE_PROP_INT32(x-txburst, VirtIOPCIProxy, net.txburst, TX_BURST), DEFINE_PROP_STRING(tx, VirtIOPCIProxy, net.tx), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; @@ -888,6 +897,7 @@ static Property virtio_serial_properties[] = { DEFINE_PROP_HEX32(class, VirtIOPCIProxy, class_code, 0), DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features), DEFINE_PROP_UINT32(max_ports, VirtIOPCIProxy, serial.max_virtserial_ports, 31), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; @@ -915,6 +925,7 @@ static TypeInfo virtio_serial_info = { static Property virtio_balloon_properties[] = { DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; @@ -969,6 +980,7 @@ static int virtio_scsi_exit_pci(PCIDevice *pci_dev) static Property virtio_scsi_properties[] = { DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2), DEFINE_VIRTIO_SCSI_PROPERTIES(VirtIOPCIProxy, host_features, scsi), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/virtio-pci.h b/hw/virtio-pci.h index e560428..e6a8861 100644 --- a/hw/virtio-pci.h +++ b/hw/virtio-pci.h @@ -24,6 +24,10 @@ #define VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT 1 #define VIRTIO_PCI_FLAG_USE_IOEVENTFD (1 VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT) +/* Some guests don't support port IO. Use MMIO instead. */ +#define
Re: [Qemu-devel] [PATCH RFC] virtio-pci: add MMIO property
On Mon, Mar 19, 2012 at 04:07:45PM -0500, Anthony Liguori wrote: On 03/19/2012 03:49 PM, Michael S. Tsirkin wrote: On Mon, Mar 19, 2012 at 02:19:33PM -0500, Anthony Liguori wrote: On 03/19/2012 10:56 AM, Michael S. Tsirkin wrote: Currently virtio-pci is specified so that configuration of the device is done through a PCI IO space (via BAR 0 of the virtual PCI device). However, Linux guests happen to use ioread/iowrite/iomap primitives for access, and these work uniformly across memory/io BARs. While PCI IO accesses are faster than MMIO on x86 kvm, MMIO might be helpful on other systems which don't implement PIO or where PIO is slower than MMIO. Add a property to make it possible to tweak the BAR type. Signed-off-by: Michael S. Tsirkinm...@redhat.com This is harmless by default but causes segfaults in memory.c when enabled. Thus an RFC until I figure out what's wrong. Doesn't this violate the virtio-pci spec? The point is to change the BAR type depending on the architecture. IO is fastest on x86 but maybe not on other architectures. Are we going to document that the BAR is X on architecture Y in the spec? I think the better way to do this is to use a separate device id range for MMIO virtio-pci. You can make the same driver hand both ranges and that way the device is presented consistently to the guest regardless of what the architecture is. Yes there are endless ways to do this. This specific hack is good for making existing linux drivers on ppc, arm etc work. Making the same vendor/device ID have different semantics depending on a magic flag in QEMU seems like a pretty bad idea to me. Regards, Anthony Liguori We do this with MSI-X so why not the BAR type? We extend the bar size with MSI-X and use a transport flag to indicate that it's available, right? No, we use regular pci capability. Just like BAR type is a regular PCI register :) Regards, Anthony LIguori --- hw/virtio-pci.c | 16 ++-- hw/virtio-pci.h |4 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 28498ec..6f338d2 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -655,6 +655,7 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev) { uint8_t *config; uint32_t size; +uint8_t bar0_type; proxy-vdev = vdev; @@ -684,8 +685,14 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev) memory_region_init_io(proxy-bar,virtio_pci_config_ops, proxy, virtio-pci, size); -pci_register_bar(proxy-pci_dev, 0, PCI_BASE_ADDRESS_SPACE_IO, -proxy-bar); + +if (proxy-flags VIRTIO_PCI_FLAG_USE_MMIO) { +bar0_type = PCI_BASE_ADDRESS_SPACE_MEMORY; +} else { +bar0_type = PCI_BASE_ADDRESS_SPACE_IO; +} + +pci_register_bar(proxy-pci_dev, 0, bar0_type,proxy-bar); if (!kvm_has_many_ioeventfds()) { proxy-flags= ~VIRTIO_PCI_FLAG_USE_IOEVENTFD; @@ -823,6 +830,7 @@ static Property virtio_blk_properties[] = { DEFINE_PROP_BIT(ioeventfd, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true), DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2), DEFINE_VIRTIO_BLK_FEATURES(VirtIOPCIProxy, host_features), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; @@ -856,6 +864,7 @@ static Property virtio_net_properties[] = { DEFINE_PROP_UINT32(x-txtimer, VirtIOPCIProxy, net.txtimer, TX_TIMER_INTERVAL), DEFINE_PROP_INT32(x-txburst, VirtIOPCIProxy, net.txburst, TX_BURST), DEFINE_PROP_STRING(tx, VirtIOPCIProxy, net.tx), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; @@ -888,6 +897,7 @@ static Property virtio_serial_properties[] = { DEFINE_PROP_HEX32(class, VirtIOPCIProxy, class_code, 0), DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features), DEFINE_PROP_UINT32(max_ports, VirtIOPCIProxy, serial.max_virtserial_ports, 31), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; @@ -915,6 +925,7 @@ static TypeInfo virtio_serial_info = { static Property virtio_balloon_properties[] = { DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; @@ -969,6 +980,7 @@ static int virtio_scsi_exit_pci(PCIDevice *pci_dev) static Property virtio_scsi_properties[] = { DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2), DEFINE_VIRTIO_SCSI_PROPERTIES(VirtIOPCIProxy, host_features, scsi), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
Re: [Qemu-devel] [PATCH RFC] virtio-pci: add MMIO property
On Mon, Mar 19, 2012 at 04:07:45PM -0500, Anthony Liguori wrote: On 03/19/2012 03:49 PM, Michael S. Tsirkin wrote: On Mon, Mar 19, 2012 at 02:19:33PM -0500, Anthony Liguori wrote: On 03/19/2012 10:56 AM, Michael S. Tsirkin wrote: Currently virtio-pci is specified so that configuration of the device is done through a PCI IO space (via BAR 0 of the virtual PCI device). However, Linux guests happen to use ioread/iowrite/iomap primitives for access, and these work uniformly across memory/io BARs. While PCI IO accesses are faster than MMIO on x86 kvm, MMIO might be helpful on other systems which don't implement PIO or where PIO is slower than MMIO. Add a property to make it possible to tweak the BAR type. Signed-off-by: Michael S. Tsirkinm...@redhat.com This is harmless by default but causes segfaults in memory.c when enabled. Thus an RFC until I figure out what's wrong. Doesn't this violate the virtio-pci spec? The point is to change the BAR type depending on the architecture. IO is fastest on x86 but maybe not on other architectures. Are we going to document that the BAR is X on architecture Y in the spec? I think the better way to do this is to use a separate device id range for MMIO virtio-pci. You can make the same driver hand both ranges and that way the device is presented consistently to the guest regardless of what the architecture is. Maybe just make this a hidden option like x-miio? This will ensure people dont turn it on by mistake on e.g. x86. Making the same vendor/device ID have different semantics depending on a magic flag in QEMU seems like a pretty bad idea to me. Regards, Anthony Liguori We do this with MSI-X so why not the BAR type? We extend the bar size with MSI-X and use a transport flag to indicate that it's available, right? Regards, Anthony LIguori --- hw/virtio-pci.c | 16 ++-- hw/virtio-pci.h |4 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 28498ec..6f338d2 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -655,6 +655,7 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev) { uint8_t *config; uint32_t size; +uint8_t bar0_type; proxy-vdev = vdev; @@ -684,8 +685,14 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev) memory_region_init_io(proxy-bar,virtio_pci_config_ops, proxy, virtio-pci, size); -pci_register_bar(proxy-pci_dev, 0, PCI_BASE_ADDRESS_SPACE_IO, -proxy-bar); + +if (proxy-flags VIRTIO_PCI_FLAG_USE_MMIO) { +bar0_type = PCI_BASE_ADDRESS_SPACE_MEMORY; +} else { +bar0_type = PCI_BASE_ADDRESS_SPACE_IO; +} + +pci_register_bar(proxy-pci_dev, 0, bar0_type,proxy-bar); if (!kvm_has_many_ioeventfds()) { proxy-flags= ~VIRTIO_PCI_FLAG_USE_IOEVENTFD; @@ -823,6 +830,7 @@ static Property virtio_blk_properties[] = { DEFINE_PROP_BIT(ioeventfd, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true), DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2), DEFINE_VIRTIO_BLK_FEATURES(VirtIOPCIProxy, host_features), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; @@ -856,6 +864,7 @@ static Property virtio_net_properties[] = { DEFINE_PROP_UINT32(x-txtimer, VirtIOPCIProxy, net.txtimer, TX_TIMER_INTERVAL), DEFINE_PROP_INT32(x-txburst, VirtIOPCIProxy, net.txburst, TX_BURST), DEFINE_PROP_STRING(tx, VirtIOPCIProxy, net.tx), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; @@ -888,6 +897,7 @@ static Property virtio_serial_properties[] = { DEFINE_PROP_HEX32(class, VirtIOPCIProxy, class_code, 0), DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features), DEFINE_PROP_UINT32(max_ports, VirtIOPCIProxy, serial.max_virtserial_ports, 31), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; @@ -915,6 +925,7 @@ static TypeInfo virtio_serial_info = { static Property virtio_balloon_properties[] = { DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; @@ -969,6 +980,7 @@ static int virtio_scsi_exit_pci(PCIDevice *pci_dev) static Property virtio_scsi_properties[] = { DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2), DEFINE_VIRTIO_SCSI_PROPERTIES(VirtIOPCIProxy, host_features, scsi), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/virtio-pci.h b/hw/virtio-pci.h index
Re: [Qemu-devel] [PATCH RFC] virtio-pci: add MMIO property
On 03/19/2012 04:29 PM, Michael S. Tsirkin wrote: On Mon, Mar 19, 2012 at 04:07:45PM -0500, Anthony Liguori wrote: On 03/19/2012 03:49 PM, Michael S. Tsirkin wrote: On Mon, Mar 19, 2012 at 02:19:33PM -0500, Anthony Liguori wrote: On 03/19/2012 10:56 AM, Michael S. Tsirkin wrote: Currently virtio-pci is specified so that configuration of the device is done through a PCI IO space (via BAR 0 of the virtual PCI device). However, Linux guests happen to use ioread/iowrite/iomap primitives for access, and these work uniformly across memory/io BARs. While PCI IO accesses are faster than MMIO on x86 kvm, MMIO might be helpful on other systems which don't implement PIO or where PIO is slower than MMIO. Add a property to make it possible to tweak the BAR type. Signed-off-by: Michael S. Tsirkinm...@redhat.com This is harmless by default but causes segfaults in memory.c when enabled. Thus an RFC until I figure out what's wrong. Doesn't this violate the virtio-pci spec? The point is to change the BAR type depending on the architecture. IO is fastest on x86 but maybe not on other architectures. Are we going to document that the BAR is X on architecture Y in the spec? I think the better way to do this is to use a separate device id range for MMIO virtio-pci. You can make the same driver hand both ranges and that way the device is presented consistently to the guest regardless of what the architecture is. Maybe just make this a hidden option like x-miio? x-violate-the-virtio-spec-to-trick-old-linux-drivers-into-working-on-power? Really, aren't we just being too clever here? From a practical perspective, I doubt anyone is ever going to support a driver that has *never* been tested on the platform just because it was accidentally compiled and happens to be there. If we just do use a device PCI device id range for this, it's a 1-line patch that can be provided via an update to existing guests. Regards, Anthony Liguori This will ensure people dont turn it on by mistake on e.g. x86. Making the same vendor/device ID have different semantics depending on a magic flag in QEMU seems like a pretty bad idea to me. Regards, Anthony Liguori We do this with MSI-X so why not the BAR type? We extend the bar size with MSI-X and use a transport flag to indicate that it's available, right? Regards, Anthony LIguori --- hw/virtio-pci.c | 16 ++-- hw/virtio-pci.h |4 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 28498ec..6f338d2 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -655,6 +655,7 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev) { uint8_t *config; uint32_t size; +uint8_t bar0_type; proxy-vdev = vdev; @@ -684,8 +685,14 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev) memory_region_init_io(proxy-bar,virtio_pci_config_ops, proxy, virtio-pci, size); -pci_register_bar(proxy-pci_dev, 0, PCI_BASE_ADDRESS_SPACE_IO, -proxy-bar); + +if (proxy-flagsVIRTIO_PCI_FLAG_USE_MMIO) { +bar0_type = PCI_BASE_ADDRESS_SPACE_MEMORY; +} else { +bar0_type = PCI_BASE_ADDRESS_SPACE_IO; +} + +pci_register_bar(proxy-pci_dev, 0, bar0_type,proxy-bar); if (!kvm_has_many_ioeventfds()) { proxy-flags= ~VIRTIO_PCI_FLAG_USE_IOEVENTFD; @@ -823,6 +830,7 @@ static Property virtio_blk_properties[] = { DEFINE_PROP_BIT(ioeventfd, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true), DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2), DEFINE_VIRTIO_BLK_FEATURES(VirtIOPCIProxy, host_features), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; @@ -856,6 +864,7 @@ static Property virtio_net_properties[] = { DEFINE_PROP_UINT32(x-txtimer, VirtIOPCIProxy, net.txtimer, TX_TIMER_INTERVAL), DEFINE_PROP_INT32(x-txburst, VirtIOPCIProxy, net.txburst, TX_BURST), DEFINE_PROP_STRING(tx, VirtIOPCIProxy, net.tx), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; @@ -888,6 +897,7 @@ static Property virtio_serial_properties[] = { DEFINE_PROP_HEX32(class, VirtIOPCIProxy, class_code, 0), DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features), DEFINE_PROP_UINT32(max_ports, VirtIOPCIProxy, serial.max_virtserial_ports, 31), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; @@ -915,6 +925,7 @@ static TypeInfo virtio_serial_info = { static Property virtio_balloon_properties[] = { DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features), +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_MMIO_BIT, false), DEFINE_PROP_END_OF_LIST(), }; @@ -969,6 +980,7 @@ static int
Re: [net-next PATCH v0 0/5] Series short description
From: John Fastabend john.r.fastab...@intel.com Date: Sun, 18 Mar 2012 23:51:45 -0700 This series is a follow up to this thread: http://www.spinics.net/lists/netdev/msg191360.html Can the interested parties please review this series? I'm willing to apply this right now if it looks OK, but if it needs more revisions we'll have to defer. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next PATCH v0 5/5] ixgbe: allow RAR table to be updated in promisc mode
On Sun, 2012-03-18 at 23:52 -0700, Fastabend, John R wrote: This allows RAR table updates while in promiscuous. With SR-IOV enabled it is valuable to allow the RAR table to be updated even when in promisc mode to configure forwarding Signed-off-by: John Fastabend john.r.fastab...@intel.com --- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 21 +++-- 1 files changed, 11 insertions(+), 10 deletions(-) Acked-by: Jeff Kirsher jeffrey.t.kirs...@intel.com signature.asc Description: This is a digitally signed message part
Re: [net-next PATCH v0 4/5] ixgbe: enable FDB netdevice ops
On Sun, 2012-03-18 at 23:52 -0700, Fastabend, John R wrote: Enable FDB ops on ixgbe when in SR-IOV mode. Signed-off-by: John Fastabend john.r.fastab...@intel.com --- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 59 + 1 files changed, 59 insertions(+), 0 deletions(-) Acked-by: Jeff Kirsher jeffrey.t.kirs...@intel.com signature.asc Description: This is a digitally signed message part
Re: [net-next PATCH v0 0/5] Series short description
On Mon, 19 Mar 2012 18:38:08 -0400 (EDT) David Miller da...@davemloft.net wrote: From: John Fastabend john.r.fastab...@intel.com Date: Sun, 18 Mar 2012 23:51:45 -0700 This series is a follow up to this thread: http://www.spinics.net/lists/netdev/msg191360.html Can the interested parties please review this series? I'm willing to apply this right now if it looks OK, but if it needs more revisions we'll have to defer. Please don't rush this into this merge window. It needs more than 1 full day of review. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH RFC] virtio-pci: add MMIO property
On Mon, 19 Mar 2012 17:13:06 -0500, Anthony Liguori anth...@codemonkey.ws wrote: Maybe just make this a hidden option like x-miio? x-violate-the-virtio-spec-to-trick-old-linux-drivers-into-working-on-power? To configure the device, we use the first I/O region of the PCI device. Meh, it does sound a little like we are specifying that it's an PCI I/O bar. Let's resurrect the PCI-v2 idea, which is ready to implement now, and a nice cleanup? Detach it from the change-of-ring-format idea which is turning out to be a tarpit. Thanks, Rusty. -- How could I marry someone with more hair than me? http://baldalex.org -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next PATCH v0 0/5] Series short description
On 3/19/2012 3:55 PM, Stephen Hemminger wrote: On Mon, 19 Mar 2012 18:38:08 -0400 (EDT) David Miller da...@davemloft.net wrote: From: John Fastabend john.r.fastab...@intel.com Date: Sun, 18 Mar 2012 23:51:45 -0700 This series is a follow up to this thread: http://www.spinics.net/lists/netdev/msg191360.html Can the interested parties please review this series? I'm willing to apply this right now if it looks OK, but if it needs more revisions we'll have to defer. Please don't rush this into this merge window. It needs more than 1 full day of review. Dave, its probably fine to push this to 3.5 then. I can resubmit after you close the merge window if you want? This has been somewhat broken for SR-IOV cards for multiple kernel releases now anyways one more wont hurt too much. I'll work with Roopa to get the macvlan driver plugged into the fdb ops in the meantime and maybe get DSA as well. Thanks, John -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next PATCH v0 0/5] Series short description
From: John Fastabend john.r.fastab...@intel.com Date: Mon, 19 Mar 2012 17:27:00 -0700 Dave, its probably fine to push this to 3.5 then. Fair enough. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH RFC] virtio-pci: add MMIO property
On 03/19/2012 06:52 PM, Rusty Russell wrote: On Mon, 19 Mar 2012 17:13:06 -0500, Anthony Liguorianth...@codemonkey.ws wrote: Maybe just make this a hidden option like x-miio? x-violate-the-virtio-spec-to-trick-old-linux-drivers-into-working-on-power? To configure the device, we use the first I/O region of the PCI device. Meh, it does sound a little like we are specifying that it's an PCI I/O bar. Let's resurrect the PCI-v2 idea, which is ready to implement now, and a nice cleanup? Detach it from the change-of-ring-format idea which is turning out to be a tarpit. I think that's a sensible approach. Regards, Anthony Liguori Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next PATCH v0 0/5] Series short description
On 3/19/2012 5:35 PM, David Miller wrote: From: John Fastabend john.r.fastab...@intel.com Date: Mon, 19 Mar 2012 17:27:00 -0700 Dave, its probably fine to push this to 3.5 then. Fair enough. Stephen, please let me know if you see any issues though because without these we have no way to forward packets correctly in the embedded switch. So we can't really use SR-IOV and virtual interfaces together correctly. And the macvlan device in passthru mode is putting the device in promiscuous mode which isn't great either. .John -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next PATCH v0 0/5] Series short description
On Mon, 19 Mar 2012 19:49:50 -0700 John Fastabend john.r.fastab...@intel.com wrote: On 3/19/2012 5:35 PM, David Miller wrote: From: John Fastabend john.r.fastab...@intel.com Date: Mon, 19 Mar 2012 17:27:00 -0700 Dave, its probably fine to push this to 3.5 then. Fair enough. Stephen, please let me know if you see any issues though because without these we have no way to forward packets correctly in the embedded switch. So we can't really use SR-IOV and virtual interfaces together correctly. And the macvlan device in passthru mode is putting the device in promiscuous mode which isn't great either. .John I am more worried about evaluating ABI compatibility with older utilities. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html