[PATCH v4 0/7] RTC: New logic to emulate RTC

2012-03-19 Thread Zhang, Yang Z
Changes in v4:
Rebase to latest head.
Changing in patch 6: 
Set the timer to one second earlier before target alarm when AF bit is 
clear. In version 3, in order to solve the async between UF, AF and UIP, the 
timer will keep running when UF or AF are clear. This is a little ugly, 
especially when a userspace program is using the alarm and we cannot achieve 
any power saving. In this version, when the AF bit is cleared, we will set the 
timer to one second earlier before the alarm. With this changing, we can avoid 
the unnecessary timer and keep the sync between UF, AF and UIP. Please help to 
review the patch 6.

Changes in v3:
Rebase to latest head.
Remove the logic to update time format when DM bit changed.
Allow to migrate from old version.
Solve the async when reading UF and UIP

Changes in v2:
Add UIP check logic.
Add logic that next second tick will occur in exactly 500ms later after reset 
divider

Current RTC emulation uses periodic timer(2 timers per second) to update RTC 
clock. And it will stop CPU staying at deep C-state for long period. Our 
experience shows the Pkg C6 residency reduced 6% when running 64 idle guest.
The following patch stop the two periodic timer and only updating RTC clock 
when guest try to read it.
--- 
Yang Zhang (7):
RTC: Remove the logic to update time format when DM bit changed
RTC: Update the RTC clock only when reading it
RTC: Add UIP(update in progress) check logic
RTC: Set internal millisecond register to 500ms when reset divider
RTC: Add RTC update-ended interrupt support
RTC: Add alarm support
RTC: Allow to migrate from old version

hw/mc146818rtc.c |  617 
-
1 files changed, 465 insertions(+), 152 deletions(-)

best regards
yang
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 1/7] RTC: Remove the logic to update time format when DM bit changed

2012-03-19 Thread Zhang, Yang Z
Change DM(date mode) and 24/12 control bit don't affect the internal registers. 
It only indicates what format is using for those registers. So we don't need to 
update time format when it is modified.

Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 hw/mc146818rtc.c |   10 +-
 1 files changed, 1 insertions(+), 9 deletions(-)

diff --git a/hw/mc146818rtc.c b/hw/mc146818rtc.c
index a46fdfc..9b49cbc 100644
--- a/hw/mc146818rtc.c
+++ b/hw/mc146818rtc.c
@@ -252,15 +252,7 @@ static void cmos_ioport_write(void *opaque, uint32_t addr, 
uint32_t data)
 rtc_set_time(s);
 }
 }
-if (((s-cmos_data[RTC_REG_B] ^ data)  (REG_B_DM | REG_B_24H)) 
-!(data  REG_B_SET)) {
-/* If the time format has changed and not in set mode,
-   update the registers immediately. */
-s-cmos_data[RTC_REG_B] = data;
-rtc_copy_date(s);
-} else {
-s-cmos_data[RTC_REG_B] = data;
-}
+s-cmos_data[RTC_REG_B] = data;
 rtc_timer_update(s, qemu_get_clock_ns(rtc_clock));
 break;
 case RTC_REG_C:
--
1.7.1
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 2/7] RTC: Update the RTC clock only when reading it

2012-03-19 Thread Zhang, Yang Z
There has no need to use two periodic timer to update RTC time. In this patch, 
we only update it when guest reading it.

Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 hw/mc146818rtc.c |  207 +-
 1 files changed, 66 insertions(+), 141 deletions(-)

diff --git a/hw/mc146818rtc.c b/hw/mc146818rtc.c
index 9b49cbc..82a5b8a 100644
--- a/hw/mc146818rtc.c
+++ b/hw/mc146818rtc.c
@@ -44,6 +44,9 @@
 # define DPRINTF_C(format, ...)  do { } while (0)
 #endif

+#define USEC_PER_SEC100L
+#define NS_PER_USEC 1000L
+
 #define RTC_REINJECT_ON_ACK_COUNT 20

 #define RTC_SECONDS 0
@@ -85,6 +88,8 @@ typedef struct RTCState {
 uint8_t cmos_data[128];
 uint8_t cmos_index;
 struct tm current_tm;
+int64_t offset_sec;
+int32_t offset_usec;
 int32_t base_year;
 qemu_irq irq;
 qemu_irq sqw_irq;
@@ -92,21 +97,29 @@ typedef struct RTCState {
 /* periodic timer */
 QEMUTimer *periodic_timer;
 int64_t next_periodic_time;
-/* second update */
-int64_t next_second_time;
 uint16_t irq_reinject_on_ack_count;
 uint32_t irq_coalesced;
 uint32_t period;
 QEMUTimer *coalesced_timer;
-QEMUTimer *second_timer;
-QEMUTimer *second_timer2;
 Notifier clock_reset_notifier;
 LostTickPolicy lost_tick_policy;
 Notifier suspend_notifier;
 } RTCState;

 static void rtc_set_time(RTCState *s);
-static void rtc_copy_date(RTCState *s);
+static void rtc_calibrate_time(RTCState *s);
+static void rtc_set_cmos(RTCState *s);
+
+static uint64_t get_guest_rtc_us(RTCState *s)
+{
+int64_t host_usec, offset_usec, guest_usec;
+
+host_usec = qemu_get_clock_ns(host_clock) / NS_PER_USEC;
+offset_usec = s-offset_sec * USEC_PER_SEC + s-offset_usec;
+guest_usec = host_usec + offset_usec;
+
+return guest_usec;
+}

 #ifdef TARGET_I386
 static void rtc_coalesced_timer_update(RTCState *s)
@@ -207,6 +220,20 @@ static void rtc_periodic_timer(void *opaque)
 }
 }

+static void rtc_set_offset(RTCState *s)
+{
+struct tm *tm = s-current_tm;
+int64_t host_usec, guest_sec, guest_usec;
+
+host_usec = qemu_get_clock_ns(host_clock) / NS_PER_USEC;
+
+guest_sec = mktimegm(tm);
+guest_usec = guest_sec * USEC_PER_SEC;
+
+s-offset_sec = (guest_usec - host_usec) / USEC_PER_SEC;
+s-offset_usec = (guest_usec - host_usec) % USEC_PER_SEC;
+}
+
 static void cmos_ioport_write(void *opaque, uint32_t addr, uint32_t data)
 {
 RTCState *s = opaque;
@@ -233,6 +260,7 @@ static void cmos_ioport_write(void *opaque, uint32_t addr, 
u   
   int32_t data)
 /* if in set mode, do not update the time */
 if (!(s-cmos_data[RTC_REG_B]  REG_B_SET)) {
 rtc_set_time(s);
+rtc_set_offset(s);
 }
 break;
 case RTC_REG_A:
@@ -243,6 +271,11 @@ static void cmos_ioport_write(void *opaque, uint32_t addr, 

  uint32_t data)
 break;
 case RTC_REG_B:
 if (data  REG_B_SET) {
+/* update cmos to when the rtc was stopping */
+if (!(s-cmos_data[RTC_REG_B]  REG_B_SET)) {
+rtc_calibrate_time(s);
+rtc_set_cmos(s);
+}
 /* set mode: reset UIP mode */
 s-cmos_data[RTC_REG_A] = ~REG_A_UIP;
 data = ~REG_B_UIE;
@@ -250,6 +283,7 @@ static void cmos_ioport_write(void *opaque, uint32_t addr, 
u   
   int32_t data)
 /* if disabling set mode, update the time */
 if (s-cmos_data[RTC_REG_B]  REG_B_SET) {
 rtc_set_time(s);
+rtc_set_offset(s);
 }
 }
 s-cmos_data[RTC_REG_B] = data;
@@ -305,7 +339,7 @@ static void rtc_set_time(RTCState *s)
 rtc_change_mon_event(tm);
 }

-static void rtc_copy_date(RTCState *s)
+static void rtc_set_cmos(RTCState *s)
 {
 const struct tm *tm = s-current_tm;
 int year;
@@ -331,122 +365,16 @@ static void rtc_copy_date(RTCState *s)
 s-cmos_data[RTC_YEAR] = rtc_to_bcd(s, year);
 }

-/* month is between 0 and 11. */
-static int get_days_in_month(int month, int year)
-{
-static const int days_tab[12] = {
-31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31
-};
-int d;
-if ((unsigned )month = 12)
-return 31;
-d = days_tab[month];
-if (month == 1) {
-if ((year % 4) == 0  ((year % 100) != 0 || (year % 400) == 0))
-d++;
-}
-return d;
-}
-
-/* update 'tm' to the next second */
-static void rtc_next_second(struct tm *tm)
+static void rtc_calibrate_time(RTCState *s)
 {
-int days_in_month;
-
-tm-tm_sec++;
-if 

[PATCH v4 3/7] RTC: Add UIP(update in progress) check logic

2012-03-19 Thread Zhang, Yang Z
The UIP(update in progress) is set when RTC is updating. And the update cycle 
begins 244us later after UIP is set. And it is cleared when update end.

Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 hw/mc146818rtc.c |   18 ++
 1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/hw/mc146818rtc.c b/hw/mc146818rtc.c
index 82a5b8a..6ebb8f6 100644
--- a/hw/mc146818rtc.c
+++ b/hw/mc146818rtc.c
@@ -377,6 +377,21 @@ static void rtc_calibrate_time(RTCState *s)
 s-current_tm = *ret;
 }

+static int update_in_progress(RTCState *s)
+{
+int64_t guest_usec;
+
+if (s-cmos_data[RTC_REG_B]  REG_B_SET) {
+return 0;
+}
+guest_usec = get_guest_rtc_us(s);
+/* UIP bit will be set at last 244us of every second. */
+if ((guest_usec % USEC_PER_SEC) = (USEC_PER_SEC - 244)) {
+return 1;
+}
+return 0;
+}
+
 static uint32_t cmos_ioport_read(void *opaque, uint32_t addr)
 {
 RTCState *s = opaque;
@@ -402,6 +417,9 @@ static uint32_t cmos_ioport_read(void *opaque, uint32_t 
addr)
 break;
 case RTC_REG_A:
 ret = s-cmos_data[s-cmos_index];
+if (update_in_progress(s)) {
+ret |= REG_A_UIP;
+}
 break;
 case RTC_REG_C:
 ret = s-cmos_data[s-cmos_index];
--
1.7.1
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 4/7] RTC: Set internal millisecond register to 500ms when reset divider

2012-03-19 Thread Zhang, Yang Z
The first update cycle begins one - half seconds later when divider reset is 
removing.

Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 hw/mc146818rtc.c |   38 +-
 1 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/hw/mc146818rtc.c b/hw/mc146818rtc.c
index 6ebb8f6..5e7fbb5 100644
--- a/hw/mc146818rtc.c
+++ b/hw/mc146818rtc.c
@@ -110,6 +110,8 @@ static void rtc_set_time(RTCState *s);
 static void rtc_calibrate_time(RTCState *s);
 static void rtc_set_cmos(RTCState *s);

+static int32_t divider_reset;
+
 static uint64_t get_guest_rtc_us(RTCState *s)
 {
 int64_t host_usec, offset_usec, guest_usec;
@@ -220,16 +222,24 @@ static void rtc_periodic_timer(void *opaque)
 }
 }

-static void rtc_set_offset(RTCState *s)
+static void rtc_set_offset(RTCState *s, int32_t start_usec)
 {
 struct tm *tm = s-current_tm;
-int64_t host_usec, guest_sec, guest_usec;
+int64_t host_usec, guest_sec, guest_usec, offset_usec, old_guest_usec;

 host_usec = qemu_get_clock_ns(host_clock) / NS_PER_USEC;
+offset_usec = s-offset_sec * USEC_PER_SEC + s-offset_usec;
+old_guest_usec = (host_usec + offset_usec) % USEC_PER_SEC;

 guest_sec = mktimegm(tm);
-guest_usec = guest_sec * USEC_PER_SEC;

+/* start_usec equal 0 means rtc internal millisecond is
+ * same with before */
+if (start_usec == 0) {
+guest_usec = guest_sec * USEC_PER_SEC + old_guest_usec;
+} else {
+guest_usec = guest_sec * USEC_PER_SEC + start_usec;
+}
 s-offset_sec = (guest_usec - host_usec) / USEC_PER_SEC;
 s-offset_usec = (guest_usec - host_usec) % USEC_PER_SEC;
 }
@@ -260,10 +270,22 @@ static void cmos_ioport_write(void *opaque, uint32_t 
addr, uint32_t data)
 /* if in set mode, do not update the time */
 if (!(s-cmos_data[RTC_REG_B]  REG_B_SET)) {
 rtc_set_time(s);
-rtc_set_offset(s);
+rtc_set_offset(s, 0);
 }
 break;
 case RTC_REG_A:
+/* when the divider reset is removed, the first update cycle
+ * begins one-half second later*/
+if (((s-cmos_data[RTC_REG_A]  0x60) == 0x60) 
+((data  0x70)  4) = 2) {
+divider_reset = 1;
+if (!(s-cmos_data[RTC_REG_B]  REG_B_SET)) {
+rtc_calibrate_time(s);
+rtc_set_offset(s, 50);
+s-cmos_data[RTC_REG_A] = ~REG_A_UIP;
+divider_reset = 0;
+}
+}
 /* UIP bit is read only */
 s-cmos_data[RTC_REG_A] = (data  ~REG_A_UIP) |
 (s-cmos_data[RTC_REG_A]  REG_A_UIP);
@@ -283,7 +305,13 @@ static void cmos_ioport_write(void *opaque, uint32_t addr, 
uint32_t data)
 /* if disabling set mode, update the time */
 if (s-cmos_data[RTC_REG_B]  REG_B_SET) {
 rtc_set_time(s);
-rtc_set_offset(s);
+if (divider_reset == 1) {
+rtc_set_offset(s, 50);
+s-cmos_data[RTC_REG_A] = ~REG_A_UIP;
+divider_reset = 0;
+} else {
+rtc_set_offset(s, 0);
+}
 }
 }
 s-cmos_data[RTC_REG_B] = data;
--
1.7.1
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 5/7] RTC:Add RTC update-ended interrupt support

2012-03-19 Thread Zhang, Yang Z
Use a timer to emulate update cycle. When update cycle ended and UIE is 
setting, then raise an interrupt. The timer runs only when UF or AF is cleared.

Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 hw/mc146818rtc.c |   86 ++
 1 files changed, 80 insertions(+), 6 deletions(-)

diff --git a/hw/mc146818rtc.c b/hw/mc146818rtc.c
index 5e7fbb5..fae049e 100644
--- a/hw/mc146818rtc.c
+++ b/hw/mc146818rtc.c
@@ -97,6 +97,11 @@ typedef struct RTCState {
 /* periodic timer */
 QEMUTimer *periodic_timer;
 int64_t next_periodic_time;
+/* update-ended timer */
+QEMUTimer *update_timer;
+QEMUTimer *update_timer2;
+uint64_t next_update_time;
+uint32_t use_timer;
 uint16_t irq_reinject_on_ack_count;
 uint32_t irq_coalesced;
 uint32_t period;
@@ -157,7 +162,8 @@ static void rtc_coalesced_timer(void *opaque)
 }
 #endif

-static void rtc_timer_update(RTCState *s, int64_t current_time)
+/* handle periodic timer */
+static void periodic_timer_update(RTCState *s, int64_t current_time)
 {
 int period_code, period;
 int64_t cur_clock, next_irq_clock;
@@ -195,7 +201,7 @@ static void rtc_periodic_timer(void *opaque)
 {
 RTCState *s = opaque;

-rtc_timer_update(s, s-next_periodic_time);
+periodic_timer_update(s, s-next_periodic_time);
 s-cmos_data[RTC_REG_C] |= REG_C_PF;
 if (s-cmos_data[RTC_REG_B]  REG_B_PIE) {
 s-cmos_data[RTC_REG_C] |= REG_C_IRQF;
@@ -222,6 +228,58 @@ static void rtc_periodic_timer(void *opaque)
 }
 }

+/* handle update-ended timer */
+static void check_update_timer(RTCState *s)
+{
+uint64_t next_update_time, expire_time;
+uint64_t guest_usec;
+qemu_del_timer(s-update_timer);
+qemu_del_timer(s-update_timer2);
+
+if (!((s-cmos_data[RTC_REG_C]  (REG_C_UF | REG_C_AF)) ==
+(REG_C_UF | REG_C_AF))  !(s-cmos_data[RTC_REG_B]  REG_B_SET)) {
+s-use_timer = 1;
+guest_usec = get_guest_rtc_us(s) % USEC_PER_SEC;
+if (guest_usec = (USEC_PER_SEC - 244)) {
+/* RTC is in update cycle when enabling UIE */
+s-cmos_data[RTC_REG_A] |= REG_A_UIP;
+next_update_time = (USEC_PER_SEC - guest_usec) * NS_PER_USEC;
+expire_time = qemu_get_clock_ns(rtc_clock) + next_update_time;
+qemu_mod_timer(s-update_timer2, expire_time);
+} else {
+next_update_time = (USEC_PER_SEC - guest_usec - 244) * NS_PER_USEC;
+expire_time = qemu_get_clock_ns(rtc_clock) + next_update_time;
+s-next_update_time = expire_time;
+qemu_mod_timer(s-update_timer, expire_time);
+}
+} else {
+s-use_timer = 0;
+}
+}
+
+static void rtc_update_timer(void *opaque)
+{
+RTCState *s = opaque;
+
+if (!(s-cmos_data[RTC_REG_B]  REG_B_SET)) {
+s-cmos_data[RTC_REG_A] |= REG_A_UIP;
+qemu_mod_timer(s-update_timer2, s-next_update_time + 244000UL);
+}
+}
+
+static void rtc_update_timer2(void *opaque)
+{
+RTCState *s = opaque;
+
+if (!(s-cmos_data[RTC_REG_B]  REG_B_SET)) {
+s-cmos_data[RTC_REG_C] |= REG_C_UF;
+s-cmos_data[RTC_REG_A] = ~REG_A_UIP;
+s-cmos_data[RTC_REG_C] |= REG_C_IRQF;
+qemu_irq_raise(s-irq);
+}
+check_update_timer(s);
+}
+
 static void rtc_set_offset(RTCState *s, int32_t start_usec)
 {
 struct tm *tm = s-current_tm;
@@ -283,13 +341,14 @@ static void cmos_ioport_write(void *opaque, uint32_t 
addr, uint32_t data)
 rtc_calibrate_time(s);
 rtc_set_offset(s, 50);
 s-cmos_data[RTC_REG_A] = ~REG_A_UIP;
+check_update_timer(s);
 divider_reset = 0;
 }
 }
 /* UIP bit is read only */
 s-cmos_data[RTC_REG_A] = (data  ~REG_A_UIP) |
 (s-cmos_data[RTC_REG_A]  REG_A_UIP);
-rtc_timer_update(s, qemu_get_clock_ns(rtc_clock));
+periodic_timer_update(s, qemu_get_clock_ns(rtc_clock));
 break;
 case RTC_REG_B:
 if (data  REG_B_SET) {
@@ -315,7 +374,8 @@ static void cmos_ioport_write(void *opaque, uint32_t addr, 
uint32_t data)
 }
 }
 s-cmos_data[RTC_REG_B] = data;
-rtc_timer_update(s, qemu_get_clock_ns(rtc_clock));
+periodic_timer_update(s, qemu_get_clock_ns(rtc_clock));
+check_update_timer(s);
 break;
 case RTC_REG_C:
 case RTC_REG_D:
@@ -445,7 +505,7 @@ static uint32_t cmos_ioport_read(void *opaque, uint32_t 
addr)
 break;
 case RTC_REG_A:
 ret = s-cmos_data[s-cmos_index];
-if (update_in_progress(s)) {
+if ((s-use_timer == 0)  update_in_progress(s)) {
 ret |= REG_A_UIP;
 }
 break;
@@ -453,6 +513,12 @@ static uint32_t cmos_ioport_read(void *opaque, uint32_t 
addr)

[PATCH v4 6/7] RTC:Add alarm support

2012-03-19 Thread Zhang, Yang Z
Changing in this patch: 
Set the timer to one second earlier before target alarm when AF bit is 
clear. In version 3, in order to solve the async between UF, AF and UIP, the 
timer will keep running when UF or AF are clear. This is a little ugly, 
especially when a userspace program is using the alarm and we cannot achieve 
any power saving. In this version, when the AF bit is cleared, we will set the 
timer to one second earlier before the alarm. With this changing, we can avoid 
the unnecessary timer and keep the sync between UF, AF and UIP. 

Set the timer to one second earlier before target alarm when AF bit is clear.
 
Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 hw/mc146818rtc.c |  274 ++
 1 files changed, 255 insertions(+), 19 deletions(-)

diff --git a/hw/mc146818rtc.c b/hw/mc146818rtc.c
index fae049e..c03606f 100644
--- a/hw/mc146818rtc.c
+++ b/hw/mc146818rtc.c
@@ -46,6 +46,11 @@

 #define USEC_PER_SEC100L
 #define NS_PER_USEC 1000L
+#define NS_PER_SEC  10ULL
+#define SEC_PER_MIN 60
+#define SEC_PER_HOUR3600
+#define MIN_PER_HOUR60
+#define HOUR_PER_DAY24

 #define RTC_REINJECT_ON_ACK_COUNT 20

@@ -114,6 +119,8 @@ typedef struct RTCState {
 static void rtc_set_time(RTCState *s);
 static void rtc_calibrate_time(RTCState *s);
 static void rtc_set_cmos(RTCState *s);
+static inline int rtc_from_bcd(RTCState *s, int a);
+static uint64_t get_next_alarm(RTCState *s);

 static int32_t divider_reset;

@@ -232,29 +239,47 @@ static void rtc_periodic_timer(void *opaque)
 static void check_update_timer(RTCState *s)
 {
 uint64_t next_update_time, expire_time;
-uint64_t guest_usec;
+uint64_t guest_usec, next_alarm_sec;
+
 qemu_del_timer(s-update_timer);
 qemu_del_timer(s-update_timer2);

-if (!((s-cmos_data[RTC_REG_C]  (REG_C_UF | REG_C_AF)) ==
-(REG_C_UF | REG_C_AF))  !(s-cmos_data[RTC_REG_B]  REG_B_SET)) {
-s-use_timer = 1;
+if (!(s-cmos_data[RTC_REG_B]  REG_B_SET)) {
 guest_usec = get_guest_rtc_us(s) % USEC_PER_SEC;
-if (guest_usec = (USEC_PER_SEC - 244)) {
-/* RTC is in update cycle when enabling UIE */
-s-cmos_data[RTC_REG_A] |= REG_A_UIP;
-next_update_time = (USEC_PER_SEC - guest_usec) * NS_PER_USEC;
-expire_time = qemu_get_clock_ns(rtc_clock) + next_update_time;
-qemu_mod_timer(s-update_timer2, expire_time);
-} else {
-next_update_time = (USEC_PER_SEC - guest_usec - 244) * NS_PER_USEC;
-expire_time = qemu_get_clock_ns(rtc_clock) + next_update_time;
-s-next_update_time = expire_time;
-qemu_mod_timer(s-update_timer, expire_time);
+/* if UF is clear, reprogram to next second */
+if (!(s-cmos_data[RTC_REG_C]  REG_C_UF)) {
+program_next_second:
+s-use_timer = 1;
+if (guest_usec = (USEC_PER_SEC - 244)) {
+/* RTC is in update cycle when enabling UIE */
+s-cmos_data[RTC_REG_A] |= REG_A_UIP;
+next_update_time = (USEC_PER_SEC - guest_usec) * NS_PER_USEC;
+expire_time = qemu_get_clock_ns(rtc_clock) + next_update_time;
+qemu_mod_timer(s-update_timer2, expire_time);
+} else {
+next_update_time = (USEC_PER_SEC - guest_usec - 244)
+* NS_PER_USEC;
+expire_time = qemu_get_clock_ns(rtc_clock) + next_update_time;
+s-next_update_time = expire_time;
+qemu_mod_timer(s-update_timer, expire_time);
+}
+return ;
+} else if (!(s-cmos_data[RTC_REG_C]  REG_C_AF)) {
+/* UF is set, but AF is clear. Program to one second
+ * earlier before target alarm*/
+next_alarm_sec = get_next_alarm(s);
+if (next_alarm_sec == 1) {
+goto program_next_second;
+} else {
+next_update_time = (USEC_PER_SEC - guest_usec) * NS_PER_USEC;
+next_update_time += (next_alarm_sec - 1) * NS_PER_SEC;
+expire_time = qemu_get_clock_ns(rtc_clock) + next_update_time;
+s-next_update_time = expire_time;
+qemu_mod_timer(s-update_timer2, expire_time);
+}
 }
-} else {
-s-use_timer = 0;
 }
+s-use_timer = 0;
 }

 static void rtc_update_timer(void *opaque)
@@ -267,15 +292,215 @@ static void rtc_update_timer(void *opaque)
 }
 }

+static inline uint8_t convert_hour(RTCState *s, uint8_t hour)
+{
+if (!(s-cmos_data[RTC_REG_B]  REG_B_24H)) {
+hour %= 12;
+if (s-cmos_data[RTC_HOURS]  0x80) {
+hour += 12;
+}
+}
+return hour;
+}
+
+static uint64_t get_next_alarm(RTCState *s)
+{
+int32_t alarm_sec, alarm_min, alarm_hour, cur_hour, cur_min, cur_sec;
+int32_t hour, min;
+uint64_t 

[PATCH v4 7/7] RTC:Allow to migrate from old version

2012-03-19 Thread Zhang, Yang Z
The new logic is compatible with old. So it should not block migrate from old 
version. But new version cannot migrate to old.

Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 hw/mc146818rtc.c |   48 
 1 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/hw/mc146818rtc.c b/hw/mc146818rtc.c
index c03606f..61ac3c3 100644
--- a/hw/mc146818rtc.c
+++ b/hw/mc146818rtc.c
@@ -827,11 +827,51 @@ static int rtc_post_load(void *opaque, int version_id)
 return 0;
 }

+static int rtc_load_old(QEMUFile *f, void *opaque, int version_id)
+{
+RTCState *s = opaque;
+
+if (version_id  2) {
+return -EINVAL;
+}
+
+qemu_get_buffer(f, s-cmos_data, sizeof(s-cmos_data));
+/* dummy load for compatibility */
+qemu_get_byte(f); /* cmos_index */
+qemu_get_be32(f); /* tm_sec */
+qemu_get_be32(f); /* tm_min */
+qemu_get_be32(f); /* tm_hour */
+qemu_get_be32(f); /* tm_wday */
+qemu_get_be32(f); /* tm_mday */
+qemu_get_be32(f); /* tm_mon */
+qemu_get_be32(f); /* tm_year */
+qemu_get_be64(f); /* periodic_timer */
+qemu_get_be64(f); /* next_periodic_time */
+qemu_get_be64(f); /* next_second_time */
+qemu_get_be64(f); /* second_timer */
+qemu_get_be64(f); /* second_timer2 */
+qemu_get_be32(f); /* irq_coalesced */
+qemu_get_be32(f); /* period */
+
+
+rtc_set_date_from_host(s-dev);
+periodic_timer_update(s,  qemu_get_clock_ns(rtc_clock));
+check_update_timer(s);
+
+#ifdef TARGET_I386
+if (s-lost_tick_policy == LOST_TICK_SLEW) {
+rtc_coalesced_timer_update(s);
+}
+#endif
+return 0;
+}
+
 static const VMStateDescription vmstate_rtc = {
 .name = mc146818rtc,
-.version_id = 2,
-.minimum_version_id = 1,
-.minimum_version_id_old = 1,
+.version_id = 3,
+.minimum_version_id = 3,
+.minimum_version_id_old = 2,
+.load_state_old = rtc_load_old,
 .post_load = rtc_post_load,
 .fields  = (VMStateField []) {
 VMSTATE_BUFFER(cmos_data, RTCState),
@@ -969,7 +1009,7 @@ static int rtc_initfn(ISADevice *dev)
 memory_region_init_io(s-io, cmos_ops, s, rtc, 2);
 isa_register_ioport(dev, s-io, base);

-qdev_set_legacy_instance_id(dev-qdev, base, 2);
+qdev_set_legacy_instance_id(dev-qdev, base, 3);
 qemu_register_reset(rtc_reset, s);

 object_property_add(OBJECT(s), date, struct tm,
--
1.7.1
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Questing regarding KVM Guest PMU

2012-03-19 Thread shashank rachamalla
On Sun, Mar 18, 2012 at 10:21 PM, Gleb Natapov g...@redhat.com wrote:
 On Sun, Mar 18, 2012 at 09:47:55PM +0530, shashank rachamalla wrote:
  I guess things are working fine with perf. But why not with oprofile ?
 
  Looks like it. I never tried oprofile. Will try to reproduce your
  problem and see what oprofile is doing.

 I am using ubuntu 10.04 with 2.6.32-21-generic kernel as guest and
 oprofile 0.9.6.
 Also, I have tried to capture kvm-events ( perf patch ) in host while
 running oprofile and perf in guest.
 Please see the attachment. I have run the tests in three cases for the
 around 5 secs.

 There are more number of MSR reads and writes in case of perf which I
 think is normal. However, there are very few MSR reads and writes with
 oprofile. Also, the number of NMI exceptions are too high in case of
 oprofile.

 Which host kernel are you using? Try latest kvm.git and check if you see
 something unusual in dmesg.

Currenly running 3.3.0-rc5. will try with the latest source from kvm
git and let you know.



 --
                        Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next PATCH v0 0/5] Series short description

2012-03-19 Thread John Fastabend
This series is a follow up to this thread:

http://www.spinics.net/lists/netdev/msg191360.html

This adds two NTF_XXX bits to signal if the PF_BRIDGE
netlink command should be parsed by the embedded bridge
or the SW bridge. The insight here is the SW bridge is
always the master device (NTF_MASTER) and the embedded
bridge is the lower device (NTF_LOWERDEV). Without either
flag set the command is parsed by the SW bridge to support
existing tooling.

To make this work correctly I added three new ndo ops

ndo_fdb_add
ndo_fdb_del
ndo_fdb_dump

to add, delete, and dump FDB entries. These operations
can be used by drivers to program embedded nics or by
software bridges. We have at least three SW bridge now
net/bridge, openvswitch, and macvlan. And three variants
of embedded bridges SR-IOV devices, multi-function devices
and Distributed Switch Architecture (DSA).

I think at least in this case adding netdevice ops is
the cleanest way to implement this. I thought about
notifier hooks and other methods but this seems to be
the simplest.

I've tested these three scenarios, embedded bridge only,
sw bridge only, and embedded bridge and SW bridge. These
are working on the Intel 82599 devices with this patch
series. I am also working on a patch for the macvlan
drivers. I'll submit that as an RFC shortly so far I
only have the passthru mode wired up.

Thanks to Stephen, Ben, and Jamal for bearing with me
and the feedback on the last round of patches.

As always any comments/feedback appreciated!

---

John Fastabend (5):
  ixgbe: allow RAR table to be updated in promisc mode
  ixgbe: enable FDB netdevice ops
  net: add fdb generic dump routine
  net: addr_list: add exclusive dev_uc_add
  net: add generic PF_BRIDGE:RTM_XXX FDB hooks


 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   80 +-
 include/linux/neighbour.h |3 
 include/linux/netdevice.h |   27 +++
 include/linux/rtnetlink.h |4 +
 net/bridge/br_device.c|3 
 net/bridge/br_fdb.c   |  128 
 net/bridge/br_netlink.c   |   12 --
 net/bridge/br_private.h   |   15 ++
 net/core/dev_addr_lists.c |   19 ++
 net/core/rtnetlink.c  |  194 +
 10 files changed, 363 insertions(+), 122 deletions(-)

-- 
Signature
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next PATCH v0 2/5] net: addr_list: add exclusive dev_uc_add

2012-03-19 Thread John Fastabend
This adds a dev_uc_add_excl() call similar to the original
dev_uc_add() except it sets the global bit. With this
change the reference count will not be bumped and -EEXIST
will be returned if a duplicate address exists.

This is useful for drivers that support SR-IOV and want
to manage the unicast lists.

Signed-off-by: John Fastabend john.r.fastab...@intel.com
---

 include/linux/netdevice.h |1 +
 net/core/dev_addr_lists.c |   19 +++
 2 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4208901..5e43cec 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2571,6 +2571,7 @@ extern int dev_addr_init(struct net_device *dev);
 
 /* Functions used for unicast addresses handling */
 extern int dev_uc_add(struct net_device *dev, unsigned char *addr);
+extern int dev_uc_add_excl(struct net_device *dev, unsigned char *addr);
 extern int dev_uc_del(struct net_device *dev, unsigned char *addr);
 extern int dev_uc_sync(struct net_device *to, struct net_device *from);
 extern void dev_uc_unsync(struct net_device *to, struct net_device *from);
diff --git a/net/core/dev_addr_lists.c b/net/core/dev_addr_lists.c
index 29c07fe..c7d27ad 100644
--- a/net/core/dev_addr_lists.c
+++ b/net/core/dev_addr_lists.c
@@ -377,6 +377,25 @@ EXPORT_SYMBOL(dev_addr_del_multiple);
  */
 
 /**
+ * dev_uc_add_excl - Add a global secondary unicast address
+ * @dev: device
+ * @addr: address to add
+ */
+int dev_uc_add_excl(struct net_device *dev, unsigned char *addr)
+{
+   int err;
+
+   netif_addr_lock_bh(dev);
+   err = __hw_addr_add_ex(dev-uc, addr, dev-addr_len,
+  NETDEV_HW_ADDR_T_UNICAST, true);
+   if (!err)
+   __dev_set_rx_mode(dev);
+   netif_addr_unlock_bh(dev);
+   return err;
+}
+EXPORT_SYMBOL(dev_uc_add_excl);
+
+/**
  * dev_uc_add - Add a secondary unicast address
  * @dev: device
  * @addr: address to add

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next PATCH v0 3/5] net: add fdb generic dump routine

2012-03-19 Thread John Fastabend
This adds a generic dump routine drivers can call. It
should be sufficient to handle any bridging model that
uses the unicast address list. This should be most SR-IOV
enabled NICs.

Signed-off-by: John Fastabend john.r.fastab...@intel.com
---

 net/core/rtnetlink.c |   56 ++
 1 files changed, 56 insertions(+), 0 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 8c3278a..35ee2d6 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2082,6 +2082,62 @@ static int rtnl_fdb_del(struct sk_buff *skb, struct 
nlmsghdr *nlh, void *arg)
return err;
 }
 
+/**
+ * ndo_dflt_fdb_dump: default netdevice operation to dump an FDB table.
+ * @nlh: netlink message header
+ * @dev: netdevice
+ *
+ * Default netdevice operation to dump the existing unicast address list.
+ * Returns zero on success.
+ */
+int ndo_dflt_fdb_dump(struct sk_buff *skb,
+ struct netlink_callback *cb,
+ struct net_device *dev,
+ int idx)
+{
+   struct netdev_hw_addr *ha;
+   struct nlmsghdr *nlh;
+   struct ndmsg *ndm;
+   u32 pid, seq;
+
+   pid = NETLINK_CB(cb-skb).pid;
+   seq = cb-nlh-nlmsg_seq;
+
+   netif_addr_lock_bh(dev);
+   list_for_each_entry(ha, dev-uc.list, list) {
+   if (idx  cb-args[0])
+   goto skip;
+
+   nlh = nlmsg_put(skb, pid, seq,
+   RTM_NEWNEIGH, sizeof(*ndm), NLM_F_MULTI);
+   if (!nlh)
+   break;
+
+   ndm = nlmsg_data(nlh);
+   ndm-ndm_family  = AF_BRIDGE;
+   ndm-ndm_pad1= 0;
+   ndm-ndm_pad2= 0;
+   ndm-ndm_flags   = NTF_LOWERDEV;
+   ndm-ndm_type= 0;
+   ndm-ndm_ifindex = dev-ifindex;
+   ndm-ndm_state   = NUD_PERMANENT;
+
+   NLA_PUT(skb, NDA_LLADDR, ETH_ALEN, ha-addr);
+
+   nlmsg_end(skb, nlh);
+skip:
+   ++idx;
+   }
+   netif_addr_unlock_bh(dev);
+
+   return idx;
+nla_put_failure:
+   netif_addr_unlock_bh(dev);
+   nlmsg_cancel(skb, nlh);
+   return idx;
+}
+EXPORT_SYMBOL(ndo_dflt_fdb_dump);
+
 static int rtnl_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb)
 {
int idx = 0;

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next PATCH v0 4/5] ixgbe: enable FDB netdevice ops

2012-03-19 Thread John Fastabend
Enable FDB ops on ixgbe when in SR-IOV mode.

Signed-off-by: John Fastabend john.r.fastab...@intel.com
---

 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   59 +
 1 files changed, 59 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 1d8f9f8..32adb4f 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -7586,6 +7586,62 @@ static int ixgbe_set_features(struct net_device *netdev,
 
 }
 
+static int ixgbe_ndo_fdb_add(struct ndmsg *ndm,
+struct net_device *dev,
+unsigned char *addr,
+u16 flags)
+{
+   struct ixgbe_adapter *adapter = netdev_priv(dev);
+   int err = -EOPNOTSUPP;
+
+   if (ndm-ndm_state  NUD_PERMANENT) {
+   pr_info(%s: FDB only supports static addresses\n,
+   ixgbe_driver_name);
+   return -EINVAL;
+   }
+
+   if (adapter-flags  IXGBE_FLAG_SRIOV_ENABLED)
+   err = dev_uc_add_excl(dev, addr);
+
+   /* Only return duplicate errors if NLM_F_EXCL is set */
+   if (err == -EEXIST  !(flags  NLM_F_EXCL))
+   err = 0;
+
+   return err;
+}
+
+static int ixgbe_ndo_fdb_del(struct ndmsg *ndm,
+struct net_device *dev,
+unsigned char *addr)
+{
+   struct ixgbe_adapter *adapter = netdev_priv(dev);
+   int err = -EOPNOTSUPP;
+
+   if (ndm-ndm_state  NUD_PERMANENT) {
+   pr_info(%s: FDB only supports static addresses\n,
+   ixgbe_driver_name);
+   return -EINVAL;
+   }
+
+   if (adapter-flags  IXGBE_FLAG_SRIOV_ENABLED)
+   err = dev_uc_del(dev, addr);
+
+   return err;
+}
+
+static int ixgbe_ndo_fdb_dump(struct sk_buff *skb,
+ struct netlink_callback *cb,
+ struct net_device *dev,
+ int idx)
+{
+   struct ixgbe_adapter *adapter = netdev_priv(dev);
+
+   if (adapter-flags  IXGBE_FLAG_SRIOV_ENABLED)
+   idx = ndo_dflt_fdb_dump(skb, cb, dev, idx);
+
+   return idx;
+}
+
 static const struct net_device_ops ixgbe_netdev_ops = {
.ndo_open   = ixgbe_open,
.ndo_stop   = ixgbe_close,
@@ -7620,6 +7676,9 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 #endif /* IXGBE_FCOE */
.ndo_set_features = ixgbe_set_features,
.ndo_fix_features = ixgbe_fix_features,
+   .ndo_fdb_add= ixgbe_ndo_fdb_add,
+   .ndo_fdb_del= ixgbe_ndo_fdb_del,
+   .ndo_fdb_dump   = ixgbe_ndo_fdb_dump,
 };
 
 static void __devinit ixgbe_probe_vf(struct ixgbe_adapter *adapter,

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next PATCH v0 5/5] ixgbe: allow RAR table to be updated in promisc mode

2012-03-19 Thread John Fastabend
This allows RAR table updates while in promiscuous. With
SR-IOV enabled it is valuable to allow the RAR table to
be updated even when in promisc mode to configure forwarding

Signed-off-by: John Fastabend john.r.fastab...@intel.com
---

 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   21 +++--
 1 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 32adb4f..d1925b5 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3400,16 +3400,17 @@ void ixgbe_set_rx_mode(struct net_device *netdev)
}
ixgbe_vlan_filter_enable(adapter);
hw-addr_ctrl.user_set_promisc = false;
-   /*
-* Write addresses to available RAR registers, if there is not
-* sufficient space to store all the addresses then enable
-* unicast promiscuous mode
-*/
-   count = ixgbe_write_uc_addr_list(netdev);
-   if (count  0) {
-   fctrl |= IXGBE_FCTRL_UPE;
-   vmolr |= IXGBE_VMOLR_ROPE;
-   }
+   }
+
+   /*
+* Write addresses to available RAR registers, if there is not
+* sufficient space to store all the addresses then enable
+* unicast promiscuous mode
+*/
+   count = ixgbe_write_uc_addr_list(netdev);
+   if (count  0) {
+   fctrl |= IXGBE_FCTRL_UPE;
+   vmolr |= IXGBE_VMOLR_ROPE;
}
 
if (adapter-num_vfs) {

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Questing regarding KVM Guest PMU

2012-03-19 Thread Gleb Natapov
On Mon, Mar 19, 2012 at 12:20:30PM +0530, shashank rachamalla wrote:
 On Sun, Mar 18, 2012 at 10:21 PM, Gleb Natapov g...@redhat.com wrote:
  On Sun, Mar 18, 2012 at 09:47:55PM +0530, shashank rachamalla wrote:
   I guess things are working fine with perf. But why not with oprofile ?
  
   Looks like it. I never tried oprofile. Will try to reproduce your
   problem and see what oprofile is doing.
 
  I am using ubuntu 10.04 with 2.6.32-21-generic kernel as guest and
  oprofile 0.9.6.
  Also, I have tried to capture kvm-events ( perf patch ) in host while
  running oprofile and perf in guest.
  Please see the attachment. I have run the tests in three cases for the
  around 5 secs.
 
  There are more number of MSR reads and writes in case of perf which I
  think is normal. However, there are very few MSR reads and writes with
  oprofile. Also, the number of NMI exceptions are too high in case of
  oprofile.
 
  Which host kernel are you using? Try latest kvm.git and check if you see
  something unusual in dmesg.
 
 Currenly running 3.3.0-rc5. will try with the latest source from kvm
 git and let you know.
 
 
Thanks, there were some fixes that didn't make it into 3.3. rdpmc
instruction emulation fix is one of them. If oprofile uses it this can
explain the problem.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next PATCH v0 1/5] net: add generic PF_BRIDGE:RTM_ FDB hooks

2012-03-19 Thread John Fastabend
Forgot to change the title resending with a title that won't
be dropped by netdev and kvm mailing lists. And updated my
local repo so it won't happen again.

---

This adds two new flags NTF_MASTER and NTF_LOWERDEV that can
now be used to specify where PF_BRIDGE netlink commands should
be sent. NTF_MASTER sends the commands to the 'dev-master'
device for parsing. Typically this will be the linux net/bridge,
macvlan, or open-vswitch devices. Also without any flags set
the command will be handled by the master device as well so
that current user space tools continue to work as expected.

The NTF_LOWERDEV flag will push the PF_BRIDGE commands to the
device. In the basic example below the commands are then parsed
and programmed in the embedded bridge.

Note if both NTF_LOWERDEV and NTF_MASTER bits are set then the
command will be sent both to 'dev-master' and 'dev' this allows
user space to easily keep the embedded bridge and software bridge
in sync.

To support this new net device ops were added to call into
the device and the existing bridging code was refactored
to use these. There should be no change from user space.

A basic setup with a SR-IOV enabled NIC looks like this,

  veth0  veth2
|  |
  
  |  bridge0 |    software bridging
  
   /
   /
  ethx.y  ethx
VF PF
 \ \   propagate FDB entries to HW
 \ \
  
  |  Embedded Bridge | hardware offloaded switching
  

In this case the embedded bridge must be managed to allow 'veth0'
to communicate with 'ethx.y' correctly. At present drivers managing
the embedded bridge either send frames onto the network which
then get dropped by the switch OR the embedded bridge will flood
these frames. With this patch we have a mechanism to manage the
embedded bridge correctly from user space. This example is specific
to SR-IOV but replacing the VF with another PF or dropping this
into the DSA framework generates similar management issues.

Examples session using the 'br'[1] tool to add, dump and then
delete a mac address with a new embedded option and enabled
ixgbe driver:

# br fdb add 22:35:19:ac:60:59 dev eth3
# br fdb
portmac addrflags
veth0   22:35:19:ac:60:58   static
veth0   9a:5f:81:f7:f6:ec   local
eth300:1b:21:55:23:59   local
eth322:35:19:ac:60:59   static
veth0   22:35:19:ac:60:57   static
#br fdb add 22:35:19:ac:60:59 embedded dev eth3
#br fdb
portmac addrflags
veth0   22:35:19:ac:60:58   static
veth0   9a:5f:81:f7:f6:ec   local
eth300:1b:21:55:23:59   local
eth322:35:19:ac:60:59   static
veth0   22:35:19:ac:60:57   static
eth322:35:19:ac:60:59   local embedded
#br fdb del 22:35:19:ac:60:59 embedded dev eth3

I added a couple lines to 'br' to set the flags correctly is all. It
is my opinion that the merit of this patch is now embedded and SW
bridges can both be modeled correctly in user space using very nearly
the same message passing.

[1] 'br' tool was published as an RFC here and will be renamed 'bridge'
http://patchwork.ozlabs.org/patch/117664/

Thanks to Jamal Hadi Salim, Stephen Hemminger and Ben Hutchings for
valuable feedback, suggestions, and review.

Signed-off-by: John Fastabend john.r.fastab...@intel.com
---

 include/linux/neighbour.h |3 +
 include/linux/netdevice.h |   26 
 include/linux/rtnetlink.h |4 +
 net/bridge/br_device.c|3 +
 net/bridge/br_fdb.c   |  128 ++
 net/bridge/br_netlink.c   |   12 
 net/bridge/br_private.h   |   15 -
 net/core/rtnetlink.c  |  138 +
 8 files changed, 217 insertions(+), 112 deletions(-)

diff --git a/include/linux/neighbour.h b/include/linux/neighbour.h
index b188f68..3a94409 100644
--- a/include/linux/neighbour.h
+++ b/include/linux/neighbour.h
@@ -33,6 +33,9 @@ enum {
 #define NTF_PROXY  0x08/* == ATF_PUBL */
 #define NTF_ROUTER 0x80
 
+#define NTF_LOWERDEV   0x02
+#define NTF_MASTER 0x04
+
 /*
  * Neighbor Cache Entry States.
  */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4535a4e..4208901 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -54,6 +54,7 @@
 #include net/netprio_cgroup.h
 
 #include linux/netdev_features.h
+#include linux/neighbour.h
 
 struct netpoll_info;
 struct phy_device;
@@ -904,6 +905,19 @@ struct netdev_fcoe_hbainfo {
  * feature set might be less than what was returned by ndo_fix_features()).
  * Must return 0 or -errno if it changed dev-features itself.
  *
+ * int (*ndo_fdb_add)(struct ndmsg *ndm, struct net_device *dev,
+ *   unsigned char *addr, u16 flags)
+ * Adds an FDB entry to dev for addr. The ndmsg contains flags to indicate
+ * if the dev-master FDB 

Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-19 Thread Wen Congyang
At 03/08/2012 03:57 PM, Wen Congyang Wrote:
 We can know the guest is paniced when the guest runs on xen.
 But we do not have such feature on kvm.
 
 Another purpose of this feature is: management app(for example:
 libvirt) can do auto dump when the guest is crashed. If management
 app does not do auto dump, the guest's user can do dump by hand if
 he sees the guest is paniced.
 
 I touch the hypervisor instead of using virtio-serial, because
 1. it is simple
 2. the virtio-serial is an optional device, and the guest may
not have such device.
 
 Changes from v2 to v3:
 1. correct spelling
 
 Changes from v1 to v2:
 1. split up host and guest-side changes
 2. introduce new request flag to avoid changing return values.
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 


Hi all:

we neet this feature, but we don't decide how to implement it.
We have two solution:
1. use vmcall
2. use virtio-serial.

I will not change this patch set before we decide how to do it.
Can we make a decision recent days?

Thanks
Wen Congyang
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PTACH] libcacard: fix VCARD_ATTR_PREFIX macro

2012-03-19 Thread Marcel Heinz

Hi,

Alon Levy al...@redhat.com wrote:

Thanks for the patch, but I think you are using a not up to date tree,
it's fixed by:


commit 0202181245297a9e847c05f4a18623219d95e93e
Author: Hans de Goede hdego...@redhat.com
Date:   Fri Mar 2 16:49:44 2012 +0100

libcacard: Fix compilation with gcc-4.7

(same fix).


My fault, sorry for the noise.

Regards,
Marcel


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] kvm/powerpc: Add new ioctl to retreive support page sizes and encodings

2012-03-19 Thread Avi Kivity
On 03/18/2012 10:47 PM, Benjamin Herrenschmidt wrote:
 On Sun, 2012-03-18 at 12:23 +0200, Avi Kivity wrote:
  -ENODOCS

 What kind of docs do you expect ? Where ? 

Documentation/virtual/kvm/api.txt.

 I don't see any of the other
 private ioctls we use on ppc documented either...


Please send patches.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: Switch to srcu-less get_dirty_log()

2012-03-19 Thread Xiao Guangrong
On 03/16/2012 05:44 PM, Takuya Yoshikawa wrote:

 On Fri, 16 Mar 2012 16:28:56 +0800
 Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote:
 
 Thanks for your explanation, maybe you are right, i do not know migration
 much.

 What i worried about is, you have changed the behaviour of GET_DIRTY_LOG,
 in the current one, it can get all the dirty pages when it is called; after
 your change, GET_DIRTY_LOG can get a empty dirty bitmap but dirty page 
 exists.
 
 The current code also see the same situation because nothing prevents the
 guest from writing to pages before GET_DIRTY_LOG returns and the userspace
 checks the bitmap.  Everything is running.
 


The current code is under the protection of s-rcu:
IIRC, it always holds s-rcu when write guest page and set dirty bit,
that mean the dirty page is logged either in the old dirty_bitmap or in the
current memslot-dirty_bitmap. Yes?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 2/3] virtio-scsi: add error handling

2012-03-19 Thread Hu Tao
On Sun, Feb 05, 2012 at 12:16:01PM +0100, Paolo Bonzini wrote:
 This commit adds basic error handling to the virtio-scsi
 HBA device.  Task management functions are sent synchronously
 via the control virtqueue.
 
 Cc: linux-scsi linux-s...@vger.kernel.org
 Cc: Rusty Russell ru...@rustcorp.com.au
 Cc: Michael S. Tsirkin m...@redhat.com
 Cc: kvm@vger.kernel.org
 Acked-by: Pekka Enberg penb...@kernel.org 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
   v3-v4: fixed 32-bit compilation; adjusted call to virtscsi_kick_cmd
 
   v2-v3: added mempool, used GFP_NOIO instead of GFP_ATOMIC,
   formatting fixes
 
   v1-v2: use scmd_printk
 
  drivers/scsi/virtio_scsi.c |   73 
 +++-
  1 files changed, 72 insertions(+), 1 deletions(-)
 
 diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
 index 3f87ae0..68104cd 100644
 --- a/drivers/scsi/virtio_scsi.c
 +++ b/drivers/scsi/virtio_scsi.c
 @@ -29,6 +29,7 @@
  /* Command queue element */
  struct virtio_scsi_cmd {
   struct scsi_cmnd *sc;
 + struct completion *comp;
   union {
   struct virtio_scsi_cmd_req   cmd;
   struct virtio_scsi_ctrl_tmf_req  tmf;
 @@ -168,11 +169,12 @@ static void virtscsi_req_done(struct virtqueue *vq)
   virtscsi_vq_done(vq, virtscsi_complete_cmd);
  };
  
 -/* These are still stubs.  */
  static void virtscsi_complete_free(void *buf)
  {
   struct virtio_scsi_cmd *cmd = buf;
  
 + if (cmd-comp)
 + complete_all(cmd-comp);
   mempool_free(cmd, virtscsi_cmd_pool);
  }
  
 @@ -306,12 +308,81 @@ out:
   return ret;
  }
  
 +static int virtscsi_tmf(struct virtio_scsi *vscsi, struct virtio_scsi_cmd 
 *cmd)
 +{
 + DECLARE_COMPLETION_ONSTACK(comp);
 + int ret;
 +
 + cmd-comp = comp;
 + ret = virtscsi_kick_cmd(vscsi, vscsi-ctrl_vq, cmd,
 +sizeof cmd-req.tmf, sizeof cmd-resp.tmf,
 +GFP_NOIO);
 + if (ret  0)
 + return FAILED;
 +
 + wait_for_completion(comp);
 + if (cmd-resp.tmf.response != VIRTIO_SCSI_S_OK 
 + cmd-resp.tmf.response != VIRTIO_SCSI_S_FUNCTION_SUCCEEDED)
 + return FAILED;
 +
 + return SUCCESS;
 +}

Hi Paolo,

This is against v5.

From 34ef5e64fc205044e4326fcc5dcf2aa6b219763a Mon Sep 17 00:00:00 2001
From: Hu Tao hu...@cn.fujitsu.com
Date: Mon, 19 Mar 2012 15:58:22 +0800
Subject: [PATCH] fix two problems in tmf

This patch fix two problems in tmf:

  1. race in virtscsi_tmf that the cmd may have been already freed
 when waking up from the completion

  2. cmd leak if virtscsi_kick_cmd fails.


Signed-off-by: Hu Tao hu...@cn.fujitsu.com
---
 drivers/scsi/virtio_scsi.c |   17 ++---
 1 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index efccd72..3b8a6e6 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -175,7 +175,8 @@ static void virtscsi_complete_free(void *buf)
 
if (cmd-comp)
complete_all(cmd-comp);
-   mempool_free(cmd, virtscsi_cmd_pool);
+   else
+   mempool_free(cmd, virtscsi_cmd_pool);
 }
 
 static void virtscsi_ctrl_done(struct virtqueue *vq)
@@ -311,21 +312,23 @@ out:
 static int virtscsi_tmf(struct virtio_scsi *vscsi, struct virtio_scsi_cmd *cmd)
 {
DECLARE_COMPLETION_ONSTACK(comp);
-   int ret;
+   int ret = FAILED;
 
cmd-comp = comp;
ret = virtscsi_kick_cmd(vscsi, vscsi-ctrl_vq, cmd,
   sizeof cmd-req.tmf, sizeof cmd-resp.tmf,
   GFP_NOIO);
if (ret  0)
-   return FAILED;
+   goto failed;
 
wait_for_completion(comp);
-   if (cmd-resp.tmf.response != VIRTIO_SCSI_S_OK 
-   cmd-resp.tmf.response != VIRTIO_SCSI_S_FUNCTION_SUCCEEDED)
-   return FAILED;
+   if (cmd-resp.tmf.response == VIRTIO_SCSI_S_OK ||
+   cmd-resp.tmf.response == VIRTIO_SCSI_S_FUNCTION_SUCCEEDED)
+   ret = SUCCESS;
 
-   return SUCCESS;
+failed:
+   mempool_free(cmd, virtscsi_cmd_pool);
+   return ret;
 }
 
 static int virtscsi_device_reset(struct scsi_cmnd *sc)
-- 
1.7.1



[PATCH] pci-assign: Fall back to host-side MSI if INTx sharing fails

2012-03-19 Thread Jan Kiszka
If the host or the device does not support INTx sharing, retry the IRQ
assignment with host-side MSI support enabled but warn about potential
consequences. This allows to preserve the previous behavior where we
defaulted to MSI and did not support INTx sharing at all.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---

Detecting if the user actually specified prefer_msi=off as property of
pci-assign is non-trivial. So I decided to go for the retry approach,
ignoring potential user requests. The warning should attract the
attention.

 hw/device-assignment.c |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 89823f1..c953713 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -835,6 +835,7 @@ static int assign_irq(AssignedDevice *dev)
 dev-irq_requested_type = 0;
 }
 
+retry:
 assigned_irq_data.flags = KVM_DEV_IRQ_GUEST_INTX;
 if (dev-features  ASSIGNED_DEVICE_PREFER_MSI_MASK 
 dev-cap.available  ASSIGNED_DEVICE_CAP_MSI)
@@ -844,6 +845,17 @@ static int assign_irq(AssignedDevice *dev)
 
 r = kvm_assign_irq(kvm_state, assigned_irq_data);
 if (r  0) {
+if (r == -EIO  !(dev-features  ASSIGNED_DEVICE_PREFER_MSI_MASK) 
+dev-cap.available  ASSIGNED_DEVICE_CAP_MSI) {
+/* Retry with host-side MSI. There might be an IRQ conflict and
+ * either the kernel or the device doesn't support sharing. */
+fprintf(stderr,
+Host-side INTx sharing not supported, 
+using MSI instead.\n
+Some devices do not to work properly in this mode.\n);
+dev-features |= ASSIGNED_DEVICE_PREFER_MSI_MASK;
+goto retry;
+}
 fprintf(stderr, Failed to assign irq for \%s\: %s\n,
 dev-dev.qdev.id, strerror(-r));
 fprintf(stderr, Perhaps you are assigning a device 
-- 
1.7.3.4
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] virtio-pci: fix abort when fail to allocate ioeventfd

2012-03-19 Thread Stefan Hajnoczi
On Fri, Mar 16, 2012 at 04:59:35PM +0800, Amos Kong wrote:
 On 14/03/12 19:46, Stefan Hajnoczi wrote:
 On Wed, Mar 14, 2012 at 10:46 AM, Avi Kivitya...@redhat.com  wrote:
 On 03/14/2012 12:39 PM, Stefan Hajnoczi wrote:
 On Wed, Mar 14, 2012 at 10:05 AM, Avi Kivitya...@redhat.com  wrote:
 On 03/14/2012 11:59 AM, Stefan Hajnoczi wrote:
 On Wed, Mar 14, 2012 at 9:22 AM, Avi Kivitya...@redhat.com  wrote:
 On 03/13/2012 12:42 PM, Amos Kong wrote:
 Boot up guest with 232 virtio-blk disk, qemu will abort for fail to
 allocate ioeventfd. This patchset changes kvm_has_many_ioeventfds(),
 and check if available ioeventfd exists. If not, virtio-pci will
 fallback to userspace, and don't use ioeventfd for io notification.
 
 How about an alternative way of solving this, within the memory core:
 trap those writes in qemu and write to the ioeventfd yourself.  This way
 ioeventfds work even without kvm:
 
 
   core: create eventfd
   core: install handler for memory address that writes to ioeventfd
   kvm (optional): install kernel handler for ioeventfd
 
 Can you give some detail about this? I'm not familiar with Memory API.
 
 
 btw, can we fix this problem by replacing abort() by a error note?
 virtio-pci will auto fallback to userspace.
 
 diff --git a/kvm-all.c b/kvm-all.c
 index 3c6b4f0..cf23dbf 100644
 --- a/kvm-all.c
 +++ b/kvm-all.c
 @@ -749,7 +749,8 @@ static void
 kvm_mem_ioeventfd_add(MemoryRegionSection *section,
  r = kvm_set_ioeventfd_mmio_long(fd,
 section-offset_within_address_space,
  data, true);
  if (r  0) {
 -abort();
 +fprintf(stderr, %s: unable to map ioeventfd: %s.\nFallback to 
 +userspace (slower).\n, __func__, strerror(-r));

The challenge is propagating the error code.  If virtio-pci.c doesn't
know that ioeventfd has failed, then it's not possible to fall back to a
userspace handler.

I believe Avi's suggestion is to put the fallback code into the KVM
memory API implementation so that virtio-pci.c doesn't need to know that
ioeventfd failed at all.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: Switch to srcu-less get_dirty_log()

2012-03-19 Thread Takuya Yoshikawa
On Mon, 19 Mar 2012 17:34:49 +0800
Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote:

 The current code is under the protection of s-rcu:
 IIRC, it always holds s-rcu when write guest page and set dirty bit,
 that mean the dirty page is logged either in the old dirty_bitmap or in the
 current memslot-dirty_bitmap. Yes?


Yes.

I just wanted to explain that getting clear dirty bitmap by GET_DIRTY_LOG
does not mean there is no dirty page: it just means that there was nothing
logged when we updated the bitmap in get_dirty_log().

We cannot know if anything happend between the bitmap update and result
check in the userspace.

So even when we get a clear bitmap, we need to stop the VCPU threads and
then do GET_DIRTY_LOG once more for live migration.


The important thing is that every bit set by mark_page_dirty() can be
found at some time in the future, including the final GET_DIRTY_LOG.

Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] virtio-pci: fix abort when fail to allocate ioeventfd

2012-03-19 Thread Avi Kivity
On 03/16/2012 10:59 AM, Amos Kong wrote:

 Can you give some detail about this? I'm not familiar with Memory API.

Well there's a huge amount of detail needed here.  The idea is that
memory_region_add_eventfd() will always work, with or without kvm, and
even if kvm is enabled but we run out of ioeventfds.

One way to do this is to implement core_eventfd_add() in exec.c.  This
is unlikely to be easy however.


 btw, can we fix this problem by replacing abort() by a error note?
 virtio-pci will auto fallback to userspace.

But other users will silently break, need to audit all other users of
ioeventfd, for example ivshmem.


-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: performance trouble

2012-03-19 Thread Gleb Natapov
On Fri, Mar 16, 2012 at 11:13:31AM +0100, David Cure wrote:
   hello,
 
   sorry for the delay,
 
 Le Thu, Feb 23, 2012 at 10:38:07AM +0200, Gleb Natapov ecrivait :
  
  Ah, I guess the reason is that it records events only of IO thread. You
  need to trace all vcpu threads too. Not sure trace-cmd allows more then
  one -P option though.
 
   I manage to have the physical server with only one VM with the
 slowly function and take trace during the slowly function.
   I upload trace in http://www.roullier.net/Report/ with :
   o report.txt.3.1.gz : with kernel 3.1
   o report.txt.3.2.gz : with kernel 3.2
   o report.txt.vhost-net-3.1.gz : with kernel 3.1 and vhost-net
   o report.txt.vhost-net.3.2.gz : with kernel 3.2 and vhost-net
 
   With 3.2 + vhost-net we have 10.5s (to remember 8s with
 vmware esxi 4).
 
Can you run the same test on Linux guest and on Windows vm with 1 cpu?
I see a lot of IPIs between vcpus.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 7/7] RTC:Allow to migrate from old version

2012-03-19 Thread Avi Kivity
On 03/19/2012 08:14 AM, Zhang, Yang Z wrote:
 The new logic is compatible with old. So it should not block migrate from old 
 version. But new version cannot migrate to old.

 +static int rtc_load_old(QEMUFile *f, void *opaque, int version_id)
 +{
 +RTCState *s = opaque;
 +
 +if (version_id  2) {
 +return -EINVAL;
 +}
 +
 +qemu_get_buffer(f, s-cmos_data, sizeof(s-cmos_data));
 +/* dummy load for compatibility */
 +qemu_get_byte(f); /* cmos_index */
 +qemu_get_be32(f); /* tm_sec */
 +qemu_get_be32(f); /* tm_min */
 +qemu_get_be32(f); /* tm_hour */
 +qemu_get_be32(f); /* tm_wday */
 +qemu_get_be32(f); /* tm_mday */
 +qemu_get_be32(f); /* tm_mon */
 +qemu_get_be32(f); /* tm_year */
 +qemu_get_be64(f); /* periodic_timer */
 +qemu_get_be64(f); /* next_periodic_time */
 +qemu_get_be64(f); /* next_second_time */
 +qemu_get_be64(f); /* second_timer */
 +qemu_get_be64(f); /* second_timer2 */
 +qemu_get_be32(f); /* irq_coalesced */
 +qemu_get_be32(f); /* period */
 +

Why don't you just convert the data to the new in-memory format?  Then
you don't need a version update.

 +
 +rtc_set_date_from_host(s-dev);

If the guest is intentionally running with an incorrect date, this breaks.


-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 2/3] virtio-scsi: add error handling

2012-03-19 Thread Paolo Bonzini
Il 19/03/2012 10:55, Hu Tao ha scritto:
 + int ret = FAILED;
  
   cmd-comp = comp;
   ret = virtscsi_kick_cmd(vscsi, vscsi-ctrl_vq, cmd,
  sizeof cmd-req.tmf, sizeof cmd-resp.tmf,
  GFP_NOIO);
   if (ret  0)
 - return FAILED;
 + goto failed;

This will return the errno, not FAILED.

I have already fixed this up locally, though I've been lazy on actually
sending out the fix.  I'll do this today.

Paolo

   wait_for_completion(comp);
 - if (cmd-resp.tmf.response != VIRTIO_SCSI_S_OK 
 - cmd-resp.tmf.response != VIRTIO_SCSI_S_FUNCTION_SUCCEEDED)
 - return FAILED;
 + if (cmd-resp.tmf.response == VIRTIO_SCSI_S_OK ||
 + cmd-resp.tmf.response == VIRTIO_SCSI_S_FUNCTION_SUCCEEDED)
 + ret = SUCCESS;
  
 - return SUCCESS;
 +failed:
 + mempool_free(cmd, virtscsi_cmd_pool);
 + return ret;

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm-ppc while I'm on vacation

2012-03-19 Thread Paul Mackerras
On Sun, Mar 18, 2012 at 11:10:43PM +0100, Alexander Graf wrote:

 Hence I asked Paul to take on temporary maintainership of the
 kvm-ppc tree for the next 3 weeks. During that time, he'll be
 allowed to send pull requests to Avi and Marcelo and is obliged to
 fix the build whenever it breaks :).

Thanks, Alex.

Avi, Marcelo, what do you plan to do with Alex's existing pull
request?  Are you about to do that pull?

Thanks,
Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm-ppc while I'm on vacation

2012-03-19 Thread Avi Kivity
On 03/19/2012 01:56 PM, Paul Mackerras wrote:
 On Sun, Mar 18, 2012 at 11:10:43PM +0100, Alexander Graf wrote:

  Hence I asked Paul to take on temporary maintainership of the
  kvm-ppc tree for the next 3 weeks. During that time, he'll be
  allowed to send pull requests to Avi and Marcelo and is obliged to
  fix the build whenever it breaks :).

 Thanks, Alex.

 Avi, Marcelo, what do you plan to do with Alex's existing pull
 request?  Are you about to do that pull?


It looks good to me, I'll pull it if Marcelo doesn't beat me to it.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: question regarding intel_idle inside kvm

2012-03-19 Thread Avi Kivity
On 03/16/2012 12:19 AM, Daniel Lezcano wrote:
 Hi all,

 I recently did some modification in the cpuidle core and the patches
 were merge to linux-next.
 Someone reported a problem with the intel_idle cpuidle driver.

 I tried to reproduce the problem with kvm but the kernel fails to
 intialize the driver because of intel_intel_init function fails in the
 processor probe.

 After digging a bit, I found it fails at:

 drivers/idle/intel_idle.c

 static int intel_idle_probe(void)
 {

 ...


 if (boot_cpu_data.cpuid_level  CPUID_MWAIT_LEAF)
 return -ENODEV;

 ^

  ...

 }

 I assumed the virtualized processor does not support this, so I
 specified the -cpu host because the host was running the intel_idle
 driver. But the driver still fails in kvm.

 I was wondering why that happens ? Does anyone have an idea of this
 problem ?

intel_idle() uses mwait, which kvm does not virtualize (it's very
expensive to do so and brings no benefits).

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: question regarding intel_idle inside kvm

2012-03-19 Thread Daniel Lezcano

On 03/19/2012 01:31 PM, Avi Kivity wrote:

On 03/16/2012 12:19 AM, Daniel Lezcano wrote:

Hi all,

I recently did some modification in the cpuidle core and the patches
were merge to linux-next.
Someone reported a problem with the intel_idle cpuidle driver.

I tried to reproduce the problem with kvm but the kernel fails to
intialize the driver because of intel_intel_init function fails in the
processor probe.

After digging a bit, I found it fails at:

drivers/idle/intel_idle.c

static int intel_idle_probe(void)
{

...


 if (boot_cpu_data.cpuid_level  CPUID_MWAIT_LEAF)
 return -ENODEV;

 ^

  ...

}

I assumed the virtualized processor does not support this, so I
specified the -cpu host because the host was running the intel_idle
driver. But the driver still fails in kvm.

I was wondering why that happens ? Does anyone have an idea of this
problem ?

intel_idle() uses mwait, which kvm does not virtualize (it's very
expensive to do so and brings no benefits).


Ok, thanks for the information. I was afraid of that :/
I will go to for a real host then :)

Thanks !
  -- Daniel

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 0/2] support to migrate with IPv6 address

2012-03-19 Thread Amos Kong
Those patches make tcp migration use the help functions in
qemu-socket.c for support IPv6 migration.

Changes from v1:
- split different changes to small patches, it will be easier to review
- fixed some problem according to Kevin's comment

Changes from v2:
- fix issue of returning real error 
- set s-fd to -1 when parse fails, won't call migrate_fd_error()

Changes from v3:
- try to use help functions in qemu-socket.c

---

Amos Kong (2):
  qemu-socket: change inet_connect() to to support nonblock socket
  use inet_listen()/inet_connect() to support ipv6 migration


 migration-tcp.c |   75 +++
 nbd.c   |2 +
 qemu-char.c |2 +
 qemu-sockets.c  |   73 ++
 qemu_socket.h   |4 +--
 ui/vnc.c|2 +
 6 files changed, 82 insertions(+), 76 deletions(-)

-- 
Amos Kong
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 1/2] qemu-socket: change inet_connect() to to support nonblock socket

2012-03-19 Thread Amos Kong
Change inet_connect(const char *str, int socktype) to
inet_connect(const char *str, bool block, int *sock_err),
socktype is unused, block is used to assign if set socket
to block/nonblock, sock_err is used to restore socket error.

Connect's successful for nonblock socket when following errors are returned:
  -EINPROGRESS or -EWOULDBLOCK
  -WSAEALREADY or -WSAEINVAL (win32)

Also change the wrap function inet_connect_opts(QemuOpts *opts)
to inet_connect_opts(QemuOpts *opts, int *sock_err).

Add a bool entry(block) for dummy_opts to tag block type.
Change nbd, vnc to use new interface.

Signed-off-by: Amos Kong ak...@redhat.com
---
 nbd.c  |2 +-
 qemu-char.c|2 +-
 qemu-sockets.c |   73 
 qemu_socket.h  |4 ++-
 ui/vnc.c   |2 +-
 5 files changed, 62 insertions(+), 21 deletions(-)

diff --git a/nbd.c b/nbd.c
index 567e94e..ad4de06 100644
--- a/nbd.c
+++ b/nbd.c
@@ -146,7 +146,7 @@ int tcp_socket_outgoing(const char *address, uint16_t port)
 
 int tcp_socket_outgoing_spec(const char *address_and_port)
 {
-return inet_connect(address_and_port, SOCK_STREAM);
+return inet_connect(address_and_port, true, NULL);
 }
 
 int tcp_socket_incoming(const char *address, uint16_t port)
diff --git a/qemu-char.c b/qemu-char.c
index bb9e3f5..d3543ea 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -2443,7 +2443,7 @@ static CharDriverState *qemu_chr_open_socket(QemuOpts 
*opts)
 if (is_listen) {
 fd = inet_listen_opts(opts, 0);
 } else {
-fd = inet_connect_opts(opts);
+fd = inet_connect_opts(opts, NULL);
 }
 }
 if (fd  0) {
diff --git a/qemu-sockets.c b/qemu-sockets.c
index 6bcb8e3..8ed45f8 100644
--- a/qemu-sockets.c
+++ b/qemu-sockets.c
@@ -51,6 +51,9 @@ static QemuOptsList dummy_opts = {
 },{
 .name = ipv6,
 .type = QEMU_OPT_BOOL,
+},{
+.name = block,
+.type = QEMU_OPT_BOOL,
 },
 { /* end if list */ }
 },
@@ -194,14 +197,15 @@ listen:
 return slisten;
 }
 
-int inet_connect_opts(QemuOpts *opts)
+int inet_connect_opts(QemuOpts *opts, int *sock_err)
 {
 struct addrinfo ai,*res,*e;
 const char *addr;
 const char *port;
 char uaddr[INET6_ADDRSTRLEN+1];
 char uport[33];
-int sock,rc;
+int sock, rc, err;
+bool block;
 
 memset(ai,0, sizeof(ai));
 ai.ai_flags = AI_CANONNAME | AI_ADDRCONFIG;
@@ -210,9 +214,11 @@ int inet_connect_opts(QemuOpts *opts)
 
 addr = qemu_opt_get(opts, host);
 port = qemu_opt_get(opts, port);
+block = qemu_opt_get_bool(opts, block, 0);
 if (addr == NULL || port == NULL) {
 fprintf(stderr, inet_connect: host and/or port not specified\n);
-return -1;
+err = -EINVAL;
+goto err;
 }
 
 if (qemu_opt_get_bool(opts, ipv4, 0))
@@ -224,7 +230,8 @@ int inet_connect_opts(QemuOpts *opts)
 if (0 != (rc = getaddrinfo(addr, port, ai, res))) {
 fprintf(stderr,getaddrinfo(%s,%s): %s\n, addr, port,
 gai_strerror(rc));
-   return -1;
+err = -EINVAL;
+goto err;
 }
 
 for (e = res; e != NULL; e = e-ai_next) {
@@ -241,21 +248,52 @@ int inet_connect_opts(QemuOpts *opts)
 continue;
 }
 setsockopt(sock,SOL_SOCKET,SO_REUSEADDR,(void*)on,sizeof(on));
-
+if (!block) {
+socket_set_nonblock(sock);
+}
 /* connect to peer */
-if (connect(sock,e-ai_addr,e-ai_addrlen)  0) {
-if (NULL == e-ai_next)
-fprintf(stderr, %s: connect(%s,%s,%s,%s): %s\n, __FUNCTION__,
-inet_strfamily(e-ai_family),
-e-ai_canonname, uaddr, uport, strerror(errno));
-closesocket(sock);
-continue;
+do {
+err = 0;
+if (connect(sock, e-ai_addr, e-ai_addrlen)  0) {
+err = -socket_error();
+if (block) {
+break;
+}
+}
+} while (err == -EINTR);
+
+if (err = 0) {
+goto success;
+} else if (!block  (err == -EINPROGRESS || err == -EWOULDBLOCK)) {
+goto success;
+#ifdef _WIN32
+} else if (!block  (sock_err == -WSAEALREADY ||
+  sock_err == -WSAEINVAL)) {
+goto success;
+#endif
 }
-freeaddrinfo(res);
-return sock;
+
+if (NULL == e-ai_next) {
+fprintf(stderr, %s: connect(%s,%s,%s,%s): %s\n, __func__,
+inet_strfamily(e-ai_family),
+e-ai_canonname, uaddr, uport, strerror(errno));
+}
+closesocket(sock);
 }
 freeaddrinfo(res);
+
+err:
+if (sock_err) {
+*sock_err = err;
+}
 return -1;
+
+success:
+freeaddrinfo(res);
+if (sock_err) {
+*sock_err = err;
+}
+return sock;
 }
 
 

[PATCH v4 2/2] use inet_listen()/inet_connect() to support ipv6 migration

2012-03-19 Thread Amos Kong
Use help functions in qemu-socket.c for tcp migration,
which already support ipv6 addresses.

For IPv6 brackets must be mandatory if you require a port.
Referencing to RFC5952, the recommended format is:

 [2312::8274]:5200

test status: Successed
listen side: qemu-kvm  -incoming tcp:[2312::8274]:5200
client side: qemu-kvm ...
 (qemu) migrate -d tcp:[2312::8274]:5200

Signed-off-by: Amos Kong ak...@redhat.com
---
 migration-tcp.c |   75 +++
 1 files changed, 20 insertions(+), 55 deletions(-)

diff --git a/migration-tcp.c b/migration-tcp.c
index 35a5781..6c66c7a 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -81,43 +81,31 @@ static void tcp_wait_for_connect(void *opaque)
 
 int tcp_start_outgoing_migration(MigrationState *s, const char *host_port)
 {
-struct sockaddr_in addr;
-int ret;
-
-ret = parse_host_port(addr, host_port);
-if (ret  0) {
-return ret;
-}
+int sock_err;
 
 s-get_error = socket_errno;
 s-write = socket_write;
 s-close = tcp_close;
 
-s-fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
-if (s-fd == -1) {
-DPRINTF(Unable to open socket);
-return -socket_error();
-}
-
-socket_set_nonblock(s-fd);
+s-fd = inet_connect(host_port, false, sock_err);
 
-do {
-ret = connect(s-fd, (struct sockaddr *)addr, sizeof(addr));
-if (ret == -1) {
-ret = -socket_error();
-}
-if (ret == -EINPROGRESS || ret == -EWOULDBLOCK) {
-qemu_set_fd_handler2(s-fd, NULL, NULL, tcp_wait_for_connect, s);
-return 0;
+if (sock_err == -EINPROGRESS || sock_err == -EWOULDBLOCK) {
+DPRINTF(connect in progress);
+qemu_set_fd_handler2(s-fd, NULL, NULL, tcp_wait_for_connect, s);
+#ifdef _WIN32
+} else if (sock_err == -WSAEALREADY || sock_err == -WSAEINVAL) {
+DPRINTF(connect in progress);
+qemu_set_fd_handler2(s-fd, NULL, NULL, tcp_wait_for_connect, s);
+#endif
+} else if (sock_err  0) {
+DPRINTF(connect failed: %s\n, strerror(-sock_err));
+if (s-fd != -1) {
+migrate_fd_error(s);
 }
-} while (ret == -EINTR);
-
-if (ret  0) {
-DPRINTF(connect failed\n);
-migrate_fd_error(s);
-return ret;
+return sock_err;
+} else {
+migrate_fd_connect(s);
 }
-migrate_fd_connect(s);
 return 0;
 }
 
@@ -157,38 +145,15 @@ out2:
 
 int tcp_start_incoming_migration(const char *host_port)
 {
-struct sockaddr_in addr;
-int val;
 int s;
 
-DPRINTF(Attempting to start an incoming migration\n);
-
-if (parse_host_port(addr, host_port)  0) {
-fprintf(stderr, invalid host/port combination: %s\n, host_port);
-return -EINVAL;
-}
-
-s = qemu_socket(PF_INET, SOCK_STREAM, 0);
-if (s == -1) {
-return -socket_error();
-}
-
-val = 1;
-setsockopt(s, SOL_SOCKET, SO_REUSEADDR, (const char *)val, sizeof(val));
-
-if (bind(s, (struct sockaddr *)addr, sizeof(addr)) == -1) {
-goto err;
-}
-if (listen(s, 1) == -1) {
-goto err;
+s = inet_listen(host_port, NULL, 256, SOCK_STREAM, 0);
+if (s  0) {
+return s;
 }
 
 qemu_set_fd_handler2(s, NULL, tcp_accept_incoming_migration, NULL,
  (void *)(intptr_t)s);
 
 return 0;
-
-err:
-close(s);
-return -socket_error();
 }

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3][Autotest][virt] autotest.base_utils: Move virt.utils.Thread-base_utils.InterruptedThread

2012-03-19 Thread Jiří Župka
It is necessary for adding syncdata class.

Signed-off-by: Jiří Župka jzu...@redhat.com
---
 client/common_lib/base_barrier.py  |2 +-
 client/common_lib/base_utils.py|   65 ++
 .../kvm/tests/migration_with_file_transfer.py  |6 +-
 client/tests/kvm/tests/migration_with_reboot.py|4 +-
 client/tests/kvm/tests/nic_bonding.py  |9 ++-
 client/tests/kvm/tests/vmstop.py   |6 +-
 client/virt/tests/nic_promisc.py   |5 +-
 client/virt/tests/nicdriver_unload.py  |4 +-
 client/virt/tests/ntttcp.py|2 +-
 client/virt/virt_test_utils.py |2 +-
 client/virt/virt_utils.py  |   69 +---
 11 files changed, 88 insertions(+), 86 deletions(-)

diff --git a/client/common_lib/base_barrier.py 
b/client/common_lib/base_barrier.py
index d20916a..df4da49 100644
--- a/client/common_lib/base_barrier.py
+++ b/client/common_lib/base_barrier.py
@@ -50,7 +50,7 @@ class listen_server(object):
 sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
 sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
 sock.bind((self.address, self.port))
-sock.listen(10)
+sock.listen(100)
 
 return sock
 
diff --git a/client/common_lib/base_utils.py b/client/common_lib/base_utils.py
index 972d18a..c40e5dc 100644
--- a/client/common_lib/base_utils.py
+++ b/client/common_lib/base_utils.py
@@ -817,6 +817,71 @@ def run_parallel(commands, timeout=None, 
ignore_status=False,
 return [bg_job.result for bg_job in bg_jobs]
 
 
+class InterruptedThread(Thread):
+
+Run a function in a background thread.
+
+def __init__(self, target, args=(), kwargs={}):
+
+Initialize the instance.
+
+@param target: Function to run in the thread.
+@param args: Arguments to pass to target.
+@param kwargs: Keyword arguments to pass to target.
+
+Thread.__init__(self)
+self._target = target
+self._args = args
+self._kwargs = kwargs
+
+
+def run(self):
+
+Run target (passed to the constructor).  No point in calling this
+function directly.  Call start() to make this function run in a new
+thread.
+
+self._e = None
+self._retval = None
+try:
+try:
+self._retval = self._target(*self._args, **self._kwargs)
+except Exception:
+self._e = sys.exc_info()
+raise
+finally:
+# Avoid circular references (start() may be called only once so
+# it's OK to delete these)
+del self._target, self._args, self._kwargs
+
+
+def join(self, timeout=None, suppress_exception=False):
+
+Join the thread.  If target raised an exception, re-raise it.
+Otherwise, return the value returned by target.
+
+@param timeout: Timeout value to pass to threading.Thread.join().
+@param suppress_exception: If True, don't re-raise the exception.
+
+Thread.join(self, timeout)
+try:
+if self._e:
+if not suppress_exception:
+# Because the exception was raised in another thread, we
+# need to explicitly insert the current context into it
+s = error.exception_context(self._e[1])
+s = error.join_contexts(error.get_context(), s)
+error.set_exception_context(self._e[1], s)
+raise self._e[0], self._e[1], self._e[2]
+else:
+return self._retval
+finally:
+# Avoid circular references (join() may be called multiple times
+# so we can't delete these)
+self._e = None
+self._retval = None
+
+
 @deprecated
 def run_bg(command):
 Function deprecated. Please use BgJob class instead.
diff --git a/client/tests/kvm/tests/migration_with_file_transfer.py 
b/client/tests/kvm/tests/migration_with_file_transfer.py
index 075148d..073b87e 100644
--- a/client/tests/kvm/tests/migration_with_file_transfer.py
+++ b/client/tests/kvm/tests/migration_with_file_transfer.py
@@ -56,13 +56,13 @@ def run_migration_with_file_transfer(test, params, env):
 
 error.context(transferring file to guest while migrating,
   logging.info)
-bg = virt_utils.Thread(vm.copy_files_to, (host_path, guest_path),
-  dict(verbose=True, timeout=transfer_timeout))
+bg = utils.InterruptedThread(vm.copy_files_to, (host_path, guest_path),
+ dict(verbose=True, timeout=transfer_timeout))
 run_and_migrate(bg)
 
 error.context(transferring file back to host while migrating,
   logging.info)
-bg 

[PATCH 3/3][Autotest][virt] virt.virt_utils: Add framework for multihost migration.

2012-03-19 Thread Jiří Župka
Multihost migration framework makes multi host migration
guest with load easy. This patch also replaces old tests
for multihost migration with version which using the framework.

Multihost miration framework take care about:
  - preparing environment before migration
  - preparing guest for migration on source and dest host
  - start guest
  - start work on guest
  - migration between hosts
  - check work on destination host
  - close guest
  - postprocess environment after migraiton

The framework also allow start multiple migraiton independently
in some time with multiple hosts and different work on guests.

Signed-off-by: Jiří Župka jzu...@redhat.com
---
 client/tests/kvm/multi_host.srv|   94 +++--
 client/tests/kvm/tests/cpuflags.py |  203 ---
 client/tests/kvm/tests/migration_multi_host.py |  105 +-
 client/virt/base.cfg.sample|3 +
 client/virt/subtests.cfg.sample|   16 +-
 client/virt/virt_env_process.py|   15 +-
 client/virt/virt_utils.py  |  506 
 client/virt/virt_vm.py |   51 +++
 8 files changed, 726 insertions(+), 267 deletions(-)

diff --git a/client/tests/kvm/multi_host.srv b/client/tests/kvm/multi_host.srv
index a4bb20f..c661253 100644
--- a/client/tests/kvm/multi_host.srv
+++ b/client/tests/kvm/multi_host.srv
@@ -37,14 +37,22 @@ def generate_mac_address():
 return mac
 
 
-def run(pair):
-logging.info(KVM test running on source host [%s] and destination 
- host [%s]\n, pair[0], pair[1])
-
-source = hosts.create_host(pair[0])
-dest = hosts.create_host(pair[1])
-source_at = autotest_remote.Autotest(source)
-dest_at = autotest_remote.Autotest(dest)
+def run(machines):
+logging.info(KVM test running on hosts %s\n, machines)
+class Machines(object):
+def __init__(self, host):
+self.host = host
+self.at = None
+self.params = None
+self.control = None
+
+_hosts = {}
+for machine in machines:
+_hosts[machine] = Machines(hosts.create_host(machine))
+
+ats = []
+for host in _hosts.itervalues():
+host.at = autotest_remote.Autotest(host.host)
 
 cfg_file = os.path.join(KVM_DIR, multi-host-tests.cfg)
 
@@ -56,7 +64,9 @@ def run(pair):
 parser.parse_file(cfg_file)
 test_dicts = parser.get_dicts()
 
-source_control_file = dest_control_file = 
+ips = []
+for host in _hosts.itervalues():
+host.control = 
 testname = kvm
 bindir = os.path.join(job.testdir, testname)
 job.install_pkg(testname, 'test', bindir)
@@ -64,21 +74,29 @@ job.install_pkg(testname, 'test', bindir)
 kvm_test_dir = os.path.join(os.environ['AUTODIR'],'tests', 'kvm')
 sys.path.append(kvm_test_dir)
 
+ips.append(host.host.ip)
 import sys
 
 for params in test_dicts:
-params['srchost'] = source.ip
-params['dsthost'] = dest.ip
+params['hosts'] = ips
+
+params['not_preprocess'] = yes
+for vm in params.get(vms).split():
+for nic in params.get('nics',).split():
+params['nic_mac_%s_%s' % (nic, vm)] = generate_mac_address()
 
-for nic in params.get('nics',).split():
-params['nic_mac_%s' % nic] = generate_mac_address()
+params['mater_images_clone'] = image1
+params['kill_vm'] = yes
 
-source_params = params.copy()
-source_params['role'] = source
+s_host = _hosts[machines[0]]
+s_host.params = params.copy()
+s_host.params['clone_master'] = yes
+s_host.params['hostid'] = machines[0]
 
-dest_params = params.copy()
-dest_params['role'] = destination
-dest_params['migration_mode'] = tcp
+for machine, host in _hosts.items()[1:]:
+host.params = params.copy()
+host.params['clone_master'] = no
+host.params['hostid'] = machine
 
 # Report the parameters we've received
 print Test parameters:
@@ -87,27 +105,31 @@ sys.path.append(kvm_test_dir)
 for key in keys:
 logging.debug(%s = %s, key, params[key])
 
-source_control_file += (job.run_test('kvm', tag='%s', params=%s) %
-(source_params['shortname'], source_params))
-dest_control_file += (job.run_test('kvm', tag='%s', params=%s) %
-  (dest_params['shortname'], dest_params))
+for host in _hosts.itervalues():
+host.control += (job.run_test('kvm', tag='%s', params=%s) %
+ (host.params['shortname'], host.params))
 
-logging.info('Source control file:\n%s', source_control_file)
-logging.info('Destination control file:\n%s', dest_control_file)
-dest_command = subcommand(dest_at.run,
-  [dest_control_file, dest.hostname])
+

[PATCH 5/5] qemu-kvm: i8254: Reorganize i8254-kvm code

2012-03-19 Thread Jan Kiszka
Include i8254-kvm.c instead of building it as a separate module. This
allows to reduce the diff to upstream and will help with merging the
latter.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 Makefile.target |1 -
 hw/i8254-kvm.c  |   12 
 hw/i8254.c  |   43 +--
 hw/i8254.h  |   49 ++---
 4 files changed, 47 insertions(+), 58 deletions(-)

diff --git a/Makefile.target b/Makefile.target
index 5f6b963..24386a4 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -251,7 +251,6 @@ obj-i386-y += testdev.o
 obj-i386-y += acpi.o acpi_piix4.o
 
 obj-i386-y += i8254.o
-obj-i386-$(CONFIG_KVM_PIT) += i8254-kvm.o
 obj-i386-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += device-assignment.o
 
 # shared objects
diff --git a/hw/i8254-kvm.c b/hw/i8254-kvm.c
index 7316111..e2aa4d6 100644
--- a/hw/i8254-kvm.c
+++ b/hw/i8254-kvm.c
@@ -21,14 +21,8 @@
  * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
  * THE SOFTWARE.
  */
-#include hw.h
-#include pc.h
-#include isa.h
-#include qemu-timer.h
-#include i8254.h
-#include qemu-kvm.h
 
-extern VMStateDescription vmstate_pit;
+#ifdef CONFIG_KVM_PIT
 
 static void kvm_pit_pre_save(void *opaque)
 {
@@ -103,7 +97,7 @@ static void dummy_timer(void *opaque)
 {
 }
 
-void kvm_pit_init(PITState *pit)
+static void qemu_kvm_pit_init(PITState *pit)
 {
 PITChannelState *s;
 
@@ -116,3 +110,5 @@ void kvm_pit_init(PITState *pit)
 vmstate_pit.post_load = kvm_pit_post_load;
 return;
 }
+
+#endif
diff --git a/hw/i8254.c b/hw/i8254.c
index ca24ab9..a8e20cb 100644
--- a/hw/i8254.c
+++ b/hw/i8254.c
@@ -30,8 +30,45 @@
 
 //#define DEBUG_PIT
 
+#define RW_STATE_LSB 1
+#define RW_STATE_MSB 2
+#define RW_STATE_WORD0 3
+#define RW_STATE_WORD1 4
+
+typedef struct PITChannelState {
+int count; /* can be 65536 */
+uint16_t latched_count;
+uint8_t count_latched;
+uint8_t status_latched;
+uint8_t status;
+uint8_t read_state;
+uint8_t write_state;
+uint8_t write_latch;
+uint8_t rw_mode;
+uint8_t mode;
+uint8_t bcd; /* not supported */
+uint8_t gate; /* timer start */
+int64_t count_load_time;
+/* irq handling */
+int64_t next_transition_time;
+QEMUTimer *irq_timer;
+qemu_irq irq;
+uint32_t irq_disabled;
+} PITChannelState;
+
+typedef struct PITState {
+ISADevice dev;
+MemoryRegion ioports;
+uint32_t iobase;
+PITChannelState channels[3];
+} PITState;
+
 static void pit_irq_timer_update(PITChannelState *s, int64_t current_time);
 
+#ifdef CONFIG_KVM_PIT
+static void qemu_kvm_pit_init(PITState *pit);
+#endif
+
 static int pit_get_count(PITChannelState *s)
 {
 uint64_t d;
@@ -412,7 +449,7 @@ static int pit_load_old(QEMUFile *f, void *opaque, int 
version_id)
 return 0;
 }
 
-VMStateDescription vmstate_pit = {
+static VMStateDescription vmstate_pit = {
 .name = i8254,
 .version_id = 3,
 .minimum_version_id = 2,
@@ -482,7 +519,7 @@ static int pit_initfn(ISADevice *dev)
 
 #ifdef CONFIG_KVM_PIT
 if (kvm_enabled()  kvm_irqchip_in_kernel()) {
-kvm_pit_init(pit);
+qemu_kvm_pit_init(pit);
 return 0;
 }
 #endif
@@ -531,3 +568,5 @@ static void pit_register_types(void)
 }
 
 type_init(pit_register_types)
+
+#include i8254-kvm.c
diff --git a/hw/i8254.h b/hw/i8254.h
index 3313662..47b2570 100644
--- a/hw/i8254.h
+++ b/hw/i8254.h
@@ -25,55 +25,10 @@
 #ifndef HW_I8254_H
 #define HW_I8254_H
 
+#include hw.h
+#include isa.h
 #include kvm.h
 
-#define PIT_SAVEVM_NAME i8254
-#define PIT_SAVEVM_VERSION 2
-
-#define RW_STATE_LSB 1
-#define RW_STATE_MSB 2
-#define RW_STATE_WORD0 3
-#define RW_STATE_WORD1 4
-
-#define PIT_FLAGS_HPET_LEGACY  1
-
-typedef struct PITChannelState {
-int count; /* can be 65536 */
-uint16_t latched_count;
-uint8_t count_latched;
-uint8_t status_latched;
-uint8_t status;
-uint8_t read_state;
-uint8_t write_state;
-uint8_t write_latch;
-uint8_t rw_mode;
-uint8_t mode;
-uint8_t bcd; /* not supported */
-uint8_t gate; /* timer start */
-int64_t count_load_time;
-/* irq handling */
-int64_t next_transition_time;
-QEMUTimer *irq_timer;
-qemu_irq irq;
-uint32_t irq_disabled;
-} PITChannelState;
-
-struct PITState {
-ISADevice dev;
-MemoryRegion ioports;
-uint32_t iobase;
-PITChannelState channels[3];
-};
-
-void pit_save(QEMUFile *f, void *opaque);
-
-int pit_load(QEMUFile *f, void *opaque, int version_id);
-
-typedef struct PITState PITState;
-
-/* i8254-kvm.c */
-void kvm_pit_init(PITState *pit);
-
 #define PIT_FREQ 1193182
 
 typedef struct PITChannelInfo {
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] qemu-kvm: i8254: Reset broken pit_load_old to upstream version

2012-03-19 Thread Jan Kiszka
pit_load_old is only called with version_id == 1, but
PIT_SAVEVM_VERSION is 2. So this function is broken in qemu_kvm for
ages, and also the dummy qemu_get_be32 is pointless. Revert to upstream.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/i8254.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/hw/i8254.c b/hw/i8254.c
index 8925139..7089832 100644
--- a/hw/i8254.c
+++ b/hw/i8254.c
@@ -390,10 +390,9 @@ static int pit_load_old(QEMUFile *f, void *opaque, int 
version_id)
 PITChannelState *s;
 int i;
 
-if (version_id != PIT_SAVEVM_VERSION)
+if (version_id != 1)
 return -EINVAL;
 
-(void)qemu_get_be32(f);
 for(i = 0; i  3; i++) {
 s = pit-channels[i];
 s-count=qemu_get_be32(f);
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] qemu-kvm: i8254: Revert pit_load_count to upstream version

2012-03-19 Thread Jan Kiszka
pit_irq_timer_update now checks generically if a channel IRQ is
disabled, so we can drop the hacks from qemu-kvm.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/i8254.c |   19 +++
 1 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/hw/i8254.c b/hw/i8254.c
index 7089832..befad05 100644
--- a/hw/i8254.c
+++ b/hw/i8254.c
@@ -187,18 +187,13 @@ void pit_get_channel_info(ISADevice *dev, int channel, 
PITChannelInfo *info)
 info-out = pit_get_out(s, qemu_get_clock_ns(vm_clock));
 }
 
-static inline void pit_load_count(PITState *s, int val, int chan)
+static inline void pit_load_count(PITChannelState *s, int val)
 {
 if (val == 0)
 val = 0x1;
-s-channels[chan].count_load_time = qemu_get_clock_ns(vm_clock);
-s-channels[chan].count = val;
-#ifdef TARGET_I386
-if (chan == 0  s-channels[0].irq_disabled) {
-return;
-}
-#endif
-pit_irq_timer_update(s-channels[chan], 
s-channels[chan].count_load_time);
+s-count_load_time = qemu_get_clock_ns(vm_clock);
+s-count = val;
+pit_irq_timer_update(s, s-count_load_time);
 }
 
 /* if already latched, do not latch again */
@@ -260,17 +255,17 @@ static void pit_ioport_write(void *opaque, uint32_t addr, 
uint32_t val)
 switch(s-write_state) {
 default:
 case RW_STATE_LSB:
-pit_load_count(pit, val, addr);
+pit_load_count(s, val);
 break;
 case RW_STATE_MSB:
-pit_load_count(pit, val  8, addr);
+pit_load_count(s, val  8);
 break;
 case RW_STATE_WORD0:
 s-write_latch = val;
 s-write_state = RW_STATE_WORD1;
 break;
 case RW_STATE_WORD1:
-pit_load_count(pit, s-write_latch | (val  8), addr);
+pit_load_count(s, s-write_latch | (val  8));
 s-write_state = RW_STATE_WORD0;
 break;
 }
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] qemu-kvm: i8254: Drop bogus irq_disabled clearing in pit_reset

2012-03-19 Thread Jan Kiszka
The IRQ output line is reset along with the HPET (via signaling a new
state on the corresponding GPIO line), not the PIT itself.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/i8254.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/hw/i8254.c b/hw/i8254.c
index befad05..ca24ab9 100644
--- a/hw/i8254.c
+++ b/hw/i8254.c
@@ -432,9 +432,6 @@ static void pit_reset(DeviceState *dev)
 PITChannelState *s;
 int i;
 
-#ifdef TARGET_I386
-pit-channels[0].irq_disabled = 0;
-#endif
 for(i = 0;i  3; i++) {
 s = pit-channels[i];
 s-mode = 3;
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] qemu-kvm: i8254/pcspk: Remove unused bits to make PC speaker kvm-aware

2012-03-19 Thread Jan Kiszka
Due to old-style-only creation of the in-kernel PIT (became broken long
ago during refactorings), the kernel always handled the speaker port in
qemu-kvm for a long while. Thus all bits that try to make the user space
speaker emulating kvm-aware are actually unused. Upstream will come with
fully-working speaker emulation even while the in-kernel PIT is enabled,
so let's drop the dead bits from qemu-kvm to ease merging with upstream.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 Makefile.objs   |2 +-
 Makefile.target |4 ++--
 hw/i8254.c  |   44 
 hw/pcspk.c  |6 --
 4 files changed, 3 insertions(+), 53 deletions(-)

diff --git a/Makefile.objs b/Makefile.objs
index c33c0f2..39791ac 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -212,7 +212,7 @@ hw-obj-$(CONFIG_SERIAL) += serial.o
 hw-obj-$(CONFIG_PARALLEL) += parallel.o
 # Moved back to Makefile.target due to #include qemu-kvm.h:
 #hw-obj-$(CONFIG_I8254) += i8254.o
-#hw-obj-$(CONFIG_PCSPK) += pcspk.o
+hw-obj-$(CONFIG_PCSPK) += pcspk.o
 hw-obj-$(CONFIG_PCKBD) += pckbd.o
 hw-obj-$(CONFIG_USB_UHCI) += usb-uhci.o
 hw-obj-$(CONFIG_USB_OHCI) += usb-ohci.o
diff --git a/Makefile.target b/Makefile.target
index ae04331..5f6b963 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -250,7 +250,7 @@ obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
 obj-i386-y += testdev.o
 obj-i386-y += acpi.o acpi_piix4.o
 
-obj-i386-y += pcspk.o i8254.o
+obj-i386-y += i8254.o
 obj-i386-$(CONFIG_KVM_PIT) += i8254-kvm.o
 obj-i386-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += device-assignment.o
 
@@ -308,7 +308,7 @@ obj-lm32-y += milkymist-vgafb.o
 obj-lm32-y += framebuffer.o
 
 obj-mips-y = mips_r4k.o mips_jazz.o mips_malta.o mips_mipssim.o
-obj-mips-y += pcspk.o i8254.o
+obj-mips-y += i8254.o
 obj-mips-y += acpi.o acpi_piix4.o
 obj-mips-y += mips_addr.o mips_timer.o mips_int.o
 obj-mips-y += gt64xxx.o mc146818rtc.o
diff --git a/hw/i8254.c b/hw/i8254.c
index 33d94e1..8925139 100644
--- a/hw/i8254.c
+++ b/hw/i8254.c
@@ -176,55 +176,11 @@ void pit_set_gate(ISADevice *dev, int channel, int val)
 s-gate = val;
 }
 
-#ifdef CONFIG_KVM_PIT
-static void kvm_get_pit_ch2(ISADevice *dev,
-struct kvm_pit_state *inkernel_state)
-{
-struct PITState *pit = DO_UPCAST(struct PITState, dev, dev);
-struct kvm_pit_state pit_state;
-
-if (kvm_enabled()  kvm_irqchip_in_kernel()) {
-kvm_get_pit(kvm_state, pit_state);
-pit-channels[2].mode = pit_state.channels[2].mode;
-pit-channels[2].count = pit_state.channels[2].count;
-pit-channels[2].count_load_time = 
pit_state.channels[2].count_load_time;
-pit-channels[2].gate = pit_state.channels[2].gate;
-if (inkernel_state) {
-memcpy(inkernel_state, pit_state, sizeof(*inkernel_state));
-}
-}
-}
-
-#if 0
-static void kvm_set_pit_ch2(ISADevice *dev,
-struct kvm_pit_state *inkernel_state)
-{
-struct PITState *pit = DO_UPCAST(struct PITState, dev, dev);
-
-if (kvm_enabled()  kvm_irqchip_in_kernel()) {
-inkernel_state-channels[2].mode = pit-channels[2].mode;
-inkernel_state-channels[2].count = pit-channels[2].count;
-inkernel_state-channels[2].count_load_time =
-pit-channels[2].count_load_time;
-inkernel_state-channels[2].gate = pit-channels[2].gate;
-kvm_set_pit(kvm_state, inkernel_state);
-}
-}
-#endif
-#else
-static inline void kvm_get_pit_ch2(ISADevice *dev,
-   struct kvm_pit_state *inkernel_state) { }
-static inline void kvm_set_pit_ch2(ISADevice *dev,
-   struct kvm_pit_state *inkernel_state) { }
-#endif
-
 void pit_get_channel_info(ISADevice *dev, int channel, PITChannelInfo *info)
 {
 PITState *pit = DO_UPCAST(PITState, dev, dev);
 PITChannelState *s = pit-channels[channel];
 
-kvm_get_pit_ch2(dev, NULL);
-
 info-gate = s-gate;
 info-mode = s-mode;
 info-initial_count = s-count;
diff --git a/hw/pcspk.c b/hw/pcspk.c
index bb25ffb..e430324 100644
--- a/hw/pcspk.c
+++ b/hw/pcspk.c
@@ -29,7 +29,6 @@
 #include qemu-timer.h
 #include i8254.h
 #include pcspk.h
-#include qemu-kvm.h
 
 #define PCSPK_BUF_LEN 1792
 #define PCSPK_SAMPLE_RATE 32000
@@ -141,9 +140,6 @@ static void pcspk_io_write(void *opaque, target_phys_addr_t 
addr, uint64_t val,
 {
 PCSpkState *s = opaque;
 const int gate = val  1;
-PITChannelInfo ch;
-
-pit_get_channel_info(s-pit, 2, ch);
 
 s-data_on = (val  1)  1;
 pit_set_gate(s-pit, 2, gate);
@@ -152,8 +148,6 @@ static void pcspk_io_write(void *opaque, target_phys_addr_t 
addr, uint64_t val,
 s-play_pos = 0;
 AUD_set_active_out(s-voice, gate  s-data_on);
 }
-
-/*  kvm_set_pit_ch2(s-pit, inkernel_state); ?? */
 }
 
 static const MemoryRegionOps pcspk_io_ops = {
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe 

[PATCH 0/5] qemu-kvm: Prepare kvm PIT for upstream merge

2012-03-19 Thread Jan Kiszka
Some preparation patches to arrange qemu-kvm for merging in latest qemu
with its own in-kernel PIT support. Later on, we can switch to that
version without losing features on the way, even just temporarily.

Jan Kiszka (5):
  qemu-kvm: i8254/pcspk: Remove unused bits to make PC speaker
kvm-aware
  qemu-kvm: i8254: Reset broken pit_load_old to upstream version
  qemu-kvm: i8254: Revert pit_load_count to upstream version
  qemu-kvm: i8254: Drop bogus irq_disabled clearing in pit_reset
  qemu-kvm: i8254: Reorganize i8254-kvm code

 Makefile.objs   |2 +-
 Makefile.target |5 +-
 hw/i8254-kvm.c  |   12 ++
 hw/i8254.c  |  112 ---
 hw/i8254.h  |   49 +---
 hw/pcspk.c  |6 ---
 6 files changed, 58 insertions(+), 128 deletions(-)

-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pci-assign: Fall back to host-side MSI if INTx sharing fails

2012-03-19 Thread Alex Williamson
On Mon, 2012-03-19 at 10:56 +0100, Jan Kiszka wrote:
 If the host or the device does not support INTx sharing, retry the IRQ
 assignment with host-side MSI support enabled but warn about potential
 consequences. This allows to preserve the previous behavior where we
 defaulted to MSI and did not support INTx sharing at all.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
 
 Detecting if the user actually specified prefer_msi=off as property of
 pci-assign is non-trivial. So I decided to go for the retry approach,
 ignoring potential user requests. The warning should attract the
 attention.
 
  hw/device-assignment.c |   12 
  1 files changed, 12 insertions(+), 0 deletions(-)
 
 diff --git a/hw/device-assignment.c b/hw/device-assignment.c
 index 89823f1..c953713 100644
 --- a/hw/device-assignment.c
 +++ b/hw/device-assignment.c
 @@ -835,6 +835,7 @@ static int assign_irq(AssignedDevice *dev)
  dev-irq_requested_type = 0;
  }
  
 +retry:
  assigned_irq_data.flags = KVM_DEV_IRQ_GUEST_INTX;
  if (dev-features  ASSIGNED_DEVICE_PREFER_MSI_MASK 
  dev-cap.available  ASSIGNED_DEVICE_CAP_MSI)
 @@ -844,6 +845,17 @@ static int assign_irq(AssignedDevice *dev)
  
  r = kvm_assign_irq(kvm_state, assigned_irq_data);
  if (r  0) {
 +if (r == -EIO  !(dev-features  ASSIGNED_DEVICE_PREFER_MSI_MASK) 
 
 +dev-cap.available  ASSIGNED_DEVICE_CAP_MSI) {
 +/* Retry with host-side MSI. There might be an IRQ conflict and
 + * either the kernel or the device doesn't support sharing. */
 +fprintf(stderr,
 +Host-side INTx sharing not supported, 
 +using MSI instead.\n
 +Some devices do not to work properly in this mode.\n);
 +dev-features |= ASSIGNED_DEVICE_PREFER_MSI_MASK;
 +goto retry;
 +}
  fprintf(stderr, Failed to assign irq for \%s\: %s\n,
  dev-dev.qdev.id, strerror(-r));
  fprintf(stderr, Perhaps you are assigning a device 

Acked-by: Alex Williamson alex.william...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC] virtio-pci: add MMIO property

2012-03-19 Thread Michael S. Tsirkin
Currently virtio-pci is specified so that configuration of the device is
done through a PCI IO space (via BAR 0 of the virtual PCI device).
However, Linux guests happen to use ioread/iowrite/iomap primitives
for access, and these work uniformly across memory/io BARs.

While PCI IO accesses are faster than MMIO on x86 kvm,
MMIO might be helpful on other systems which don't
implement PIO or where PIO is slower than MMIO.

Add a property to make it possible to tweak the BAR type.

Signed-off-by: Michael S. Tsirkin m...@redhat.com

This is harmless by default but causes segfaults in memory.c
when enabled. Thus an RFC until I figure out what's wrong.

---
 hw/virtio-pci.c |   16 ++--
 hw/virtio-pci.h |4 
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 28498ec..6f338d2 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -655,6 +655,7 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice 
*vdev)
 {
 uint8_t *config;
 uint32_t size;
+uint8_t bar0_type;
 
 proxy-vdev = vdev;
 
@@ -684,8 +685,14 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice 
*vdev)
 
 memory_region_init_io(proxy-bar, virtio_pci_config_ops, proxy,
   virtio-pci, size);
-pci_register_bar(proxy-pci_dev, 0, PCI_BASE_ADDRESS_SPACE_IO,
- proxy-bar);
+
+if (proxy-flags  VIRTIO_PCI_FLAG_USE_MMIO) {
+bar0_type = PCI_BASE_ADDRESS_SPACE_MEMORY;
+} else {
+bar0_type = PCI_BASE_ADDRESS_SPACE_IO;
+}
+
+pci_register_bar(proxy-pci_dev, 0, bar0_type, proxy-bar);
 
 if (!kvm_has_many_ioeventfds()) {
 proxy-flags = ~VIRTIO_PCI_FLAG_USE_IOEVENTFD;
@@ -823,6 +830,7 @@ static Property virtio_blk_properties[] = {
 DEFINE_PROP_BIT(ioeventfd, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
 DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2),
 DEFINE_VIRTIO_BLK_FEATURES(VirtIOPCIProxy, host_features),
+DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -856,6 +864,7 @@ static Property virtio_net_properties[] = {
 DEFINE_PROP_UINT32(x-txtimer, VirtIOPCIProxy, net.txtimer, 
TX_TIMER_INTERVAL),
 DEFINE_PROP_INT32(x-txburst, VirtIOPCIProxy, net.txburst, TX_BURST),
 DEFINE_PROP_STRING(tx, VirtIOPCIProxy, net.tx),
+DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -888,6 +897,7 @@ static Property virtio_serial_properties[] = {
 DEFINE_PROP_HEX32(class, VirtIOPCIProxy, class_code, 0),
 DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features),
 DEFINE_PROP_UINT32(max_ports, VirtIOPCIProxy, 
serial.max_virtserial_ports, 31),
+DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -915,6 +925,7 @@ static TypeInfo virtio_serial_info = {
 
 static Property virtio_balloon_properties[] = {
 DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features),
+DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -969,6 +980,7 @@ static int virtio_scsi_exit_pci(PCIDevice *pci_dev)
 static Property virtio_scsi_properties[] = {
 DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2),
 DEFINE_VIRTIO_SCSI_PROPERTIES(VirtIOPCIProxy, host_features, scsi),
+DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/virtio-pci.h b/hw/virtio-pci.h
index e560428..e6a8861 100644
--- a/hw/virtio-pci.h
+++ b/hw/virtio-pci.h
@@ -24,6 +24,10 @@
 #define VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT 1
 #define VIRTIO_PCI_FLAG_USE_IOEVENTFD   (1  
VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT)
 
+/* Some guests don't support port IO. Use MMIO instead. */
+#define VIRTIO_PCI_FLAG_USE_MMIO_BIT 2
+#define VIRTIO_PCI_FLAG_USE_MMIO   (1  VIRTIO_PCI_FLAG_USE_MMIO_BIT)
+
 typedef struct {
 PCIDevice pci_dev;
 VirtIODevice *vdev;
-- 
1.7.9.111.gf3fb0
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PATCH: nVMX: Better MSR_IA32_FEATURE_CONTROL handling

2012-03-19 Thread Nadav Har'El
Hi, in a minute I'll send a new version of the MSR_IA32_FEATURE_CONTROL
patch for nested VMX; I just wanted to reply first to your comments so
you'll know what to expect:

On Wed, Mar 07, 2012, Avi Kivity wrote about Re: PATCH: nVMX: Better 
MSR_IA32_FEATURE_CONTROL handling:
 On 03/07/2012 05:58 PM, Nadav Har'El wrote:
  +   u64 msr_ia32_feature_control;
   };
 
 Need to add to the list of exported MSRs so it can be live migrated
 (msrs_to_save).

Did this.

 The variable itself should live in vcpu-arch, even if
 some bits are vendor specific.

But not this. I understand what you explained about vmx.c being for
Intel *hosts*, not about emulating Intel *guests*, but I do think that
since none of the bits in this MSR are relevant on AMD hosts (which
don't do nested VMX), it isn't useful to support this MSR outside vmx.c.

So I left this variable it in vmx-nested. As I noted earlier, svm.c
did exactly the same thing (nested.vm_cr_msr), so at least there's
symmetry here.

  @@ -1999,7 +2000,7 @@ static int vmx_get_vmx_msr(struct kvm_vc
   
  switch (msr_index) {
  case MSR_IA32_FEATURE_CONTROL:
  -   *pdata = 0;
  +   *pdata = to_vmx(vcpu)-nested.msr_ia32_feature_control;
  break;
 
 In a separate patch, please move this outside vmx_get_vmx_msr().  It's
 not a vmx msr.

Done, but not split into two patches: The patch removes the old case in
vmx_get_vmx_msr() (and also removes vmx_set_vmx_msr() entirely) and
instead adds the case in vmx_get_msr() and vmx_set_msr().

  +#define VMXON_NEEDED_FEATURES \
  + (FEATURE_CONTROL_LOCKED | FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX)
 
 Use const u64 instead of #define please, it jars my eyes.

I would, if Linux coding style allowed to declare variables in the
middle of blocks. Unfortunately it doesn't, so I left this #define.
I don't think it's that bad.

-- 
Nadav Har'El|Monday, Mar 19 2012, 
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |A conscience does not prevent sin. It
http://nadav.harel.org.il   |only prevents you from enjoying it.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] qemu-kvm: Drop installation of self-built optionroms

2012-03-19 Thread Jan Kiszka
All corresponding binaries are now in pc-bios, so we can remove this
diff to upstream.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 Makefile |5 -
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/Makefile b/Makefile
index 908954c..1bc3cb0 100644
--- a/Makefile
+++ b/Makefile
@@ -294,12 +294,7 @@ endif
 ifneq ($(BLOBS),)
$(INSTALL_DIR) $(DESTDIR)$(datadir)
set -e; for x in $(BLOBS); do \
-   if [ -f $(SRC_PATH)/pc-bios/$$x ];then \
$(INSTALL_DATA) $(SRC_PATH)/pc-bios/$$x $(DESTDIR)$(datadir); 
\
-   fi \
-   ; if [ -f pc-bios/optionrom/$$x ];then \
-   $(INSTALL_DATA) pc-bios/optionrom/$$x $(DESTDIR)$(datadir); \
-   fi \
done
 endif
$(INSTALL_DIR) $(DESTDIR)$(datadir)/keymaps
-- 
1.7.3.4
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PATCH: nVMX: Better MSR_IA32_FEATURE_CONTROL handling

2012-03-19 Thread Nadav Har'El
The existing code emulated the guest's use of the IA32_FEATURE_CONTROL MSR
in a way that was enough to run nested VMX guests, but did not fully
conform to the VMX specification, and in particular did not allow a guest
BIOS to prevent the guest OS from using VMX by setting the lock bit on this
MSR.

This patch emulates this MSR better, allowing the guest to lock it, and
verifying its setting on VMXON. Also make sure that this MSR (and of course,
VMXON state) is reset on guest vcpu reset (via SIPI).

Signed-off-by: Nadav Har'El n...@il.ibm.com
Reported-by: Julian Stecklina j...@alien8.de
---
 arch/x86/kvm/vmx.c |   43 +++
 arch/x86/kvm/x86.c |3 ++-
 2 files changed, 25 insertions(+), 21 deletions(-)

--- .before/arch/x86/kvm/vmx.c  2012-03-19 18:34:24.0 +0200
+++ .after/arch/x86/kvm/vmx.c   2012-03-19 18:34:24.0 +0200
@@ -352,6 +352,7 @@ struct nested_vmx {
 * we must keep them pinned while L2 runs.
 */
struct page *apic_access_page;
+   u64 msr_ia32_feature_control;
 };
 
 struct vcpu_vmx {
@@ -1998,9 +1999,6 @@ static int vmx_get_vmx_msr(struct kvm_vc
}
 
switch (msr_index) {
-   case MSR_IA32_FEATURE_CONTROL:
-   *pdata = 0;
-   break;
case MSR_IA32_VMX_BASIC:
/*
 * This MSR reports some information about VMX support. We
@@ -2072,21 +2070,6 @@ static int vmx_get_vmx_msr(struct kvm_vc
return 1;
 }
 
-static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
-{
-   if (!nested_vmx_allowed(vcpu))
-   return 0;
-
-   if (msr_index == MSR_IA32_FEATURE_CONTROL)
-   /* TODO: the right thing. */
-   return 1;
-   /*
-* No need to treat VMX capability MSRs specially: If we don't handle
-* them, handle_wrmsr will #GP(0), which is correct (they are readonly)
-*/
-   return 0;
-}
-
 /*
  * Reads an msr value (of 'msr_index') into 'pdata'.
  * Returns 0 on success, non-0 otherwise.
@@ -2129,6 +2112,9 @@ static int vmx_get_msr(struct kvm_vcpu *
case MSR_IA32_SYSENTER_ESP:
data = vmcs_readl(GUEST_SYSENTER_ESP);
break;
+   case MSR_IA32_FEATURE_CONTROL:
+   data = to_vmx(vcpu)-nested.msr_ia32_feature_control;
+   break;
case MSR_TSC_AUX:
if (!to_vmx(vcpu)-rdtscp_enabled)
return 1;
@@ -2197,6 +2183,12 @@ static int vmx_set_msr(struct kvm_vcpu *
}
ret = kvm_set_msr_common(vcpu, msr_index, data);
break;
+   case MSR_IA32_FEATURE_CONTROL:
+   if (to_vmx(vcpu)-nested.msr_ia32_feature_control
+FEATURE_CONTROL_LOCKED)
+   return 1;
+   to_vmx(vcpu)-nested.msr_ia32_feature_control = data;
+   break;
case MSR_TSC_AUX:
if (!vmx-rdtscp_enabled)
return 1;
@@ -2205,8 +2197,6 @@ static int vmx_set_msr(struct kvm_vcpu *
return 1;
/* Otherwise falls through */
default:
-   if (vmx_set_vmx_msr(vcpu, msr_index, data))
-   break;
msr = find_msr_entry(vmx, msr_index);
if (msr) {
msr-data = data;
@@ -3807,6 +3797,8 @@ static int vmx_vcpu_setup(struct vcpu_vm
return 0;
 }
 
+static void free_nested(struct vcpu_vmx *vmx);
+
 static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -3920,6 +3912,9 @@ static int vmx_vcpu_reset(struct kvm_vcp
/* HACK: Don't enable emulation on guest boot/reset */
vmx-emulation_required = 0;
 
+   /* Reset nested-VMX settings: */
+   vmx-nested.msr_ia32_feature_control = 0;
+   free_nested(vmx);
 out:
return ret;
 }
@@ -5031,6 +5026,14 @@ static int handle_vmon(struct kvm_vcpu *
return 1;
}
 
+#define VMXON_NEEDED_FEATURES \
+ (FEATURE_CONTROL_LOCKED | FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX)
+   if ((vmx-nested.msr_ia32_feature_control  VMXON_NEEDED_FEATURES)
+   != VMXON_NEEDED_FEATURES) {
+   kvm_inject_gp(vcpu, 0);
+   return 1;
+   }
+
INIT_LIST_HEAD((vmx-nested.vmcs02_pool));
vmx-nested.vmcs02_num = 0;
 
--- .before/arch/x86/kvm/x86.c  2012-03-19 18:34:24.0 +0200
+++ .after/arch/x86/kvm/x86.c   2012-03-19 18:34:24.0 +0200
@@ -799,7 +799,8 @@ static u32 msrs_to_save[] = {
 #ifdef CONFIG_X86_64
MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR,
 #endif
-   MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA
+   MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
+   MSR_IA32_FEATURE_CONTROL
 };
 
 static unsigned num_msrs_to_save;
--
To unsubscribe from this list: send the line unsubscribe 

Re: [RFC 2/2] kvm: guest-side changes for tmem on KVM

2012-03-19 Thread Konrad Rzeszutek Wilk
On Fri, Mar 16, 2012 at 10:30:35AM +0530, Akshay Karle wrote:
  +/* kvm tmem foundation ops/hypercalls */
  +
  +static inline int kvm_tmem_op(u32 tmem_cmd, u32 tmem_pool, struct 
  tmem_oid oid,
  +  u32 index, u32 tmem_offset, u32 pfn_offset, unsigned long pfn, u32 len, 
  uint16_t cli_id)
 
  That is rather long list of arguments. Could you pass in a structure 
  instead?
 
  Are you actually using all of the arguments in every call?
 
 For different functions different parameters are used. If we want to reduce 
 the number of arguments,
 the tmem_ops structure can be created in the functions calling kvm_tmem_op 
 instead of creating it here
 and that can be passed, will make these changes in the next patch.
 
  +{
  +  struct tmem_ops op;
  +  int rc = 0;
  +  op.cmd = tmem_cmd;
  +  op.pool_id = tmem_pool;
  +  op.u.gen.oid[0] = oid.oid[0];
  +  op.u.gen.oid[1] = oid.oid[1];
  +  op.u.gen.oid[2] = oid.oid[2];
  +  op.u.gen.index = index;
  +  op.u.gen.tmem_offset = tmem_offset;
  +  op.u.gen.pfn_offset = pfn_offset;
  +  op.u.gen.pfn = pfn;
  +  op.u.gen.len = len;
  +  op.u.gen.cli_id = cli_id;
  +  rc = kvm_hypercall1(KVM_HC_TMEM, virt_to_phys(op));
  +  rc = rc + 1000;
 
  Why the addition?
 
 If you notice the host patch I had subtracted 1000 while passing the return 
 value
 in the kvm_emulate_hypercall function. This was to avoid the guest kernel 
 panic due to
 the return of a non-negative value by the kvm_hypercall. In order to get the 
 original value
 back I added 1000.

Avi, is there a right way of doing this?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 2/2] kvm: guest-side changes for tmem on KVM

2012-03-19 Thread Avi Kivity
On 03/19/2012 07:49 PM, Konrad Rzeszutek Wilk wrote:
 On Fri, Mar 16, 2012 at 10:30:35AM +0530, Akshay Karle wrote:
   +/* kvm tmem foundation ops/hypercalls */
   +
   +static inline int kvm_tmem_op(u32 tmem_cmd, u32 tmem_pool, struct 
   tmem_oid oid,
   +u32 index, u32 tmem_offset, u32 pfn_offset, unsigned long pfn, 
   u32 len, uint16_t cli_id)
  
   That is rather long list of arguments. Could you pass in a structure 
   instead?
  
   Are you actually using all of the arguments in every call?
  
  For different functions different parameters are used. If we want to reduce 
  the number of arguments,
  the tmem_ops structure can be created in the functions calling kvm_tmem_op 
  instead of creating it here
  and that can be passed, will make these changes in the next patch.
  
   +{
   +struct tmem_ops op;
   +int rc = 0;
   +op.cmd = tmem_cmd;
   +op.pool_id = tmem_pool;
   +op.u.gen.oid[0] = oid.oid[0];
   +op.u.gen.oid[1] = oid.oid[1];
   +op.u.gen.oid[2] = oid.oid[2];
   +op.u.gen.index = index;
   +op.u.gen.tmem_offset = tmem_offset;
   +op.u.gen.pfn_offset = pfn_offset;
   +op.u.gen.pfn = pfn;
   +op.u.gen.len = len;
   +op.u.gen.cli_id = cli_id;
   +rc = kvm_hypercall1(KVM_HC_TMEM, virt_to_phys(op));
   +rc = rc + 1000;
  
   Why the addition?
  
  If you notice the host patch I had subtracted 1000 while passing the return 
  value
  in the kvm_emulate_hypercall function. This was to avoid the guest kernel 
  panic due to
  the return of a non-negative value by the kvm_hypercall. In order to get 
  the original value
  back I added 1000.

 Avi, is there a right way of doing this?

Why would the guest kernel panic due to the return of a non-negative value?

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: add MMIO property

2012-03-19 Thread Avi Kivity
On 03/19/2012 05:56 PM, Michael S. Tsirkin wrote:
 Currently virtio-pci is specified so that configuration of the device is
 done through a PCI IO space (via BAR 0 of the virtual PCI device).
 However, Linux guests happen to use ioread/iowrite/iomap primitives
 for access, and these work uniformly across memory/io BARs.

 While PCI IO accesses are faster than MMIO on x86 kvm,
 MMIO might be helpful on other systems which don't
 implement PIO or where PIO is slower than MMIO.

 Add a property to make it possible to tweak the BAR type.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com

 This is harmless by default but causes segfaults in memory.c
 when enabled. Thus an RFC until I figure out what's wrong.


Should be done via an extra BAR (with the same layout, perhaps extended)
so compatibility is preserved.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: add MMIO property

2012-03-19 Thread Michael S. Tsirkin
On Mon, Mar 19, 2012 at 07:58:12PM +0200, Avi Kivity wrote:
 On 03/19/2012 05:56 PM, Michael S. Tsirkin wrote:
  Currently virtio-pci is specified so that configuration of the device is
  done through a PCI IO space (via BAR 0 of the virtual PCI device).
  However, Linux guests happen to use ioread/iowrite/iomap primitives
  for access, and these work uniformly across memory/io BARs.
 
  While PCI IO accesses are faster than MMIO on x86 kvm,
  MMIO might be helpful on other systems which don't
  implement PIO or where PIO is slower than MMIO.
 
  Add a property to make it possible to tweak the BAR type.
 
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
 
  This is harmless by default but causes segfaults in memory.c
  when enabled. Thus an RFC until I figure out what's wrong.
 
 
 Should be done via an extra BAR (with the same layout, perhaps extended)
 so compatibility is preserved.

No, that would need guest changes to be of use.  The point of this hack
is to make things work for Linux guests where PIO does not work.

 -- 
 error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH RFC] virtio-pci: add MMIO property

2012-03-19 Thread Anthony Liguori

On 03/19/2012 10:56 AM, Michael S. Tsirkin wrote:

Currently virtio-pci is specified so that configuration of the device is
done through a PCI IO space (via BAR 0 of the virtual PCI device).
However, Linux guests happen to use ioread/iowrite/iomap primitives
for access, and these work uniformly across memory/io BARs.

While PCI IO accesses are faster than MMIO on x86 kvm,
MMIO might be helpful on other systems which don't
implement PIO or where PIO is slower than MMIO.

Add a property to make it possible to tweak the BAR type.

Signed-off-by: Michael S. Tsirkinm...@redhat.com

This is harmless by default but causes segfaults in memory.c
when enabled. Thus an RFC until I figure out what's wrong.


Doesn't this violate the virtio-pci spec?

Making the same vendor/device ID have different semantics depending on a magic 
flag in QEMU seems like a pretty bad idea to me.


Regards,

Anthony Liguori



---
  hw/virtio-pci.c |   16 ++--
  hw/virtio-pci.h |4 
  2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 28498ec..6f338d2 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -655,6 +655,7 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice 
*vdev)
  {
  uint8_t *config;
  uint32_t size;
+uint8_t bar0_type;

  proxy-vdev = vdev;

@@ -684,8 +685,14 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice 
*vdev)

  memory_region_init_io(proxy-bar,virtio_pci_config_ops, proxy,
virtio-pci, size);
-pci_register_bar(proxy-pci_dev, 0, PCI_BASE_ADDRESS_SPACE_IO,
-proxy-bar);
+
+if (proxy-flags  VIRTIO_PCI_FLAG_USE_MMIO) {
+bar0_type = PCI_BASE_ADDRESS_SPACE_MEMORY;
+} else {
+bar0_type = PCI_BASE_ADDRESS_SPACE_IO;
+}
+
+pci_register_bar(proxy-pci_dev, 0, bar0_type,proxy-bar);

  if (!kvm_has_many_ioeventfds()) {
  proxy-flags= ~VIRTIO_PCI_FLAG_USE_IOEVENTFD;
@@ -823,6 +830,7 @@ static Property virtio_blk_properties[] = {
  DEFINE_PROP_BIT(ioeventfd, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
  DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2),
  DEFINE_VIRTIO_BLK_FEATURES(VirtIOPCIProxy, host_features),
+DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
  DEFINE_PROP_END_OF_LIST(),
  };

@@ -856,6 +864,7 @@ static Property virtio_net_properties[] = {
  DEFINE_PROP_UINT32(x-txtimer, VirtIOPCIProxy, net.txtimer, 
TX_TIMER_INTERVAL),
  DEFINE_PROP_INT32(x-txburst, VirtIOPCIProxy, net.txburst, TX_BURST),
  DEFINE_PROP_STRING(tx, VirtIOPCIProxy, net.tx),
+DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
  DEFINE_PROP_END_OF_LIST(),
  };

@@ -888,6 +897,7 @@ static Property virtio_serial_properties[] = {
  DEFINE_PROP_HEX32(class, VirtIOPCIProxy, class_code, 0),
  DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features),
  DEFINE_PROP_UINT32(max_ports, VirtIOPCIProxy, 
serial.max_virtserial_ports, 31),
+DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
  DEFINE_PROP_END_OF_LIST(),
  };

@@ -915,6 +925,7 @@ static TypeInfo virtio_serial_info = {

  static Property virtio_balloon_properties[] = {
  DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features),
+DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
  DEFINE_PROP_END_OF_LIST(),
  };

@@ -969,6 +980,7 @@ static int virtio_scsi_exit_pci(PCIDevice *pci_dev)
  static Property virtio_scsi_properties[] = {
  DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2),
  DEFINE_VIRTIO_SCSI_PROPERTIES(VirtIOPCIProxy, host_features, scsi),
+DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
  DEFINE_PROP_END_OF_LIST(),
  };

diff --git a/hw/virtio-pci.h b/hw/virtio-pci.h
index e560428..e6a8861 100644
--- a/hw/virtio-pci.h
+++ b/hw/virtio-pci.h
@@ -24,6 +24,10 @@
  #define VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT 1
  #define VIRTIO_PCI_FLAG_USE_IOEVENTFD   (1  
VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT)

+/* Some guests don't support port IO. Use MMIO instead. */
+#define VIRTIO_PCI_FLAG_USE_MMIO_BIT 2
+#define VIRTIO_PCI_FLAG_USE_MMIO   (1  VIRTIO_PCI_FLAG_USE_MMIO_BIT)
+
  typedef struct {
  PCIDevice pci_dev;
  VirtIODevice *vdev;


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: KVM inside Oracle VM

2012-03-19 Thread Sever Apostu
-Original Message-
From: Sever Apostu 
Sent: Sunday, March 18, 2012 10:27 PM
To: kvm@vger.kernel.org
Subject: KVM inside Oracle VM

Hi,

I'm planning on building an Oracle VM machine with KVM inside (three KVM 
machines inside one Oracle VM machine). The trouble is I have yet to find any 
reference on that :-) Also, although I am familiar with Xen and Oracle VM, I 
know little about KVM.

What do you think, would this be a reasonable mix of the two virtualization 
techniques ? Any major drawbacks I should consider ?

Thank you,
Sever
--
To unsubscribe from this list: send the line unsubscribe kvm in the body of a 
message to majord...@vger.kernel.org More majordomo info at  
http://vger.kernel.org/majordomo-info.html



Any chance anyone has any feedback about KVM installed inside a Xen guest ?

Thank you,
Sever
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: KVM inside Oracle VM

2012-03-19 Thread Sever Apostu
-Original Message-
From: Paolo Bonzini [mailto:pbonz...@redhat.com] 
Sent: Monday, March 19, 2012 10:01 PM
To: Sever Apostu
Cc: kvm@vger.kernel.org
Subject: Re: KVM inside Oracle VM

Il 19/03/2012 20:29, Sever Apostu ha scritto:
 Any chance anyone has any feedback about KVM installed inside a Xen guest ?

It's really a Xen question more than a KVM question.

Performance and stability will probably suffer.

Paolo


Thank you for the reply, Paolo!

Most likely both will suffer and I am prepared to live with that, but is it 
conceptually possible?

I am asking you guys this instead of trying it myself because I am building a 
project plan that would require someone else providing the infrastructure, so I 
am wondering whether I should give this one day of testing or simply dismiss it 
as absurd :-)

I will follow-up on this on the Xen mailing list as well.

Thank you,
Sever
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2] virtio-pci: add MMIO property

2012-03-19 Thread Michael S. Tsirkin
Currently virtio-pci is specified so that configuration of the device is
done through a PCI IO space (via BAR 0 of the virtual PCI device).
However, Linux guests happen to use ioread/iowrite/iomap primitives
for access, and these work uniformly across memory/io BARs.

While PCI IO accesses are faster than MMIO on x86 kvm,
MMIO might be helpful on other systems:
for example IBM pSeries machines not all firmware/hypervisor
versions necessarily support PCI PIO access on all domains.

Add a property to make it possible to tweak the BAR type.

Signed-off-by: Michael S. Tsirkin m...@redhat.com

---

OK I added old_mmio (BTW: would be nice if ops were checked when
region is inited) and now things work in userspace.
However, when I add ioeventfd=on I get an assert:

qemu/kvm-all.c:747: kvm_mem_ioeventfd_add: Assertion `match_data 
section-size == 4' failed.

How to reproduce:
1. apply patch
2. create virtio device with flags mmio=on,ioeventfd=on


 hw/virtio-pci.c |   68 +-
 hw/virtio-pci.h |5 
 2 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 28498ec..b061000 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -510,8 +510,58 @@ const MemoryRegionPortio virtio_portio[] = {
 PORTIO_END_OF_LIST()
 };
 
+static void virtio_pci_config_mmio_writeb(void *opaque, target_phys_addr_t 
addr, uint32_t val)
+{
+VirtIOPCIProxy *proxy = opaque;
+virtio_pci_config_writeb(opaque, addr  proxy-bar0_mask, val);
+}
+
+static void virtio_pci_config_mmio_writew(void *opaque, target_phys_addr_t 
addr, uint32_t val)
+{
+VirtIOPCIProxy *proxy = opaque;
+virtio_pci_config_writew(opaque, addr  proxy-bar0_mask, val);
+}
+
+static void virtio_pci_config_mmio_writel(void *opaque, target_phys_addr_t 
addr, uint32_t val)
+{
+VirtIOPCIProxy *proxy = opaque;
+virtio_pci_config_writel(opaque, addr  proxy-bar0_mask, val);
+}
+
+static uint32_t virtio_pci_config_mmio_readb(void *opaque, target_phys_addr_t 
addr)
+{
+VirtIOPCIProxy *proxy = opaque;
+return virtio_pci_config_readb(opaque, addr  proxy-bar0_mask);
+}
+
+static uint32_t virtio_pci_config_mmio_readw(void *opaque, target_phys_addr_t 
addr)
+{
+VirtIOPCIProxy *proxy = opaque;
+uint32_t val = virtio_pci_config_readw(opaque, addr  proxy-bar0_mask);
+return val;
+}
+
+static uint32_t virtio_pci_config_mmio_readl(void *opaque, target_phys_addr_t 
addr)
+{
+VirtIOPCIProxy *proxy = opaque;
+uint32_t val = virtio_pci_config_readl(opaque, addr  proxy-bar0_mask);
+return val;
+}
+
 static const MemoryRegionOps virtio_pci_config_ops = {
 .old_portio = virtio_portio,
+.old_mmio = {
+.read = {
+virtio_pci_config_mmio_readb,
+virtio_pci_config_mmio_readw,
+virtio_pci_config_mmio_readl,
+},
+.write = {
+virtio_pci_config_mmio_writeb,
+virtio_pci_config_mmio_writew,
+virtio_pci_config_mmio_writel,
+},
+},
 .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
@@ -655,6 +705,7 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice 
*vdev)
 {
 uint8_t *config;
 uint32_t size;
+uint8_t bar0_type;
 
 proxy-vdev = vdev;
 
@@ -682,10 +733,18 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice 
*vdev)
 if (size  (size-1))
 size = 1  qemu_fls(size);
 
+proxy-bar0_mask = size - 1;
+
 memory_region_init_io(proxy-bar, virtio_pci_config_ops, proxy,
   virtio-pci, size);
-pci_register_bar(proxy-pci_dev, 0, PCI_BASE_ADDRESS_SPACE_IO,
- proxy-bar);
+
+if (proxy-flags  VIRTIO_PCI_FLAG_USE_MMIO) {
+bar0_type = PCI_BASE_ADDRESS_SPACE_MEMORY;
+} else {
+bar0_type = PCI_BASE_ADDRESS_SPACE_IO;
+}
+
+pci_register_bar(proxy-pci_dev, 0, bar0_type, proxy-bar);
 
 if (!kvm_has_many_ioeventfds()) {
 proxy-flags = ~VIRTIO_PCI_FLAG_USE_IOEVENTFD;
@@ -823,6 +882,7 @@ static Property virtio_blk_properties[] = {
 DEFINE_PROP_BIT(ioeventfd, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
 DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2),
 DEFINE_VIRTIO_BLK_FEATURES(VirtIOPCIProxy, host_features),
+DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -856,6 +916,7 @@ static Property virtio_net_properties[] = {
 DEFINE_PROP_UINT32(x-txtimer, VirtIOPCIProxy, net.txtimer, 
TX_TIMER_INTERVAL),
 DEFINE_PROP_INT32(x-txburst, VirtIOPCIProxy, net.txburst, TX_BURST),
 DEFINE_PROP_STRING(tx, VirtIOPCIProxy, net.tx),
+DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -888,6 +949,7 @@ static Property virtio_serial_properties[] = {
 DEFINE_PROP_HEX32(class, VirtIOPCIProxy, class_code, 0),
 

Re: [Qemu-devel] [PATCH RFC] virtio-pci: add MMIO property

2012-03-19 Thread Michael S. Tsirkin
On Mon, Mar 19, 2012 at 02:19:33PM -0500, Anthony Liguori wrote:
 On 03/19/2012 10:56 AM, Michael S. Tsirkin wrote:
 Currently virtio-pci is specified so that configuration of the device is
 done through a PCI IO space (via BAR 0 of the virtual PCI device).
 However, Linux guests happen to use ioread/iowrite/iomap primitives
 for access, and these work uniformly across memory/io BARs.
 
 While PCI IO accesses are faster than MMIO on x86 kvm,
 MMIO might be helpful on other systems which don't
 implement PIO or where PIO is slower than MMIO.
 
 Add a property to make it possible to tweak the BAR type.
 
 Signed-off-by: Michael S. Tsirkinm...@redhat.com
 
 This is harmless by default but causes segfaults in memory.c
 when enabled. Thus an RFC until I figure out what's wrong.
 
 Doesn't this violate the virtio-pci spec?
 

The point is to change the BAR type depending on the architecture.
IO is fastest on x86 but maybe not on other architectures.

 Making the same vendor/device ID have different semantics depending
 on a magic flag in QEMU seems like a pretty bad idea to me.
 
 Regards,
 
 Anthony Liguori

We do this with MSI-X so why not the BAR type?

 
 ---
   hw/virtio-pci.c |   16 ++--
   hw/virtio-pci.h |4 
   2 files changed, 18 insertions(+), 2 deletions(-)
 
 diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
 index 28498ec..6f338d2 100644
 --- a/hw/virtio-pci.c
 +++ b/hw/virtio-pci.c
 @@ -655,6 +655,7 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice 
 *vdev)
   {
   uint8_t *config;
   uint32_t size;
 +uint8_t bar0_type;
 
   proxy-vdev = vdev;
 
 @@ -684,8 +685,14 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, 
 VirtIODevice *vdev)
 
   memory_region_init_io(proxy-bar,virtio_pci_config_ops, proxy,
 virtio-pci, size);
 -pci_register_bar(proxy-pci_dev, 0, PCI_BASE_ADDRESS_SPACE_IO,
 -proxy-bar);
 +
 +if (proxy-flags  VIRTIO_PCI_FLAG_USE_MMIO) {
 +bar0_type = PCI_BASE_ADDRESS_SPACE_MEMORY;
 +} else {
 +bar0_type = PCI_BASE_ADDRESS_SPACE_IO;
 +}
 +
 +pci_register_bar(proxy-pci_dev, 0, bar0_type,proxy-bar);
 
   if (!kvm_has_many_ioeventfds()) {
   proxy-flags= ~VIRTIO_PCI_FLAG_USE_IOEVENTFD;
 @@ -823,6 +830,7 @@ static Property virtio_blk_properties[] = {
   DEFINE_PROP_BIT(ioeventfd, VirtIOPCIProxy, flags, 
  VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
   DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2),
   DEFINE_VIRTIO_BLK_FEATURES(VirtIOPCIProxy, host_features),
 +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
 VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
   DEFINE_PROP_END_OF_LIST(),
   };
 
 @@ -856,6 +864,7 @@ static Property virtio_net_properties[] = {
   DEFINE_PROP_UINT32(x-txtimer, VirtIOPCIProxy, net.txtimer, 
  TX_TIMER_INTERVAL),
   DEFINE_PROP_INT32(x-txburst, VirtIOPCIProxy, net.txburst, TX_BURST),
   DEFINE_PROP_STRING(tx, VirtIOPCIProxy, net.tx),
 +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
 VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
   DEFINE_PROP_END_OF_LIST(),
   };
 
 @@ -888,6 +897,7 @@ static Property virtio_serial_properties[] = {
   DEFINE_PROP_HEX32(class, VirtIOPCIProxy, class_code, 0),
   DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features),
   DEFINE_PROP_UINT32(max_ports, VirtIOPCIProxy, 
  serial.max_virtserial_ports, 31),
 +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
 VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
   DEFINE_PROP_END_OF_LIST(),
   };
 
 @@ -915,6 +925,7 @@ static TypeInfo virtio_serial_info = {
 
   static Property virtio_balloon_properties[] = {
   DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features),
 +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
 VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
   DEFINE_PROP_END_OF_LIST(),
   };
 
 @@ -969,6 +980,7 @@ static int virtio_scsi_exit_pci(PCIDevice *pci_dev)
   static Property virtio_scsi_properties[] = {
   DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2),
   DEFINE_VIRTIO_SCSI_PROPERTIES(VirtIOPCIProxy, host_features, scsi),
 +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
 VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
   DEFINE_PROP_END_OF_LIST(),
   };
 
 diff --git a/hw/virtio-pci.h b/hw/virtio-pci.h
 index e560428..e6a8861 100644
 --- a/hw/virtio-pci.h
 +++ b/hw/virtio-pci.h
 @@ -24,6 +24,10 @@
   #define VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT 1
   #define VIRTIO_PCI_FLAG_USE_IOEVENTFD   (1  
  VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT)
 
 +/* Some guests don't support port IO. Use MMIO instead. */
 +#define VIRTIO_PCI_FLAG_USE_MMIO_BIT 2
 +#define VIRTIO_PCI_FLAG_USE_MMIO   (1  VIRTIO_PCI_FLAG_USE_MMIO_BIT)
 +
   typedef struct {
   PCIDevice pci_dev;
   VirtIODevice *vdev;
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH RFC] virtio-pci: add MMIO property

2012-03-19 Thread Anthony Liguori

On 03/19/2012 03:49 PM, Michael S. Tsirkin wrote:

On Mon, Mar 19, 2012 at 02:19:33PM -0500, Anthony Liguori wrote:

On 03/19/2012 10:56 AM, Michael S. Tsirkin wrote:

Currently virtio-pci is specified so that configuration of the device is
done through a PCI IO space (via BAR 0 of the virtual PCI device).
However, Linux guests happen to use ioread/iowrite/iomap primitives
for access, and these work uniformly across memory/io BARs.

While PCI IO accesses are faster than MMIO on x86 kvm,
MMIO might be helpful on other systems which don't
implement PIO or where PIO is slower than MMIO.

Add a property to make it possible to tweak the BAR type.

Signed-off-by: Michael S. Tsirkinm...@redhat.com

This is harmless by default but causes segfaults in memory.c
when enabled. Thus an RFC until I figure out what's wrong.


Doesn't this violate the virtio-pci spec?



The point is to change the BAR type depending on the architecture.
IO is fastest on x86 but maybe not on other architectures.


Are we going to document that the BAR is X on architecture Y in the spec?

I think the better way to do this is to use a separate device id range for MMIO 
virtio-pci.  You can make the same driver hand both ranges and that way the 
device is presented consistently to the guest regardless of what the 
architecture is.



Making the same vendor/device ID have different semantics depending
on a magic flag in QEMU seems like a pretty bad idea to me.

Regards,

Anthony Liguori


We do this with MSI-X so why not the BAR type?


We extend the bar size with MSI-X and use a transport flag to indicate that it's 
available, right?


Regards,

Anthony LIguori





---
  hw/virtio-pci.c |   16 ++--
  hw/virtio-pci.h |4 
  2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 28498ec..6f338d2 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -655,6 +655,7 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice 
*vdev)
  {
  uint8_t *config;
  uint32_t size;
+uint8_t bar0_type;

  proxy-vdev = vdev;

@@ -684,8 +685,14 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice 
*vdev)

  memory_region_init_io(proxy-bar,virtio_pci_config_ops, proxy,
virtio-pci, size);
-pci_register_bar(proxy-pci_dev, 0, PCI_BASE_ADDRESS_SPACE_IO,
-proxy-bar);
+
+if (proxy-flags   VIRTIO_PCI_FLAG_USE_MMIO) {
+bar0_type = PCI_BASE_ADDRESS_SPACE_MEMORY;
+} else {
+bar0_type = PCI_BASE_ADDRESS_SPACE_IO;
+}
+
+pci_register_bar(proxy-pci_dev, 0, bar0_type,proxy-bar);

  if (!kvm_has_many_ioeventfds()) {
  proxy-flags= ~VIRTIO_PCI_FLAG_USE_IOEVENTFD;
@@ -823,6 +830,7 @@ static Property virtio_blk_properties[] = {
  DEFINE_PROP_BIT(ioeventfd, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
  DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2),
  DEFINE_VIRTIO_BLK_FEATURES(VirtIOPCIProxy, host_features),
+DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
  DEFINE_PROP_END_OF_LIST(),
  };

@@ -856,6 +864,7 @@ static Property virtio_net_properties[] = {
  DEFINE_PROP_UINT32(x-txtimer, VirtIOPCIProxy, net.txtimer, 
TX_TIMER_INTERVAL),
  DEFINE_PROP_INT32(x-txburst, VirtIOPCIProxy, net.txburst, TX_BURST),
  DEFINE_PROP_STRING(tx, VirtIOPCIProxy, net.tx),
+DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
  DEFINE_PROP_END_OF_LIST(),
  };

@@ -888,6 +897,7 @@ static Property virtio_serial_properties[] = {
  DEFINE_PROP_HEX32(class, VirtIOPCIProxy, class_code, 0),
  DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features),
  DEFINE_PROP_UINT32(max_ports, VirtIOPCIProxy, 
serial.max_virtserial_ports, 31),
+DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
  DEFINE_PROP_END_OF_LIST(),
  };

@@ -915,6 +925,7 @@ static TypeInfo virtio_serial_info = {

  static Property virtio_balloon_properties[] = {
  DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features),
+DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
  DEFINE_PROP_END_OF_LIST(),
  };

@@ -969,6 +980,7 @@ static int virtio_scsi_exit_pci(PCIDevice *pci_dev)
  static Property virtio_scsi_properties[] = {
  DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2),
  DEFINE_VIRTIO_SCSI_PROPERTIES(VirtIOPCIProxy, host_features, scsi),
+DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
  DEFINE_PROP_END_OF_LIST(),
  };

diff --git a/hw/virtio-pci.h b/hw/virtio-pci.h
index e560428..e6a8861 100644
--- a/hw/virtio-pci.h
+++ b/hw/virtio-pci.h
@@ -24,6 +24,10 @@
  #define VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT 1
  #define VIRTIO_PCI_FLAG_USE_IOEVENTFD   (1   
VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT)

+/* Some guests don't support port IO. Use MMIO instead. */
+#define 

Re: [Qemu-devel] [PATCH RFC] virtio-pci: add MMIO property

2012-03-19 Thread Michael S. Tsirkin
On Mon, Mar 19, 2012 at 04:07:45PM -0500, Anthony Liguori wrote:
 On 03/19/2012 03:49 PM, Michael S. Tsirkin wrote:
 On Mon, Mar 19, 2012 at 02:19:33PM -0500, Anthony Liguori wrote:
 On 03/19/2012 10:56 AM, Michael S. Tsirkin wrote:
 Currently virtio-pci is specified so that configuration of the device is
 done through a PCI IO space (via BAR 0 of the virtual PCI device).
 However, Linux guests happen to use ioread/iowrite/iomap primitives
 for access, and these work uniformly across memory/io BARs.
 
 While PCI IO accesses are faster than MMIO on x86 kvm,
 MMIO might be helpful on other systems which don't
 implement PIO or where PIO is slower than MMIO.
 
 Add a property to make it possible to tweak the BAR type.
 
 Signed-off-by: Michael S. Tsirkinm...@redhat.com
 
 This is harmless by default but causes segfaults in memory.c
 when enabled. Thus an RFC until I figure out what's wrong.
 
 Doesn't this violate the virtio-pci spec?
 
 
 The point is to change the BAR type depending on the architecture.
 IO is fastest on x86 but maybe not on other architectures.
 
 Are we going to document that the BAR is X on architecture Y in the spec?
 
 I think the better way to do this is to use a separate device id
 range for MMIO virtio-pci.  You can make the same driver hand both
 ranges and that way the device is presented consistently to the
 guest regardless of what the architecture is.

Yes there are endless ways to do this.
This specific hack is good for making existing linux drivers
on ppc, arm etc work.

 Making the same vendor/device ID have different semantics depending
 on a magic flag in QEMU seems like a pretty bad idea to me.
 
 Regards,
 
 Anthony Liguori
 
 We do this with MSI-X so why not the BAR type?
 
 We extend the bar size with MSI-X and use a transport flag to
 indicate that it's available, right?

No, we use regular pci capability. Just like BAR type is
a regular PCI register :)

 Regards,
 
 Anthony LIguori
 
 
 
 ---
   hw/virtio-pci.c |   16 ++--
   hw/virtio-pci.h |4 
   2 files changed, 18 insertions(+), 2 deletions(-)
 
 diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
 index 28498ec..6f338d2 100644
 --- a/hw/virtio-pci.c
 +++ b/hw/virtio-pci.c
 @@ -655,6 +655,7 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, 
 VirtIODevice *vdev)
   {
   uint8_t *config;
   uint32_t size;
 +uint8_t bar0_type;
 
   proxy-vdev = vdev;
 
 @@ -684,8 +685,14 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, 
 VirtIODevice *vdev)
 
   memory_region_init_io(proxy-bar,virtio_pci_config_ops, proxy,
 virtio-pci, size);
 -pci_register_bar(proxy-pci_dev, 0, PCI_BASE_ADDRESS_SPACE_IO,
 -proxy-bar);
 +
 +if (proxy-flags   VIRTIO_PCI_FLAG_USE_MMIO) {
 +bar0_type = PCI_BASE_ADDRESS_SPACE_MEMORY;
 +} else {
 +bar0_type = PCI_BASE_ADDRESS_SPACE_IO;
 +}
 +
 +pci_register_bar(proxy-pci_dev, 0, bar0_type,proxy-bar);
 
   if (!kvm_has_many_ioeventfds()) {
   proxy-flags= ~VIRTIO_PCI_FLAG_USE_IOEVENTFD;
 @@ -823,6 +830,7 @@ static Property virtio_blk_properties[] = {
   DEFINE_PROP_BIT(ioeventfd, VirtIOPCIProxy, flags, 
  VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
   DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2),
   DEFINE_VIRTIO_BLK_FEATURES(VirtIOPCIProxy, host_features),
 +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
 VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
   DEFINE_PROP_END_OF_LIST(),
   };
 
 @@ -856,6 +864,7 @@ static Property virtio_net_properties[] = {
   DEFINE_PROP_UINT32(x-txtimer, VirtIOPCIProxy, net.txtimer, 
  TX_TIMER_INTERVAL),
   DEFINE_PROP_INT32(x-txburst, VirtIOPCIProxy, net.txburst, 
  TX_BURST),
   DEFINE_PROP_STRING(tx, VirtIOPCIProxy, net.tx),
 +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
 VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
   DEFINE_PROP_END_OF_LIST(),
   };
 
 @@ -888,6 +897,7 @@ static Property virtio_serial_properties[] = {
   DEFINE_PROP_HEX32(class, VirtIOPCIProxy, class_code, 0),
   DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features),
   DEFINE_PROP_UINT32(max_ports, VirtIOPCIProxy, 
  serial.max_virtserial_ports, 31),
 +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
 VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
   DEFINE_PROP_END_OF_LIST(),
   };
 
 @@ -915,6 +925,7 @@ static TypeInfo virtio_serial_info = {
 
   static Property virtio_balloon_properties[] = {
   DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features),
 +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
 VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
   DEFINE_PROP_END_OF_LIST(),
   };
 
 @@ -969,6 +980,7 @@ static int virtio_scsi_exit_pci(PCIDevice *pci_dev)
   static Property virtio_scsi_properties[] = {
   DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2),
   DEFINE_VIRTIO_SCSI_PROPERTIES(VirtIOPCIProxy, host_features, scsi),
 +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
 VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
   

Re: [Qemu-devel] [PATCH RFC] virtio-pci: add MMIO property

2012-03-19 Thread Michael S. Tsirkin
On Mon, Mar 19, 2012 at 04:07:45PM -0500, Anthony Liguori wrote:
 On 03/19/2012 03:49 PM, Michael S. Tsirkin wrote:
 On Mon, Mar 19, 2012 at 02:19:33PM -0500, Anthony Liguori wrote:
 On 03/19/2012 10:56 AM, Michael S. Tsirkin wrote:
 Currently virtio-pci is specified so that configuration of the device is
 done through a PCI IO space (via BAR 0 of the virtual PCI device).
 However, Linux guests happen to use ioread/iowrite/iomap primitives
 for access, and these work uniformly across memory/io BARs.
 
 While PCI IO accesses are faster than MMIO on x86 kvm,
 MMIO might be helpful on other systems which don't
 implement PIO or where PIO is slower than MMIO.
 
 Add a property to make it possible to tweak the BAR type.
 
 Signed-off-by: Michael S. Tsirkinm...@redhat.com
 
 This is harmless by default but causes segfaults in memory.c
 when enabled. Thus an RFC until I figure out what's wrong.
 
 Doesn't this violate the virtio-pci spec?
 
 
 The point is to change the BAR type depending on the architecture.
 IO is fastest on x86 but maybe not on other architectures.
 
 Are we going to document that the BAR is X on architecture Y in the spec?
 
 I think the better way to do this is to use a separate device id
 range for MMIO virtio-pci.  You can make the same driver hand both
 ranges and that way the device is presented consistently to the
 guest regardless of what the architecture is.

Maybe just make this a hidden option like x-miio?
This will ensure people dont turn it on by mistake on e.g. x86.

 Making the same vendor/device ID have different semantics depending
 on a magic flag in QEMU seems like a pretty bad idea to me.
 
 Regards,
 
 Anthony Liguori
 
 We do this with MSI-X so why not the BAR type?
 
 We extend the bar size with MSI-X and use a transport flag to
 indicate that it's available, right?
 
 Regards,
 
 Anthony LIguori
 
 
 
 ---
   hw/virtio-pci.c |   16 ++--
   hw/virtio-pci.h |4 
   2 files changed, 18 insertions(+), 2 deletions(-)
 
 diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
 index 28498ec..6f338d2 100644
 --- a/hw/virtio-pci.c
 +++ b/hw/virtio-pci.c
 @@ -655,6 +655,7 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, 
 VirtIODevice *vdev)
   {
   uint8_t *config;
   uint32_t size;
 +uint8_t bar0_type;
 
   proxy-vdev = vdev;
 
 @@ -684,8 +685,14 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, 
 VirtIODevice *vdev)
 
   memory_region_init_io(proxy-bar,virtio_pci_config_ops, proxy,
 virtio-pci, size);
 -pci_register_bar(proxy-pci_dev, 0, PCI_BASE_ADDRESS_SPACE_IO,
 -proxy-bar);
 +
 +if (proxy-flags   VIRTIO_PCI_FLAG_USE_MMIO) {
 +bar0_type = PCI_BASE_ADDRESS_SPACE_MEMORY;
 +} else {
 +bar0_type = PCI_BASE_ADDRESS_SPACE_IO;
 +}
 +
 +pci_register_bar(proxy-pci_dev, 0, bar0_type,proxy-bar);
 
   if (!kvm_has_many_ioeventfds()) {
   proxy-flags= ~VIRTIO_PCI_FLAG_USE_IOEVENTFD;
 @@ -823,6 +830,7 @@ static Property virtio_blk_properties[] = {
   DEFINE_PROP_BIT(ioeventfd, VirtIOPCIProxy, flags, 
  VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
   DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2),
   DEFINE_VIRTIO_BLK_FEATURES(VirtIOPCIProxy, host_features),
 +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
 VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
   DEFINE_PROP_END_OF_LIST(),
   };
 
 @@ -856,6 +864,7 @@ static Property virtio_net_properties[] = {
   DEFINE_PROP_UINT32(x-txtimer, VirtIOPCIProxy, net.txtimer, 
  TX_TIMER_INTERVAL),
   DEFINE_PROP_INT32(x-txburst, VirtIOPCIProxy, net.txburst, 
  TX_BURST),
   DEFINE_PROP_STRING(tx, VirtIOPCIProxy, net.tx),
 +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
 VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
   DEFINE_PROP_END_OF_LIST(),
   };
 
 @@ -888,6 +897,7 @@ static Property virtio_serial_properties[] = {
   DEFINE_PROP_HEX32(class, VirtIOPCIProxy, class_code, 0),
   DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features),
   DEFINE_PROP_UINT32(max_ports, VirtIOPCIProxy, 
  serial.max_virtserial_ports, 31),
 +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
 VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
   DEFINE_PROP_END_OF_LIST(),
   };
 
 @@ -915,6 +925,7 @@ static TypeInfo virtio_serial_info = {
 
   static Property virtio_balloon_properties[] = {
   DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features),
 +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
 VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
   DEFINE_PROP_END_OF_LIST(),
   };
 
 @@ -969,6 +980,7 @@ static int virtio_scsi_exit_pci(PCIDevice *pci_dev)
   static Property virtio_scsi_properties[] = {
   DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2),
   DEFINE_VIRTIO_SCSI_PROPERTIES(VirtIOPCIProxy, host_features, scsi),
 +DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
 VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
   DEFINE_PROP_END_OF_LIST(),
   };
 
 diff --git a/hw/virtio-pci.h b/hw/virtio-pci.h
 index 

Re: [Qemu-devel] [PATCH RFC] virtio-pci: add MMIO property

2012-03-19 Thread Anthony Liguori

On 03/19/2012 04:29 PM, Michael S. Tsirkin wrote:

On Mon, Mar 19, 2012 at 04:07:45PM -0500, Anthony Liguori wrote:

On 03/19/2012 03:49 PM, Michael S. Tsirkin wrote:

On Mon, Mar 19, 2012 at 02:19:33PM -0500, Anthony Liguori wrote:

On 03/19/2012 10:56 AM, Michael S. Tsirkin wrote:

Currently virtio-pci is specified so that configuration of the device is
done through a PCI IO space (via BAR 0 of the virtual PCI device).
However, Linux guests happen to use ioread/iowrite/iomap primitives
for access, and these work uniformly across memory/io BARs.

While PCI IO accesses are faster than MMIO on x86 kvm,
MMIO might be helpful on other systems which don't
implement PIO or where PIO is slower than MMIO.

Add a property to make it possible to tweak the BAR type.

Signed-off-by: Michael S. Tsirkinm...@redhat.com

This is harmless by default but causes segfaults in memory.c
when enabled. Thus an RFC until I figure out what's wrong.


Doesn't this violate the virtio-pci spec?



The point is to change the BAR type depending on the architecture.
IO is fastest on x86 but maybe not on other architectures.


Are we going to document that the BAR is X on architecture Y in the spec?

I think the better way to do this is to use a separate device id
range for MMIO virtio-pci.  You can make the same driver hand both
ranges and that way the device is presented consistently to the
guest regardless of what the architecture is.


Maybe just make this a hidden option like x-miio?


x-violate-the-virtio-spec-to-trick-old-linux-drivers-into-working-on-power?

Really, aren't we just being too clever here?  From a practical perspective, I 
doubt anyone is ever going to support a driver that has *never* been tested on 
the platform just because it was accidentally compiled and happens to be there.


If we just do use a device PCI device id range for this, it's a 1-line patch 
that can be provided via an update to existing guests.


Regards,

Anthony Liguori


This will ensure people dont turn it on by mistake on e.g. x86.


Making the same vendor/device ID have different semantics depending
on a magic flag in QEMU seems like a pretty bad idea to me.

Regards,

Anthony Liguori


We do this with MSI-X so why not the BAR type?


We extend the bar size with MSI-X and use a transport flag to
indicate that it's available, right?

Regards,

Anthony LIguori





---
  hw/virtio-pci.c |   16 ++--
  hw/virtio-pci.h |4 
  2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 28498ec..6f338d2 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -655,6 +655,7 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice 
*vdev)
  {
  uint8_t *config;
  uint32_t size;
+uint8_t bar0_type;

  proxy-vdev = vdev;

@@ -684,8 +685,14 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice 
*vdev)

  memory_region_init_io(proxy-bar,virtio_pci_config_ops, proxy,
virtio-pci, size);
-pci_register_bar(proxy-pci_dev, 0, PCI_BASE_ADDRESS_SPACE_IO,
-proxy-bar);
+
+if (proxy-flagsVIRTIO_PCI_FLAG_USE_MMIO) {
+bar0_type = PCI_BASE_ADDRESS_SPACE_MEMORY;
+} else {
+bar0_type = PCI_BASE_ADDRESS_SPACE_IO;
+}
+
+pci_register_bar(proxy-pci_dev, 0, bar0_type,proxy-bar);

  if (!kvm_has_many_ioeventfds()) {
  proxy-flags= ~VIRTIO_PCI_FLAG_USE_IOEVENTFD;
@@ -823,6 +830,7 @@ static Property virtio_blk_properties[] = {
  DEFINE_PROP_BIT(ioeventfd, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
  DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2),
  DEFINE_VIRTIO_BLK_FEATURES(VirtIOPCIProxy, host_features),
+DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
  DEFINE_PROP_END_OF_LIST(),
  };

@@ -856,6 +864,7 @@ static Property virtio_net_properties[] = {
  DEFINE_PROP_UINT32(x-txtimer, VirtIOPCIProxy, net.txtimer, 
TX_TIMER_INTERVAL),
  DEFINE_PROP_INT32(x-txburst, VirtIOPCIProxy, net.txburst, TX_BURST),
  DEFINE_PROP_STRING(tx, VirtIOPCIProxy, net.tx),
+DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
  DEFINE_PROP_END_OF_LIST(),
  };

@@ -888,6 +897,7 @@ static Property virtio_serial_properties[] = {
  DEFINE_PROP_HEX32(class, VirtIOPCIProxy, class_code, 0),
  DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features),
  DEFINE_PROP_UINT32(max_ports, VirtIOPCIProxy, 
serial.max_virtserial_ports, 31),
+DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
  DEFINE_PROP_END_OF_LIST(),
  };

@@ -915,6 +925,7 @@ static TypeInfo virtio_serial_info = {

  static Property virtio_balloon_properties[] = {
  DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features),
+DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
  DEFINE_PROP_END_OF_LIST(),
  };

@@ -969,6 +980,7 @@ static int 

Re: [net-next PATCH v0 0/5] Series short description

2012-03-19 Thread David Miller
From: John Fastabend john.r.fastab...@intel.com
Date: Sun, 18 Mar 2012 23:51:45 -0700

 This series is a follow up to this thread:
 
 http://www.spinics.net/lists/netdev/msg191360.html

Can the interested parties please review this series?

I'm willing to apply this right now if it looks OK, but if
it needs more revisions we'll have to defer.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next PATCH v0 5/5] ixgbe: allow RAR table to be updated in promisc mode

2012-03-19 Thread Jeff Kirsher
On Sun, 2012-03-18 at 23:52 -0700, Fastabend, John R wrote:
 This allows RAR table updates while in promiscuous. With
 SR-IOV enabled it is valuable to allow the RAR table to
 be updated even when in promisc mode to configure forwarding
 
 Signed-off-by: John Fastabend john.r.fastab...@intel.com
 ---
 
  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   21
 +++--
  1 files changed, 11 insertions(+), 10 deletions(-) 

Acked-by: Jeff Kirsher jeffrey.t.kirs...@intel.com


signature.asc
Description: This is a digitally signed message part


Re: [net-next PATCH v0 4/5] ixgbe: enable FDB netdevice ops

2012-03-19 Thread Jeff Kirsher
On Sun, 2012-03-18 at 23:52 -0700, Fastabend, John R wrote:
 Enable FDB ops on ixgbe when in SR-IOV mode.
 
 Signed-off-by: John Fastabend john.r.fastab...@intel.com
 ---
 
  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   59
 +
  1 files changed, 59 insertions(+), 0 deletions(-) 

Acked-by: Jeff Kirsher jeffrey.t.kirs...@intel.com


signature.asc
Description: This is a digitally signed message part


Re: [net-next PATCH v0 0/5] Series short description

2012-03-19 Thread Stephen Hemminger
On Mon, 19 Mar 2012 18:38:08 -0400 (EDT)
David Miller da...@davemloft.net wrote:

 From: John Fastabend john.r.fastab...@intel.com
 Date: Sun, 18 Mar 2012 23:51:45 -0700
 
  This series is a follow up to this thread:
  
  http://www.spinics.net/lists/netdev/msg191360.html
 
 Can the interested parties please review this series?
 
 I'm willing to apply this right now if it looks OK, but if
 it needs more revisions we'll have to defer.

Please don't rush this into this merge window. It needs more than
1 full day of review.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH RFC] virtio-pci: add MMIO property

2012-03-19 Thread Rusty Russell
On Mon, 19 Mar 2012 17:13:06 -0500, Anthony Liguori anth...@codemonkey.ws 
wrote:
  Maybe just make this a hidden option like x-miio?
 
 x-violate-the-virtio-spec-to-trick-old-linux-drivers-into-working-on-power?

To configure the device, we use the first I/O region of the PCI
device.

Meh, it does sound a little like we are specifying that it's an PCI I/O
bar.

Let's resurrect the PCI-v2 idea, which is ready to implement now, and a
nice cleanup?  Detach it from the change-of-ring-format idea which is
turning out to be a tarpit.

Thanks,
Rusty.
-- 
  How could I marry someone with more hair than me?  http://baldalex.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next PATCH v0 0/5] Series short description

2012-03-19 Thread John Fastabend
On 3/19/2012 3:55 PM, Stephen Hemminger wrote:
 On Mon, 19 Mar 2012 18:38:08 -0400 (EDT)
 David Miller da...@davemloft.net wrote:
 
 From: John Fastabend john.r.fastab...@intel.com
 Date: Sun, 18 Mar 2012 23:51:45 -0700

 This series is a follow up to this thread:

 http://www.spinics.net/lists/netdev/msg191360.html

 Can the interested parties please review this series?

 I'm willing to apply this right now if it looks OK, but if
 it needs more revisions we'll have to defer.
 
 Please don't rush this into this merge window. It needs more than
 1 full day of review.

Dave, its probably fine to push this to 3.5 then. I can
resubmit after you close the merge window if you want? This
has been somewhat broken for SR-IOV cards for multiple
kernel releases now anyways one more wont hurt too much.

I'll work with Roopa to get the macvlan driver plugged into
the fdb ops in the meantime and maybe get DSA as well.

Thanks,
John
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next PATCH v0 0/5] Series short description

2012-03-19 Thread David Miller
From: John Fastabend john.r.fastab...@intel.com
Date: Mon, 19 Mar 2012 17:27:00 -0700

 Dave, its probably fine to push this to 3.5 then.

Fair enough.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH RFC] virtio-pci: add MMIO property

2012-03-19 Thread Anthony Liguori

On 03/19/2012 06:52 PM, Rusty Russell wrote:

On Mon, 19 Mar 2012 17:13:06 -0500, Anthony Liguorianth...@codemonkey.ws  
wrote:

Maybe just make this a hidden option like x-miio?


x-violate-the-virtio-spec-to-trick-old-linux-drivers-into-working-on-power?


To configure the device, we use the first I/O region of the PCI
device.

Meh, it does sound a little like we are specifying that it's an PCI I/O
bar.

Let's resurrect the PCI-v2 idea, which is ready to implement now, and a
nice cleanup?  Detach it from the change-of-ring-format idea which is
turning out to be a tarpit.


I think that's a sensible approach.

Regards,

Anthony Liguori


Thanks,
Rusty.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next PATCH v0 0/5] Series short description

2012-03-19 Thread John Fastabend
On 3/19/2012 5:35 PM, David Miller wrote:
 From: John Fastabend john.r.fastab...@intel.com
 Date: Mon, 19 Mar 2012 17:27:00 -0700
 
 Dave, its probably fine to push this to 3.5 then.
 
 Fair enough.

Stephen, please let me know if you see any issues though
because without these we have no way to forward packets
correctly in the embedded switch. So we can't really
use SR-IOV and virtual interfaces together correctly. And
the macvlan device in passthru mode is putting the device
in promiscuous mode which isn't great either.

.John
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next PATCH v0 0/5] Series short description

2012-03-19 Thread Stephen Hemminger
On Mon, 19 Mar 2012 19:49:50 -0700
John Fastabend john.r.fastab...@intel.com wrote:

 On 3/19/2012 5:35 PM, David Miller wrote:
  From: John Fastabend john.r.fastab...@intel.com
  Date: Mon, 19 Mar 2012 17:27:00 -0700
  
  Dave, its probably fine to push this to 3.5 then.
  
  Fair enough.
 
 Stephen, please let me know if you see any issues though
 because without these we have no way to forward packets
 correctly in the embedded switch. So we can't really
 use SR-IOV and virtual interfaces together correctly. And
 the macvlan device in passthru mode is putting the device
 in promiscuous mode which isn't great either.
 
 .John

I am more worried about evaluating ABI compatibility with older
utilities.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html