Re: [PATCH] drm/radeon/kms: don't require up to 64k allocations. (v2)

2009-09-24 Thread Michel Dänzer
On Thu, 2009-09-24 at 15:25 +1000, Dave Airlie wrote: 
 On Thu, Sep 24, 2009 at 3:16 PM, Dave Airlie airl...@gmail.com wrote:
  From: Dave Airlie airl...@redhat.com
 
  This avoids needing to do a kmalloc  PAGE_SIZE for the main
  indirect buffer chunk, it adds an accessor for all reads from
  the chunk and caches a single page at a time for subsequent
  reads.
 
  changes since v1:
  Use a two page pool which should be the most common case
  a single packet spanning  PAGE_SIZE will be hit, but I'm
  having trouble seeing anywhere we currently generate anything like that.
  r600: untested conversion
  hopefully proper short page copying at end
  added parser_error flag to set deep errors instead of having to test
  every ib value fetch.
 
 I've left one bug in here see if you can see it :-)
 
 the extra pages copy starts one page too soon, but its probably not
 a greatly hit path as it relies on a packet  4096 also.

FWIW, I've been working on merging several Composite operations into a
single draw packet in RadeonCompositeTile in the X driver, to decrease
the CS checking overhead. That should quite easily generate a single
packet  4096 bytes (e.g. rendering more than about 40 characters in one
go, if my math is right).


-- 
Earthling Michel Dänzer   |http://www.vmware.com
Libre software enthusiast |  Debian, X and DRI developer

--
Come build with us! The BlackBerryreg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9#45;12, 2009. Register now#33;
http://p.sf.net/sfu/devconf
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


[PATCH] drm/radeon/kms: don't require up to 64k allocations. (v2)

2009-09-23 Thread Dave Airlie
From: Dave Airlie airl...@redhat.com

This avoids needing to do a kmalloc  PAGE_SIZE for the main
indirect buffer chunk, it adds an accessor for all reads from
the chunk and caches a single page at a time for subsequent
reads.

changes since v1:
Use a two page pool which should be the most common case
a single packet spanning  PAGE_SIZE will be hit, but I'm
having trouble seeing anywhere we currently generate anything like that.
r600: untested conversion
hopefully proper short page copying at end
added parser_error flag to set deep errors instead of having to test
every ib value fetch.

can someone confirm my last_page_index math is right?

Signed-off-by: Dave Airlie airl...@redhat.com
---
 drivers/gpu/drm/radeon/r100.c   |  188 ++-
 drivers/gpu/drm/radeon/r100_track.h |   69 -
 drivers/gpu/drm/radeon/r200.c   |   79 +++
 drivers/gpu/drm/radeon/r300.c   |  137 +
 drivers/gpu/drm/radeon/r600_cs.c|   26 +++---
 drivers/gpu/drm/radeon/radeon.h |   37 +++-
 drivers/gpu/drm/radeon/radeon_cs.c  |  105 ++--
 7 files changed, 370 insertions(+), 271 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
index 737970b..9ab976d 100644
--- a/drivers/gpu/drm/radeon/r100.c
+++ b/drivers/gpu/drm/radeon/r100.c
@@ -863,13 +863,11 @@ int r100_cs_parse_packet0(struct radeon_cs_parser *p,
 void r100_cs_dump_packet(struct radeon_cs_parser *p,
 struct radeon_cs_packet *pkt)
 {
-   struct radeon_cs_chunk *ib_chunk;
volatile uint32_t *ib;
unsigned i;
unsigned idx;
 
ib = p-ib-ptr;
-   ib_chunk = p-chunks[p-chunk_ib_idx];
idx = pkt-idx;
for (i = 0; i = (pkt-count + 1); i++, idx++) {
DRM_INFO(ib[%d]=0x%08X\n, idx, ib[idx]);
@@ -896,7 +894,7 @@ int r100_cs_packet_parse(struct radeon_cs_parser *p,
  idx, ib_chunk-length_dw);
return -EINVAL;
}
-   header = ib_chunk-kdata[idx];
+   header = radeon_get_ib_value(p, idx);
pkt-idx = idx;
pkt-type = CP_PACKET_GET_TYPE(header);
pkt-count = CP_PACKET_GET_COUNT(header);
@@ -939,7 +937,6 @@ int r100_cs_packet_parse(struct radeon_cs_parser *p,
  */
 int r100_cs_packet_parse_vline(struct radeon_cs_parser *p)
 {
-   struct radeon_cs_chunk *ib_chunk;
struct drm_mode_object *obj;
struct drm_crtc *crtc;
struct radeon_crtc *radeon_crtc;
@@ -947,8 +944,9 @@ int r100_cs_packet_parse_vline(struct radeon_cs_parser *p)
int crtc_id;
int r;
uint32_t header, h_idx, reg;
+   volatile uint32_t *ib;
 
-   ib_chunk = p-chunks[p-chunk_ib_idx];
+   ib = p-ib-ptr;
 
/* parse the wait until */
r = r100_cs_packet_parse(p, waitreloc, p-idx);
@@ -963,7 +961,7 @@ int r100_cs_packet_parse_vline(struct radeon_cs_parser *p)
return r;
}
 
-   if (ib_chunk-kdata[waitreloc.idx + 1] != RADEON_WAIT_CRTC_VLINE) {
+   if (radeon_get_ib_value(p, waitreloc.idx + 1) != 
RADEON_WAIT_CRTC_VLINE) {
DRM_ERROR(vline wait had illegal wait until\n);
r = -EINVAL;
return r;
@@ -978,9 +976,9 @@ int r100_cs_packet_parse_vline(struct radeon_cs_parser *p)
p-idx += waitreloc.count;
p-idx += p3reloc.count;
 
-   header = ib_chunk-kdata[h_idx];
-   crtc_id = ib_chunk-kdata[h_idx + 5];
-   reg = ib_chunk-kdata[h_idx]  2;
+   header = radeon_get_ib_value(p, h_idx);
+   crtc_id = radeon_get_ib_value(p, h_idx + 5);
+   reg = header  2;
mutex_lock(p-rdev-ddev-mode_config.mutex);
obj = drm_mode_object_find(p-rdev-ddev, crtc_id, 
DRM_MODE_OBJECT_CRTC);
if (!obj) {
@@ -994,8 +992,9 @@ int r100_cs_packet_parse_vline(struct radeon_cs_parser *p)
 
if (!crtc-enabled) {
/* if the CRTC isn't enabled - we need to nop out the wait 
until */
-   ib_chunk-kdata[h_idx + 2] = PACKET2(0);
-   ib_chunk-kdata[h_idx + 3] = PACKET2(0);
+   
+   ib[h_idx + 2] = PACKET2(0);
+   ib[h_idx + 3] = PACKET2(0);
} else if (crtc_id == 1) {
switch (reg) {
case AVIVO_D1MODE_VLINE_START_END:
@@ -1011,8 +1010,8 @@ int r100_cs_packet_parse_vline(struct radeon_cs_parser *p)
r = -EINVAL;
goto out;
}
-   ib_chunk-kdata[h_idx] = header;
-   ib_chunk-kdata[h_idx + 3] |= RADEON_ENG_DISPLAY_SELECT_CRTC1;
+   ib[h_idx] = header;
+   ib[h_idx + 3] |= RADEON_ENG_DISPLAY_SELECT_CRTC1;
}
 out:
mutex_unlock(p-rdev-ddev-mode_config.mutex);
@@ -1033,7 +1032,6 @@ out:
 int r100_cs_packet_next_reloc(struct radeon_cs_parser *p,
  struct radeon_cs_reloc **cs_reloc)
 {
-   struct