On 2008-08-22, Timothy Normand Miller wrote:
> On Fri, Aug 22, 2008 at 9:44 AM, Petter Urkedal <[EMAIL PROTECTED]> wrote:
> 
> >> The former could be implemented in microcode.  What you could do is
> >> have an entry point in the microcode whose job is to initialize.  It
> >> traps all engine and memory access and responds to certain commands
> >> that arrive in the form of engine access, except that in this mode it
> >> doesn't pass anything through.  It just effectively maps the engine
> >> space to the scratch buffer by grabbing addresses and data and
> >> servicing the reads and writes out of the whole scratch space.  Then
> >> when some particular address outside of that range is accessed, it
> >> bails out into the "real" main loop that can poll PCI properly.  If
> >> the size of the code to handle this is smaller than the size of the
> >> code that would be necessary to fill it from program immediates, then
> >> it's a win, especially if you're tight on program space.
> >
> > Yes, that's another options; I'm not sure which is best.  My suggestion
> > of mapping scratch space to an IO port range have the advantage that it
> > can also be used after initialisation to update parameters.  But, if the
> > whole IO address space is dictated by the VGA specs, my solution won't
> > work.
> 
> There's some confusion.  When I refer to I/O space, I mean the
> non-scratch memory space as seen by HQ, like how we access fifos.

Indeed.  I should have seen that coming.  Maybe we should refer to "HQ
ports" or "HQIO" versus "host IO", "PCI IO", or something.
 
> It occurs to me that, as long as HQ is either off or not accessing
> scratch space at a given moment, we could at least write to it from
> the host.  What we need is a fifo that hooks into the MEM stage.  The
> MEM stage would look at the instruction word and the fifo.  If the
> instruction has to access the memory, then that gets serviced.
> Otherwise, it looks at the fifo.
> 
> But that's only if we're using both ports on the RAM.  If we're using
> only one, then this is trivial... just hook to the other port, using
> the same mechanism that we use to write to the program file from the
> host.

Okay, that'll work.  Maybe it's not a big issue, but for ASIC it may
incur more logic since we could have gotten away with single-ported
memory.

> > I think either solution will save us program memory, considering how
> > expensive it is to initialise using immediates:
> >
> >        move HIGH_BITS, r0 ; Adjust HIGH_BITS if LOW_BITS are negative!
> >        shift r0, 16, r0
> >        add r0, LOW_BITS, r0
> >        move r0, [GLOBAL]
> 
> I agree.  Let's hook up the scratch memory to PCI.

Well, at least we agree to receive the initialisation data from the host.

> >>  What we could do there is run the pci-program-microcode logic at
> >> engine speed, then we can have it listen to a fifo (how PCI fills the
> >> program file) and also to part of the I/O space.
> >
> > I think this would require its own hardware logic, since the two BRAMs
> > are accessed from different part of the RTL.
> 
> The RAMs are dual-ported.  One port is accessed in the fetch stage,
> while the other is accessed in the MEM stage.

Yes, but this does not have to be efficient at all, right?  Why not use
a HQIO handler?  That is, no changes to the RTL.

> >> Yeah.  But we're going to end up with a lot more ports anyhow.  If
> >> there's no use in having the combined port, then ditch it.  If we find
> >> a use for it, we can put it back later.
> >
> > Good, I have prepared to commit this to the port decode:
> >
> >        PCI_T_CMD_TYPE:
> >            hqio_inport = pci2hq_cmd_type & {32{pci2hq_cmd_valid}};
> >        PCI_T_CMD_FLAGS:
> >            hqio_inport = pci2hq_cmd_flags;
> >
> > Then we also have the bit to avoid checking PCI_T_CMD_COUNT, as
> > discussed.  I'm assuming pci2hq_cmd_valid is the same as
> > pci2hq_cmd_count != 0.
> 
> Yes.  valid means count != 0.  But the problem is that now you can't
> dequeue a null command.  Have we decided not to do the null command?

I didn't think of the dequeuing issue.  But, yes I don't see a use for a
null command if to terminate writes.  Isn't it so that a PCI target
write of N words followed by one of M words in a continuous range should
be considered equivalent to a singe write of N + M words?  If that's the
PCI semantics, then I don't think a termination command carries any
meaning.

> >> True, but that bothers me.  If we change granularity, then we'll have
> >> to do a shift anyhow.
> >
> > The granularity of a jump table is fixed by the constraints of the HQ
> > architecture, but of we switch between jump-table to a table-lookup,
> > then yes.  So, we should make up our mind whichever of the two is best.
> > The if-then-else-if variant doesn't care.
> 
> The jump table approach appeals to be because it's so much more
> flexible.  I think we should definitely have host access to scratch
> (at least for writing).

Well in this particular case, I'm now down to 3 jump instructions, as
apparent from the attached code, but in general I agree.

> Also, we do have more block RAMs available for program and scratch.
> It just means more MUXing after the registered outputs.

That's good to know.  We're down to 85 words for poll_pci() mostly due
to the 1 or 16 specialisation of the reads, but the VGA parts could
become much bigger.

> >> Don't you have an instruction bit that specifies whether or not a
> >> write-back is allowed?  Can you turn that off in the noop?
> >
> > The instruction word wasn't wide enough for a dedicated disable bit, but
> > write-back is disabled for stores, and branch instructions have a
> > write-back disable bit.  Branch instructions are not suitable for noop,
> > but we can use a store:
> >
> >        move r0, [-1]
> >
> > That is, we reserve port -1 for this purpose.
> 
> As long as it never corresponds to anything, that's fine.  Hey, how
> about a branch instruction that always comes out false?  How would
> that work?  It would be odd to have a branch in a delay slot... but
> this is one that would always do nothing.  What would happen?

Good idea.  I think that's safe.

I have so far played it safe with the delay slot, but we can in fact do
some clever tricks with it like executing a single instruction somewhere
followed by an immediate jump to some other location.  E.g. the
following code executes an arithmetic operation which is encoded in r0,
applies it to r1 and r2 and stores the result in r1:

        add r0, apply_operator, r0
        jump r0
          jump cont
cont:
        ...

apply_operator:
        add r1, r2, r1  ; If r0 = 0, then add
        sub r1, r2, r1  ; If r0 = 1, then substract
        and r1, r2, r1  ; etc
        or r1, r2, r1
        xor r1, r2, r1

> >> I didn't notice a check to make sure the bridge isn't busy.  The
> >> situations where it matters are kinda rare, but you could have some
> >> long video read going on that clogs up the memory system, so the queue
> >> in the S3 fills, making the bridge busy, and then you can't issue
> >> writes or read requests. (I'm not sure if it's okay to issue
> >> addresses.)
> >
> > But we are handling all access to memory, right?  So if there is a long
> > video read, HQ will be stuck in a loop serving it.
> 
> I meant a long video read being done by the video controller in the
> S3.  We're only intercepting PCI access to memory.

In the current bridge wrapper HQ intercepts all or nothing.  Will we
change that, or will the driver switch modes at will even while the
bridge is active reading?

> >> Yes, although in that case you would have to wait until the queue had
> >> exactly 16 entries in it or perhaps 8 and repeat.  This is definitely
> >> going to be somewhat painful.
> >
> > Or, we adjust the current code to round down the count to and even
> > number.
> 
> Yes, unless you get a count of 1, which you need to handle separately.

Yes, the current version have separate code for that.
;;; Copyright (c) 2008 Traversal Technology
;;; 
;;; Permission is hereby granted, free of charge, to any person obtaining a
;;; copy of this software and associated documentation files (the "Software"),
;;; to deal in the Software without restriction, including without limitation
;;; the rights to use, copy, modify, merge, publish, distribute, sublicense,
;;; and/or sell copies of the Software, and to permit persons to whom the
;;; Software is furnished to do so, subject to the following conditions:
;;; 
;;; The above copyright notice and this permission notice shall be included in
;;; all copies or substantial portions of the Software.
;;; 
;;; THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
;;; IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
;;; FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
;;; AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
;;; LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
;;; FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
;;; DEALINGS IN THE SOFTWARE.
;;;
;;; Author: Petter Urkedal

include hqio

let G_POLL_BASE         = 0  ; FIXME

;; Global Parameters
;; -----------------

;; This is 5-entry lookup table where relevant indices are 1..4, or
;; PCI_TARGET_ENG, PCI_TARGET_MEM, PCI_TARGET_VMEM, PCI_TARGET_IO.
let G_POLL_ADDRESS_CORRECTIONS = G_POLL_BASE - 1 ; OBS:
                                ; First entry at G_POLL_BASE + 0
                                ; Last  entry at G_POLL_BASE + 3

;; Global State
;; ------------

let G_POLL_TARGET       = G_POLL_BASE + 4
let G_POLL_ADDR         = G_POLL_BASE + 5

;;; ------------------------------------------------------------------------
;;; poll_pci(r4: continuation)

    frame
        alias q0..q1 = r3..r4
        alias p0 = r5
        protect r6..r31

        ;; Switch between Command Types
        ;; ----------------------------

poll_pci:
        move [PCI_T_CMD_TYPE], r0
        ;; Return if we have an idle or an empty pipe.  Keep this case first to
        ;; ensure a quick exit when there is nothing to do.
        jzero r0, p0
          ;; Check for write data command.
          xor r0, PCI_CMDTYPE_WDATA, r0
        jzero r0, poll_pci_wdata
          ;; Check for read count command.
          xor r0, PCI_CMDTYPE_RCOUNT ^ PCI_CMDTYPE_WDATA, r0
        jzero r0, poll_pci_rcount
          ;; Otherwise, we have an address command; fall though.
          move [PCI_T_CMD_FLAGS], r0


        ;; Address Command
        ;; ---------------

        ;; We have r0 = [PCI_T_CMD_FLAGS] from delay slot above.
        ;; Will set r1 to the address correction.
        move r0, [G_POLL_TARGET]

        ;; Determine the address adjustment form a lookup table.
        add r0, G_POLL_ADDRESS_CORRECTIONS, r0
        move [r0], r0

        ;; Fetch address, and save the adjusted address.
        move [PCI_T_CMD_DATA], r1               ; dequeue address
        add r1, r0, r1
        move r1, [G_POLL_ADDR]

        ;; Tail call in case there are more PCI target commands.
        jump poll_pci
          noop


        ;; Read Command
        ;; ------------

poll_pci_rcount:
        ;; Check if the read count is 1 or 16, and jump to the corresponding
        ;; code.
        move [PCI_T_CMD_DATA], r0
        xor r0, 1, r0
        jnzero r0, poll_pci_read_16
          move [G_POLL_ADDR], r0


        ;; Single Word Read
        ;; ----------------

        ;; Get the address and split it into a granule-aligned address and an
        ;; offset from that address.
        ;move [G_POLL_ADDR], r0  ; from delay slot
        and r0, MEM_GRANULE_SIZE - 1, r1
        and r0, ~(MEM_GRANULE_SIZE - 1), r0

        ;; Issue the request to either engine or memory.
        move [G_POLL_TARGET], r2
        xor r2, PCI_TARGET_ENG, r2
        jnzero r2, poll_pci_read_1_not_eng
          move r0, [MEM_SEND_ADDR_MEM]
        move r0, [MEM_SEND_ADDR_ENG]
poll_pci_read_1_not_eng:

        ;; Skip initial words before the requested one.
        jump mem_skip_max_7, r2
          sub MEM_GRANULE_SIZE - 1, r2, q0 ; q0 is the final words to skip

        ;; Transfer the requested word.
        move [MEM_READQ_DATA], r0
        move r0, [PCI_TR_DATA]

        ;; Skip the remaining words.
        jump mem_skip_max_7, r2
          move q0, r1

        ;; Tail call.
        jump poll_pci
          noop


        ;; 16 Words Read
        ;; -------------

poll_pci_read_16:
        ;; Send and address request.  We assume that engine reads are never
        ;; cached, and thus will never be served by this code.
        ;move [G_POLL_ADDR], r0  ; from delay slot
        move r0, [MEM_SEND_ADDR_MEM]

        ;; Request 16 words.
        move 16, r2
        move r2, [MEM_SEND_READ_COUNT]

        ;; Wait for at least 4 words to become available.
poll_pci_read_16_next:
        move [MEM_READQ_AVAIL], r0
poll_pci_read_16_wait:
        sub r0, 4, r0
        jneg r0, poll_pci_read_16_wait
          move [MEM_READQ_AVAIL], r0

        ;; Transfer 4 words.
        move [MEM_READQ_DATA], r0
        move [MEM_READQ_DATA], r1
        move r0, [PCI_TR_DATA]
        move r1, [PCI_TR_DATA]
        move [MEM_READQ_DATA], r0
        move [MEM_READQ_DATA], r1
        move r0, [PCI_TR_DATA]
        move r1, [PCI_TR_DATA]

        ;; Repeat the above 4 times.
        jpos r2, poll_pci_read_16_next
          ;; Does this look weird?  We could also use 7 if it looks better...
          sub r2, 6, r2
        jump poll_pci
          noop


        ;; Write Commands
        ;; --------------

poll_pci_wdata:
        ;; Send address to bridge and adjust [G_POLL_ADDR].
        move [G_POLL_ADDR], r2
        move r2, [MEM_SEND_ADDR_MEM]

        ;; Prepare for the first transfer.
        move [PCI_T_CMD_FLAGS], r1

poll_pci_wdata_next:
        ;; Do the transfer and increment the address.
        move [PCI_T_CMD_DATA], r0
        move r0, [add MEM_SEND_DATA_0000, r1]
        add r2, 1, r2

        ;; Repeat as long as we receive write commands.
        move [PCI_T_CMD_TYPE], r1
        xor r1, PCI_CMDTYPE_WDATA, r1
        jzero r1, poll_pci_wdata_next
          move [PCI_T_CMD_FLAGS], r1

        ;; Save the address in case there consecutive write commands which have
        ;; just not entered the pipe yet.
        move r2, [G_POLL_ADDR]

        ;; Tail call in case there are more commands.
        jump poll_pci
          noop
    endframe


;;; ------------------------------------------------------------------------
;;; mem_skip_max_7(r1: count, r2: cont)
;;;
;;; Drops count words from MEM_READQ_DATA, where 0 ≤ count ≤ 7.

    frame
        alias p0..p1 = r1..r2
        protect r3..r31
mem_skip_max_7_next: ; Not the entry point!
        sub p0, r0, p0 ; decrement counter by available words
        jnneg p0, mem_skip_max_7_no_trunc
          noop
        add p0, r0, r0
        move 0, p0
mem_skip_max_7_no_trunc:
        sub mem_skip_max_7, r0, r0
        jump r0
          noop
        move [MEM_READQ_DATA], r0
        move [MEM_READQ_DATA], r0
        move [MEM_READQ_DATA], r0
        move [MEM_READQ_DATA], r0
        move [MEM_READQ_DATA], r0
        move [MEM_READQ_DATA], r0
        move [MEM_READQ_DATA], r0
mem_skip_max_7:
        jnzero p0, mem_skip_max_7_next
          move [MEM_READQ_AVAIL], r0
        jump p1
          noop
    endframe
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to