On Wed, Aug 20, 2014 at 8:30 PM, Matt Thomas <[email protected]> wrote:
> > Recently, I started making NetBSD support OpenRisc. > > Great! > I'm using binutils from the top of the tree and GCC 4.9 for my toolchain. > > I looked at using llvm-openrisc but NetBSD's LLVM is 3.6 while > llvm-openrisc > is 3.1. Since my expertise with toolchains is more gcc centric, I went > that > way. > > https://github.com/openrisc/llvm-or1k is actually 3.5 and these guys have even more recent versions http://compilergroup-srv.elet.polimi.it/pulp/git/pulp-public (Their git server seems a bit unreliable though, I haven't been able to pull anything from it ever) > So I'm wondering on what ISA features I can count on. Are OR32BIS II > instructions widely implemented? floating point? > > Strictly, you can't count on any of the OR32BIS II instructions being implemented, but in practice, 'all' implementations have support for mul div and ff1/fl1. FPU is not that commonly implemented/used, but I doubt there are code in the kernel that depends on FPU? I also doubt there are much user space FPU code that is written in asm? > I was deciding on whether to focus on whether to just support the no-delay > version of the ISA. I found that PIC code and -mno-delay seem to be > incompatible at the moment. > Delay-slot implementations are still most dominant, especially if you want a mmu. I have a long-term plan on doing a delay-slot-less version of mor1kx-cappuccino (https://github.com/openrisc/mor1kx), and there's this https://github.com/pgavin/carpe I would suggest working on making the code delay-slot agnostic instead of choosing one path. > The problem is computing the GOT pointer doesn't take into account > -mno-delay > or -mcompat-delay. It's always emitted as: > > l.jal 8 > l.movhi r16,gotpchi(_GLOBAL_OFFSET_TABLE_-4) > l.ori r16,r16,gotpclo(_GLOBAL_OFFSET_TABLE_+0) > l.add r16,r16,r9 > > The problem is for no-delay the l.jal should have an argument of 4 or the > l.movhi will never be executed since it was branched over. I think for -m > no-delay or -mcompat-delay it should be: > > l.jal 4 > l.movhi r16,gotpchi(_GLOBAL_OFFSET_TABLE_+0) > l.ori r16,r16,gotpclo(_GLOBAL_OFFSET_TABLE_+4) > l.add r16,r16,r9 > > Yes, this is a deficiency in the implementation in gcc (llvm actually handles this correctly). But, that said, you will never be able to make the got pointer acquiring work with -mcompat-delay, since the l.jal 4 will be wrong for implementations that features delay-slot. IOW, PIC code will need to always choose to either be delay or no-delay. > I notice that r16 is being as the GOT pointer and r10 as the thread pointer > though there aren't document as such in the OpenRISC 1.1 Architecture. > > Yes, we should update the arch spec ABI section with this... > I was surprised to see that patterns for ffssi2, ctzsi2, and clzsi2 aren't > present for gcc given the l.ff1 and l.fl1 instructions. > > True, the l.ff1 and l.fl1 are optional, so I guess that's why that haven't been implemented. I might take a look at adding support for that at some point, if someone doesn't beat me to it. llvm (can) make use of these though. > Looking at the emitted gcc code, I see. > > l.addi r1,r1,16 > l.lwz r9,-4(r1) # SI load > l.lwz r1,-16(r1) # SI load > > The load of r1 after the l.addi serves no useful purpose. > > It's a known issue... and it's in my todo-pipeline to fix that as soon as I'm done with what I'm currently working on. The reason that code is emitted is to work-around some dwarf2 issue that Christian Svensson noticed. If you want to take a look at it yourself, this is the code that make it happen. https://github.com/openrisc/or1k-gcc/blob/or1k/gcc/config/or1k/or1k.c#L132-L136 Stefan
_______________________________________________ OpenRISC mailing list [email protected] http://lists.openrisc.net/listinfo/openrisc
