Robert Burrell Donkin wrote: > Jochen Wiedmann wrote: >> On Tue, May 19, 2009 at 10:09 AM, Robert Burrell Donkin >> <[email protected]> wrote: >>> as everyone who's subscribed to the commit records can probably tell, >>> i've started to try to tidy up and simplify the rat code base. >>> >>> i know that the old code base is difficult to understand. it's crufty >>> with quite a lot of half baked features in there. there are also too >>> many interfaces. so, i'm going to start by taking a look at simplifying >>> the code base by cleaning out all the unnecessary complexity i can find. >>> >>> once this is done, it would probably be a good idea to talk about a new >>> architecture which will make the code more accessible and easier to work on. >> Candidate #1 for me is the XML serialization. I don't see any point in >> manually crafted XML serialization when we have Transformer or >> XMLStreamWriter around. Sole reason for not changing this in the past >> is the multitude of test cases in that area. > > (this decision was made during a period when rat was trying to be > self-contained) > > if switching to a standard mechanism means losing some test cases that > are hard to port, then that's ok by me.
i've going to stop the radical surgery now. please feel free to jump in with code or feedback. my next topic will be improved mime typing using tika. that should be reasonably contained. i suspect that the ant lib is currently broken but running the tests causes JRE issues on gentoo. i should be able to fix them without major disruption to the rest of the code (once i have setup a suitable environment). i have some ideas to improve the header recognition code, and the (currently faulty) logical in that area. the current code doesn't scale well as the number of headers recognised rise. this will involve a major reworking of the code in that area. i'll raise that as a separate topic when i'm ready to take that on. - robert
