Thanks very much Henok for your reply. I would be very much interested in
your thesis, and any code that you may provide. Is your thesis published
online? Is it in english? Your approach seems very interesting, and I
would be very interested in looking at the details. Some ideas I had were
us
i have a thesis work which i have done. it was on lega documents. the XML IR
systems are very susceptible for producing duplicate or near duplicate contents
(not in concept, but in textual content ).
here is what i did .
i tag each article content in the legal documents, with their status, and th