2009/9/17 Emad Nawfal (عماد نوفل) <emadnaw...@gmail.com>: > Hi Tutors, > I want to color-code the different parts of the word in a morphologically > complex natural language. The file I have looks like this, where the fisrt > column is the word, and the second is the composite part of speech tag. For > example, Al is a DETERMINER, wlAy is a NOUN and At is a PLURAL NOUN SUFFIX > > Al+wlAy+At DET+NOUN+NSUFF_FEM_PL > Al+mtHd+p DET+ADJ+NSUFF_FEM_SG > > The output I want is one on which the word has no plus signs, and each > segment is color-coded with a grammatical category. For example, the noun is > red, the det is green, and the suffix is orange. Like on this page here: > http://docs.google.com/View?id=df7jv9p9_3582pt63cc4
Here is an example that duplicates your google doc and generates fairly clean, idiomatic HTML. It requires the HTML generation package from http://pypi.python.org/pypi/html/1.4 from html import HTML lines = ''' Al+wlAy+At DET+NOUN+NSUFF_FEM_PL Al+mtHd+p DET+ADJ+NSUFF_FEM_SG '''.splitlines() # Define colors in a CSS stylesheet styles = ''' .NOUN {color: red } .ADJ {color: brown } .DET {color: green} .NSUFF_FEM_PL, .NSUFF_FEM_SG {color: blue} ''' h = HTML() with h.html: with h.head: h.title("Example") h.style(styles) with h.body(newlines=True): for line in lines: line = line.split() if len(line) != 2: continue word = line[0] pos = line[1] zipped = zip(word.split("+"), pos.split("+")) for part, kind in zipped: h.span(part, klass=kind) h.br print h Kent _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor