On 12/01/2011 16:24, Giuseppe Penone wrote:
> Yes I also was thinking that, being the first two chars not valid (\0xff and
> \0xfe)
That would be the BOM (Byte Order Mark)...
, the problem is that I cannot find a reference to understand what is
> the encoding according to those chars.
... for UTF-16LE (or UTF-16 for short). You'll also want to be careful
about NULL characters.
The attached fragment accepts "html" pastes from firefox/thinderbird
and correctly shows the Arabic fragment from your original message
when copied from thunderbird.
Hey, it even honors RTL, which is kinda neat :)
mvg,
Dieter
import gtk
def on_paste(textview, clipboard):
textview.stop_emission("paste-clipboard")
targets = clipboard.wait_for_targets()
if 'text/html' in targets:
clipboard.request_contents('text/html', paste_html,
textview.get_buffer())
return True
def paste_html(clipboard, selectiondata, textbuffer):
selection_data = selectiondata.data.decode('utf_16').replace('\x00', '')
textbuffer.insert_at_cursor(selection_data)
return True
if __name__ == '__main__':
clipboard = gtk.clipboard_get()
window = gtk.Window()
window.connect('delete-event', gtk.main_quit)
buffer = gtk.TextBuffer()
textview = gtk.TextView(buffer)
textview.connect('paste-clipboard', on_paste, clipboard)
window.add(textview)
window.show_all()
gtk.main()
_______________________________________________
pygtk mailing list [email protected]
http://www.daa.com.au/mailman/listinfo/pygtk
Read the PyGTK FAQ: http://faq.pygtk.org/