Re: [Xournal-devel] Xournal

Denis Auroux Sat, 01 Jan 2011 11:23:59 -0800

Dear Andreas,

Thanks for the quick reply!


> The only thing which is new (and was really complicated;-)) is that the
> eraser not only look to the end of the line (start and endpoint) but
> also to the line itself. I calculate a line from the eraser to the line
> which is right angled, so I can also delete long lines, vector geometry,
> you know;-)

Good :-) (actually, my code had a hack to subdivide strokes so that no 
line segment would be too long, but I agree that actually calculating is 
better...)

> I have tried your version on Windows 7, but there is only the cursor,
> not "extended input".

I am not familiar with Windows 7, but at least on Windows XP, to get 
"extended input" support one needs to install the pen driver from 
Wacom's web site. (http://www.wacom.com/tabletpc/driver.cfm). This 
probably only works with Wacom hardware, not with other brands of 
digitizers. (More generally, a layer of library code for GTK+-Win32 to 
be aware of pen events in Windows is missing; I'm not sure how the Wacom 
driver works).

> I didn't try to port it to windows yet, but it
> shouldn't be a big problem, because I don't use more libs than you...

Agreed.

> The next semester starts in February, if its more ore less finished
> until February (what I hope) I will use it for all day work, so I will
> "test" it...
>
> I'll study at least 1.5 years, until then its for sure maintained, later
> we well see... (I don't know what I will do in the future, but I spend
> to much time in this project "to throw it away" after this time;-))

Sounds good! And don't worry, no pressure. I'm just really excited to 
hear about your work, but take your time -- there's no major hurry. (One 
of the nice things about working on what's still a relatively 
small-scale program rather than a much bigger project).

>> (searching in handwritten annotations, handling alternative file
>> formats, etc.)
> I think other fileformats are not so important, because you can export
> the other files to a PDF file and open it with Xournal.

Agreed. But I didn't mean other formats to annotate (even though the 
case for djvu is almost convincing), rather formats to use for storing 
journals. Namely:

- some people suggested Xournal should work with SVG files instead of 
its own special file format, as some other tools (Jarnal for instance) 
use SVG-based formats, so supporting a subset of SVG as a native format 
would allow better compatibility with these + perfect export to Inkscape.

- others argued that the annotations can be stored inside a PDF file by 
embedding the XOJ contents as a PDF object, so that there's no need to 
export anymore, and any PDF viewer can view a xournal file (as 
read-only). This is also interesting (+ another cool PDF hack that I'm 
kind of eager to try).

- and of course, the holy grail would be to hack into MS Journal's JNT 
format and read/write it, but that seems very hard. It didn't look 
obvious to me at least.

Anyway, definitely not things for you to worry about. But I kind of like 
these crazy challenges. I'm definitely more a hacker type than a 
typical, responsible software developer.

> But search in handwritten annotations would be a good feature;-) I
> thought about this, but I think this is extremely complicated...

Yes, I agree it's very hard. My idea is that handwriting recognition is 
way too difficult to implement (requires a lot of knowledge, 
dictionaries, etc., and some of the algorithms are patented); but if the 
goal is to search within your own notes and your handwriting is 
consistent, then it might be easier to match a handwritten word to 
search for with notes taken in the same handwriting! I have some naive 
ideas about how to approach this, but they may well not work. So I won't 
promise anything. Just another challenge to keep me interested :)

> Windows Journal has this feature. I tested it, also the windows
> handwriting recognizion. But I think they use a wordlist, and if the
> word is not in the list they have problems to recognize the word...

This is indeed a very difficult problem algorithmically, and without a 
dictionary it is almost hopeless. One thing that is easier to implement 
is recognition of text where each character is written within a 
delimited box -- try for example CellWriter. It is still a hard 
algorithmic problem, but it has been studied and reasonably good 
solutions exist. Unfortunately this is not very useful for freehand 
note-taking. Anyway, I think Xournal has proved its usefulness even 
without handwriting recognition, but it's something I want to keep in mind.

> We will see, I don't know exactly how long it will take, on of the
> biggest problem at the moment is PDF export, I used the ciaro pdf
> library, and it works, but it takes about 3 minutes to create a 10 page
> PDF... So I have to see what I will do with the PDF export, I thought I may 
> can
> use the parsed objects from Poppler...

In case you didn't notice, xo-print.c contains actually two essentially 
independent solutions to this problem, and I think you can probably use 
them "as is". Neither is perfect -- they both have advantages and 
drawbacks. (Currently one is used for "export to PDF" and the other for 
"print -> print to file -> PDF"). The second solution is probably quite 
close to what you are doing.

1) the first "half" (lines 1-1408) of xo-print.c deals with parsing 
manually a PDF file (assuming it's not encrypted nor in some other weird 
flavor of PDF) and combining the existing PDF code with PDF code 
corresponding to the objects of the journal.  This is all done 
completely by hand, and it's the code that's connected to the "File -> 
Export to PDF" command.

Advantages: very fast, memory-efficient, doesn't rely on any libraries. 
Drawbacks: doesn't work on all PDF files (encrypted or otherwise 
protected the main exception); can't combine multiple PDF files; font 
handling is not perfect (because the PDF generator is custom-written and 
its font extraction code is a bit quirky); hard to maintain (I'm 
probably the only one who understands how it's supposed to work).

More precisely:

- until line 506 is a rudimentary PDF parser I wrote in my spare time, 
with pdf_parse_info() the main function that loads the structure into 
memory. It doesn't extract any actual contents of the PDF file, just the 
directory of objects and what is needed to generate an annotated PDF by 
adding extra drawing commands on top of the existing pages.

- lines 506-678 are functions helpful to add things onto PDF data;
- lines 678-774: draw plain or bitmap backgrounds into a PDF;
- lines 774-1010: code to embed a font into a PDF file, inspired by
the Sun Font Tools, and using some of theircode, see the calls to
OpenTTFont, CreateTTFromTTGlyphs, CloseTTFont within embed_pdffont()
(This is very convoluted and not ideal. The Sun Font Tools code should 
probably be updated at some point to use code from Cairo for PDF font 
embedding).

- then pdf_draw_page() draws the objects on a page into a PDF;
- and finally print_to_pdf() is the main function that produces a PDF 
file from the journal structures in memory (journal.pages) +
background PDF file (bgpdf.file_contents).

2) the second "half" of xo-print.c prints via GtkPrint / Cairo. This is 
often a more reasonable way of doing things, and the code is much 
shorter (lines 1409-1558) and conceptually a lot simpler. (It's probably 
very similar to what you are trying to do). Advantages: easy to 
maintain, standard code without any quirky hacks; should work almost all 
the time; also handles actual printing to a printer, and exporting to 
other GtkPrint formats such as SVG. Drawbacks: there is a good number of 
PDF files that somehow get completely mangled in the PDF -> Cairo -> PDF 
conversion and produce much larger / much slower output.

Currently this is only connected to the File -> Print command; selecting 
"Print to File" and "PDF" in the print dialog box makes GtkPrint produce 
a printing context that generates a PDF file. There should be a way to 
ask GtkPrint to make such a print job without bringing up the printing 
dialog box, and then it's a PDF export function.


My vote would be: assuming the internal data structures and types of 
objects that can be embedded haven't changed too much, you might as well 
keep the "export to PDF by hand" code somewhere in there (perhaps in a 
separate file so it doesn't pollute your nicer code). As long as the 
Journal/Page/Layer/Item structures haven't changed too much and you 
still have at most one PDF background file referenced by the xoj, it'll 
work fine without any adaptation!

I'll be happy to make the updates needed for supporting image objects 
(as in the "Insert Image" patch) (for now those probably just don't get 
exported) or anything else that comes from extending the capabilities. 
I'm guessing that at some point you might want to support a .xoj that 
annotates multiple PDF files at once; this will be much more challenging 
and I can't promise that the code will adapt to that setting, but I'm 
willing to give it a try at some point.

But this is really up to you -- it does make the code significantly less 
clean and you might want to start with a cleaner code base.

If you do keep it, the sensible thing to do might be to have a config 
option switch to decide which of the two paths ("everything by hand" vs. 
"print-to-PDF via GtkPrint/Cairo") is used when the user exports to PDF.

>> or anything else that you feel I'd be best qualified for
> You wrote you are a professor, which "topic" (or how you call this, I
> don't know;-))? Computer science? Mathematics?

I'm a math professor, but I don't think the kind of math I do has any 
relevance here. I'm also a hacker at heart. Unfortunately I don't have 
much formal training in computer science (it probably shows in my coding 
style). I'm by now fairly experienced at writing C code, and have a bit 
of understanding of the internal workings of GTK+. Of course the main 
asset is that I understand the existing xournal code well -- so things 
like "how did you deal with this issue?" or "how does this thing work?" 
are easy for me to answer. And I like challenges :)

> I use Ubuntu for development, I didn't test it on other distributions,
> but it shouldn't be a problem...

Don't even worry about it. At some point I'd be happy to give it a try 
on CentOS -- one of the most outdated distributions out there (GTK+ 
2.10.4, Cairo 1.2.4, and so on), but the one forced on me by the 
Berkeley sysadmins, so I happen to care a bit about backwards 
compatibility with old versions of libraries.

> I didn't understand your stroke-recognizer, but I simply use your code,
> and it worked, so this is not a problem at the moment;-)

Sure! (For your info: the idea is to think of a stroke as a piece of 
wire, and compute its center of mass, moment of inertia, etc. -- as in 
solid mechanics -- to decide how straight it is. Then repeat after 
subdividing. This detects triangles and other polygons quite well.
How well the various straight line pieces fit together is then used to 
decide whether a polygon is a rectangle, etc.).

There's a couple of bugs in that code that I need to look into and fix 
someday, but nothing urgent -- it'll wait.

I'll be happy to provide more details upon request if you ever need to 
know more.

> Ok, you will hear from me when I have a complete "working" release, at
> the moment there are some parts which are not working.

Sure! Sounds great.
Best wishes,

Denis

-- 
Denis Auroux                             aur...@math.berkeley.edu
University of California, Berkeley       Tel: 510-642-4367
Department of Mathematics                Fax: 510-642-8204
817 Evans Hall # 3840
Berkeley, CA 94720-3840

------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Xournal-devel mailing list
Xournal-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xournal-devel

Re: [Xournal-devel] Xournal

Reply via email to