On 11/08/13 09:24, Ralph Böhme wrote:
Hi
Hello Ralph,
over the last days I've been testing Tracker 0.16.2 with a large set of real world test data which are mostly PDFs from various sources. From the start I ran into an issue that tracker-extract repeatedly got stuck in one PDF or another. Looking at the Tracker debug logs, process stack back-traces and other debug data, it seemed the tracker-extract parent and child processes where somewhere deadlocked sending data from the child to the parent via the pipe IPC fd. Um... parent (with a glib mainloop), child, glib, fork() ? <https://developer.gnome.org/glib/2.37/glib-The-Main-Event-Loop.html> "On Unix, the GLib mainloop is incompatible with fork(). Any program using the mainloop must either exec() or exit() from the child without returning to the mainloop."
Hmm...
As I'm only really at the beginning of an in depth analysis, I can't say for sure that the hangs I see are the cause of this, but knowing there seems to exist a fundamental design flaw in tracker-extra-pdf, I'm asking for thoughts on this.
So we use the parent/child set up because some PDFs take a REALLY long time to process and we have a 10 second window for them to be indexed. After that we kill the child process and return. We did this because we didn't want to kill the tracker-extract process all the time. In reality, this is actually what tracker-extract was built to do, so ...
Afaict, the right design would involve an exec() in the child and using some other IPC channel. I'll happily volunteer.
Yea, so we are actually calling exit() in the child. See: extract_content_child_process()
Thoughts?
Are you sure it isn't a difficult PDF taking too long? -- Regards, Martyn Founder & Director @ Lanedo GmbH. http://www.linkedin.com/in/martynrussell _______________________________________________ tracker-list mailing list tracker-list@gnome.org https://mail.gnome.org/mailman/listinfo/tracker-list