On Wed, Sep 22, 2010 at 02:11:31PM +0200, carlosgc wrote: > Excerpts from suzuki toshiya's message of miƩ sep 15 12:16:22 +0200 2010: > > Hi, > > Hi, > > > Attached patches are the introduction of new API to access raw text. > > I wish some maintainer of poppler-glib can review it. > > Yes, sorry for the delay. > > > poppler-0.15.0_glib-lib.diff > > patch to declare new function and its implementation > > > > I prefer poppler_page_get_raw_text(), rather than > poppler_page_get_selected_raw_text(), and always return the text of > the whole page. I don't see why you might want the selected text in > raw order.
I've made that function. Here's the patch.
From 389d49e3413ce09601b574308bd6bbd46044e6b3 Mon Sep 17 00:00:00 2001 From: danigm <[email protected]> Date: Wed, 5 Jan 2011 14:07:59 +0100 Subject: [PATCH] [glib] Added poppler_page_get_raw_text function --- glib/poppler-page.cc | 54 +++++++++++++++++++++++++++++++++++++++++++++++++- glib/poppler-page.h | 1 + 2 files changed, 54 insertions(+), 1 deletions(-) diff --git a/glib/poppler-page.cc b/glib/poppler-page.cc index a8e6b2d..8966f7e 100644 --- a/glib/poppler-page.cc +++ b/glib/poppler-page.cc @@ -2117,7 +2117,7 @@ poppler_page_get_crop_box (PopplerPage *page, PopplerRectangle *rect) * This array must be freed with g_free () when done. * * The position in the array represents an offset in the text returned by - * poppler_page_get_text() + * poppler_page_get_raw_text() * * Return value: %TRUE if the page contains text, %FALSE otherwise * @@ -2200,3 +2200,55 @@ poppler_page_get_text_layout (PopplerPage *page, return TRUE; } + +/** + * poppler_page_get_raw_text: + * @page: A #PopplerPage + * + * Return value: a pointer to the text page in raw order + * as a string + * + **/ +char * +poppler_page_get_raw_text (PopplerPage *page) +{ + TextPage *text; + TextWordList *wordlist; + TextWord *word, *nextword; + char *craw_text; + GooString *raw_text; + int i = 0; + + raw_text = new GooString(); + + g_return_val_if_fail (POPPLER_IS_PAGE (page), FALSE); + + text = poppler_page_get_text_page (page); + wordlist = text->makeWordList (gFalse); + + if (wordlist->getLength () <= 0) + return NULL; + + for (i = 0; i < wordlist->getLength (); i++) + { + word = wordlist->get (i); + raw_text->append (word->getText ()); + + nextword = word->getNext (); + if (nextword) + { + raw_text->append (' '); + } + else + { + raw_text->append ('\n'); + } + } + + craw_text = g_strdup (raw_text->getCString ()); + + delete wordlist; + delete raw_text; + + return craw_text; +} diff --git a/glib/poppler-page.h b/glib/poppler-page.h index d40c0ee..333cb23 100644 --- a/glib/poppler-page.h +++ b/glib/poppler-page.h @@ -128,6 +128,7 @@ void poppler_page_get_crop_box (PopplerPage *page, gboolean poppler_page_get_text_layout (PopplerPage *page, PopplerRectangle **rectangles, guint *n_rectangles); +char *poppler_page_get_raw_text (PopplerPage *page); /* A rectangle on a page, with coordinates in PDF points. */ #define POPPLER_TYPE_RECTANGLE (poppler_rectangle_get_type ()) -- 1.7.3.4.742.g987cd
pgpp1zaMZq4WA.pgp
Description: PGP signature
_______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
