Re: Retrieving content of hyperlinked slides in powerpoint files(.PPTX) through apache POI

simanchal maharana Thu, 30 Jan 2014 23:44:53 -0800

Hi David,

Thanks lot for your suggestion.
Actually its for translation of PPTX files. So I have to replace whole
paragraph with its translation.
But paragraph is combination of some <a:r> ie; CTRegularTextRun, and
again <a:r> is parent of <a:t>.


1. I saw XML of each paragraph, but never faced more than one <a:t> in
one <a:r> (CTRegularTextRun), but in code it gives array of T. so
write code in this way.

2. I have to replace whole paragraph content by its translation. Its
not possible to divide translated text as per <a:r> or <a:t>. So I
removed all siblings of <a:r> ie; CTRegularTextRun except 1st one and
I am replacing <a:t> content of 1st CTRegularTextRun of paragraph by
its translation.

3. For blank paragraphs number of ctRegularTextRun is zero I guess. so
while setting its content by
    ctRegularTextRun[0].setT("Some Translated Text"); gives
ArrayIndexOutOfBound exception. So I check for length. I
    have modified it.
String originalParaText = replaceUnwantedChar(xslfParagraph.getText());
if ( ! originalText.isEmpty()) {

I am doing all operation. So now I don't need to check for length of
CTRegularTextRun[] for that paragraph.
Thanks lot for this suggestion.

4. Now I figured out the difference among

for(int index = 1; index <= ctRegularTextRun.length-1; index++) and
for(int index = ctRegularTextRun.length-1; index > 0 ; index--).

While traversing in forward direction (1st one) it gives
IndexOutOfBoundsException while traversing in backward (2nd one) it
works fine. So I was deleting ctRegularTextRun in backward direction
whereas I was leaving 1st ctRegularTextRun ie; present at 0th index.

but now I can put both.

CTRegularTextRun[] ctRegularTextRun = xslfParagraph.getXmlObject().getRArray();
 for(int index = ctRegularTextRun.length-1; index > 0 ; index--){
            xslfParagraph.getXmlObject().removeR(index);
 }

or

for(int index = 1; index <= ctRegularTextRun.length-1; index++){
             xslfParagraph.getXmlObject().removeR(1);
}

Thanks lot for your suggestion.
Simanchal



On Fri, Jan 31, 2014 at 4:17 AM, David Law-2 [via Apache POI]
<[email protected]> wrote:
> Simanchal,
>
> may I ask a couple of stupid questions?
>
> I've removed some dead code & what's left
> in the heart of all those nested if's & for's is this:
>
> CTRegularTextRun[] ctRegularTextRun =
> xslfParagraph.getXmlObject().getRArray();
>
> for (int index = ctRegularTextRun.length - 1; index > 0; index--) {
>      xslfParagraph.getXmlObject().removeR(index);
> }
> if (ctRegularTextRun.length > 0) {
>      ctRegularTextRun[0].setT("");
> }
>
> First you get an Array of all CTRegularTextRuns contained in the XmlObject.
> Then you remove them all from the XmlObject.
> (they now only exist in the Array you just got)
> Finally you set the T Element of (only!) the 1st CTRegularTextRun (if
> present) to "".
>
> Q1) Now I wonder why you need to iterate backwards through the Array?
> Q2) Setting the T Element will have no effect (because you have just
> deleted all R's from the XmlObject?!
>
> All the best,
> DaveLaw
>
>
> On 30.01.2014 04:44, simanchal maharana wrote:
>
>> Hi Andreas,
>>
>> PFA PPTX file for your review.
>>
>> Thanks,
>> Simanchal
>>
>> On Thu, Jan 30, 2014 at 2:44 AM, Andreas Beeker [via Apache POI]
>> <[hidden email]> wrote:
>>> Hi,
>>>
>>> is there a chance to get your .pptx-files?
>>>
>>> - link it to your stackoverflow post [1]
>>> - or open a bugzilla entry [2]
>>> - or send it to my email address (least preferred ...)
>>>
>>> Andi.
>>>
>>>
>>> [1]
>>>
>>> http://stackoverflow.com/questions/21386211/retrieving-content-of-hyperlinked-slides-in-powerpoint-files-pptx-through-apac
>>> [2] http://issues.apache.org/bugzilla/buglist.cgi?product=POI
>>>
>>> On 29.01.2014 07:21, simanchal maharana wrote:
>>>
>>>> I am trying to get the text content of powerpoint files and replace with
>>>> some
>>>> other text. I have a powerpoint file of 20 slides. where 13,14,15,16
>>>> slides
>>>> have hyperlink to 17,18,19 and 20th slide. I am using XMLSlideshow to
>>>> traverse through the slides, But it gives only 16 slides. It does not
>>>> give
>>>> last 4 hyperlinked slides.
>>>>
>>>> Any idea really appreciable in advance how can I get content of all
>>>> hyper-linked slides and Replace by some other text.
>>>>
>>>> here is my code.
>>>>
>>>> import java.io.FileInputStream;
>>>> import java.io.FileOutputStream;
>>>> import org.apache.poi.xslf.usermodel.XMLSlideShow;
>>>> import org.apache.poi.xslf.usermodel.XSLFShape;
>>>> import org.apache.poi.xslf.usermodel.XSLFSlide;
>>>> import org.apache.poi.xslf.usermodel.XSLFTextParagraph;
>>>> import org.apache.poi.xslf.usermodel.XSLFTextShape;
>>>> import org.openxmlformats.schemas.drawingml.x2006.main.CTRegularTextRun;
>>>> public class Testing {
>>>> static String inputFile =
>>>> "C:\\Users\\SM78882\\Desktop\\Testing\\IE_Basics_English.pptx";
>>>> static String outputFile =
>>>> "C:\\Users\\SM78882\\Desktop\\Testing\\result.pptx";
>>>>
>>>> public static String replaceUnwantedChar(String originalString) {
>>>> if (null != originalString)
>>>> return "" + originalString.replaceAll("(\n+)|(\t+)|(\\s{2,})", " ")
>>>> .trim();
>>>> else
>>>> return "";
>>>> }
>>>> public static void main(String[] args) {
>>>> FileInputStream fis = null;
>>>> FileOutputStream fos = null;
>>>> XMLSlideShow ppt = null;
>>>> try {
>>>> fis = new FileInputStream(inputFile);
>>>> fos = new FileOutputStream(outputFile);
>>>> ppt = new XMLSlideShow(fis);
>>>> System.out.println("No of slides:" + ppt.getSlides().length); // gives
>>>> 16
>>>> slides.
>>>> for (XSLFSlide slide : ppt.getSlides()) {
>>>> for (XSLFShape shape : slide) {
>>>> if (shape instanceof XSLFTextShape) {
>>>> XSLFTextShape txShape = (XSLFTextShape) shape;
>>>> for (XSLFTextParagraph xslfParagraph : txShape .getTextParagraphs()) {
>>>> String originalText = replaceUnwantedChar(xslfParagraph .getText());
>>>> if (!originalText.isEmpty()) {
>>>> String translation = "";
>>>> if (translation != null) {
>>>> CTRegularTextRun[] ctRegularTextRun = xslfParagraph
>>>> .getXmlObject().getRArray();
>>>> for (int index = ctRegularTextRun.length - 1; index > 0; index--) {
>>>> xslfParagraph.getXmlObject().removeR( index);
>>>> }
>>>> if (ctRegularTextRun.length > 0)
>>>> ctRegularTextRun[0].setT(translation);
>>>> }
>>>> }
>>>> }
>>>> }
>>>> }
>>>> }
>>>> ppt.write(fos);
>>>> fos.close();
>>>> fis.close();
>>>> } catch (Exception ex) {
>>>> ex.printStackTrace();
>>>> }
>>>> }
>>>> }
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>>
>>>> http://apache-poi.1045710.n5.nabble.com/Retrieving-content-of-hyperlinked-slides-in-powerpoint-files-PPTX-through-apache-POI-tp5714766.html
>>>> Sent from the POI - User mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>>
>>> ________________________________
>>> If you reply to this email, your message will be added to the discussion
>>> below:
>>>
>>> http://apache-poi.1045710.n5.nabble.com/Retrieving-content-of-hyperlinked-slides-in-powerpoint-files-PPTX-through-apache-POI-tp5714766p5714769.html
>>> To unsubscribe from Retrieving content of hyperlinked slides in
>>> powerpoint
>>> files(.PPTX) through apache POI, click here.
>>> NAML
>>
>> Final_2.7z (16M)
>> <http://apache-poi.1045710.n5.nabble.com/attachment/5714773/0/Final_2.7z>
>> IE_Basics_English.pptx (4M)
>> <http://apache-poi.1045710.n5.nabble.com/attachment/5714773/1/IE_Basics_English.pptx>
>> PPTXParser_Code.java (4K)
>> <http://apache-poi.1045710.n5.nabble.com/attachment/5714773/2/PPTXParser_Code.java>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-poi.1045710.n5.nabble.com/Retrieving-content-of-hyperlinked-slides-in-powerpoint-files-PPTX-through-apache-POI-tp5714766p5714773.html
>
>> Sent from the POI - User mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>
> ________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-poi.1045710.n5.nabble.com/Retrieving-content-of-hyperlinked-slides-in-powerpoint-files-PPTX-through-apache-POI-tp5714766p5714785.html
> To unsubscribe from Retrieving content of hyperlinked slides in powerpoint
> files(.PPTX) through apache POI, click here.
> NAML




--
View this message in context: 
http://apache-poi.1045710.n5.nabble.com/Retrieving-content-of-hyperlinked-slides-in-powerpoint-files-PPTX-through-apache-POI-tp5714766p5714788.html
Sent from the POI - User mailing list archive at Nabble.com.

Re: Retrieving content of hyperlinked slides in powerpoint files(.PPTX) through apache POI

Reply via email to