Here is the code that should work for you:

public class CompactPackage {

    public static void main(String[] args) throws Exception {
        XMLSlideShow pptx = new XMLSlideShow(new FileInputStream(args[0]));

        OPCPackage pkg = pptx.getPackage();
        for(PackagePart mediaPart :
pkg.getPartsByName(Pattern.compile("/ppt/media/.*?"))){
            if(!isReferenced(mediaPart, pkg)) {
                System.out.println(mediaPart.getPartName() + " is not
referenced. removing.... ");
                pkg.removePart(mediaPart);
            }
        }
    }

    /**
     * Check if a package part is referenced by any other part in the
OPC package
     *
     * @param mediaPart     the part to check for references
     * @param pkg           the package this parts belong to
     * @return              whether mediaPart is referenced or not
     */
    public static boolean isReferenced(PackagePart mediaPart,
OPCPackage pkg) throws Exception {
        for(PackagePart part : pkg.getParts()){
            if(part.isRelationshipPart()) continue;

            for(PackageRelationship rel : part.getRelationships()){
                if(
mediaPart.getPartName().getURI().equals(rel.getTargetURI())){
                    //System.out.println("mediaPart[" +
mediaPart.getPartName() + "] is referenced by " + part.getPartName());
                    return true;
                }
            }
        }
        return false;
    }
}

P.S. You may want to cleanup "/ppt/embeddings/.*?" too.

Cheers,
Yegor

On Thu, May 3, 2012 at 11:07 AM, sp0065 <[email protected]> wrote:
> Yegor,
>
> Thanks! Almost done. I generated the list of files in the /ppt/media/
> as you suggested.
>
> Now I am trying to generate a list of files referenced in my
> one-slide-presentation-file. Then I am going to delete those files
> that are not referenced. So far I was able to generate list of
> relationships from the only-slide I have. Relationships look like:
> <Relationship Id="rId2"
> Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image";
> Target="../media/image1.jpeg"/>
>
> =========
> XSLFSlide slide0 = pptx.getSlides()[0];
> List<POIXMLDocumentPart> rels = slide0.getRelations();
>
> for(POIXMLDocumentPart r : rels){
>  PackageRelationship r1 = r.getPackageRelationship();
>  String name = r1.toString();
>  console.printf(name);
>  console.printf("\n");
> }
> =========
>
> Result of this code is:
> id=rId2 - container=org.apache.poi.openxml4j.opc.ZipPackage@4e79f1 -
> relationshipType=http://schemas.openxmlformats.org/officeDocument/2006/relationships/image
> - source=/ppt/slides/slide2.xml -
> target=/ppt/media/image1.jpeg,targetMode=INTERNAL
>
> I can match filename from the /ppt/media/ to the fiename in the the
> relationship using regex but it would be nice to generate only target
> part of the relationship.
>
> Any advise in terms of if this is the right direction would help.
>
>
>
>
> On Wed, May 2, 2012 at 2:25 AM, Yegor Kozlov-4 [via Apache POI]
> <[email protected]> wrote:
>> POI does remove unused parts when removing slides. A media part can be
>> referenced by multiple slides and such compaction should be done when
>> writing slideshow, not when removing slides.
>>
>>
>> The code to compact pptx files and remove unreferenced media parts can
>> look as follows:
>>
>>         List<PackagePart> mediaParts =
>> pptx.getPackage().getPartsByName(Pattern.compile("/ppt/media/.*?"));
>>         for(PackagePart part : mediaParts){
>>            // TODO: check if this media part is referenced by other
>> slides and remove if it is not
>>             if(unused) {
>>                 pptx.getPackage().removePart(part);
>>             }
>>         }
>>
>>
>> Yegor
>>
>> On Tue, May 1, 2012 at 8:05 PM, sp0065 <[hidden email]> wrote:
>>
>>> I am looking for a way to compact/shrink pptx files that are generated as
>>> a
>>> results of removing all slides but one slide with slide.removeSlide(X). As
>>> I
>>> mentioned, size of the result files is almost the same as the size of the
>>> source presentation file because unnecessary images used in the removed
>>> slides were not removed from the file (ppt\media).
>>>
>>> When I opened one-slide-file in PowerPoint and Saved As, file was
>>> compacted
>>> and unnecessary images were removed. Can I do "Save As" programmatically
>>> with Apache POI by adding some operation to my program?
>>>
>>> Thank you.
>>>
>>> --
>>> View this message in context:
>>> http://apache-poi.1045710.n5.nabble.com/How-to-split-input-pptx-file-into-a-set-of-single-slide-files-tp5631543p5678348.html
>>
>>> Sent from the POI - User mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>>
>> ________________________________
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://apache-poi.1045710.n5.nabble.com/How-to-split-input-pptx-file-into-a-set-of-single-slide-files-tp5631543p5679837.html
>> To unsubscribe from How to split input pptx file into a set of single slide
>> files, click here.
>> NAML
>
>
> --
> View this message in context: 
> http://apache-poi.1045710.n5.nabble.com/How-to-split-input-pptx-file-into-a-set-of-single-slide-files-tp5631543p5682458.html
> Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to