I use POI 3.8 in an application that interfaces with another (third party) 
application to write out custom properties to office documents.  We have a 
client who has an unusually large dataset (can reach many thousand) they need 
to write out to the document and I ran into two issues:

When using a .DOC, the code writes the properties extremely fast, however if 
the size of the property collection exceeds 512k (based on disk file observed 
size) MS Word will choke when loading the custom properties collection.  This 
seems to be a Microsoft limitation.

When using a .DOCX the problem is different.  The issue becomes the performance 
of writing each new properties once the size of the collection reaches into the 
thousands.  The first thousand or so properties aren't that bad, but gets much 
worse as the collection gets larger.  In my time trials, monitoring timing each 
batch of 25 property writes, the first batch of 25 entries took 0.028 seconds, 
the batch at the 1k mark took 1.1 seconds, the batch at 2k took 4.3 seconds, 
and the batch at 3K took 10.1 seconds.  I have reason to believe the size of 
the dataset may reach much larger than 3k.

For various administrative and legal reasons, producing a derivative work from 
POI is unpalatable, and it would be vastly more preferable to have a standard 
version of POI that addresses our issue.

The feature I am requesting is a function to add a custom property that does 
not check for the existence of a property of the same name before adding.   My 
application will accept responsibility for ensuring no duplicates.  That seems 
the least disruptive way to gain the performance I need.

I tacked the performance issue to the code that checks the name of the new 
property against what is already in the list before allowing the add to 
continue.    Perhaps an overload with a Boolean to check for duplicates, or 
else a new function name such as addPropertyNoCheck() would work.

My application has the following code:
import org.apache.poi.POIXMLProperties;
...
protected POIXMLProperties.CustomProperties _custProps = null;
....
this._custProps = this._xDoc.getProperties().getCustomProperties();
...
this._custProps.addProperty(id, val);


In the profiler, nearly all the extra time comes from the call to contains() 
within add()
                 public void addProperty(String name, String value){
                        CTProperty p = add(name);
                        p.setLpwstr(value);
                }
                private CTProperty add(String name) {
                        if(contains(name)) {
                                throw new IllegalArgumentException("A property 
with this name " +
                                                "already exists in the custom 
properties");
                        }

                        CTProperty p = props.getProperties().addNewProperty();
                        int pid = nextPid();
                        p.setPid(pid);
                        p.setFmtid(FORMAT_ID);
                        p.setName(name);
                        return p;
                }

                public boolean contains(String name){
                        for(CTProperty p : 
props.getProperties().getPropertyList()){
                                if(p.getName().equals(name)) return true;
                        }
                        return false;
                }

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to