I use POI 3.8 in an application that interfaces with another (third party)
application to write out custom properties to office documents. We have a
client who has an unusually large dataset (can reach many thousand) they need
to write out to the document and I ran into two issues:
When using a .DOC, the code writes the properties extremely fast, however if
the size of the property collection exceeds 512k (based on disk file observed
size) MS Word will choke when loading the custom properties collection. This
seems to be a Microsoft limitation.
When using a .DOCX the problem is different. The issue becomes the performance
of writing each new properties once the size of the collection reaches into the
thousands. The first thousand or so properties aren't that bad, but gets much
worse as the collection gets larger. In my time trials, monitoring timing each
batch of 25 property writes, the first batch of 25 entries took 0.028 seconds,
the batch at the 1k mark took 1.1 seconds, the batch at 2k took 4.3 seconds,
and the batch at 3K took 10.1 seconds. I have reason to believe the size of
the dataset may reach much larger than 3k.
For various administrative and legal reasons, producing a derivative work from
POI is unpalatable, and it would be vastly more preferable to have a standard
version of POI that addresses our issue.
The feature I am requesting is a function to add a custom property that does
not check for the existence of a property of the same name before adding. My
application will accept responsibility for ensuring no duplicates. That seems
the least disruptive way to gain the performance I need.
I tacked the performance issue to the code that checks the name of the new
property against what is already in the list before allowing the add to
continue. Perhaps an overload with a Boolean to check for duplicates, or
else a new function name such as addPropertyNoCheck() would work.
My application has the following code:
import org.apache.poi.POIXMLProperties;
...
protected POIXMLProperties.CustomProperties _custProps = null;
....
this._custProps = this._xDoc.getProperties().getCustomProperties();
...
this._custProps.addProperty(id, val);
In the profiler, nearly all the extra time comes from the call to contains()
within add()
public void addProperty(String name, String value){
CTProperty p = add(name);
p.setLpwstr(value);
}
private CTProperty add(String name) {
if(contains(name)) {
throw new IllegalArgumentException("A property
with this name " +
"already exists in the custom
properties");
}
CTProperty p = props.getProperties().addNewProperty();
int pid = nextPid();
p.setPid(pid);
p.setFmtid(FORMAT_ID);
p.setName(name);
return p;
}
public boolean contains(String name){
for(CTProperty p :
props.getProperties().getPropertyList()){
if(p.getName().equals(name)) return true;
}
return false;
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]