The only file format that I can really talk about here is the newer SpreadsheetML format in the .xlsx files. These files are simply zipped xml and so it is very easy to look into the internal workings and see just where the information is stored. To discover how to drill down to the actual path to an external file, I simply created two workbooks and linked from a cell in one workbook to a cell in another workbook using the familar =[Book1.xlsx]'Sheet1'!A1 syntax. After saving the second workbook away to disc, I unzipped it and had a good dig around inside it's contents.
Lookinf into the xml for the workbbok, the contents of cell A1 were just as you described them, the formula I entered had been modified by Excel to look like this - [1]'Sheet1'!A1. The integer value 1 had replaced the name of and path to the external workbook and the question then was, how to resolve this back to the actual path. The answer is really quite straightforward becuase, when the file was saved originally, Excel created a folder called externalLinks and in this folder there were two entries, another folder called _rels and a file called externalLink1.xml. The externalLink1.xml file contained an entry that allows a link to be formed between an identifier - in this case rID1 and the specific cell in question. That identifier then makes it possible to drill down into the single file that the _rels folder hods which goes by the name of externalLink1.xml.rels. This file contains xml markup that links the identifier to an actual file through the Target attribute of the Relationship element. To summarise then, the process would be; 1. Get the formula from the cell and if it contains a link to an external file; I guess searching for embedded '[' and ']' characters would be a good place to start. 2. If the formula does contain a link to an external file, then look into the externalLink1.xml file, search for the cells reference and locate the associated relationship id for the external workbook. 3. Look into the externalLink1.xml.rels file and locate the relationship element with the matching identifier. Read the associated Target attribute and that should be the address of the external file. It should be possible to do all of this from within POI as it is possible to recover the various relationships that exist between elements but I have never tried to do this myself. If you fancy having a go, then post to the list and see if anyone can help you accomplish this task and, if you do manage to find out how to do it, please post to the list so that others can benefit from your work. It may also be worth posting onto the dev list to ask for specific help with this problem. The people who frequent that list have a far better knowledge of the internal workings of POI than do I. Finally, I created a workbook with just a single cell linked to an external file and so have not tested out this theory; I would bet that more than one externalLink1.xml and externalLin1.xml.rels file will be created if there are multiple liks to different external workbooks. I will through levae this for you to test out as I guess you have a few files on hand that you could unzip. With this information, it may be possible to refine the algorithm slightly and make the search for the names of and paths to external workbooks more efficient. Now, with regard to the older binary file format, life is a little harder as it is much more difficult to look into the file and see just where specific items of data are stored. You may want to begin by taking a look at the ForkedEvaluator class that does offer some support for linked wokrbooks but I do not know how much use this will be to you. Aside from that, I do not know how to suggest you proceed with the older file format other than using the BiffViewer and reviewing Microsoft's file format specification to locate whereabouts in the file the external workbooks ID is stored. Yours Mark B -- View this message in context: http://apache-poi.1045710.n5.nabble.com/get-Path-of-cells-referenced-to-cells-of-another-file-tp3362200p3362651.html Sent from the POI - User mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
