Hello all,
did anyone try this? It does not work for me, I tried running it on a sample PPT found on Google (http://www.kcmetro.cc.mo.us/longview/ctac/powerpoint/ct.ppt) as well as some PPT I created myself in OpenOffice. The results are a number of empty text files and error messages like this (copy of stdout, with empty lines removed -- there were three before each line starting with a slash):
<snip> \Current User org.apache.poi.hpsf.NoPropertySetStreamException \PowerPoint Document org.apache.poi.hpsf.NoPropertySetStreamException \PersistentStorage Directory org.apache.poi.hpsf.NoPropertySetStreamException \DocumentSummaryInformation org.apache.poi.hpsf.NoPropertySetStreamException \SummaryInformation org.apache.poi.hpsf.NoPropertySetStreamException \Text_Content org.apache.poi.hpsf.NoPropertySetStreamException \Object4CompObj org.apache.poi.hpsf.NoPropertySetStreamException \Object4Ole10Native org.apache.poi.hpsf.NoPropertySetStreamException \Object4Ole org.apache.poi.hpsf.NoPropertySetStreamException \Object6CompObj org.apache.poi.hpsf.NoPropertySetStreamException \Object6Ole10Native org.apache.poi.hpsf.NoPropertySetStreamException \Object6Ole org.apache.poi.hpsf.NoPropertySetStreamException \Object8CompObj org.apache.poi.hpsf.NoPropertySetStreamException \Object8Ole10Native org.apache.poi.hpsf.NoPropertySetStreamException \Object8Ole org.apache.poi.hpsf.NoPropertySetStreamException \Object9CompObj org.apache.poi.hpsf.NoPropertySetStreamException \Object9OlePres000 org.apache.poi.hpsf.NoPropertySetStreamException \Object9Ole10Native org.apache.poi.hpsf.NoPropertySetStreamException \Object9Ole org.apache.poi.hpsf.NoPropertySetStreamException \Object10CompObj org.apache.poi.hpsf.NoPropertySetStreamException \Object10Ole10Native org.apache.poi.hpsf.NoPropertySetStreamException \Object10Ole org.apache.poi.hpsf.NoPropertySetStreamException \Object7CompObj org.apache.poi.hpsf.NoPropertySetStreamException \Object7Ole10Native org.apache.poi.hpsf.NoPropertySetStreamException \Object7Ole org.apache.poi.hpsf.NoPropertySetStreamException \Object5CompObj org.apache.poi.hpsf.NoPropertySetStreamException \Object5Ole10Native org.apache.poi.hpsf.NoPropertySetStreamException \Object5Ole org.apache.poi.hpsf.NoPropertySetStreamException \Object2CompObj org.apache.poi.hpsf.NoPropertySetStreamException \Object2OlePres000 org.apache.poi.hpsf.NoPropertySetStreamException \Object2Ole10Native org.apache.poi.hpsf.NoPropertySetStreamException \Object2Ole org.apache.poi.hpsf.NoPropertySetStreamException \Object3CompObj org.apache.poi.hpsf.NoPropertySetStreamException \Object3Ole10Native org.apache.poi.hpsf.NoPropertySetStreamException \Object3Ole org.apache.poi.hpsf.NoPropertySetStreamException \Object1CompObj org.apache.poi.hpsf.NoPropertySetStreamException \Object1OlePres000 org.apache.poi.hpsf.NoPropertySetStreamException \Object1Ole10Native org.apache.poi.hpsf.NoPropertySetStreamException \Object1Ole org.apache.poi.hpsf.NoPropertySetStreamException \Header org.apache.poi.hpsf.NoPropertySetStreamException <snap>
Any idea what is going wrong?
Thanks, Peter
Koundinya (Sudhakar Chavali) wrote:
Hi All,
We have done initail ground work for extracting PowerPoint 2 text. We would like to say thanks to POI group. Though the base work is rough, we are able to extract the text from PowerPoint.
Sorry for bad programming. But hope this wll be helpful to make
the good program from this scrath by the efficient developers.
Here is the sample. When ever there are modifictaions, we will post the information.
import java.io.*; import java.util.*; import org.apache.poi.hpsf.*; import org.apache.poi.poifs.eventfilesystem.*; import org.apache.poi.util.HexDump; import org.apache.poi.util.LittleEndian;
public class PPT2Text { public static void main(String[] args) throws IOException { final String filename = args[0]; POIFSReader r = new POIFSReader();
/* Register a listener for *all* documents. */ r.registerListener(new MyPOIFSReaderListener()); r.read(new FileInputStream(filename)); }
static class MyPOIFSReaderListener implements POIFSReaderListener {
static int filename=1;
public void processPOIFSReaderEvent(POIFSReaderEvent event) {
PropertySet ps = null;
try { org.apache.poi.poifs.filesystem.DocumentInputStream dis=null;
System.out.println("\n\n"); System.out.println(event.getPath()+event.getName()); dis=event.getStream(); /* byte btoWrite[]= new byte[12];
dis.read(btoWrite);
System.out.println("Version :"+LittleEndian.getUnsignedByte(btoWrite,0)); System.out.println("Instance :"+LittleEndian.getUShort(btoWrite,0)); System.out.println("Type :"+LittleEndian.getUShort(btoWrite,2)); System.out.println("Len :"+LittleEndian.getLong(btoWrite,4));
*/
FileOutputStream fos= new FileOutputStream(""+filename+".txt");
byte btoWrite[]= new byte[dis.available()]; dis.read(btoWrite,0,dis.available()); for(int i=0;i<btoWrite.length-20;i++) { //System.out.println("Version :"+LittleEndian.getUnsignedByte(btoWrite,i+0)); //System.out.println("Instance :"+LittleEndian.getUShort(btoWrite,i+0)); //System.out.println("Type :"+LittleEndian.getUShort(btoWrite,i+2)); //System.out.println("Len :"+LittleEndian.getUInt(btoWrite,i+4));
long type=LittleEndian.getUShort(btoWrite,i+2); long size=LittleEndian.getUInt(btoWrite,i+4); if (type==4008) { fos.write(btoWrite,i+4+1,(int)size+3);
}
}
filename++; //System.out.println(event.getStream().toString()); //ps = PropertySetFactory.create(event.getStream()); } catch (Exception ex) { //System.out.println("No property set stream: \"" + event.getPath() + // event.getName() + "\""); System.out.println(ex); return; } } }
}
thanks, Sudhakar
=====
"No one can earn a million dollars honestly."- William Jennings Bryan (1860-1925)
"Make everything as simple as possible, but not simpler."- Albert Einstein (1879-1955)
"It is dangerous to be sincere unless you are also stupid."- George Bernard Shaw (1856-1950)
__________________________________ Do you Yahoo!? Yahoo! Finance Tax Center - File online. File on time. http://taxes.yahoo.com/filing.html
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
