From the JDK docs: FileWriter is meant for writing streams of characters. For writing streams of raw bytes, consider using a FileOutputStream.
You get characters replaced depending on your platforms character encoding. You must ensure you're writing bytes and not characters! Michael On 5. Nov 2010, at 18:14, Grant Overby wrote: > First difference (on second line, first line is for reference point): > > bad: > <</Length 1372/E 1779/Filter/FlateDecode/I 1811/L 1795/O 1741/S 1423/T > 1676/V 1757>>stream > xÚ?U LSW >?Û O)]Wä!Ô>?"CATl?4PkADy ? ?RjgÊ??< õ A > > Start of second line in hex: 78 DA 3F 55 0B 4C 53 57 > > good: > <</Length 1372/E 1779/Filter/FlateDecode/I 1811/L 1795/O 1741/S 1423/T > 1676/V 1757>>stream > xÚ”U LSW >—Û O)]Wä!Ô>˜"CATl”4PkADy ‹ –Rjgʈˆ< õ A > > Start of second line in hex: 78 DA 94 55 0B 4C 53 57 > > > > > Isolated incorrect single characters are throughout the document. > Downloading it multiple times shows consistant errors. > > > I'll keep thinking on it, but nothing is apparent to me. This shouldn't > happen afaik. > > > Anyone? > > -- > Grant Overby > Senior Developer > FloorSoft, Inc. > > Often people, especially computer engineers, focus on the machines. They > think, "By doing this, the machine will run faster. By doing this, the > machine will run more effectively. By doing this, the machine will something > something something." They are focusing on machines. But in fact we need to > focus on humans, on how humans care about doing programming or operating the > application of the machines. We are the masters. They are the slaves. -- > Yukihiro Matsumoto > > > > > On Fri, Nov 5, 2010 at 6:58 PM, Yogesh <[email protected]> wrote: > >> Thanks Grant. >> But I have thousands of PDF URLs like this. I have tried around 12 so far. >> Can all of them be corrupt? >> >> What can I do about this? >> >> >> - Yogesh >> >> >> >> >> On 5 November 2010 18:53, Grant Overby <[email protected]> wrote: >> >>> I ran the code [2]. The pdf is corrupted by the code as MD5s are >>> different. >>> File sizes are identical [1]; >>> >>> 1: >>> 11/05/2010 06:47 PM 2,371,050 msb201055.pdf >>> 11/05/2010 06:46 PM 2,371,050 My.pdf >>> >>> >>> >>> 2: >>> package s; >>> >>> import java.io.FileWriter; >>> import java.io.InputStream; >>> import java.io.IOException; >>> import java.net.URL; >>> import java.net.URLConnection; >>> import java.net.MalformedURLException; >>> >>> public class Main >>> { >>> public static void main(String[] args) throws IOException >>> { >>> URL url = new URL(" >>> >>> http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2947364/pdf/msb201055.pdf?tool=pmcentrez >>> "); >>> >>> URLConnection con = url.openConnection(); >>> >>> InputStream in = con.getInputStream(); >>> >>> FileWriter out = new FileWriter("C:/My.pdf"); >>> >>> int next = 0; >>> while ( ( next = in.read() ) != -1 ) { >>> out.write(next); >>> } >>> out.flush(); >>> out.close(); >>> in.close(); >>> } >>> } >>> >>> >>> >>> >>> -- >>> Grant Overby >>> Senior Developer >>> FloorSoft, Inc. >>> >>> Often people, especially computer engineers, focus on the machines. They >>> think, "By doing this, the machine will run faster. By doing this, the >>> machine will run more effectively. By doing this, the machine will >>> something >>> something something." They are focusing on machines. But in fact we need >>> to >>> focus on humans, on how humans care about doing programming or operating >>> the >>> application of the machines. We are the masters. They are the slaves. -- >>> Yukihiro Matsumoto >>> >>> >>> >>> >>> On Fri, Nov 5, 2010 at 6:45 PM, <[email protected]> wrote: >>> >>>> Yogesh, >>>> >>>> Compare the file size and hash (SHA1, MD5, etc.) of the file you >>> download >>>> from your browser with the file that Java downloads. The end of the >>> file >>>> may be missing when you download it via Java. I know you said the file >>>> size is correct, but is it the *exact* same number of bytes? If so, >>> then >>>> the content must be different, and it should just be a matter of running >>>> `diff` on the files to see what's going wrong. >>>> >>>> ---- >>>> Thanks, >>>> Adam >>>> >>>> >>>> >>>> >>>> >>>> From: >>>> Yogesh <[email protected]> >>>> To: >>>> [email protected] >>>> Cc: >>>> [email protected] >>>> Date: >>>> 11/05/2010 15:29 >>>> Subject: >>>> Re: Save URLs to PDFs? >>>> >>>> >>>> >>>> Yes. I can download the file through the browser. It works perfectly >>> fine. >>>> >>>> - Yogesh >>>> >>>> >>>> >>>> On 5 November 2010 18:25, Grant Overby <[email protected]> wrote: >>>> >>>>> If you download the file through a browser? Does it work then? >>>>> >>>>> >>>>> -- >>>>> Grant Overby >>>>> Senior Developer >>>>> FloorSoft, Inc. >>>>> >>>>> Often people, especially computer engineers, focus on the machines. >>> They >>>>> think, "By doing this, the machine will run faster. By doing this, the >>>>> machine will run more effectively. By doing this, the machine will >>>> something >>>>> something something." They are focusing on machines. But in fact we >>> need >>>> to >>>>> focus on humans, on how humans care about doing programming or >>> operating >>>> the >>>>> application of the machines. We are the masters. They are the slaves. >>> -- >>>>> Yukihiro Matsumoto >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, Nov 5, 2010 at 6:18 PM, Yogesh <[email protected]> wrote: >>>>> >>>>>> I tried with that, it writes a blank PDF. Though, the file size and >>> the >>>>>> number of pages is correct (for the new written file) >>>>>> >>>>>> - Yogesh >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 5 November 2010 18:09, Grant Overby <[email protected]> wrote: >>>>>> >>>>>>> You don't need pdfBox to do this. Below is some rough code that >>> allows >>>>>>> you >>>>>>> to download a file and save it. >>>>>>> >>>>>>> URLConnection urlConnection = new URL("http://..."); >>>>>>> InputStream in = urlConnection.getInputStream(); >>>>>>> FileWriter out = new FileWriter("my.pdf"); >>>>>>> int next = 0; >>>>>>> while ( ( next = in.read() ) != -1 ) out.write(next); >>>>>>> //close everything >>>>>>> >>>>>>> -- >>>>>>> Grant Overby >>>>>>> Senior Developer >>>>>>> FloorSoft, Inc. >>>>>>> >>>>>>> Often people, especially computer engineers, focus on the machines. >>>> They >>>>>>> think, "By doing this, the machine will run faster. By doing this, >>> the >>>>>>> machine will run more effectively. By doing this, the machine will >>>>>>> something >>>>>>> something something." They are focusing on machines. But in fact we >>>> need >>>>>>> to >>>>>>> focus on humans, on how humans care about doing programming or >>>> operating >>>>>>> the >>>>>>> application of the machines. We are the masters. They are the >>> slaves. >>>> -- >>>>>>> Yukihiro Matsumoto >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Nov 5, 2010 at 5:56 PM, Yogesh <[email protected]> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I have PDFs which I can access through URLs. I want to download >>> and >>>>>>> save it >>>>>>>> to files. How can I go about it? >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> -Yogesh >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>>> - FHA 203b; 203k; HECM; VA; USDA; Conventional >>>> - Warehouse Lines; FHA-Authorized Originators >>>> - Lending and Servicing in over 45 States >>>> www.swmc.com - www.simplehecmcalculator.com Visit >>>> www.swmc.com/resources for helpful links on Training, Webinars, >>> Lender >>>> Alerts and Submitting Conditions >>>> This email and any content within or attached hereto from Sun West >>> Mortgage >>>> Company, Inc. is confidential and/or legally privileged. The information >>> is >>>> intended only for the use of the individual or entity named on this >>> email.. >>>> If you are not the intended recipient, you are hereby notified that any >>>> disclosure, copying, distribution or taking any action in reliance on >>> the >>>> contents of this email information is strictly prohibited, and that the >>>> documents should be returned to this office immediately by email. >>> Receipt by >>>> anyone other than the intended recipient is not a waiver of any >>> privilege. >>>> Please do not include your social security number, account number, or >>> any >>>> other personal or financial information in the content of the email. >>> Should >>>> you have any questions, please call (800) 453 7884. = >>>> >>> >> >>

