Thanks Grant. But I have thousands of PDF URLs like this. I have tried around 12 so far. Can all of them be corrupt?
What can I do about this? - Yogesh On 5 November 2010 18:53, Grant Overby <[email protected]> wrote: > I ran the code [2]. The pdf is corrupted by the code as MD5s are different. > File sizes are identical [1]; > > 1: > 11/05/2010 06:47 PM 2,371,050 msb201055.pdf > 11/05/2010 06:46 PM 2,371,050 My.pdf > > > > 2: > package s; > > import java.io.FileWriter; > import java.io.InputStream; > import java.io.IOException; > import java.net.URL; > import java.net.URLConnection; > import java.net.MalformedURLException; > > public class Main > { > public static void main(String[] args) throws IOException > { > URL url = new URL(" > > http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2947364/pdf/msb201055.pdf?tool=pmcentrez > "); > > URLConnection con = url.openConnection(); > > InputStream in = con.getInputStream(); > > FileWriter out = new FileWriter("C:/My.pdf"); > > int next = 0; > while ( ( next = in.read() ) != -1 ) { > out.write(next); > } > out.flush(); > out.close(); > in.close(); > } > } > > > > > -- > Grant Overby > Senior Developer > FloorSoft, Inc. > > Often people, especially computer engineers, focus on the machines. They > think, "By doing this, the machine will run faster. By doing this, the > machine will run more effectively. By doing this, the machine will > something > something something." They are focusing on machines. But in fact we need to > focus on humans, on how humans care about doing programming or operating > the > application of the machines. We are the masters. They are the slaves. -- > Yukihiro Matsumoto > > > > > On Fri, Nov 5, 2010 at 6:45 PM, <[email protected]> wrote: > > > Yogesh, > > > > Compare the file size and hash (SHA1, MD5, etc.) of the file you download > > from your browser with the file that Java downloads. The end of the file > > may be missing when you download it via Java. I know you said the file > > size is correct, but is it the *exact* same number of bytes? If so, then > > the content must be different, and it should just be a matter of running > > `diff` on the files to see what's going wrong. > > > > ---- > > Thanks, > > Adam > > > > > > > > > > > > From: > > Yogesh <[email protected]> > > To: > > [email protected] > > Cc: > > [email protected] > > Date: > > 11/05/2010 15:29 > > Subject: > > Re: Save URLs to PDFs? > > > > > > > > Yes. I can download the file through the browser. It works perfectly > fine. > > > > - Yogesh > > > > > > > > On 5 November 2010 18:25, Grant Overby <[email protected]> wrote: > > > > > If you download the file through a browser? Does it work then? > > > > > > > > > -- > > > Grant Overby > > > Senior Developer > > > FloorSoft, Inc. > > > > > > Often people, especially computer engineers, focus on the machines. > They > > > think, "By doing this, the machine will run faster. By doing this, the > > > machine will run more effectively. By doing this, the machine will > > something > > > something something." They are focusing on machines. But in fact we > need > > to > > > focus on humans, on how humans care about doing programming or > operating > > the > > > application of the machines. We are the masters. They are the slaves. > -- > > > Yukihiro Matsumoto > > > > > > > > > > > > > > > On Fri, Nov 5, 2010 at 6:18 PM, Yogesh <[email protected]> wrote: > > > > > >> I tried with that, it writes a blank PDF. Though, the file size and > the > > >> number of pages is correct (for the new written file) > > >> > > >> - Yogesh > > >> > > >> > > >> > > >> > > >> On 5 November 2010 18:09, Grant Overby <[email protected]> wrote: > > >> > > >>> You don't need pdfBox to do this. Below is some rough code that > allows > > >>> you > > >>> to download a file and save it. > > >>> > > >>> URLConnection urlConnection = new URL("http://..."); > > >>> InputStream in = urlConnection.getInputStream(); > > >>> FileWriter out = new FileWriter("my.pdf"); > > >>> int next = 0; > > >>> while ( ( next = in.read() ) != -1 ) out.write(next); > > >>> //close everything > > >>> > > >>> -- > > >>> Grant Overby > > >>> Senior Developer > > >>> FloorSoft, Inc. > > >>> > > >>> Often people, especially computer engineers, focus on the machines. > > They > > >>> think, "By doing this, the machine will run faster. By doing this, > the > > >>> machine will run more effectively. By doing this, the machine will > > >>> something > > >>> something something." They are focusing on machines. But in fact we > > need > > >>> to > > >>> focus on humans, on how humans care about doing programming or > > operating > > >>> the > > >>> application of the machines. We are the masters. They are the slaves. > > -- > > >>> Yukihiro Matsumoto > > >>> > > >>> > > >>> > > >>> > > >>> On Fri, Nov 5, 2010 at 5:56 PM, Yogesh <[email protected]> wrote: > > >>> > > >>> > Hi, > > >>> > > > >>> > I have PDFs which I can access through URLs. I want to download and > > >>> save it > > >>> > to files. How can I go about it? > > >>> > > > >>> > Thanks > > >>> > > > >>> > -Yogesh > > >>> > > > >>> > > >> > > >> > > > > > > > > > > > - FHA 203b; 203k; HECM; VA; USDA; Conventional > > - Warehouse Lines; FHA-Authorized Originators > > - Lending and Servicing in over 45 States > > www.swmc.com - www.simplehecmcalculator.com Visit > > www.swmc.com/resources for helpful links on Training, Webinars, Lender > > Alerts and Submitting Conditions > > This email and any content within or attached hereto from Sun West > Mortgage > > Company, Inc. is confidential and/or legally privileged. The information > is > > intended only for the use of the individual or entity named on this > email.. > > If you are not the intended recipient, you are hereby notified that any > > disclosure, copying, distribution or taking any action in reliance on the > > contents of this email information is strictly prohibited, and that the > > documents should be returned to this office immediately by email. Receipt > by > > anyone other than the intended recipient is not a waiver of any > privilege. > > Please do not include your social security number, account number, or any > > other personal or financial information in the content of the email. > Should > > you have any questions, please call (800) 453 7884. = > > >

