Michael:
You're right. Copy / paste for the loose; I read right over it. :(
corrected code:
package s;
import java.io.FileWriter;
import java.io.InputStream;
import java.io.IOException;
import java.io.FileOutputStream;
import java.net.URL;
import java.net.URLConnection;
import java.net.MalformedURLException;
public class Main
{
public static void main(String[] args) throws IOException
{
URL url = new URL("
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2947364/pdf/msb201055.pdf?tool=pmcentrez
");
URLConnection con = url.openConnection();
InputStream in = con.getInputStream();
FileOutputStream out = new FileOutputStream("C:/My.pdf");
int next = 0;
while ( ( next = in.read() ) != -1 ) {
out.write(next);
}
out.flush();
out.close();
in.close();
}
}
--
Grant Overby
Senior Developer
FloorSoft, Inc.
Often people, especially computer engineers, focus on the machines. They
think, "By doing this, the machine will run faster. By doing this, the
machine will run more effectively. By doing this, the machine will something
something something." They are focusing on machines. But in fact we need to
focus on humans, on how humans care about doing programming or operating the
application of the machines. We are the masters. They are the slaves. --
Yukihiro Matsumoto
On Fri, Nov 5, 2010 at 7:18 PM, Michael Baehr <[email protected]>wrote:
> From the JDK docs:
>
> FileWriter is meant for writing streams of characters. For writing streams
> of raw bytes, consider using a FileOutputStream.
>
> You get characters replaced depending on your platforms character encoding.
> You must ensure you're writing bytes and not characters!
>
> Michael
>
> On 5. Nov 2010, at 18:14, Grant Overby wrote:
>
> > First difference (on second line, first line is for reference point):
> >
> > bad:
> > <</Length 1372/E 1779/Filter/FlateDecode/I 1811/L 1795/O 1741/S 1423/T
> > 1676/V 1757>>stream
> > xÚ?U LSW >?Û O)]Wä!Ô>?"CATl?4PkADy ? ?RjgÊ??< õ A
> >
> > Start of second line in hex: 78 DA 3F 55 0B 4C 53 57
> >
> > good:
> > <</Length 1372/E 1779/Filter/FlateDecode/I 1811/L 1795/O 1741/S 1423/T
> > 1676/V 1757>>stream
> > xÚ”U LSW >—Û O)]Wä!Ô>˜"CATl”4PkADy ‹ –Rjgʈˆ< õ A
> >
> > Start of second line in hex: 78 DA 94 55 0B 4C 53 57
> >
> >
> >
> >
> > Isolated incorrect single characters are throughout the document.
> > Downloading it multiple times shows consistant errors.
> >
> >
> > I'll keep thinking on it, but nothing is apparent to me. This shouldn't
> > happen afaik.
> >
> >
> > Anyone?
> >
> > --
> > Grant Overby
> > Senior Developer
> > FloorSoft, Inc.
> >
> > Often people, especially computer engineers, focus on the machines. They
> > think, "By doing this, the machine will run faster. By doing this, the
> > machine will run more effectively. By doing this, the machine will
> something
> > something something." They are focusing on machines. But in fact we need
> to
> > focus on humans, on how humans care about doing programming or operating
> the
> > application of the machines. We are the masters. They are the slaves. --
> > Yukihiro Matsumoto
> >
> >
> >
> >
> > On Fri, Nov 5, 2010 at 6:58 PM, Yogesh <[email protected]> wrote:
> >
> >> Thanks Grant.
> >> But I have thousands of PDF URLs like this. I have tried around 12 so
> far.
> >> Can all of them be corrupt?
> >>
> >> What can I do about this?
> >>
> >>
> >> - Yogesh
> >>
> >>
> >>
> >>
> >> On 5 November 2010 18:53, Grant Overby <[email protected]> wrote:
> >>
> >>> I ran the code [2]. The pdf is corrupted by the code as MD5s are
> >>> different.
> >>> File sizes are identical [1];
> >>>
> >>> 1:
> >>> 11/05/2010 06:47 PM 2,371,050 msb201055.pdf
> >>> 11/05/2010 06:46 PM 2,371,050 My.pdf
> >>>
> >>>
> >>>
> >>> 2:
> >>> package s;
> >>>
> >>> import java.io.FileWriter;
> >>> import java.io.InputStream;
> >>> import java.io.IOException;
> >>> import java.net.URL;
> >>> import java.net.URLConnection;
> >>> import java.net.MalformedURLException;
> >>>
> >>> public class Main
> >>> {
> >>> public static void main(String[] args) throws IOException
> >>> {
> >>> URL url = new URL("
> >>>
> >>>
> http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2947364/pdf/msb201055.pdf?tool=pmcentrez
> >>> ");
> >>>
> >>> URLConnection con = url.openConnection();
> >>>
> >>> InputStream in = con.getInputStream();
> >>>
> >>> FileWriter out = new FileWriter("C:/My.pdf");
> >>>
> >>> int next = 0;
> >>> while ( ( next = in.read() ) != -1 ) {
> >>> out.write(next);
> >>> }
> >>> out.flush();
> >>> out.close();
> >>> in.close();
> >>> }
> >>> }
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Grant Overby
> >>> Senior Developer
> >>> FloorSoft, Inc.
> >>>
> >>> Often people, especially computer engineers, focus on the machines.
> They
> >>> think, "By doing this, the machine will run faster. By doing this, the
> >>> machine will run more effectively. By doing this, the machine will
> >>> something
> >>> something something." They are focusing on machines. But in fact we
> need
> >>> to
> >>> focus on humans, on how humans care about doing programming or
> operating
> >>> the
> >>> application of the machines. We are the masters. They are the slaves.
> --
> >>> Yukihiro Matsumoto
> >>>
> >>>
> >>>
> >>>
> >>> On Fri, Nov 5, 2010 at 6:45 PM, <[email protected]> wrote:
> >>>
> >>>> Yogesh,
> >>>>
> >>>> Compare the file size and hash (SHA1, MD5, etc.) of the file you
> >>> download
> >>>> from your browser with the file that Java downloads. The end of the
> >>> file
> >>>> may be missing when you download it via Java. I know you said the
> file
> >>>> size is correct, but is it the *exact* same number of bytes? If so,
> >>> then
> >>>> the content must be different, and it should just be a matter of
> running
> >>>> `diff` on the files to see what's going wrong.
> >>>>
> >>>> ----
> >>>> Thanks,
> >>>> Adam
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> From:
> >>>> Yogesh <[email protected]>
> >>>> To:
> >>>> [email protected]
> >>>> Cc:
> >>>> [email protected]
> >>>> Date:
> >>>> 11/05/2010 15:29
> >>>> Subject:
> >>>> Re: Save URLs to PDFs?
> >>>>
> >>>>
> >>>>
> >>>> Yes. I can download the file through the browser. It works perfectly
> >>> fine.
> >>>>
> >>>> - Yogesh
> >>>>
> >>>>
> >>>>
> >>>> On 5 November 2010 18:25, Grant Overby <[email protected]> wrote:
> >>>>
> >>>>> If you download the file through a browser? Does it work then?
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Grant Overby
> >>>>> Senior Developer
> >>>>> FloorSoft, Inc.
> >>>>>
> >>>>> Often people, especially computer engineers, focus on the machines.
> >>> They
> >>>>> think, "By doing this, the machine will run faster. By doing this,
> the
> >>>>> machine will run more effectively. By doing this, the machine will
> >>>> something
> >>>>> something something." They are focusing on machines. But in fact we
> >>> need
> >>>> to
> >>>>> focus on humans, on how humans care about doing programming or
> >>> operating
> >>>> the
> >>>>> application of the machines. We are the masters. They are the slaves.
> >>> --
> >>>>> Yukihiro Matsumoto
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Fri, Nov 5, 2010 at 6:18 PM, Yogesh <[email protected]> wrote:
> >>>>>
> >>>>>> I tried with that, it writes a blank PDF. Though, the file size and
> >>> the
> >>>>>> number of pages is correct (for the new written file)
> >>>>>>
> >>>>>> - Yogesh
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 5 November 2010 18:09, Grant Overby <[email protected]> wrote:
> >>>>>>
> >>>>>>> You don't need pdfBox to do this. Below is some rough code that
> >>> allows
> >>>>>>> you
> >>>>>>> to download a file and save it.
> >>>>>>>
> >>>>>>> URLConnection urlConnection = new URL("http://...");
> >>>>>>> InputStream in = urlConnection.getInputStream();
> >>>>>>> FileWriter out = new FileWriter("my.pdf");
> >>>>>>> int next = 0;
> >>>>>>> while ( ( next = in.read() ) != -1 ) out.write(next);
> >>>>>>> //close everything
> >>>>>>>
> >>>>>>> --
> >>>>>>> Grant Overby
> >>>>>>> Senior Developer
> >>>>>>> FloorSoft, Inc.
> >>>>>>>
> >>>>>>> Often people, especially computer engineers, focus on the machines.
> >>>> They
> >>>>>>> think, "By doing this, the machine will run faster. By doing this,
> >>> the
> >>>>>>> machine will run more effectively. By doing this, the machine will
> >>>>>>> something
> >>>>>>> something something." They are focusing on machines. But in fact we
> >>>> need
> >>>>>>> to
> >>>>>>> focus on humans, on how humans care about doing programming or
> >>>> operating
> >>>>>>> the
> >>>>>>> application of the machines. We are the masters. They are the
> >>> slaves.
> >>>> --
> >>>>>>> Yukihiro Matsumoto
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Fri, Nov 5, 2010 at 5:56 PM, Yogesh <[email protected]>
> wrote:
> >>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> I have PDFs which I can access through URLs. I want to download
> >>> and
> >>>>>>> save it
> >>>>>>>> to files. How can I go about it?
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>>
> >>>>>>>> -Yogesh
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> - FHA 203b; 203k; HECM; VA; USDA; Conventional
> >>>> - Warehouse Lines; FHA-Authorized Originators
> >>>> - Lending and Servicing in over 45 States
> >>>> www.swmc.com - www.simplehecmcalculator.com Visit
> >>>> www.swmc.com/resources for helpful links on Training, Webinars,
> >>> Lender
> >>>> Alerts and Submitting Conditions
> >>>> This email and any content within or attached hereto from Sun West
> >>> Mortgage
> >>>> Company, Inc. is confidential and/or legally privileged. The
> information
> >>> is
> >>>> intended only for the use of the individual or entity named on this
> >>> email..
> >>>> If you are not the intended recipient, you are hereby notified that
> any
> >>>> disclosure, copying, distribution or taking any action in reliance on
> >>> the
> >>>> contents of this email information is strictly prohibited, and that
> the
> >>>> documents should be returned to this office immediately by email.
> >>> Receipt by
> >>>> anyone other than the intended recipient is not a waiver of any
> >>> privilege.
> >>>> Please do not include your social security number, account number, or
> >>> any
> >>>> other personal or financial information in the content of the email.
> >>> Should
> >>>> you have any questions, please call (800) 453 7884. =
> >>>>
> >>>
> >>
> >>
>
>