Pavel Yermolenko wrote:
According to page address, that contains suffix jsp after last point '.', it
seems to be JSP page.
Also the syntax of the content of the page correspond to syntax of a JSP page.
When I type such address in browser, the jsp-code is executed on server side
and return to client some content. This content launches Acrobat reader that
visualize .pdf content in browser client window.
When I visualize the source, I can see this content - it looks like javascript.
Here is the 1st line of this content.
<script type="text/javascript"
src="http://"portal_name"/assets/vendor/jquery/jquery.js?cv=20150325_120000"
charset="utf-8"></script>
Also inside this content I can find the actual link to .pdf file.
Concerning copyright, my actions are perfectly legal: I have access to these
documents because my university subscribed to this portal.
When I download article with browser, IP address is recognized and corresponding message
"Bought by 'university_name'" is displayed on the top of page.
The problem with web-based downloading - it's very long; in contrast,
programming downloading allow accelerate considerably access to articles.
Ok then, I'll take you at your word.
From what you are saying above, the piece of code which actually downloads the PDF seems
to be a *javascipt* function. javascript is another programming language, totally
distinct from Java.
This seems to have nothing to do with Tomcat per se, not even with Java.
So I do not believe that it is appropriate to continue this discussion on the Tomcat Users
list, but if you contact me off-list, I can give you some tips, because this kind of thing
happens to be right into my own area of expertise.
Shortly :
What you are trying to do is very complex, much more complex than what you may believe at
first. So if you do not have a lot of time and/or a big budget for doing this, my first
recommendation would be : "give it up".
My second recommendation would be to examine this site carefully, or contact the people
responsible for the website, to see if they do not offer an API (for example, a web
service) to download documents. Quite a few such publisher websites do offer that (but
then, quite a few also don't). The bigger ones (like Springer, Elsevier, Wiley etc..)
generally do offer an API of some sort.
The problem with trying to analyse HTML pages which you download, to extract some specific
content, is that it puts you at the mercy of even the smallest changes that these people
may make to their website logic, which is made for human viewers, not for programs.
So in the end whatever clever programming you do, tends to become a nightmare in terms of
reliability and maintenance. (Note that even the sites which offer an API can be
problematic also, but much less so than HTML pages).
Regards
Pavel.
-----Original Message-----
From: André Warnier [mailto:a...@ice-sa.com]
Sent: vendredi 27 mars 2015 23:20
To: Tomcat Users List
Subject: Re: JSP page exploration scenario
Pavel Yermolenko wrote:
Hello André,
Why do you make it so complicated ?
Why do you not just request the link to the JSP page ? does that not return the
PDF file that you want ?
JSP page doesn't include link to .pdf.
When I "execute" such JSP page in browser (e.g. Chrome) and then see its
source, the link on .pdf does present.
What you propose works perfectly with "ordinary" pages, not with JSP.
Are you not confusing "Java applets" with "JSP pages" here ? The original meaning of JSP is
"Java Server Pages", with the word "Server" meaning that whatever execution there is, is on the
server side.
In other words, by the time the page gets to your browser, it should not contain any
"JSP code" anymore. The JSP code will have been run on the server side, and
been transformed into HTML or whatever, before it is even sent to the browser.
On the other hand, if a page contains Java Applets, these Applets will be
executed on the client/browser side, by a local JVM.
Following up on that same line, and with a lot of imagination thrown in, if
your purpose is to simulate what a local Java Applet does, to download a PDF
from the server and open it, then what you need is a protocol analyser, that
shows what goes on between the local Java Applet and the server in question,
and /that/ is what you need to simulate.
Not that I encourage you along these lines. Presumably, if someone went
through the trouble of building a website in that way, they probably do not
want people to just download their documents without going through the applet.
Ever heard of "copyright" for documents ? If not, I kindly suggest that you
seriously investigate the matter, before you even make further trials along those lines.
In some countries, even /attempting/ to do that kind of thing can land you into very
serious trouble.
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org
---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel
antivirus Avast.
http://www.avast.com
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org