Re: JSP page exploration scenario

André Warnier Sat, 28 Mar 2015 02:26:00 -0700

Pavel Yermolenko wrote:

According to page address, that contains suffix jsp after last point '.', it 
seems to be JSP page.
Also the syntax of the content of the page correspond to syntax of a JSP page.


When I type such address in browser, the jsp-code is executed on server side 
and return to client some content. This content launches Acrobat reader that 
visualize .pdf content in browser client window.
When I visualize the source, I can see this content - it looks like javascript. 
Here is the 1st line of this content.
<script type="text/javascript" 
src="http://"portal_name"/assets/vendor/jquery/jquery.js?cv=20150325_120000"; 
charset="utf-8"></script>

Also inside this content I can find the actual link to .pdf file.

Concerning copyright, my actions are perfectly legal: I have access to these 
documents because my university subscribed to this portal.
When I download article with browser, IP address is recognized and corresponding message 
"Bought by 'university_name'" is displayed on the top of page.
The problem with web-based downloading - it's very long; in contrast, 
programming downloading allow accelerate considerably access to articles.


Ok then, I'll take you at your word.

From what you are saying above, the piece of code which actually downloads the PDF seemsto be a *javascipt* function. javascript is another programming language, totallydistinct from Java.

This seems to have nothing to do with Tomcat per se, not even with Java.

So I do not believe that it is appropriate to continue this discussion on the Tomcat Userslist, but if you contact me off-list, I can give you some tips, because this kind of thinghappens to be right into my own area of expertise.


Shortly :

What you are trying to do is very complex, much more complex than what you may believe atfirst. So if you do not have a lot of time and/or a big budget for doing this, my firstrecommendation would be : "give it up".My second recommendation would be to examine this site carefully, or contact the peopleresponsible for the website, to see if they do not offer an API (for example, a webservice) to download documents. Quite a few such publisher websites do offer that (butthen, quite a few also don't). The bigger ones (like Springer, Elsevier, Wiley etc..)generally do offer an API of some sort.The problem with trying to analyse HTML pages which you download, to extract some specificcontent, is that it puts you at the mercy of even the smallest changes that these peoplemay make to their website logic, which is made for human viewers, not for programs.So in the end whatever clever programming you do, tends to become a nightmare in terms ofreliability and maintenance. (Note that even the sites which offer an API can beproblematic also, but much less so than HTML pages).

Regards

Pavel.

-----Original Message-----

From: André Warnier [mailto:a...@ice-sa.com]Sent: vendredi 27 mars 2015 23:20

To: Tomcat Users List
Subject: Re: JSP page exploration scenario

Pavel Yermolenko wrote:

Hello André,

Why do you make it so complicated ?

Why do you not just request the link to the JSP page ? does that not return the 
PDF file that you want ?

JSP page doesn't include link to .pdf.

When I "execute" such JSP page in browser (e.g. Chrome) and then see its 
source, the link on .pdf does present.

What you propose works perfectly with "ordinary" pages, not with JSP.

Are you not confusing "Java applets" with "JSP pages" here ? The original meaning of JSP is
"Java Server Pages", with the word "Server" meaning that whatever execution there is, is on the
server side.
In other words, by the time the page gets to your browser, it should not contain any
"JSP code" anymore. The JSP code will have been run on the server side, and
been transformed into HTML or whatever, before it is even sent to the browser.

On the other hand, if a page contains Java Applets, these Applets will be
executed on the client/browser side, by a local JVM.

Following up on that same line, and with a lot of imagination thrown in, if
your purpose is to simulate what a local Java Applet does, to download a PDF
from the server and open it, then what you need is a protocol analyser, that
shows what goes on between the local Java Applet and the server in question,
and /that/ is what you need to simulate.

Not that I encourage you along these lines. Presumably, if someone went
through the trouble of building a website in that way, they probably do not
want people to just download their documents without going through the applet.
Ever heard of "copyright" for documents ? If not, I kindly suggest that you
seriously investigate the matter, before you even make further trials along those lines.
In some countries, even /attempting/ to do that kind of thing can land you into very
serious trouble.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel
antivirus Avast.
http://www.avast.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: JSP page exploration scenario

Reply via email to