Re: JSP page exploration scenario

2015-03-29 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Pavel,

On 3/28/15 5:44 AM, Pavel Yermolenko wrote:
 Thank you for this explanation. Probably you have a reason - le jeu
 n'en vaut pas la chandelle. Before I worked in this way with
 another portal and it took me half a day to elaborate a code in C#
 that do all this job. But in that case the pages were simple -
 ordinary .html pages with links to .pdf files, easily
 identifiables. JSP-pages make this approach useless. So, I think,
 we can stop this topic.

I think what you are trying to do is doable, but I am certainly
personally having a difficult time understanding exactly what you are
trying to do.

If you were to provide a flow chart or list of steps that you want to
occur, it might be easier to understand.

- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v2
Comment: GPGTools - http://gpgtools.org

iQIcBAEBCAAGBQJVGJJeAAoJEBzwKT+lPKRYi5wP/3YK43iP0+VX+h/T0jLp5odl
3DLKs8VUTCnFwinmxRarm8NTdxdArkrUw3gco3A3+au8iWJ9caf/btrez8AruRmQ
gffwvBdBzOx3ZsHESrw+JWX86FnJ+Kmbg4Q8a2ySX79Zi6KXgUUxtC8Q3/WB3Puh
T71qCdhX9BJQQ6ZGsstXhKEsPn7EsxNk2SXCOffVqXxRZHu4u0a3C45JCfgjOXun
T+feItgAal1gFoQXLJSslmyJrdIiJx6GMkPMcBRhhl5+Ji4JyWHowkmHepa73aqN
uoy+hgHqWl/vf/kMnLSyCwc1PBIoNgirtqCY+ktY6q3toTQKuCnEMap7RB0hfSUO
rHhdlB7kDpUZXbt+nN1cUet+hL5zm31B4mQ8yNfPsdA7kpGuR5XSdhSkaYLG4yE1
z0ZpJgAtx2r6kvF4U8xIsDmZ4SZG8zAp0qEL0k112av+DjIruJYPEqWurS1FzBV7
OGQpr1EQXyYJJABgiW0TVjAvLjKAYDpOiM5JNZGggtBByiQ7WlHl9K1JJhMszwSz
ZrsdgCYtyKReNblFMkjiWpLQR6pgDIYMEjU6RH583+G6b+ggbzejNIcXRoD/Ago6
BN6BfLcsGCvsqsXE/TVg6gasqVJz/yKxaJiaoMNsJMacRxtxsEg1jTU7kNLx78Cy
yrYGVko3a9TqghX/cekN
=u3Mw
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: JSP page exploration scenario

2015-03-28 Thread André Warnier

Pavel Yermolenko wrote:

According to page address, that contains suffix jsp after last point '.', it 
seems to be JSP page.
Also the syntax of the content of the page correspond to syntax of a JSP page.

When I type such address in browser, the jsp-code is executed on server side 
and return to client some content. This content launches Acrobat reader that 
visualize .pdf content in browser client window.
When I visualize the source, I can see this content - it looks like javascript. 
Here is the 1st line of this content.
script type=text/javascript 
src=http://portal_name/assets/vendor/jquery/jquery.js?cv=20150325_12; 
charset=utf-8/script

Also inside this content I can find the actual link to .pdf file.

Concerning copyright, my actions are perfectly legal: I have access to these 
documents because my university subscribed to this portal.
When I download article with browser, IP address is recognized and corresponding message 
Bought by 'university_name' is displayed on the top of page.
The problem with web-based downloading - it's very long; in contrast, 
programming downloading allow accelerate considerably access to articles.



Ok then, I'll take you at your word.
From what you are saying above, the piece of code which actually downloads the PDF seems 
to be a *javascipt* function.  javascript is another programming language, totally 
distinct from Java.

This seems to have nothing to do with Tomcat per se, not even with Java.

So I do not believe that it is appropriate to continue this discussion on the Tomcat Users 
list, but if you contact me off-list, I can give you some tips, because this kind of thing 
happens to be right into my own area of expertise.


Shortly :
What you are trying to do is very complex, much more complex than what you may believe at 
first. So if you do not have a lot of time and/or a big budget for doing this, my first 
recommendation would be : give it up.
My second recommendation would be to examine this site carefully, or contact the people 
responsible for the website, to see if they do not offer an API (for example, a web 
service) to download documents.  Quite a few such publisher websites do offer that (but 
then, quite a few also don't). The bigger ones (like Springer, Elsevier, Wiley etc..) 
generally do offer an API of some sort.
The problem with trying to analyse HTML pages which you download, to extract some specific 
content, is that it puts you at the mercy of even the smallest changes that these people 
may make to their website logic, which is made for human viewers, not for programs.
So in the end whatever clever programming you do, tends to become a nightmare in terms of 
reliability and maintenance.  (Note that even the sites which offer an API can be 
problematic also, but much less so than HTML pages).









Regards

Pavel.

-Original Message-
From: André Warnier [mailto:a...@ice-sa.com] 
Sent: vendredi 27 mars 2015 23:20

To: Tomcat Users List
Subject: Re: JSP page exploration scenario

Pavel Yermolenko wrote:

Hello André,

 


Why do you make it so complicated ?

Why do you not just request the link to the JSP page ? does that not return the 
PDF file that you want ?

 


JSP page doesn't include link to .pdf.

When I execute such JSP page in browser (e.g. Chrome) and then see its 
source, the link on .pdf does present.

 


What you propose works perfectly with ordinary pages, not with JSP.



Are you not confusing Java applets with JSP pages here ? The original meaning of JSP is 
Java Server Pages, with the word Server meaning that whatever execution there is, is on the 
server side.
In other words, by the time the page gets to your browser, it should not contain any 
JSP code anymore.  The JSP code will have been run on the server side, and 
been transformed into HTML or whatever, before it is even sent to the browser.

On the other hand, if a page contains Java Applets, these Applets will be 
executed on the client/browser side, by a local JVM.

Following up on that same line, and with a lot of imagination thrown in, if 
your purpose is to simulate what a local Java Applet does, to download a PDF 
from the server and open it, then what you need is a protocol analyser, that 
shows what goes on between the local Java Applet and the server in question, 
and /that/ is what you need to simulate.

Not that I encourage you along these lines.  Presumably, if someone went 
through the trouble of building a website in that way, they probably do not 
want people to just download their documents without going through the applet.
Ever heard of copyright for documents ? If not, I kindly suggest that you 
seriously investigate the matter, before you even make further trials along those lines.  
In some countries, even /attempting/ to do that kind of thing can land you into very 
serious trouble.



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional

RE: JSP page exploration scenario

2015-03-28 Thread Pavel Yermolenko
André,

Thank you for this explanation. Probably you have a reason - le jeu n'en vaut 
pas la chandelle.
Before I worked in this way with another portal and it took me half a day to 
elaborate a code in C# that do all this job.
But in that case the pages were simple - ordinary .html pages with links to 
.pdf files, easily identifiables.
JSP-pages make this approach useless.
So, I think, we can stop this topic.

Once more, thanks a lot for assistance.

Regards

Pavel

-Original Message-
From: André Warnier [mailto:a...@ice-sa.com]
Sent: samedi 28 mars 2015 10:24
To: Tomcat Users List
Subject: Re: JSP page exploration scenario

Pavel Yermolenko wrote:
 According to page address, that contains suffix jsp after last point '.', it 
 seems to be JSP page.
 Also the syntax of the content of the page correspond to syntax of a JSP page.

 When I type such address in browser, the jsp-code is executed on server side 
 and return to client some content. This content launches Acrobat reader that 
 visualize .pdf content in browser client window.
 When I visualize the source, I can see this content - it looks like 
 javascript. Here is the 1st line of this content.
 script type=text/javascript
 src=http://portal_name/assets/vendor/jquery/jquery.js?cv=20150325_1
 2 charset=utf-8/script

 Also inside this content I can find the actual link to .pdf file.

 Concerning copyright, my actions are perfectly legal: I have access to these 
 documents because my university subscribed to this portal.
 When I download article with browser, IP address is recognized and 
 corresponding message Bought by 'university_name' is displayed on the top 
 of page.
 The problem with web-based downloading - it's very long; in contrast, 
 programming downloading allow accelerate considerably access to articles.


Ok then, I'll take you at your word.
 From what you are saying above, the piece of code which actually downloads the 
PDF seems to be a *javascipt* function.  javascript is another programming 
language, totally distinct from Java.
This seems to have nothing to do with Tomcat per se, not even with Java.

So I do not believe that it is appropriate to continue this discussion on the 
Tomcat Users list, but if you contact me off-list, I can give you some tips, 
because this kind of thing happens to be right into my own area of expertise.

Shortly :
What you are trying to do is very complex, much more complex than what you may 
believe at first. So if you do not have a lot of time and/or a big budget for 
doing this, my first recommendation would be : give it up.
My second recommendation would be to examine this site carefully, or contact 
the people responsible for the website, to see if they do not offer an API (for 
example, a web
service) to download documents.  Quite a few such publisher websites do offer 
that (but then, quite a few also don't). The bigger ones (like Springer, 
Elsevier, Wiley etc..) generally do offer an API of some sort.
The problem with trying to analyse HTML pages which you download, to extract 
some specific content, is that it puts you at the mercy of even the smallest 
changes that these people may make to their website logic, which is made for 
human viewers, not for programs.
So in the end whatever clever programming you do, tends to become a nightmare 
in terms of reliability and maintenance.  (Note that even the sites which offer 
an API can be problematic also, but much less so than HTML pages).







 Regards

 Pavel.

 -Original Message-
 From: André Warnier [mailto:a...@ice-sa.com]
 Sent: vendredi 27 mars 2015 23:20
 To: Tomcat Users List
 Subject: Re: JSP page exploration scenario

 Pavel Yermolenko wrote:
 Hello André,



 Why do you make it so complicated ?

 Why do you not just request the link to the JSP page ? does that not return 
 the PDF file that you want ?



 JSP page doesn't include link to .pdf.

 When I execute such JSP page in browser (e.g. Chrome) and then see its 
 source, the link on .pdf does present.



 What you propose works perfectly with ordinary pages, not with JSP.


 Are you not confusing Java applets with JSP pages here ? The original 
 meaning of JSP is Java Server Pages, with the word Server meaning that 
 whatever execution there is, is on the server side.
 In other words, by the time the page gets to your browser, it should not 
 contain any JSP code anymore.  The JSP code will have been run on the 
 server side, and been transformed into HTML or whatever, before it is even 
 sent to the browser.

 On the other hand, if a page contains Java Applets, these Applets will be 
 executed on the client/browser side, by a local JVM.

 Following up on that same line, and with a lot of imagination thrown in, if 
 your purpose is to simulate what a local Java Applet does, to download a PDF 
 from the server and open it, then what you need is a protocol analyser, that 
 shows what goes on between the local Java Applet and the server in question

RE: JSP page exploration scenario

2015-03-27 Thread Pavel Yermolenko
Hello André,



Why do you make it so complicated ?

Why do you not just request the link to the JSP page ? does that not return the 
PDF file that you want ?



JSP page doesn't include link to .pdf.

When I execute such JSP page in browser (e.g. Chrome) and then see its 
source, the link on .pdf does present.



What you propose works perfectly with ordinary pages, not with JSP.



Regards



Pavel



-Original Message-
From: André Warnier [mailto:a...@ice-sa.com]
Sent: vendredi 27 mars 2015 21:37
To: Tomcat Users List
Subject: Re: JSP page exploration scenario



Pavel Yermolenko wrote:

 Hello,







 For acceleration to getting articles (.pdf files) from some portal I'm

 trying to realize following scenario in Java application:







 1.  Initial page is read into a string object STRING1



 2.  STRING1 is analyzed and array ARR1 of links, associated to articles,

 is built



 3.  Unfortunately these links (from ARR1) aren't links to simple pages,

 but links to Java Server Page (JSP) pages, their content can't be

 accessed using dedicated classes (e.g. HttpWebRequest, WebClient)



 4.  So, the idea is to use Tomcat for running JSP links locally and then

 extract in some way the real links to the .pdf files, . or access

 directly to .pdf content that is captured by Tomcat







 Any comments about realization of such scenario are welcome.



Why do you make it so complicated ?

Why do you not just request the link to the JSP page ? does that not return the 
PDF file that you want ?



-

To unsubscribe, e-mail:  mailto:users-unsubscr...@tomcat.apache.org 
users-unsubscr...@tomcat.apache.org

For additional commands, e-mail:  mailto:users-h...@tomcat.apache.org 
users-h...@tomcat.apache.org



---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
http://www.avast.com


Re: JSP page exploration scenario

2015-03-27 Thread André Warnier

Pavel Yermolenko wrote:

Hello,

 


For acceleration to getting articles (.pdf files) from some portal I'm
trying to realize following scenario in Java application:

 


1.  Initial page is read into a string object STRING1

2.  STRING1 is analyzed and array ARR1 of links, associated to articles,
is built

3.  Unfortunately these links (from ARR1) aren't links to simple pages,
but links to Java Server Page (JSP) pages, their content can't be accessed
using dedicated classes (e.g. HttpWebRequest, WebClient)

4.  So, the idea is to use Tomcat for running JSP links locally and then
extract in some way the real links to the .pdf files, . or access directly
to .pdf content that is captured by Tomcat

 


Any comments about realization of such scenario are welcome.


Why do you make it so complicated ?
Why do you not just request the link to the JSP page ? does that not return the PDF file 
that you want ?


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: JSP page exploration scenario

2015-03-27 Thread André Warnier

Pavel Yermolenko wrote:

Hello André,

 


Why do you make it so complicated ?

Why do you not just request the link to the JSP page ? does that not return the 
PDF file that you want ?

 


JSP page doesn't include link to .pdf.

When I execute such JSP page in browser (e.g. Chrome) and then see its 
source, the link on .pdf does present.

 


What you propose works perfectly with ordinary pages, not with JSP.



Are you not confusing Java applets with JSP pages here ? The original meaning of JSP 
is Java Server Pages, with the word Server meaning that whatever execution there is, 
is on the server side.
In other words, by the time the page gets to your browser, it should not contain any JSP 
code anymore.  The JSP code will have been run on the server side, and been transformed 
into HTML or whatever, before it is even sent to the browser.


On the other hand, if a page contains Java Applets, these Applets will be executed on the 
client/browser side, by a local JVM.


Following up on that same line, and with a lot of imagination thrown in, if your purpose 
is to simulate what a local Java Applet does, to download a PDF from the server and open 
it, then what you need is a protocol analyser, that shows what goes on between the local 
Java Applet and the server in question, and /that/ is what you need to simulate.


Not that I encourage you along these lines.  Presumably, if someone went through the 
trouble of building a website in that way, they probably do not want people to just 
download their documents without going through the applet.
Ever heard of copyright for documents ? If not, I kindly suggest that you seriously 
investigate the matter, before you even make further trials along those lines.  In some 
countries, even /attempting/ to do that kind of thing can land you into very serious trouble.




-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: JSP page exploration scenario

2015-03-27 Thread Pavel Yermolenko
According to page address, that contains suffix jsp after last point '.', it 
seems to be JSP page.
Also the syntax of the content of the page correspond to syntax of a JSP page.

When I type such address in browser, the jsp-code is executed on server side 
and return to client some content. This content launches Acrobat reader that 
visualize .pdf content in browser client window.
When I visualize the source, I can see this content - it looks like javascript. 
Here is the 1st line of this content.
script type=text/javascript 
src=http://portal_name/assets/vendor/jquery/jquery.js?cv=20150325_12; 
charset=utf-8/script

Also inside this content I can find the actual link to .pdf file.

Concerning copyright, my actions are perfectly legal: I have access to these 
documents because my university subscribed to this portal.
When I download article with browser, IP address is recognized and 
corresponding message Bought by 'university_name' is displayed on the top of 
page.
The problem with web-based downloading - it's very long; in contrast, 
programming downloading allow accelerate considerably access to articles.

Regards

Pavel.

-Original Message-
From: André Warnier [mailto:a...@ice-sa.com]
Sent: vendredi 27 mars 2015 23:20
To: Tomcat Users List
Subject: Re: JSP page exploration scenario

Pavel Yermolenko wrote:
 Hello André,



 Why do you make it so complicated ?

 Why do you not just request the link to the JSP page ? does that not return 
 the PDF file that you want ?



 JSP page doesn't include link to .pdf.

 When I execute such JSP page in browser (e.g. Chrome) and then see its 
 source, the link on .pdf does present.



 What you propose works perfectly with ordinary pages, not with JSP.


Are you not confusing Java applets with JSP pages here ? The original 
meaning of JSP is Java Server Pages, with the word Server meaning that 
whatever execution there is, is on the server side.
In other words, by the time the page gets to your browser, it should not 
contain any JSP code anymore.  The JSP code will have been run on the server 
side, and been transformed into HTML or whatever, before it is even sent to the 
browser.

On the other hand, if a page contains Java Applets, these Applets will be 
executed on the client/browser side, by a local JVM.

Following up on that same line, and with a lot of imagination thrown in, if 
your purpose is to simulate what a local Java Applet does, to download a PDF 
from the server and open it, then what you need is a protocol analyser, that 
shows what goes on between the local Java Applet and the server in question, 
and /that/ is what you need to simulate.

Not that I encourage you along these lines.  Presumably, if someone went 
through the trouble of building a website in that way, they probably do not 
want people to just download their documents without going through the applet.
Ever heard of copyright for documents ? If not, I kindly suggest that you 
seriously investigate the matter, before you even make further trials along 
those lines.  In some countries, even /attempting/ to do that kind of thing can 
land you into very serious trouble.



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org


---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
http://www.avast.com


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org