Re: JSP page exploration scenario
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Pavel, On 3/28/15 5:44 AM, Pavel Yermolenko wrote: Thank you for this explanation. Probably you have a reason - le jeu n'en vaut pas la chandelle. Before I worked in this way with another portal and it took me half a day to elaborate a code in C# that do all this job. But in that case the pages were simple - ordinary .html pages with links to .pdf files, easily identifiables. JSP-pages make this approach useless. So, I think, we can stop this topic. I think what you are trying to do is doable, but I am certainly personally having a difficult time understanding exactly what you are trying to do. If you were to provide a flow chart or list of steps that you want to occur, it might be easier to understand. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v2 Comment: GPGTools - http://gpgtools.org iQIcBAEBCAAGBQJVGJJeAAoJEBzwKT+lPKRYi5wP/3YK43iP0+VX+h/T0jLp5odl 3DLKs8VUTCnFwinmxRarm8NTdxdArkrUw3gco3A3+au8iWJ9caf/btrez8AruRmQ gffwvBdBzOx3ZsHESrw+JWX86FnJ+Kmbg4Q8a2ySX79Zi6KXgUUxtC8Q3/WB3Puh T71qCdhX9BJQQ6ZGsstXhKEsPn7EsxNk2SXCOffVqXxRZHu4u0a3C45JCfgjOXun T+feItgAal1gFoQXLJSslmyJrdIiJx6GMkPMcBRhhl5+Ji4JyWHowkmHepa73aqN uoy+hgHqWl/vf/kMnLSyCwc1PBIoNgirtqCY+ktY6q3toTQKuCnEMap7RB0hfSUO rHhdlB7kDpUZXbt+nN1cUet+hL5zm31B4mQ8yNfPsdA7kpGuR5XSdhSkaYLG4yE1 z0ZpJgAtx2r6kvF4U8xIsDmZ4SZG8zAp0qEL0k112av+DjIruJYPEqWurS1FzBV7 OGQpr1EQXyYJJABgiW0TVjAvLjKAYDpOiM5JNZGggtBByiQ7WlHl9K1JJhMszwSz ZrsdgCYtyKReNblFMkjiWpLQR6pgDIYMEjU6RH583+G6b+ggbzejNIcXRoD/Ago6 BN6BfLcsGCvsqsXE/TVg6gasqVJz/yKxaJiaoMNsJMacRxtxsEg1jTU7kNLx78Cy yrYGVko3a9TqghX/cekN =u3Mw -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: JSP page exploration scenario
Pavel Yermolenko wrote: According to page address, that contains suffix jsp after last point '.', it seems to be JSP page. Also the syntax of the content of the page correspond to syntax of a JSP page. When I type such address in browser, the jsp-code is executed on server side and return to client some content. This content launches Acrobat reader that visualize .pdf content in browser client window. When I visualize the source, I can see this content - it looks like javascript. Here is the 1st line of this content. script type=text/javascript src=http://portal_name/assets/vendor/jquery/jquery.js?cv=20150325_12; charset=utf-8/script Also inside this content I can find the actual link to .pdf file. Concerning copyright, my actions are perfectly legal: I have access to these documents because my university subscribed to this portal. When I download article with browser, IP address is recognized and corresponding message Bought by 'university_name' is displayed on the top of page. The problem with web-based downloading - it's very long; in contrast, programming downloading allow accelerate considerably access to articles. Ok then, I'll take you at your word. From what you are saying above, the piece of code which actually downloads the PDF seems to be a *javascipt* function. javascript is another programming language, totally distinct from Java. This seems to have nothing to do with Tomcat per se, not even with Java. So I do not believe that it is appropriate to continue this discussion on the Tomcat Users list, but if you contact me off-list, I can give you some tips, because this kind of thing happens to be right into my own area of expertise. Shortly : What you are trying to do is very complex, much more complex than what you may believe at first. So if you do not have a lot of time and/or a big budget for doing this, my first recommendation would be : give it up. My second recommendation would be to examine this site carefully, or contact the people responsible for the website, to see if they do not offer an API (for example, a web service) to download documents. Quite a few such publisher websites do offer that (but then, quite a few also don't). The bigger ones (like Springer, Elsevier, Wiley etc..) generally do offer an API of some sort. The problem with trying to analyse HTML pages which you download, to extract some specific content, is that it puts you at the mercy of even the smallest changes that these people may make to their website logic, which is made for human viewers, not for programs. So in the end whatever clever programming you do, tends to become a nightmare in terms of reliability and maintenance. (Note that even the sites which offer an API can be problematic also, but much less so than HTML pages). Regards Pavel. -Original Message- From: André Warnier [mailto:a...@ice-sa.com] Sent: vendredi 27 mars 2015 23:20 To: Tomcat Users List Subject: Re: JSP page exploration scenario Pavel Yermolenko wrote: Hello André, Why do you make it so complicated ? Why do you not just request the link to the JSP page ? does that not return the PDF file that you want ? JSP page doesn't include link to .pdf. When I execute such JSP page in browser (e.g. Chrome) and then see its source, the link on .pdf does present. What you propose works perfectly with ordinary pages, not with JSP. Are you not confusing Java applets with JSP pages here ? The original meaning of JSP is Java Server Pages, with the word Server meaning that whatever execution there is, is on the server side. In other words, by the time the page gets to your browser, it should not contain any JSP code anymore. The JSP code will have been run on the server side, and been transformed into HTML or whatever, before it is even sent to the browser. On the other hand, if a page contains Java Applets, these Applets will be executed on the client/browser side, by a local JVM. Following up on that same line, and with a lot of imagination thrown in, if your purpose is to simulate what a local Java Applet does, to download a PDF from the server and open it, then what you need is a protocol analyser, that shows what goes on between the local Java Applet and the server in question, and /that/ is what you need to simulate. Not that I encourage you along these lines. Presumably, if someone went through the trouble of building a website in that way, they probably do not want people to just download their documents without going through the applet. Ever heard of copyright for documents ? If not, I kindly suggest that you seriously investigate the matter, before you even make further trials along those lines. In some countries, even /attempting/ to do that kind of thing can land you into very serious trouble. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional
RE: JSP page exploration scenario
André, Thank you for this explanation. Probably you have a reason - le jeu n'en vaut pas la chandelle. Before I worked in this way with another portal and it took me half a day to elaborate a code in C# that do all this job. But in that case the pages were simple - ordinary .html pages with links to .pdf files, easily identifiables. JSP-pages make this approach useless. So, I think, we can stop this topic. Once more, thanks a lot for assistance. Regards Pavel -Original Message- From: André Warnier [mailto:a...@ice-sa.com] Sent: samedi 28 mars 2015 10:24 To: Tomcat Users List Subject: Re: JSP page exploration scenario Pavel Yermolenko wrote: According to page address, that contains suffix jsp after last point '.', it seems to be JSP page. Also the syntax of the content of the page correspond to syntax of a JSP page. When I type such address in browser, the jsp-code is executed on server side and return to client some content. This content launches Acrobat reader that visualize .pdf content in browser client window. When I visualize the source, I can see this content - it looks like javascript. Here is the 1st line of this content. script type=text/javascript src=http://portal_name/assets/vendor/jquery/jquery.js?cv=20150325_1 2 charset=utf-8/script Also inside this content I can find the actual link to .pdf file. Concerning copyright, my actions are perfectly legal: I have access to these documents because my university subscribed to this portal. When I download article with browser, IP address is recognized and corresponding message Bought by 'university_name' is displayed on the top of page. The problem with web-based downloading - it's very long; in contrast, programming downloading allow accelerate considerably access to articles. Ok then, I'll take you at your word. From what you are saying above, the piece of code which actually downloads the PDF seems to be a *javascipt* function. javascript is another programming language, totally distinct from Java. This seems to have nothing to do with Tomcat per se, not even with Java. So I do not believe that it is appropriate to continue this discussion on the Tomcat Users list, but if you contact me off-list, I can give you some tips, because this kind of thing happens to be right into my own area of expertise. Shortly : What you are trying to do is very complex, much more complex than what you may believe at first. So if you do not have a lot of time and/or a big budget for doing this, my first recommendation would be : give it up. My second recommendation would be to examine this site carefully, or contact the people responsible for the website, to see if they do not offer an API (for example, a web service) to download documents. Quite a few such publisher websites do offer that (but then, quite a few also don't). The bigger ones (like Springer, Elsevier, Wiley etc..) generally do offer an API of some sort. The problem with trying to analyse HTML pages which you download, to extract some specific content, is that it puts you at the mercy of even the smallest changes that these people may make to their website logic, which is made for human viewers, not for programs. So in the end whatever clever programming you do, tends to become a nightmare in terms of reliability and maintenance. (Note that even the sites which offer an API can be problematic also, but much less so than HTML pages). Regards Pavel. -Original Message- From: André Warnier [mailto:a...@ice-sa.com] Sent: vendredi 27 mars 2015 23:20 To: Tomcat Users List Subject: Re: JSP page exploration scenario Pavel Yermolenko wrote: Hello André, Why do you make it so complicated ? Why do you not just request the link to the JSP page ? does that not return the PDF file that you want ? JSP page doesn't include link to .pdf. When I execute such JSP page in browser (e.g. Chrome) and then see its source, the link on .pdf does present. What you propose works perfectly with ordinary pages, not with JSP. Are you not confusing Java applets with JSP pages here ? The original meaning of JSP is Java Server Pages, with the word Server meaning that whatever execution there is, is on the server side. In other words, by the time the page gets to your browser, it should not contain any JSP code anymore. The JSP code will have been run on the server side, and been transformed into HTML or whatever, before it is even sent to the browser. On the other hand, if a page contains Java Applets, these Applets will be executed on the client/browser side, by a local JVM. Following up on that same line, and with a lot of imagination thrown in, if your purpose is to simulate what a local Java Applet does, to download a PDF from the server and open it, then what you need is a protocol analyser, that shows what goes on between the local Java Applet and the server in question
RE: JSP page exploration scenario
Hello André, Why do you make it so complicated ? Why do you not just request the link to the JSP page ? does that not return the PDF file that you want ? JSP page doesn't include link to .pdf. When I execute such JSP page in browser (e.g. Chrome) and then see its source, the link on .pdf does present. What you propose works perfectly with ordinary pages, not with JSP. Regards Pavel -Original Message- From: André Warnier [mailto:a...@ice-sa.com] Sent: vendredi 27 mars 2015 21:37 To: Tomcat Users List Subject: Re: JSP page exploration scenario Pavel Yermolenko wrote: Hello, For acceleration to getting articles (.pdf files) from some portal I'm trying to realize following scenario in Java application: 1. Initial page is read into a string object STRING1 2. STRING1 is analyzed and array ARR1 of links, associated to articles, is built 3. Unfortunately these links (from ARR1) aren't links to simple pages, but links to Java Server Page (JSP) pages, their content can't be accessed using dedicated classes (e.g. HttpWebRequest, WebClient) 4. So, the idea is to use Tomcat for running JSP links locally and then extract in some way the real links to the .pdf files, . or access directly to .pdf content that is captured by Tomcat Any comments about realization of such scenario are welcome. Why do you make it so complicated ? Why do you not just request the link to the JSP page ? does that not return the PDF file that you want ? - To unsubscribe, e-mail: mailto:users-unsubscr...@tomcat.apache.org users-unsubscr...@tomcat.apache.org For additional commands, e-mail: mailto:users-h...@tomcat.apache.org users-h...@tomcat.apache.org --- L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. http://www.avast.com
Re: JSP page exploration scenario
Pavel Yermolenko wrote: Hello, For acceleration to getting articles (.pdf files) from some portal I'm trying to realize following scenario in Java application: 1. Initial page is read into a string object STRING1 2. STRING1 is analyzed and array ARR1 of links, associated to articles, is built 3. Unfortunately these links (from ARR1) aren't links to simple pages, but links to Java Server Page (JSP) pages, their content can't be accessed using dedicated classes (e.g. HttpWebRequest, WebClient) 4. So, the idea is to use Tomcat for running JSP links locally and then extract in some way the real links to the .pdf files, . or access directly to .pdf content that is captured by Tomcat Any comments about realization of such scenario are welcome. Why do you make it so complicated ? Why do you not just request the link to the JSP page ? does that not return the PDF file that you want ? - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: JSP page exploration scenario
Pavel Yermolenko wrote: Hello André, Why do you make it so complicated ? Why do you not just request the link to the JSP page ? does that not return the PDF file that you want ? JSP page doesn't include link to .pdf. When I execute such JSP page in browser (e.g. Chrome) and then see its source, the link on .pdf does present. What you propose works perfectly with ordinary pages, not with JSP. Are you not confusing Java applets with JSP pages here ? The original meaning of JSP is Java Server Pages, with the word Server meaning that whatever execution there is, is on the server side. In other words, by the time the page gets to your browser, it should not contain any JSP code anymore. The JSP code will have been run on the server side, and been transformed into HTML or whatever, before it is even sent to the browser. On the other hand, if a page contains Java Applets, these Applets will be executed on the client/browser side, by a local JVM. Following up on that same line, and with a lot of imagination thrown in, if your purpose is to simulate what a local Java Applet does, to download a PDF from the server and open it, then what you need is a protocol analyser, that shows what goes on between the local Java Applet and the server in question, and /that/ is what you need to simulate. Not that I encourage you along these lines. Presumably, if someone went through the trouble of building a website in that way, they probably do not want people to just download their documents without going through the applet. Ever heard of copyright for documents ? If not, I kindly suggest that you seriously investigate the matter, before you even make further trials along those lines. In some countries, even /attempting/ to do that kind of thing can land you into very serious trouble. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: JSP page exploration scenario
According to page address, that contains suffix jsp after last point '.', it seems to be JSP page. Also the syntax of the content of the page correspond to syntax of a JSP page. When I type such address in browser, the jsp-code is executed on server side and return to client some content. This content launches Acrobat reader that visualize .pdf content in browser client window. When I visualize the source, I can see this content - it looks like javascript. Here is the 1st line of this content. script type=text/javascript src=http://portal_name/assets/vendor/jquery/jquery.js?cv=20150325_12; charset=utf-8/script Also inside this content I can find the actual link to .pdf file. Concerning copyright, my actions are perfectly legal: I have access to these documents because my university subscribed to this portal. When I download article with browser, IP address is recognized and corresponding message Bought by 'university_name' is displayed on the top of page. The problem with web-based downloading - it's very long; in contrast, programming downloading allow accelerate considerably access to articles. Regards Pavel. -Original Message- From: André Warnier [mailto:a...@ice-sa.com] Sent: vendredi 27 mars 2015 23:20 To: Tomcat Users List Subject: Re: JSP page exploration scenario Pavel Yermolenko wrote: Hello André, Why do you make it so complicated ? Why do you not just request the link to the JSP page ? does that not return the PDF file that you want ? JSP page doesn't include link to .pdf. When I execute such JSP page in browser (e.g. Chrome) and then see its source, the link on .pdf does present. What you propose works perfectly with ordinary pages, not with JSP. Are you not confusing Java applets with JSP pages here ? The original meaning of JSP is Java Server Pages, with the word Server meaning that whatever execution there is, is on the server side. In other words, by the time the page gets to your browser, it should not contain any JSP code anymore. The JSP code will have been run on the server side, and been transformed into HTML or whatever, before it is even sent to the browser. On the other hand, if a page contains Java Applets, these Applets will be executed on the client/browser side, by a local JVM. Following up on that same line, and with a lot of imagination thrown in, if your purpose is to simulate what a local Java Applet does, to download a PDF from the server and open it, then what you need is a protocol analyser, that shows what goes on between the local Java Applet and the server in question, and /that/ is what you need to simulate. Not that I encourage you along these lines. Presumably, if someone went through the trouble of building a website in that way, they probably do not want people to just download their documents without going through the applet. Ever heard of copyright for documents ? If not, I kindly suggest that you seriously investigate the matter, before you even make further trials along those lines. In some countries, even /attempting/ to do that kind of thing can land you into very serious trouble. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org --- L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. http://www.avast.com - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org