Re: Serving static files in a cluster
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hassan, On 2/21/2010 9:36 AM, Hassan Schroeder wrote: On Sun, Feb 21, 2010 at 3:23 AM, imrank imran...@gmail.com wrote: Can I use the approach of having all the files sitting on a single NFS file server and have the different tomcat instances read/write the files to that server's filesystem? I guess theres gonna be some cost in terms of network latency... Not to mention creating a single point of failure. Alternatively, keep local copies on each server and use rsync to maintain consistent images. Or Hadoop, which I believe can be configured to copy-to-cluster either synchronously or asynchronously on write. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkuEKH8ACgkQ9CaO5/Lv0PBdvQCglBd9ppuj7Pvyq9D1fBSEXO1l ZvsAoJCMgBFSBgr85wUEOjXlNpEQW9JF =W50+ -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Serving static files in a cluster
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 André, On 2/21/2010 10:21 AM, André Warnier wrote: You can certainly do that on the base of symbolic links and NFS mounts for instance. Each Tomcat would contain something like : Just be sure that Tomcat doesn't delete your entire document repository when you undeploy. It would be better to have this document repo /outside/ of the docBase. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkuEKPMACgkQ9CaO5/Lv0PB9zACgpj+Ir3/Gi6onpc/YIs1lphbt ZKgAnjU0o1ffjlObDqXxiXEDn+8owZU9 =cK43 -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Serving static files in a cluster
Christopher Schultz wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 André, On 2/21/2010 10:21 AM, André Warnier wrote: You can certainly do that on the base of symbolic links and NFS mounts for instance. Each Tomcat would contain something like : Just be sure that Tomcat doesn't delete your entire document repository when you undeploy. It would be better to have this document repo /outside/ of the docBase. YES. +1. I remember this, now. That was a very bad recommendation of mine. Something like that is in the documentation, or in a previous thread in this forum. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkuEKPMACgkQ9CaO5/Lv0PB9zACgpj+Ir3/Gi6onpc/YIs1lphbt ZKgAnjU0o1ffjlObDqXxiXEDn+8owZU9 =cK43 -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Serving static files in a cluster
Imran Khan wrote: Hey, I am using tomcat 5.5.26 on Ubuntu, currently having a clustered configuration, but having the entire cluster on a single box. I have the tomcat instances sitting behind apache. Eventually I'd like to move to cluster on different physical boxes. Part of our application involves serving files that are saved on the local file system. These files are uploaded by users. My question is, what is the best way to save these files so that they can be served across the different physical boxes? Should I be mirroring the files across each physical box or is there a particular distributed file system I should be using for storing the files? I dont know if there is any other technique. Since it seems that you have one single Apache httpd front-ending multiple Tomcat instances, you could set the system up to have the static files in one single location, and serve them up directly with Apache. This single location could be on some particularly reliable network fileserver and accessed through NFS or so. All you need to do at the Apache level, is tuning your proxy rules or mod_jk configuration, to make Apache not proxy to Tomcat the requests for static pages. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Serving static files in a cluster
Hey, Thanks for ur prompt reply. Unfortunately, the approach you described wouldn't work in our case because our app needs to do some custom authorization logic before a file can be downloaded (sorry should've mentioned that). I dont think I can get httpd to perform this authorization logic. Can I use the approach of having all the files sitting on a single NFS file server and have the different tomcat instances read/write the files to that server's filesystem? I guess theres gonna be some cost in terms of network latency... Cheers, Imran awarnier wrote: Imran Khan wrote: Hey, I am using tomcat 5.5.26 on Ubuntu, currently having a clustered configuration, but having the entire cluster on a single box. I have the tomcat instances sitting behind apache. Eventually I'd like to move to cluster on different physical boxes. Part of our application involves serving files that are saved on the local file system. These files are uploaded by users. My question is, what is the best way to save these files so that they can be served across the different physical boxes? Should I be mirroring the files across each physical box or is there a particular distributed file system I should be using for storing the files? I dont know if there is any other technique. Since it seems that you have one single Apache httpd front-ending multiple Tomcat instances, you could set the system up to have the static files in one single location, and serve them up directly with Apache. This single location could be on some particularly reliable network fileserver and accessed through NFS or so. All you need to do at the Apache level, is tuning your proxy rules or mod_jk configuration, to make Apache not proxy to Tomcat the requests for static pages. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org -- View this message in context: http://old.nabble.com/Serving-static-files-in-a-cluster-tp27672008p27674733.html Sent from the Tomcat - User mailing list archive at Nabble.com. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Serving static files in a cluster
On Sun, Feb 21, 2010 at 3:23 AM, imrank imran...@gmail.com wrote: Can I use the approach of having all the files sitting on a single NFS file server and have the different tomcat instances read/write the files to that server's filesystem? I guess theres gonna be some cost in terms of network latency... Not to mention creating a single point of failure. Alternatively, keep local copies on each server and use rsync to maintain consistent images. -- Hassan Schroeder hassan.schroe...@gmail.com twitter: @hassan - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Serving static files in a cluster
imrank wrote: Hey, Thanks for ur prompt reply. Unfortunately, the approach you described wouldn't work in our case because our app needs to do some custom authorization logic before a file can be downloaded (sorry should've mentioned that). I dont think I can get httpd to perform this authorization logic. I have not yet met an authorization logic that can be done with Tomcat and can't be done with Apache httpd (but I would be curious about the details of yours, just to verify). This being said, with us being on a Tomcat forum and that sort of thing, it is maybe not the right place for that kind of discussion. (I am available off-list if you would like to explore this however). So let's for now suppose that the authorization logic is unmovable and has to happen at the Tomcat level.. Can I use the approach of having all the files sitting on a single NFS file server and have the different tomcat instances read/write the files to that server's filesystem? I guess theres gonna be some cost in terms of network latency... You can certainly do that on the base of symbolic links and NFS mounts for instance. Each Tomcat would contain something like : (tomcat_dir)/webapps/your_app/the_docs -- /mnt/NFS/somedir_with_docs Unless your network is really slow or these files really large, nowadays network latency is probably not going to be the main concern. The problem may be file and directory locking however, in a multi-user and multi-Tomcat instances context. You would have to make sure that no two Tomcats (and webapps within these Tomcats) could conceivably be one reading, one writing the same file at the same time. Through NFS this is not so easy. Note that you would have the same kind of issue even if you did this through NFS at the Apache level, but it may be easier because there is only one Apache host. Also, just to get you thinking on the subject of authentication/authorization : 1) It is possible to conceive an AAA method at the Apache level, that uses Tomcat as the AAA back-end. The basic idea is this : - at the Tomcat level, you create a webapp that is basically a dummy, and does nothing else than authentication/authorizing a request to it. Its answer is a simple plain text response yes or no. - at the Apache level, whenever you need an authorization, you send a background request to Tomcat and this dummy webapp, and read the response (which could also be the user-id, instead of just yes). Then if the response is positive, you proceed; else you return forbidden. 2) if you have an Apache front-end anyway, you can do /all/ the authentication/authorization at the Apache level, thus freeing Tomcat(s) for more interesting things. If Apache authenticates a request, it can forward the obtained user-id to Tomcat when it proxies the request. Check the tomcatAuthentication attribute to the Connector tag. What ultimately makes more sense and is more efficient and is easier to maintain, is a decision for you to make in function of your knowledge of the setup and the usage patterns of the application. Instinctively, if your configuration is as follows : browser -- Apache + connector -- Tomcat + NFS -- NFS fileserver then the megabits have to circulate through more network and more code than if the configuration is like this : browser -- Apache + NFS -- NFS fileserver For example, in the first case, if your Apache front-end and Tomcats and the NFS fileserver are on the same network cable, then the same file may end up being transferred several times over that cable, before it is sent to the browser. Also, if the serving of the static files is done at the Apache level, you may be able to use one of the caching modules available at the Apache level, to avoid even more network traffic. But again that depends on the application, and how often the same files would be requested over a period of time. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Serving static files in a cluster
Hassan Schroeder wrote: On Sun, Feb 21, 2010 at 3:23 AM, imrank imran...@gmail.com wrote: Can I use the approach of having all the files sitting on a single NFS file server and have the different tomcat instances read/write the files to that server's filesystem? I guess theres gonna be some cost in terms of network latency... Not to mention creating a single point of failure. Alternatively, keep local copies on each server and use rsync to maintain consistent images. Not to mention possible inconsistencies between the different copies.. ;-) Imagine you have 4 balanced Tomcats, each of which has its own file repository, and each of which can potentially run the next upload request or download request. To/from where does the file get uploaded/downladed ? (until all rsyncs have run). And if the file is there twice, but different, which one is correct ? (how would rsync know ?) There are probably more than one sensible configuration possible. Choosing the best one would really depend on details of the application. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Serving static files in a cluster
On Sun, Feb 21, 2010 at 7:36 AM, André Warnier a...@ice-sa.com wrote: Not to mention possible inconsistencies between the different copies.. ;-) Imagine you have 4 balanced Tomcats, each of which has its own file repository, and each of which can potentially run the next upload request or download request. To/from where does the file get uploaded/downladed ? (until all rsyncs have run). And if the file is there twice, but different, which one is correct ? (how would rsync know ?) Have you used rsync? Because I'm not sure I'm understanding your questions. I don't see how downloads are relevant; it's uploads that add a file to the file system on the Tomcat that processed the request. And that would be the source filesystem to rsync from. There are probably more than one sensible configuration possible. Choosing the best one would really depend on details of the application. Based on the original description: ... involves serving files that are saved on the local file system. These files are uploaded by users. :: I'm assuming a write-once, read-multiple use case, for which rsync is perfect. A newly added file on one file system will be propagated to the others. It's a simple and consistent replication scheme. If, however, the application allows file *modification*, then you have a concurrency problem no matter what storage mechanism you use. So yes, the best solution does depend on the details of the app... -- Hassan Schroeder hassan.schroe...@gmail.com twitter: @hassan - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Serving static files in a cluster
Hassan Schroeder wrote: On Sun, Feb 21, 2010 at 7:36 AM, André Warnier a...@ice-sa.com wrote: Not to mention possible inconsistencies between the different copies.. ;-) Imagine you have 4 balanced Tomcats, each of which has its own file repository, and each of which can potentially run the next upload request or download request. To/from where does the file get uploaded/downladed ? (until all rsyncs have run). And if the file is there twice, but different, which one is correct ? (how would rsync know ?) Have you used rsync? Yes, quite a lot. But not necessarily all the options. Because I'm not sure I'm understanding your questions. I don't see how downloads are relevant; What I meant was this : a user uploads a file. That file is uploaded to one Tomcat, and there is only a copy on that one Tomcat, until rsync has synchronised it to the other Tomcats. If at that point a user (maybe even the same one, just to check) requests the file, this request ends up with one of the Tomcats, not necessarily the same one. What if that Tomcat does not have the file yet ? It may be that I am misunderstading how you would set up the rsync bit. it's uploads that add a file to the file system on the Tomcat that processed the request. And that would be the source filesystem to rsync from. Yes, but how often ? (again, maybe my incomplete knowledge of the rsync capabilities). Each Tomcat would need to rsync is repository with each of the others, constantly, not so ? Isn't that in itself going to generate a lot of traffic ? And if each Tomcat pulls the files of the others via rsync, how does one rsync know that the new file he is seeing on this other Tomcat has finished uploading ? Not polemical questions by the way, genuinely trying to learn new tricks, and inform the OP about alternatives. In my opinion, the simplest and most reliable scheme is to have one single repository, which could itself be made as reliable as possible via a number of methods (hardware duplication, replication, snapshots,..). If there are locking issues, the single repository makes them easier to solve. If there is only one uploading host, that again makes it easier. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Serving static files in a cluster
On Sun, Feb 21, 2010 at 8:54 AM, André Warnier a...@ice-sa.com wrote: it's uploads that add a file to the file system on the Tomcat that processed the request. And that would be the source filesystem to rsync from. Yes, but how often ? In the simplest case, once each time a file is uploaded :-) How frequent are the uploads? If it's one every few minutes, the above simplest case applies; if it's hundreds per second, different situation, tune to suit. Each Tomcat would need to rsync is repository with each of the others, constantly, not so ? Isn't that in itself going to generate a lot of traffic ? And if each Tomcat pulls the files of the others via rsync, how does one rsync know that the new file he is seeing on this other Tomcat has finished uploading ? Again, the file system *with a new file* is the source for an rsync push. The application itself knows when an upload is complete, so there won't be incomplete files copied. In my opinion, the simplest and most reliable scheme is to have one single repository, which could itself be made as reliable as possible via a number of methods (hardware duplication, replication, snapshots,..). Which certainly starts moving away from simplest :-) If there are locking issues, the single repository makes them easier to solve. If there is only one uploading host, that again makes it easier. I definitely agree with the second statement. -- Hassan Schroeder hassan.schroe...@gmail.com twitter: @hassan - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Serving static files in a cluster
network cable, then the same file may end up being transferred several times over that cable, before it is sent to the browser. Also, if the serving of the static files is done at the Apache level, you may be able to use one of the caching modules available at the Apache level, to avoid even more network traffic. But again that depends on the application, and how often the same files would be requested over a period of time. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org -- View this message in context: http://old.nabble.com/Serving-static-files-in-a-cluster-tp27672008p27677298.html Sent from the Tomcat - User mailing list archive at Nabble.com. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Serving static files in a cluster
On Sun, Feb 21, 2010 at 3:01 PM, imrank imran...@gmail.com wrote: Hasan, the approach you described is one that I was also considering to keep things consistent across tomcat instances (btw, there is no modifications occurring to existing files). I was considering using an approach whereby after a file is uploaded by a user, I run rsysnc to synchronize across the nodes. However my concern with this approach is that there will be a some delay before the file is available on other nodes (files can be up to couple 100MBs in size). Only you know if that's a problem. Does it *need* to be immediately available? When I upload something to youtube, I get a message that it's being processed and isn't immediately available -- and that's OK. Patience is a virtue (sometimes)! That said, how long does it take to transfer a couple-hundred-mb file across a local hardwired ethernet (or fiber) connection? It might not be worth worrying about, until you have to :-) -- Hassan Schroeder hassan.schroe...@gmail.com twitter: @hassan - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Serving static files in a cluster
Hey, I am using tomcat 5.5.26 on Ubuntu, currently having a clustered configuration, but having the entire cluster on a single box. I have the tomcat instances sitting behind apache. Eventually I'd like to move to cluster on different physical boxes. Part of our application involves serving files that are saved on the local file system. These files are uploaded by users. My question is, what is the best way to save these files so that they can be served across the different physical boxes? Should I be mirroring the files across each physical box or is there a particular distributed file system I should be using for storing the files? I dont know if there is any other technique. Thanks, Imran