Re: Serving static files in a cluster

2010-02-23 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hassan,

On 2/21/2010 9:36 AM, Hassan Schroeder wrote:
 On Sun, Feb 21, 2010 at 3:23 AM, imrank imran...@gmail.com wrote:
 
 Can I use the approach of having all the files sitting on a single NFS file
 server and have the different tomcat instances read/write the files to that
 server's filesystem? I guess theres gonna be some cost in terms of network
 latency...
 
 Not to mention creating a single point of failure.
 
 Alternatively, keep local copies on each server and use rsync to
 maintain consistent images.

Or Hadoop, which I believe can be configured to copy-to-cluster either
synchronously or asynchronously on write.

- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkuEKH8ACgkQ9CaO5/Lv0PBdvQCglBd9ppuj7Pvyq9D1fBSEXO1l
ZvsAoJCMgBFSBgr85wUEOjXlNpEQW9JF
=W50+
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Serving static files in a cluster

2010-02-23 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

André,

On 2/21/2010 10:21 AM, André Warnier wrote:
 You can certainly do that on the base of symbolic links and NFS mounts
 for instance. Each Tomcat would contain something like :

Just be sure that Tomcat doesn't delete your entire document repository
when you undeploy. It would be better to have this document repo
/outside/ of the docBase.

- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkuEKPMACgkQ9CaO5/Lv0PB9zACgpj+Ir3/Gi6onpc/YIs1lphbt
ZKgAnjU0o1ffjlObDqXxiXEDn+8owZU9
=cK43
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Serving static files in a cluster

2010-02-23 Thread André Warnier

Christopher Schultz wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

André,

On 2/21/2010 10:21 AM, André Warnier wrote:

You can certainly do that on the base of symbolic links and NFS mounts
for instance. Each Tomcat would contain something like :


Just be sure that Tomcat doesn't delete your entire document repository
when you undeploy. It would be better to have this document repo
/outside/ of the docBase.


YES. +1.
I remember this, now.  That was a very bad recommendation of mine.
Something like that is in the documentation, or in a previous thread in 
this forum.





- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkuEKPMACgkQ9CaO5/Lv0PB9zACgpj+Ir3/Gi6onpc/YIs1lphbt
ZKgAnjU0o1ffjlObDqXxiXEDn+8owZU9
=cK43
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org





-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Serving static files in a cluster

2010-02-21 Thread André Warnier

Imran Khan wrote:

Hey,

I am using tomcat 5.5.26 on Ubuntu, currently having a clustered
configuration, but having the entire cluster on a single box. I have the
tomcat instances sitting behind apache.

Eventually I'd like to move to cluster on different physical boxes. Part of
our application involves serving files that are saved on the local file
system. These files are uploaded by users.

My question is, what is the best way to save these files so that they can be
served across the different physical boxes? Should I be mirroring the files
across each physical box or is there a particular distributed file system I
should be using for storing the files? I dont know if there is any other
technique.

Since it seems that you have one single Apache httpd front-ending 
multiple Tomcat instances, you could set the system up to have the 
static files in one single location, and serve them up directly with Apache.
This single location could be on some particularly reliable network 
fileserver and accessed through NFS or so.
All you need to do at the Apache level, is tuning your proxy rules or 
mod_jk configuration, to make Apache not proxy to Tomcat the requests 
for static pages.


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Serving static files in a cluster

2010-02-21 Thread imrank

Hey,

Thanks for ur prompt reply.

Unfortunately, the approach you described wouldn't work in our case because
our app needs to do some custom authorization logic before a file can be
downloaded (sorry should've mentioned that). I dont think I can get httpd to
perform this authorization logic.

Can I use the approach of having all the files sitting on a single NFS file
server and have the different tomcat instances read/write the files to that
server's filesystem? I guess theres gonna be some cost in terms of network
latency... 

Cheers,

Imran


awarnier wrote:
 
 Imran Khan wrote:
 Hey,
 
 I am using tomcat 5.5.26 on Ubuntu, currently having a clustered
 configuration, but having the entire cluster on a single box. I have the
 tomcat instances sitting behind apache.
 
 Eventually I'd like to move to cluster on different physical boxes. Part
 of
 our application involves serving files that are saved on the local file
 system. These files are uploaded by users.
 
 My question is, what is the best way to save these files so that they can
 be
 served across the different physical boxes? Should I be mirroring the
 files
 across each physical box or is there a particular distributed file system
 I
 should be using for storing the files? I dont know if there is any other
 technique.
 
 Since it seems that you have one single Apache httpd front-ending 
 multiple Tomcat instances, you could set the system up to have the 
 static files in one single location, and serve them up directly with
 Apache.
 This single location could be on some particularly reliable network 
 fileserver and accessed through NFS or so.
 All you need to do at the Apache level, is tuning your proxy rules or 
 mod_jk configuration, to make Apache not proxy to Tomcat the requests 
 for static pages.
 
 -
 To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
 For additional commands, e-mail: users-h...@tomcat.apache.org
 
 
 

-- 
View this message in context: 
http://old.nabble.com/Serving-static-files-in-a-cluster-tp27672008p27674733.html
Sent from the Tomcat - User mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Serving static files in a cluster

2010-02-21 Thread Hassan Schroeder
On Sun, Feb 21, 2010 at 3:23 AM, imrank imran...@gmail.com wrote:

 Can I use the approach of having all the files sitting on a single NFS file
 server and have the different tomcat instances read/write the files to that
 server's filesystem? I guess theres gonna be some cost in terms of network
 latency...

Not to mention creating a single point of failure.

Alternatively, keep local copies on each server and use rsync to
maintain consistent images.

-- 
Hassan Schroeder  hassan.schroe...@gmail.com
twitter: @hassan

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Serving static files in a cluster

2010-02-21 Thread André Warnier

imrank wrote:

Hey,

Thanks for ur prompt reply.

Unfortunately, the approach you described wouldn't work in our case because
our app needs to do some custom authorization logic before a file can be
downloaded (sorry should've mentioned that). I dont think I can get httpd to
perform this authorization logic.


I have not yet met an authorization logic that can be done with Tomcat 
and can't be done with Apache httpd (but I would be curious about the 
details of yours, just to verify).


This being said, with us being on a Tomcat forum and that sort of thing, 
it is maybe not the right place for that kind of discussion.

(I am available off-list if you would like to explore this however).
So let's for now suppose that the authorization logic is unmovable and 
has to happen at the Tomcat level..




Can I use the approach of having all the files sitting on a single NFS file
server and have the different tomcat instances read/write the files to that
server's filesystem? I guess theres gonna be some cost in terms of network
latency... 


You can certainly do that on the base of symbolic links and NFS mounts 
for instance. Each Tomcat would contain something like :


(tomcat_dir)/webapps/your_app/the_docs -- /mnt/NFS/somedir_with_docs

Unless your network is really slow or these files really large, nowadays 
network latency is probably not going to be the main concern.
The problem may be file and directory locking however, in a multi-user 
and multi-Tomcat instances context.
You would have to make sure that no two Tomcats (and webapps within 
these Tomcats) could conceivably be one reading, one writing the same 
file at the same time.  Through NFS this is not so easy.  Note that you 
would have the same kind of issue even if you did this through NFS at 
the Apache level, but it may be easier because there is only one Apache 
host.



Also, just to get you thinking on the subject of 
authentication/authorization :


1) It is possible to conceive an AAA method at the Apache level, that 
uses Tomcat as the AAA back-end.  The basic idea is this :
- at the Tomcat level, you create a webapp that is basically a dummy, 
and does nothing else than authentication/authorizing a request to it. 
Its answer is a simple plain text response yes or no.
- at the Apache level, whenever you need an authorization, you send a 
background request to Tomcat and this dummy webapp, and read the 
response (which could also be the user-id, instead of just yes).

Then if the response is positive, you proceed; else you return forbidden.

2) if you have an Apache front-end anyway, you can do /all/ the 
authentication/authorization at the Apache level, thus freeing Tomcat(s) 
for more interesting things.  If Apache authenticates a request, it can 
forward the obtained user-id to Tomcat when it proxies the request.

Check the tomcatAuthentication attribute to the Connector tag.


What ultimately makes more sense and is more efficient and is easier to 
maintain, is a decision for you to make in function of your knowledge of 
the setup and the usage patterns of the application.


Instinctively, if your configuration is as follows :

browser -- Apache + connector -- Tomcat + NFS -- NFS fileserver

then the megabits have to circulate through more network and more code 
than if the configuration is like this :


browser -- Apache + NFS -- NFS fileserver

For example, in the first case, if your Apache front-end and Tomcats and 
the NFS fileserver are on the same network cable, then the same file may 
end up being transferred several times over that cable, before it is 
sent to the browser.


Also, if the serving of the static files is done at the Apache level, 
you may be able to use one of the caching modules available at the 
Apache level, to avoid even more network traffic.
But again that depends on the application, and how often the same files 
would be requested over a period of time.



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Serving static files in a cluster

2010-02-21 Thread André Warnier

Hassan Schroeder wrote:

On Sun, Feb 21, 2010 at 3:23 AM, imrank imran...@gmail.com wrote:


Can I use the approach of having all the files sitting on a single NFS file
server and have the different tomcat instances read/write the files to that
server's filesystem? I guess theres gonna be some cost in terms of network
latency...


Not to mention creating a single point of failure.

Alternatively, keep local copies on each server and use rsync to
maintain consistent images.


Not to mention possible inconsistencies between the different copies..
;-)
Imagine you have 4 balanced Tomcats, each of which has its own file 
repository, and each of which can potentially run the next upload 
request or download request.  To/from where does the file get 
uploaded/downladed ? (until all rsyncs have run).  And if the file is 
there twice, but different, which one is correct ? (how would rsync know ?)


There are probably more than one sensible configuration possible. 
Choosing the best one would really depend on details of the application.



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Serving static files in a cluster

2010-02-21 Thread Hassan Schroeder
On Sun, Feb 21, 2010 at 7:36 AM, André Warnier a...@ice-sa.com wrote:

 Not to mention possible inconsistencies between the different copies..
 ;-)
 Imagine you have 4 balanced Tomcats, each of which has its own file
 repository, and each of which can potentially run the next upload request or
 download request.  To/from where does the file get uploaded/downladed ?
 (until all rsyncs have run).  And if the file is there twice, but different,
 which one is correct ? (how would rsync know ?)

Have you used rsync? Because I'm not sure I'm understanding your
questions. I don't see how downloads are relevant; it's uploads that
add a file to the file system on the Tomcat that processed the request.
And that would be the source filesystem to rsync from.

 There are probably more than one sensible configuration possible. Choosing
 the best one would really depend on details of the application.

Based on the original description:
  ... involves serving files that are saved on the local file
   system. These files are uploaded by users.

:: I'm assuming a write-once, read-multiple use case, for which rsync
is perfect. A newly added file on one file system will be propagated to
the others. It's a simple and consistent replication scheme.

If, however, the application allows file *modification*, then you have
a concurrency problem no matter what storage mechanism you use.

So yes, the best solution does depend on the details of the app...
-- 
Hassan Schroeder  hassan.schroe...@gmail.com
twitter: @hassan

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Serving static files in a cluster

2010-02-21 Thread André Warnier

Hassan Schroeder wrote:

On Sun, Feb 21, 2010 at 7:36 AM, André Warnier a...@ice-sa.com wrote:


Not to mention possible inconsistencies between the different copies..
;-)
Imagine you have 4 balanced Tomcats, each of which has its own file
repository, and each of which can potentially run the next upload request or
download request.  To/from where does the file get uploaded/downladed ?
(until all rsyncs have run).  And if the file is there twice, but different,
which one is correct ? (how would rsync know ?)


Have you used rsync? 

Yes, quite a lot. But not necessarily all the options.

Because I'm not sure I'm understanding your
questions. I don't see how downloads are relevant; 


What I meant was this : a user uploads a file. That file is uploaded to 
one Tomcat, and there is only a copy on that one Tomcat, until rsync has 
synchronised it to the other Tomcats.
If at that point a user (maybe even the same one, just to check) 
requests the file, this request ends up with one of the Tomcats, not 
necessarily the same one.  What if that Tomcat does not have the file yet ?


It may be that I am misunderstading how you would set up the rsync bit.

it's uploads that

add a file to the file system on the Tomcat that processed the request.
And that would be the source filesystem to rsync from.


Yes, but how often ? (again, maybe my incomplete knowledge of the rsync 
capabilities).  Each Tomcat would need to rsync is repository with each 
of the others, constantly, not so ? Isn't that in itself going to 
generate a lot of traffic ? And if each Tomcat pulls the files of the 
others via rsync, how does one rsync know that the new file he is seeing 
on this other Tomcat has finished uploading ?


Not polemical questions by the way, genuinely trying to learn new 
tricks, and inform the OP about alternatives.


In my opinion, the simplest and most reliable scheme is to have one 
single repository, which could itself be made as reliable as possible 
via a number of methods (hardware duplication, replication, 
snapshots,..). If there are locking issues, the single repository makes 
them easier to solve.  If there is only one uploading host, that again 
makes it easier.




-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Serving static files in a cluster

2010-02-21 Thread Hassan Schroeder
On Sun, Feb 21, 2010 at 8:54 AM, André Warnier a...@ice-sa.com wrote:

 it's uploads that
 add a file to the file system on the Tomcat that processed the request.
 And that would be the source filesystem to rsync from.

 Yes, but how often ?

In the simplest case, once each time a file is uploaded :-)

How frequent are the uploads? If it's one every few minutes, the above
simplest case applies; if it's hundreds per second, different situation,
tune to suit.

 Each Tomcat would need to rsync is repository with each of
 the others, constantly, not so ? Isn't that in itself going to generate a
 lot of traffic ? And if each Tomcat pulls the files of the others via
 rsync, how does one rsync know that the new file he is seeing on this other
 Tomcat has finished uploading ?

Again, the file system *with a new file* is the source for an rsync
push. The application itself knows when an upload is complete, so
there won't be incomplete files copied.

 In my opinion, the simplest and most reliable scheme is to have one single
 repository, which could itself be made as reliable as possible via a number
 of methods (hardware duplication, replication, snapshots,..).

Which certainly starts moving away from simplest :-)

 If there are
 locking issues, the single repository makes them easier to solve.  If there
 is only one uploading host, that again makes it easier.

I definitely agree with the second statement.

-- 
Hassan Schroeder  hassan.schroe...@gmail.com
twitter: @hassan

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Serving static files in a cluster

2010-02-21 Thread imrank
 network cable, then the same file may 
 end up being transferred several times over that cable, before it is 
 sent to the browser.
 
 Also, if the serving of the static files is done at the Apache level, 
 you may be able to use one of the caching modules available at the 
 Apache level, to avoid even more network traffic.
 But again that depends on the application, and how often the same files 
 would be requested over a period of time.
 
 
 -
 To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
 For additional commands, e-mail: users-h...@tomcat.apache.org
 
 
 

-- 
View this message in context: 
http://old.nabble.com/Serving-static-files-in-a-cluster-tp27672008p27677298.html
Sent from the Tomcat - User mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Serving static files in a cluster

2010-02-21 Thread Hassan Schroeder
On Sun, Feb 21, 2010 at 3:01 PM, imrank imran...@gmail.com wrote:

 Hasan, the approach you described is one that I was also considering to keep
 things consistent across tomcat instances (btw, there is no modifications
 occurring to existing files). I was considering using an approach whereby
 after a file is uploaded by a user, I run rsysnc to synchronize across the
 nodes. However my concern with this approach is that there will be a some
 delay before the file is available on other nodes (files can be up to couple
 100MBs in size).

Only you know if that's a problem. Does it *need* to be immediately
available? When I upload something to youtube, I get a message that
it's being processed and isn't immediately available -- and that's OK.

Patience is a virtue (sometimes)!

That said, how long does it take to transfer a couple-hundred-mb file
across a local hardwired ethernet (or fiber) connection? It might not
be worth worrying about, until you have to :-)

-- 
Hassan Schroeder  hassan.schroe...@gmail.com
twitter: @hassan

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Serving static files in a cluster

2010-02-20 Thread Imran Khan
Hey,

I am using tomcat 5.5.26 on Ubuntu, currently having a clustered
configuration, but having the entire cluster on a single box. I have the
tomcat instances sitting behind apache.

Eventually I'd like to move to cluster on different physical boxes. Part of
our application involves serving files that are saved on the local file
system. These files are uploaded by users.

My question is, what is the best way to save these files so that they can be
served across the different physical boxes? Should I be mirroring the files
across each physical box or is there a particular distributed file system I
should be using for storing the files? I dont know if there is any other
technique.

Thanks,

Imran