The keyword "static" in java means that a single instance of it will exist for a given class loader. Two different class loaders will have different values for a static variable even within the same JVM running on the same host.
Synchronization in Java works based on locks. In the case of synchronized keyword applied to static methods, the lock would be the class. Same rules apply across multiple class loaders as above. The only time you would need to synchronize something is if it contains shared state and it must be updated in an atomic manner. This isn't going to work in any parallel process unless you first have a shared data structure. Static only guarantees that it will be shared within the same class loader (again see above). A static method is fine if there is no shared state (i.e. if it's just a function that takes parameters and returns a value). If you need to share state, I would look at writing to HDFS or using an ACID compliant data store with transaction semantics (e.g. a relational database). You might want to check out this: https://en.wikipedia.org/wiki/Functional_programming I would try to avoid shared state unless it's absolutely necessary. -------- Original Message -------- Subject: Is it safe to have static methods in Hadoop Framework From: Huy Pham <[email protected]> Date: Thu, July 25, 2013 2:46 pm To: "[email protected]" <[email protected]>, "[email protected]" <[email protected]> Hi All, I am writing a class (called Parser) with a couple of static functions because I don't want millions of instances of this class to be created during the run. However, I realized that Hadoop will eventually produce parallel jobs, and if all jobs will call static functions of this Parser class, would that be safe? In other words, will all hadoop jobs share the same class Parser or will each of them have their own Parser? In the former case, if all jobs share the same class, then if I make the methods synchronized, then the jobs would need to wait until the locks to the functions are released, thus that would affect the performance. However, in later case, that would not cause any problem. Can someone provide some insights? Thanks Huy
