Re: question about setting SPARK_CLASSPATH IN spark_env.sh

2014-06-18 Thread santhoma
by the way, any idea how to sync the spark config dir with other nodes in the cluster? ~santhosh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/question-about-setting-SPARK-CLASSPATH-IN-spark-env-sh-tp7809p7853.html Sent from the Apache Spark User List mai

Re: question about setting SPARK_CLASSPATH IN spark_env.sh

2014-06-17 Thread santhoma
Thanks, I hope this problem will go away once I upgrade to spark 1.0 where we can send the clusterwide classpaths using spark-submit command -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/question-about-setting-SPARK-CLASSPATH-IN-spark-env-sh-tp7809p7822.ht

question about setting SPARK_CLASSPATH IN spark_env.sh

2014-06-17 Thread santhoma
Hi, This is about spark 0.9. I have a 3 node spark cluster. I want to add a locally available jarfile (present on all nodes) to the SPARK_CLASPATH variable in /etc/spark/conf/spark-env.sh so that all nodes can access it. Question is, should I edit 'spark-env.sh' on all nodes to add the jar ?

Re: Configuring distributed caching with Spark and YARN

2014-04-01 Thread santhoma
I think with addJar() there is no 'caching', in the sense files will be copied everytime per job. Whereas in hadoop distributed cache, files will be copied only once, and a symlink will be created to the cache file for subsequent runs: https://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/fi

Re: Configuring distributed caching with Spark and YARN

2014-03-27 Thread santhoma
Curious to know, were you able to do distributed caching for spark? I have done that for hadoop and pig, but could not find a way to do it in spark -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Configuring-distributed-caching-with-Spark-and-YARN-tp1074p33

Re: How to set environment variable for a spark job

2014-03-27 Thread santhoma
Got it finally, pasting it here so that it will be useful for others val conf = new SparkConf() .setJars(jarList); conf.setExecutorEnv("ORACLE_HOME", myOraHome) conf.setExecutorEnv("SPARK_JAVA_OPTS", "-Djava.library.path=/my/custom/path") -- View this message in context: http:/

Re: How to set environment variable for a spark job

2014-03-26 Thread santhoma
OK, it was working. I printed System.getenv(..) for both env variables and they gave correct values. However it did not give me the intended result. My intention was to load a native library from LD_LIBRARY_PATH, but looks like the library is loaded from value of -Djava.library.path. Value o

Re: How to set environment variable for a spark job

2014-03-25 Thread santhoma
I tried it, it did not work conf.setExecutorEnv("ORACLE_HOME", orahome) conf.setExecutorEnv("LD_LIBRARY_PATH", ldpath) Any idea how to set it using java.library.path ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-set-environment-variabl

any distributed cache mechanism available in spark ?

2014-03-25 Thread santhoma
I have been writing map-reduce on hadoop using PIG , and is now trying to migrate to SPARK. My cluster consists of multiple nodes, and the jobs depend on a native library (.so files). In hadoop and PIG , I could distribute the files across nodes using "-files" or "-archive" option, but I could no

How to set environment variable for a spark job

2014-03-25 Thread santhoma
Hello I have a requirement to set some env values for my spark jobs. Does anyone know how to set them? Specifically following variables: 1) ORACLE_HOME 2) LD_LIBRARY_PATH thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-set-environment-varia

Re: Java API - Serialization Issue

2014-03-25 Thread santhoma
This worked great. Thanks a lot -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Java-API-Serialization-Issue-tp1460p3178.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: java.io.NotSerializableException Of dependent Java lib.

2014-03-24 Thread santhoma
Can someone answer this question please? Specifically about the Serializable implementation of dependent jars .. ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-io-NotSerializableException-Of-dependent-Java-lib-tp1973p3087.html Sent from the Apache

Re: Java API - Serialization Issue

2014-03-24 Thread santhoma
I am also facing the same problem. I have implemented Serializable for my code, but the exception is thrown from third party libraries on which I have no control . Exception in thread "main" org.apache.spark.SparkException: Job aborted: Task not serializable: java.io.NotSerializableException: (li