Ultimately it was PermGen out of memory. I somehow missed it in the log On Thu, May 14, 2015 at 9:24 AM, Lior Chaga <lio...@taboola.com> wrote:
> After profiling with YourKit, I see there's an OutOfMemoryException in > context SQLContext.applySchema. Again, it's a very small RDD. Each executor > has 180GB RAM. > > On Thu, May 14, 2015 at 8:53 AM, Lior Chaga <lio...@taboola.com> wrote: > >> Hi, >> >> Using spark sql with HiveContext. Spark version is 1.3.1 >> When running local spark everything works fine. When running on spark >> cluster I get ClassNotFoundError org.apache.hadoop.hive.shims.Hadoop23Shims. >> This class belongs to hive-shims-0.23, and is a runtime dependency for >> spark-hive: >> >> [INFO] org.apache.spark:spark-hive_2.10:jar:1.3.1 >> [INFO] +- org.spark-project.hive:hive-metastore:jar:0.13.1a:compile >> [INFO] | +- org.spark-project.hive:hive-shims:jar:0.13.1a:compile >> [INFO] | | +- >> org.spark-project.hive.shims:hive-shims-common:jar:0.13.1a:compile >> [INFO] | | +- >> org.spark-project.hive.shims:hive-shims-0.20:jar:0.13.1a:runtime >> [INFO] | | +- >> org.spark-project.hive.shims:hive-shims-common-secure:jar:0.13.1a:compile >> [INFO] | | +- >> org.spark-project.hive.shims:hive-shims-0.20S:jar:0.13.1a:runtime >> [INFO] | | \- >> org.spark-project.hive.shims:hive-shims-0.23:jar:0.13.1a:runtime >> >> >> >> My spark distribution is: >> make-distribution.sh --tgz -Phive -Phive-thriftserver -DskipTests >> >> >> If I try to add this dependency to my driver project, then the exception >> disappears, but then the task is stuck when registering an rdd as a table >> (I get timeout after 30 seconds). I should emphasize that the first rdd I >> register as a table is a very small one (about 60K row), and as I said - it >> runs swiftly in local. >> I suspect maybe other dependencies are missing, but they fail silently. >> >> Would be grateful if anyone knows how to solve it. >> >> Lior >> >> >