from:"Marcelo Vanzin"

Re: Spark on YARN

2014-11-18 Thread Marcelo Vanzin

Can you check in your RM's web UI how much of each resource does Yarn think you have available? You can also check that in the Yarn configuration directly. Perhaps it's not configured to use all of the available resources. (If it was set up with Cloudera Manager, CM will reserve some room for

Re: spark-shell giving me error of unread block data

2014-11-19 Thread Marcelo Vanzin

Hi Anson, We've seen this error when incompatible classes are used in the driver and executors (e.g., same class name, but the classes are different and thus the serialized data is different). This can happen for example if you're including some 3rd party libraries in your app's jar, or changing

Re: spark-shell giving me error of unread block data

2014-11-19 Thread Marcelo Vanzin

are using CDH's version of Spark, not trying to run an Apache Spark release on top of CDH, right? (If that's the case, then we could probably move this conversation to cdh-us...@cloudera.org, since it would be CDH-specific.) On Wed Nov 19 2014 at 4:52:51 PM Marcelo Vanzin van...@cloudera.com wrote

Re: How to incrementally compile spark examples using mvn

2014-11-20 Thread Marcelo Vanzin

Hi Yiming, On Wed, Nov 19, 2014 at 5:35 PM, Yiming (John) Zhang sdi...@gmail.com wrote: Thank you for your reply. I was wondering whether there is a method of reusing locally-built components without installing them? That is, if I have successfully built the spark project as a whole, how

Re: spark-submit and logging

2014-11-20 Thread Marcelo Vanzin

Check the --files argument in the output spark-submit -h. On Thu, Nov 20, 2014 at 7:51 AM, Matt Narrell matt.narr...@gmail.com wrote: How do I configure the files to be uploaded to YARN containers. So far, I’ve only seen --conf spark.yarn.jar=hdfs://….” which allows me to specify the HDFS

Re: spark-submit and logging

2014-11-20 Thread Marcelo Vanzin

Hi Tobias, With the current Yarn code, packaging the configuration in your app's jar and adding the -Dlog4j.configuration=log4jConf.xml argument to the extraJavaOptions configs should work. That's not the recommended way for get it to work, though, since this behavior may change in the future.

Re: Using Spark Context as an attribute of a class cannot be used

2014-11-24 Thread Marcelo Vanzin

Hello, On Mon, Nov 24, 2014 at 12:07 PM, aecc alessandroa...@gmail.com wrote: This is the stacktrace: org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: $iwC$$iwC$$iwC$$iwC$AAA - field (class

Re: Using Spark Context as an attribute of a class cannot be used

2014-11-24 Thread Marcelo Vanzin

On Mon, Nov 24, 2014 at 1:56 PM, aecc alessandroa...@gmail.com wrote: I checked sqlContext, they use it in the same way I would like to use my class, they make the class Serializable with transient. Does this affects somehow the whole pipeline of data moving? I mean, will I get performance

Re: Using Spark Context as an attribute of a class cannot be used

2014-11-24 Thread Marcelo Vanzin

That's an interesting question for which I do not know the answer. Probably a question for someone with more knowledge of the internals of the shell interpreter... On Mon, Nov 24, 2014 at 2:19 PM, aecc alessandroa...@gmail.com wrote: Ok, great, I'm gonna do do it that way, thanks :). However I

Re: Is there a way to turn on spark eventLog on the worker node?

2014-11-24 Thread Marcelo Vanzin

Hello, What exactly are you trying to see? Workers don't generate any events that would be logged by enabling that config option. Workers generate logs, and those are captured and saved to disk by the cluster manager, generally, without you having to do anything. On Mon, Nov 24, 2014 at 7:46 PM,

Re: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava

2014-12-02 Thread Marcelo Vanzin

On Tue, Dec 2, 2014 at 11:22 AM, Judy Nash judyn...@exchange.microsoft.com wrote: Any suggestion on how can user with custom Hadoop jar solve this issue? You'll need to include all the dependencies for that custom Hadoop jar to the classpath. Those will include Guava (which is not included in

Re: How to incrementally compile spark examples using mvn

2014-12-05 Thread Marcelo Vanzin

wrote: Thank you, Marcelo and Sean, mvn install is a good answer for my demands. -邮件原件- 发件人: Marcelo Vanzin [mailto:van...@cloudera.com] 发送时间: 2014年11月21日 1:47 收件人: yiming zhang 抄送: Sean Owen; user@spark.apache.org 主题: Re: How to incrementally compile spark examples using mvn Hi

Re: How to incrementally compile spark examples using mvn

2014-12-05 Thread Marcelo Vanzin

and got weird errors because some toy version i once build was stuck in my local maven repo and it somehow got priority over a real maven repo). On Fri, Dec 5, 2014 at 5:28 PM, Marcelo Vanzin van...@cloudera.com wrote: You can set SPARK_PREPEND_CLASSES=1 and it should pick your new mllib

Re: spark shell and hive context problem

2014-12-09 Thread Marcelo Vanzin

Hello, In CDH 5.2 you need to manually add Hive classes to the classpath of your Spark job if you want to use the Hive integration. Also, be aware that since Spark 1.1 doesn't really support the version of Hive shipped with CDH 5.2, this combination is to be considered extremely experimental. On

Re: Spark 1.0.0 Standalone mode config

2014-12-10 Thread Marcelo Vanzin

Hello, What do you mean by app that uses 2 cores and 8G of RAM? Spark apps generally involve multiple processes. The command line options you used affect only one of them (the driver). You may want to take a look at similar configuration for executors. Also, check the documentation:

Re: Spark Server - How to implement

2014-12-11 Thread Marcelo Vanzin

it as a public API, but mostly for internal Hive use. It can give you a few ideas, though. Also, SPARK-3215. On Thu, Dec 11, 2014 at 5:41 PM, Marcelo Vanzin van...@cloudera.com wrote: Hi Manoj, I'm not aware of any public projects that do something like that, except for the Ooyala server which you say

Re: Spark Server - How to implement

2014-12-11 Thread Marcelo Vanzin

Hi Manoj, I'm not aware of any public projects that do something like that, except for the Ooyala server which you say doesn't cover your needs. We've been playing with something like that inside Hive, though: On Thu, Dec 11, 2014 at 5:33 PM, Manoj Samel manojsamelt...@gmail.com wrote: Hi,

Re: SPARK-2243 Support multiple SparkContexts in the same JVM

2014-12-17 Thread Marcelo Vanzin

Hi Anton, That could solve some of the issues (I've played with that a little bit). But there are still some areas where this would be sub-optimal, because Spark still uses system properties in some places and those are global, not per-class loader. (SparkSubmit is the biggest offender here, but

Re: Can Spark 1.1.0 save checkpoint to HDFS 2.5.1?

2014-12-19 Thread Marcelo Vanzin

On Fri, Dec 19, 2014 at 4:05 PM, Haopu Wang hw...@qilinsoft.com wrote: My application doesn’t depends on hadoop-client directly. It only depends on spark-core_2.10 which depends on hadoop-client 1.0.4. This can be checked by Maven repository at

Re: Yarn not running as many executors as I'd like

2014-12-19 Thread Marcelo Vanzin

How many cores / memory do you have available per NodeManager, and how many cores / memory are you requesting for your job? Remember that in Yarn mode, Spark launches num executors + 1 containers. The extra container, by default, reserves 1 core and about 1g of memory (more if running in cluster

Re: Who manage the log4j appender while running spark on yarn?

2014-12-22 Thread Marcelo Vanzin

If you don't specify your own log4j.properties, Spark will load the default one (from core/src/main/resources/org/apache/spark/log4j-defaults.properties, which ends up being packaged with the Spark assembly). You can easily override the config file if you want to, though; check the Debugging

Re: “mapreduce.job.user.classpath.first” for Spark

2015-02-04 Thread Marcelo Vanzin

Hi Corey, When you run on Yarn, Yarn's libraries are placed in the classpath, and they have precedence over your app's. So, with Spark 1.2, you'll get Guava 11 in your classpath (with Spark 1.1 and earlier you'd get Guava 14 from Spark, so still a problem for you). Right now, the option Markus

Re: “mapreduce.job.user.classpath.first” for Spark

2015-02-04 Thread Marcelo Vanzin

Hi Koert, On Wed, Feb 4, 2015 at 11:35 AM, Koert Kuipers ko...@tresata.com wrote: do i understand it correctly that on yarn the the customer jars are truly placed before the yarn and spark jars on classpath? meaning at container construction time, on the same classloader? that would be great

Re: “mapreduce.job.user.classpath.first” for Spark

2015-02-04 Thread Marcelo Vanzin

On Wed, Feb 4, 2015 at 1:12 PM, Koert Kuipers ko...@tresata.com wrote: about putting stuff on classpath before spark or yarn... yeah you can shoot yourself in the foot with it, but since the container is isolated it should be ok, no? we have been using HADOOP_USER_CLASSPATH_FIRST forever with

Re: “mapreduce.job.user.classpath.first” for Spark

2015-02-04 Thread Marcelo Vanzin

Hi Corey, On Wed, Feb 4, 2015 at 12:44 PM, Corey Nolet cjno...@gmail.com wrote: Another suggestion is to build Spark by yourself. I'm having trouble seeing what you mean here, Marcelo. Guava is already shaded to a different package for the 1.2.0 release. It shouldn't be causing conflicts.

Re: how to run python app in yarn?

2015-01-14 Thread Marcelo Vanzin

As the error message says... On Wed, Jan 14, 2015 at 3:14 PM, freedafeng freedaf...@yahoo.com wrote: Error: Cluster deploy mode is currently not supported for python applications. Use yarn-client instead of yarn-cluster for pyspark apps. -- Marcelo

Re: Error when running SparkPi on Secure HA Hadoop cluster

2015-01-15 Thread Marcelo Vanzin

You're specifying the queue in the spark-submit command line: --queue thequeue Are you sure that queue exists? On Thu, Jan 15, 2015 at 11:23 AM, Manoj Samel manojsamelt...@gmail.com wrote: Hi, Setup is as follows Hadoop Cluster 2.3.0 (CDH5.0) - Namenode HA - Resource manager HA -

Re: spark java options

2015-01-16 Thread Marcelo Vanzin

Hi Kane, What's the complete command line you're using to submit the app? Where to you expect these options to appear? On Fri, Jan 16, 2015 at 11:12 AM, Kane Kim kane.ist...@gmail.com wrote: I want to add some java options when submitting application: --conf

Re: spark java options

2015-01-16 Thread Marcelo Vanzin

Hi Kane, Here's the command line you sent me privately: ./spark-1.2.0-bin-hadoop2.4/bin/spark-submit --class SimpleApp --conf spark.executor.extraJavaOptions=-XX:+UnlockCommercialFeatures -XX:+FlightRecorder --master local simpleapp.jar ./test.log You're running the app in local mode. In that

Re: Discourse: A proposed alternative to the Spark User list

2015-01-22 Thread Marcelo Vanzin

On Thu, Jan 22, 2015 at 10:21 AM, Sean Owen so...@cloudera.com wrote: I think a Spark site would have a lot less traffic. One annoyance is that people can't figure out when to post on SO vs Data Science vs Cross Validated. Another is that a lot of the discussions we see on the Spark users list

Re: Class loading issue, spark.files.userClassPathFirst doesn't seem to be working

2015-02-18 Thread Marcelo Vanzin

Hello, On Tue, Feb 17, 2015 at 8:53 PM, dgoldenberg dgoldenberg...@gmail.com wrote: I've tried setting spark.files.userClassPathFirst to true in SparkConf in my program, also setting it to true in $SPARK-HOME/conf/spark-defaults.conf as Is the code in question running on the driver or in some

Re: Spark on Windows 2008 R2 serv er does not work

2015-01-28 Thread Marcelo Vanzin

https://issues.apache.org/jira/browse/SPARK-2356 Take a look through the comments, there are some workarounds listed there. On Wed, Jan 28, 2015 at 1:40 PM, Wang, Ningjun (LNG-NPV) ningjun.w...@lexisnexis.com wrote: Has anybody successfully install and run spark-1.2.0 on windows 2008 R2 or

Re: Spark SQL - Unable to use Hive UDF because of ClassNotFoundException

2015-01-30 Thread Marcelo Vanzin

Hi Capitão, Since you're using CDH, your question is probably more appropriate for the cdh-u...@cloudera.org list. The problem you're seeing is most probably an artifact of the way CDH is currently packaged. You have to add Hive jars manually to you Spark app's classpath if you want to use the

Re: different akka versions and spark

2015-01-05 Thread Marcelo Vanzin

Spark doesn't really shade akka; it pulls a different build (kept under the org.spark-project.akka group and, I assume, with some build-time differences from upstream akka?), but all classes are still in the original location. The upgrade is a little more unfortunate than just changing akka,

Re: /tmp directory fills up

2015-01-12 Thread Marcelo Vanzin

Hi Alessandro, You can look for a log line like this in your driver's output: 15/01/12 10:51:01 INFO storage.DiskBlockManager: Created local directory at /data/yarn/nm/usercache/systest/appcache/application_1421081007635_0002/spark-local-20150112105101-4f3d If you're deploying your application

Re: How does unmanaged memory work with the executor memory limits?

2015-01-12 Thread Marcelo Vanzin

Short answer: yes. Take a look at: http://spark.apache.org/docs/latest/running-on-yarn.html Look for memoryOverhead. On Mon, Jan 12, 2015 at 2:06 PM, Michael Albert m_albert...@yahoo.com.invalid wrote: Greetings! My executors apparently are being terminated because they are running beyond

Re: Running spark 1.2 on Hadoop + Kerberos

2015-01-08 Thread Marcelo Vanzin

Hi Manoj, As long as you're logged in (i.e. you've run kinit), everything should just work. You can run klist to make sure you're logged in. On Thu, Jan 8, 2015 at 3:49 PM, Manoj Samel manojsamelt...@gmail.com wrote: Hi, For running spark 1.2 on Hadoop cluster with Kerberos, what spark

Re: correct/best way to install custom spark1.2 on cdh5.3.0?

2015-01-08 Thread Marcelo Vanzin

I ran this with CDH 5.2 without a problem (sorry don't have 5.3 readily available at the moment): $ HBASE='/opt/cloudera/parcels/CDH/lib/hbase/\*' $ spark-submit --driver-class-path $HBASE --conf spark.executor.extraClassPath=$HBASE --master yarn --class org.apache.spark.examples.HBaseTest

Re: Running spark 1.2 on Hadoop + Kerberos

2015-01-08 Thread Marcelo Vanzin

On Thu, Jan 8, 2015 at 4:09 PM, Manoj Samel manojsamelt...@gmail.com wrote: Some old communication (Oct 14) says Spark is not certified with Kerberos. Can someone comment on this aspect ? Spark standalone doesn't support kerberos. Spark running on top of Yarn works fine with kerberos. --

Re: correct/best way to install custom spark1.2 on cdh5.3.0?

2015-01-08 Thread Marcelo Vanzin

On Thu, Jan 8, 2015 at 3:33 PM, freedafeng freedaf...@yahoo.com wrote: I installed the custom as a standalone mode as normal. The master and slaves started successfully. However, I got error when I ran a job. It seems to me from the error message the some library was compiled against hadoop1,

Re: SparkSQL

2015-01-08 Thread Marcelo Vanzin

Disclaimer: this seems more of a CDH question, I'd suggest sending these to the CDH mailing list in the future. CDH 5.2 actually has Spark 1.1. It comes with SparkSQL built-in, but it does not include the thrift server because of incompatibilities with the CDH version of Hive. To use Hive

Re: correct/best way to install custom spark1.2 on cdh5.3.0?

2015-01-08 Thread Marcelo Vanzin

Disclaimer: CDH questions are better handled at cdh-us...@cloudera.org. But the question I'd like to ask is: why do you need your own Spark build? What's wrong with CDH's Spark that it doesn't work for you? On Thu, Jan 8, 2015 at 3:01 PM, freedafeng freedaf...@yahoo.com wrote: Could anyone come

Re: Will Spark serialize an entire Object or just the method referred in an object?

2015-02-10 Thread Marcelo Vanzin

the func1 and func2 from jars that are already cached into local nodes? Thanks, Yitong 2015-02-09 14:35 GMT-08:00 Marcelo Vanzin van...@cloudera.com: `func1` and `func2` never get serialized. They must exist on the other end in the form of a class loaded by the JVM. What gets serialized

Re: Will Spark serialize an entire Object or just the method referred in an object?

2015-02-09 Thread Marcelo Vanzin

`func1` and `func2` never get serialized. They must exist on the other end in the form of a class loaded by the JVM. What gets serialized is an instance of a particular closure (the argument to your map function). That's a separate class. The instance of that class that is serialized contains

Re: Spark History Server can't read event logs

2015-01-08 Thread Marcelo Vanzin

and the user that runs Spark in our case is a unix ID called mapr (in the mapr group). Therefore, this can't read my job event logs as shown above. Thanks, Michael -Original Message- From: Marcelo Vanzin [mailto:van...@cloudera.com] Sent: 07 January 2015 18:10 To: England, Michael (IT/UK

Re: Spark History Server can't read event logs

2015-01-08 Thread Marcelo Vanzin

Nevermind my last e-mail. HDFS complains about not understanding 3777... On Thu, Jan 8, 2015 at 9:46 AM, Marcelo Vanzin van...@cloudera.com wrote: Hmm. Can you set the permissions of /apps/spark/historyserver/logs to 3777? I'm not sure HDFS respects the group id bit, but it's worth a try. (BTW

Re: spark 1.1 got error when working with cdh5.3.0 standalone mode

2015-01-07 Thread Marcelo Vanzin

This could be cause by many things including wrong configuration. Hard to tell with just the info you provided. Is there any reason why you want to use your own Spark instead of the one shipped with CDH? CDH 5.3 has Spark 1.2, so unless you really need to run Spark 1.1, you should be better off

Re: spark-network-yarn 2.11 depends on spark-network-shuffle 2.10

2015-01-07 Thread Marcelo Vanzin

This particular case shouldn't cause problems since both of those libraries are java-only (the scala version appended there is just for helping the build scripts). But it does look weird, so it would be nice to fix it. On Wed, Jan 7, 2015 at 12:25 AM, Aniket Bhatnagar aniket.bhatna...@gmail.com

Re: Spark History Server can't read event logs

2015-01-07 Thread Marcelo Vanzin

The Spark code generates the log directory with 770 permissions. On top of that you need to make sure of two things: - all directories up to /apps/spark/historyserver/logs/ are readable by the user running the history server - the user running the history server belongs to the group that owns

Re: Spark History Server can't read event logs

2015-01-08 Thread Marcelo Vanzin

Sorry for the noise; but I just remembered you're actually using MapR (and not HDFS), so maybe the 3777 trick could work... On Thu, Jan 8, 2015 at 10:32 AM, Marcelo Vanzin van...@cloudera.com wrote: Nevermind my last e-mail. HDFS complains about not understanding 3777... On Thu, Jan 8, 2015

Re: SPARKonYARN failing on CDH 5.3.0 : container cannot be fetched because of NumberFormatException

2015-01-08 Thread Marcelo Vanzin

Just to add to Sandy's comment, check your client configuration (generally in /etc/spark/conf). If you're using CM, you may need to run the Deploy Client Configuration command on the cluster to update the configs to match the new version of CDH. On Thu, Jan 8, 2015 at 11:38 AM, Sandy Ryza

Re: issue Running Spark Job on Yarn Cluster

2015-02-19 Thread Marcelo Vanzin

You'll need to look at your application's logs. You can use yarn logs --applicationId [id] to see them. On Wed, Feb 18, 2015 at 2:39 AM, sachin Singh sachin.sha...@gmail.com wrote: Hi, I want to run my spark Job in Hadoop yarn Cluster mode, I am using below command - spark-submit --master

Re: Spark Job History Server

2015-03-18 Thread Marcelo Vanzin

Those classes are not part of standard Spark. You may want to contact Hortonworks directly if they're suggesting you use those. On Wed, Mar 18, 2015 at 3:30 AM, patcharee patcharee.thong...@uni.no wrote: Hi, I am using spark 1.3. I would like to use Spark Job History Server. I added the

Re: InvalidAuxServiceException in dynamicAllocation

2015-03-17 Thread Marcelo Vanzin

I assume you're running YARN given the exception. I don't know if this is covered in the documentation (I took a quick look at the config document and didn't see references to it), but you need to configure Spark's external shuffle service as and auxiliary nodemanager service in your YARN

Re: WebUI on yarn through ssh tunnel affected by AmIpfilter

2015-03-20 Thread Marcelo Vanzin

Instead of opening a tunnel to the Spark web ui port, could you open a tunnel to the YARN RM web ui instead? That should allow you to navigate to the Spark application's web ui through the RM proxy, and hopefully that will work better. On Fri, Feb 6, 2015 at 9:08 PM, yangqch

Re: spark there is no space on the disk

2015-03-19 Thread Marcelo Vanzin

IIRC you have to set that configuration on the Worker processes (for standalone). The app can't override it (only for a client-mode driver). YARN has a similar configuration, but I don't know the name (shouldn't be hard to find, though). On Thu, Mar 19, 2015 at 11:56 AM, Davies Liu

Re: HiveContext test, Spark Context did not initialize after waiting 10000ms

2015-03-06 Thread Marcelo Vanzin

On Fri, Mar 6, 2015 at 2:47 PM, nitinkak001 nitinkak...@gmail.com wrote: I am trying to run a Hive query from Spark using HiveContext. Here is the code / val conf = new SparkConf().setAppName(HiveSparkIntegrationTest) conf.set(spark.executor.extraClassPath,

Re: Spark Build with Hadoop 2.6, yarn - encounter java.lang.NoClassDefFoundError: org/codehaus/jackson/map/deser/std/StdDeserializer

2015-03-05 Thread Marcelo Vanzin

It seems from the excerpt below that your cluster is set up to use the Yarn ATS, and the code is failing in that path. I think you'll need to apply the following patch to your Spark sources if you want this to work: https://github.com/apache/spark/pull/3938 On Thu, Mar 5, 2015 at 10:04 AM, Todd

Re: How do I alter the combination of keys that exit the Spark shell?

2015-03-13 Thread Marcelo Vanzin

+ ALT + V for copying commands in the shell) and that results in closing my shell. In order to solve this I was wondering if I just deactivating the CTRL + C combination at all! Any ideas? // Adamantios On Fri, Mar 13, 2015 at 7:37 PM, Marcelo Vanzin van...@cloudera.com wrote: You can type

Re: How do I alter the combination of keys that exit the Spark shell?

2015-03-13 Thread Marcelo Vanzin

commands in the shell) and that results in closing my shell. In order to solve this I was wondering if I just deactivating the CTRL + C combination at all! Any ideas? // Adamantios On Fri, Mar 13, 2015 at 7:37 PM, Marcelo Vanzin van...@cloudera.com wrote: You can type :quit. On Fri

Re: How do I alter the combination of keys that exit the Spark shell?

2015-03-13 Thread Marcelo Vanzin

You can type :quit. On Fri, Mar 13, 2015 at 10:29 AM, Adamantios Corais adamantios.cor...@gmail.com wrote: Hi, I want change the default combination of keys that exit the Spark shell (i.e. CTRL + C) to something else, such as CTRL + H? Thank you in advance. // Adamantios -- Marcelo

Re: Building Spark 1.3 for Scala 2.11 using Maven

2015-03-05 Thread Marcelo Vanzin

I've never tried it, but I'm pretty sure in the very least you want -Pscala-2.11 (not -D). On Thu, Mar 5, 2015 at 4:46 PM, Night Wolf nightwolf...@gmail.com wrote: Hey guys, Trying to build Spark 1.3 for Scala 2.11. I'm running with the folllowng Maven command; -DskipTests -Dscala-2.11

Re: Building Spark 1.3 for Scala 2.11 using Maven

2015-03-05 Thread Marcelo Vanzin

Ah, and you may have to use dev/change-version-to-2.11.sh. (Again, never tried compiling with scala 2.11.) On Thu, Mar 5, 2015 at 4:52 PM, Marcelo Vanzin van...@cloudera.com wrote: I've never tried it, but I'm pretty sure in the very least you want -Pscala-2.11 (not -D). On Thu, Mar 5, 2015

Re: Does HiveContext connect to HiveServer2?

2015-03-24 Thread Marcelo Vanzin

spark-submit --files /path/to/hive-site.xml On Tue, Mar 24, 2015 at 10:31 AM, Udit Mehta ume...@groupon.com wrote: Another question related to this, how can we propagate the hive-site.xml to all workers when running in the yarn cluster mode? On Tue, Mar 24, 2015 at 10:09 AM, Marcelo Vanzin

Re: Does HiveContext connect to HiveServer2?

2015-03-24 Thread Marcelo Vanzin

It does neither. If you provide a Hive configuration to Spark, HiveContext will connect to your metastore server, otherwise it will create its own metastore in the working directory (IIRC). On Tue, Mar 24, 2015 at 8:58 AM, nitinkak001 nitinkak...@gmail.com wrote: I am wondering if HiveContext

Re: Is it possible to use json4s 3.2.11 with Spark 1.3.0?

2015-03-24 Thread Marcelo Vanzin

/spark-submit --class App1 --conf spark.driver.userClassPathFirst=true --conf spark.executor.userClassPathFirst=true $HOME/projects/sparkapp/target/scala-2.10/sparkapp-assembly-1.0.jar Thanks, Alexey On Tue, Mar 24, 2015 at 5:03 AM, Marcelo Vanzin van...@cloudera.com wrote: You could build

Re: Spark shell never leaves ACCEPTED state in YARN CDH5

2015-03-25 Thread Marcelo Vanzin

The probably means there are not enough free resources in your cluster to run the AM for the Spark job. Check your RM's web ui to see the resources you have available. On Wed, Mar 25, 2015 at 12:08 PM, Khandeshi, Ami ami.khande...@fmr.com.invalid wrote: I am seeing the same behavior. I have

Re: Spark-events does not exist error, while it does with all the req. rights

2015-03-30 Thread Marcelo Vanzin

Are those config values in spark-defaults.conf? I don't think you can use ~ there - IIRC it does not do any kind of variable expansion. On Mon, Mar 30, 2015 at 3:50 PM, Tom thubregt...@gmail.com wrote: I have set spark.eventLog.enabled true as I try to preserve log files. When I run, I get

Re: Spark 1.3.0 Build Failure

2015-03-30 Thread Marcelo Vanzin

This sounds like SPARK-6532. On Mon, Mar 30, 2015 at 1:34 PM, ARose ashley.r...@telarix.com wrote: So, I am trying to build Spark 1.3.0 (standalone mode) on Windows 7 using Maven, but I'm getting a build failure. java -version java version 1.8.0_31 Java(TM) SE Runtime Environment (build

Re: Spark-events does not exist error, while it does with all the req. rights

2015-03-30 Thread Marcelo Vanzin

and spark-env: Log directory /home/hduser/spark/spark-events does not exist. (Also, in the default /tmp/spark-events it also did not work) On 30 March 2015 at 18:03, Marcelo Vanzin van...@cloudera.com wrote: Are those config values in spark-defaults.conf? I don't think you can use ~ there - IIRC

Re: Spark-events does not exist error, while it does with all the req. rights

2015-03-30 Thread Marcelo Vanzin

So, the error below is still showing the invalid configuration. You mentioned in the other e-mails that you also changed the configuration, and that the directory really, really exists. Given the exception below, the only ways you'd get the error with a valid configuration would be if (i) the

Re: Spark-events does not exist error, while it does with all the req. rights

2015-03-31 Thread Marcelo Vanzin

a text file, closed it an viewed it, and deleted it (iii). My findings were reconfirmed by my colleague. Any other ideas? Thanks, Tom On 30 March 2015 at 19:19, Marcelo Vanzin van...@cloudera.com wrote: So, the error below is still showing the invalid configuration. You mentioned

Re: Spark UI and running spark-submit with --master yarn

2015-03-02 Thread Marcelo Vanzin

? Thanks a lot for the help -AJ On Mon, Mar 2, 2015 at 3:50 PM, Marcelo Vanzin van...@cloudera.com wrote: What are you calling masternode? In yarn-cluster mode, the driver is running somewhere in your cluster, not on the machine where you run spark-submit. The easiest way to get to the Spark UI

Re: Spark UI and running spark-submit with --master yarn

2015-03-02 Thread Marcelo Vanzin

What are you calling masternode? In yarn-cluster mode, the driver is running somewhere in your cluster, not on the machine where you run spark-submit. The easiest way to get to the Spark UI when using Yarn is to use the Yarn RM's web UI. That will give you a link to the application's UI

Re: Spark UI and running spark-submit with --master yarn

2015-03-02 Thread Marcelo Vanzin

.compute.amazonaws.com:9026 shows me all the applications. Do I have to do anything for the port 8088 or whatever I am seeing at 9026 port is good .Attached is screenshot . Thanks AJ On Mon, Mar 2, 2015 at 4:24 PM, Marcelo Vanzin van...@cloudera.com wrote: That's the RM's RPC port, not the web UI port

Re: Is SPARK_CLASSPATH really deprecated?

2015-03-02 Thread Marcelo Vanzin

. -- Kannan On Thu, Feb 26, 2015 at 6:08 PM, Marcelo Vanzin van...@cloudera.com wrote: On Thu, Feb 26, 2015 at 5:12 PM, Kannan Rajah kra...@maprtech.com wrote: Also, I would like to know if there is a localization overhead when we use spark.executor.extraClassPath. Again, in the case

Re: Upgrade to Spark 1.2.1 using Guava

2015-02-27 Thread Marcelo Vanzin

(URLClassLoader.java:355) ... On Feb 25, 2015, at 5:24 PM, Marcelo Vanzin van...@cloudera.com wrote: Guava is not in Spark. (Well, long version: it's in Spark but it's relocated to a different package except for some special classes leaked through the public API.) If your app needs

Re: Upgrade to Spark 1.2.1 using Guava

2015-02-27 Thread Marcelo Vanzin

On Fri, Feb 27, 2015 at 1:30 PM, Pat Ferrel p...@occamsmachete.com wrote: @Marcelo do you mean by modifying spark.executor.extraClassPath on all workers, that didn’t seem to work? That's an app configuration, not a worker configuration, so if you're trying to set it on the worker configuration

Re: Upgrade to Spark 1.2.1 using Guava

2015-02-27 Thread Marcelo Vanzin

On Fri, Feb 27, 2015 at 1:42 PM, Pat Ferrel p...@occamsmachete.com wrote: I changed in the spark master conf, which is also the only worker. I added a path to the jar that has guava in it. Still can’t find the class. Sorry, I'm still confused about what config you're changing. I'm suggesting

Re: Spark Monitoring UI for Hadoop Yarn Cluster

2015-03-03 Thread Marcelo Vanzin

Spark applications shown in the RM's UI should have an Application Master link when they're running. That takes you to the Spark UI for that application where you can see all the information you're looking for. If you're running a history server and add spark.yarn.historyServer.address to your

Re: Spark Monitoring UI for Hadoop Yarn Cluster

2015-03-04 Thread Marcelo Vanzin

On Wed, Mar 4, 2015 at 10:08 AM, Srini Karri skarri@gmail.com wrote: spark.executor.extraClassPath D:\\Apache\\spark-1.2.1-bin-hadoop2\\spark-1.2.1-bin-hadoop2.4\\bin\\classes spark.eventLog.dir D:/Apache/spark-1.2.1-bin-hadoop2/spark-1.2.1-bin-hadoop2.4/bin/tmp/spark-events

Re: Issues with maven dependencies for version 1.2.0 but not version 1.1.0

2015-03-04 Thread Marcelo Vanzin

, Mar 4, 2015 at 4:10 PM, Marcelo Vanzin van...@cloudera.com wrote: Seems like someone set up m2.mines.com as a mirror in your pom file or ~/.m2/settings.xml, and it doesn't mirror Spark 1.2 (or does but is in a messed up state). On Wed, Mar 4, 2015 at 3:49 PM, kpeng1 kpe...@gmail.com wrote

Re: Issues with maven dependencies for version 1.2.0 but not version 1.1.0

2015-03-04 Thread Marcelo Vanzin

Seems like someone set up m2.mines.com as a mirror in your pom file or ~/.m2/settings.xml, and it doesn't mirror Spark 1.2 (or does but is in a messed up state). On Wed, Mar 4, 2015 at 3:49 PM, kpeng1 kpe...@gmail.com wrote: Hi All, I am currently having problem with the maven dependencies for

Re: ImportError: No module named iter ... (on CDH5 v1.2.0+cdh5.3.2+369-1.cdh5.3.2.p0.17.el6.noarch) ...

2015-03-03 Thread Marcelo Vanzin

Weird python errors like this generally mean you have different versions of python in the nodes of your cluster. Can you check that? On Tue, Mar 3, 2015 at 4:21 PM, subscripti...@prismalytics.io subscripti...@prismalytics.io wrote: Hi Friends: We noticed the following in 'pyspark' happens when

Re: Spark excludes fastutil dependencies we need

2015-02-26 Thread Marcelo Vanzin

On Wed, Feb 25, 2015 at 8:42 PM, Jim Kleckner j...@cloudphysics.com wrote: So, should the userClassPathFirst flag work and there is a bug? Sorry for jumping in the middle of conversation (and probably missing some of it), but note that this option applies only to executors. If you're trying to

Re: output worker stdout to one place

2015-02-20 Thread Marcelo Vanzin

Hi Anny, You could play with creating your own log4j.properties that will write the output somewhere else (e.g. to some remote mount, or remote syslog). Sorry, but I don't have an example handy. Alternatively, if you can use Yarn, it will collect all logs after the job is finished and make them

Re: Is SPARK_CLASSPATH really deprecated?

2015-02-26 Thread Marcelo Vanzin

SPARK_CLASSPATH is definitely deprecated, but my understanding is that spark.executor.extraClassPath is not, so maybe the documentation needs fixing. I'll let someone who might know otherwise comment, though. On Thu, Feb 26, 2015 at 2:43 PM, Kannan Rajah kra...@maprtech.com wrote:

Re: Error: no snappyjava in java.library.path

2015-02-26 Thread Marcelo Vanzin

Hi Dan, This is a CDH issue, so I'd recommend using cdh-u...@cloudera.org for those questions. This is an issue with fixed in recent CM 5.3 updates; if you're not using CM, or want a workaround, you can manually configure spark.driver.extraLibraryPath and spark.executor.extraLibraryPath to

Re: Is SPARK_CLASSPATH really deprecated?

2015-02-26 Thread Marcelo Vanzin

On Thu, Feb 26, 2015 at 5:12 PM, Kannan Rajah kra...@maprtech.com wrote: Also, I would like to know if there is a localization overhead when we use spark.executor.extraClassPath. Again, in the case of hbase, these jars would be typically available on all nodes. So there is no need to localize

Re: Spark History Server : jobs link doesn't open

2015-03-26 Thread Marcelo Vanzin

bcc: user@, cc: cdh-user@ I recommend using CDH's mailing list whenever you have a problem with CDH. That being said, you haven't provided enough info to debug the problem. Since you're using CM, you can easily go look at the History Server's logs and see what the underlying error is. On Thu,

Re: Invalid ContainerId ... Caused by: java.lang.NumberFormatException: For input string: e04

2015-03-24 Thread Marcelo Vanzin

Hi there, On Tue, Mar 24, 2015 at 1:40 PM, Manoj Samel manojsamelt...@gmail.com wrote: When I run any query, it gives java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode; Are you running a custom-compiled Spark by any chance?

Re: FAILED SelectChannelConnector@0.0.0.0:4040 java.net.BindException: Address already in use

2015-03-24 Thread Marcelo Vanzin

Does your application actually fail? That message just means there's another application listening on that port. Spark should try to bind to a different one after that and keep going. On Tue, Mar 24, 2015 at 12:43 PM, , Roy rp...@njit.edu wrote: I get following message for each time I run spark

Re: Using a different spark jars than the one on the cluster

2015-03-18 Thread Marcelo Vanzin

Since you're using YARN, you should be able to download a Spark 1.3.0 tarball from Spark's website and use spark-submit from that installation to launch your app against the YARN cluster. So effectively you would have 1.2.0 and 1.3.0 side-by-side in your cluster. On Wed, Mar 18, 2015 at 11:09

Re: Spark 1.3 Dynamic Allocation - Requesting 0 new executor(s) because tasks are backlogged

2015-03-23 Thread Marcelo Vanzin

On Mon, Mar 23, 2015 at 2:15 PM, Manoj Samel manojsamelt...@gmail.com wrote: Found the issue above error - the setting for spark_shuffle was incomplete. Now it is able to ask and get additional executors. The issue is once they are released, it is not able to proceed with next query. That

Re: Is it possible to use json4s 3.2.11 with Spark 1.3.0?

2015-03-23 Thread Marcelo Vanzin

You could build a far jar for your application containing both your code and the json4s library, and then run Spark with these two options: spark.driver.userClassPathFirst=true spark.executor.userClassPathFirst=true Both only work in 1.3. (1.2 has spark.files.userClassPathFirst, but that

Re: Invalid ContainerId ... Caused by: java.lang.NumberFormatException: For input string: e04

2015-03-23 Thread Marcelo Vanzin

This happens most probably because the Spark 1.3 you have downloaded is built against an older version of the Hadoop libraries than those used by CDH, and those libraries cannot parse the container IDs generated by CDH. You can try to work around this by manually adding CDH jars to the front of

Re: Spark-events does not exist error, while it does with all the req. rights

2015-04-02 Thread Marcelo Vanzin

FYI I wrote a small test to try to reproduce this, and filed SPARK-6688 to track the fix. On Tue, Mar 31, 2015 at 1:15 PM, Marcelo Vanzin van...@cloudera.com wrote: Hmmm... could you try to set the log dir to file:/home/hduser/spark/spark-events? I checked the code and it might be the case

Re: How to debug Spark on Yarn?

2015-04-24 Thread Marcelo Vanzin

On top of what's been said... On Wed, Apr 22, 2015 at 10:48 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: 1) I can go to Spark UI and see the status of the APP but cannot see the logs as the job progresses. How can i see logs of executors as they progress ? Spark 1.3 should have links to the

Re: Spark SQL - Setting YARN Classpath for primordial class loader

2015-04-23 Thread Marcelo Vanzin

You'd have to use spark.{driver,executor}.extraClassPath to modify the system class loader. But that also means you have to manually distribute the jar to the nodes in your cluster, into a common location. On Thu, Apr 23, 2015 at 6:38 PM, Night Wolf nightwolf...@gmail.com wrote: Hi guys,

Re: Spark SQL - Setting YARN Classpath for primordial class loader

2015-04-23 Thread Marcelo Vanzin

No, those have to be local paths. On Thu, Apr 23, 2015 at 6:53 PM, Night Wolf nightwolf...@gmail.com wrote: Thanks Marcelo, can this be a path on HDFS? On Fri, Apr 24, 2015 at 11:52 AM, Marcelo Vanzin van...@cloudera.com wrote: You'd have to use spark.{driver,executor}.extraClassPath

< 1 2 3 4 5 >

101 - 200 of 482 matches

Mail list logo