RE: Enable Parsing Failed or Incompleted jobs on HistoryServer (YARN mode)

2014-07-07 Thread Andrew Lee
in the history server faster. Haven't reliably tested this though. May just be a coincidence of timing. -Suren On Wed, Jul 2, 2014 at 8:01 PM, Andrew Lee wrote: Hi All, I have HistoryServer up and running, and it is great. Is it possible to also enable HsitoryServer to parse failed jo

RE: Spark logging strategy on YARN

2014-07-07 Thread Andrew Lee
Hi Kudryavtsev, Here's what I am doing as a common practice and reference, I don't want to say it is best practice since it requires a lot of customer experience and feedback, but from a development and operating stand point, it will be great to separate the YARN container logs with the Spark lo

Seattle Spark Meetup slides: xPatterns, Fun Things, and Machine Learning Streams - next is Interactive OLAP

2014-07-07 Thread Denny Lee
Apologies for the delay but we’ve had a bunch of great slides and sessions at Seattle Spark Meetup this past couple of months including Claudiu Barbura’s "xPatterns on Spark, Shark, Mesos, and Tachyon"; Paco Nathan’s "Fun Things You Can Do with Spark 1.0”, and "Machine Learning Streams with Spar

spark-1.0.0-rc11 2f1dc868 spark-shell not honoring --properties-file option?

2014-07-08 Thread Andrew Lee
Build: Spark 1.0.0 rc11 (git commit tag: 2f1dc868e5714882cf40d2633fb66772baf34789) Hi All, When I enabled the spark-defaults.conf for the eventLog, spark-shell broke while spark-submit works. I'm trying to create a separate directory per user to keep track with their own Spark job event

RE: SPARK_CLASSPATH Warning

2014-07-11 Thread Andrew Lee
As mentioned, deprecated in Spark 1.0+. Try to use the --driver-class-path: ./bin/spark-shell --driver-class-path yourlib.jar:abc.jar:xyz.jar Don't use glob *, specify the JAR one by one with colon. Date: Wed, 9 Jul 2014 13:45:07 -0700 From: kat...@cs.pitt.edu Subject: SPARK_CLASSPATH Warning To

RE: spark-1.0.0-rc11 2f1dc868 spark-shell not honoring --properties-file option?

2014-07-11 Thread Andrew Lee
Ok, I found it on JIRA SPARK-2390: https://issues.apache.org/jira/browse/SPARK-2390 So it looks like this is a known issue. From: alee...@hotmail.com To: user@spark.apache.org Subject: spark-1.0.0-rc11 2f1dc868 spark-shell not honoring --properties-file option? Date: Tue, 8 Jul 2014 15:17:00 -070

Seattle Spark Meetup: Evan Chan's Interactive OLAP Queries with Spark and Cassandra

2014-07-17 Thread Denny Lee
We had a great Seattle Spark Meetup session with Evan Chan presenting his  Interactive OLAP Queries with Spark and Cassandra.  You can find his awesome presentation at: http://www.slideshare.net/EvanChan2/2014-07olapcassspark. Enjoy!

SeattleSparkMeetup: Spark at eBay - Troubleshooting the everyday issues

2014-07-18 Thread Denny Lee
We're coming off a great Seattle Spark Meetup session with Evan Chan (@evanfchan) Interactive OLAP Queries with @ApacheSpark and #Cassandra  (http://www.slideshare.net/EvanChan2/2014-07olapcassspark) at Whitepages.  Now, we're proud to announce that our next session is Spark at eBay - Troublesho

RE: Hive From Spark

2014-07-21 Thread Andrew Lee
Hi All, Currently, if you are running Spark HiveContext API with Hive 0.12, it won't work due to the following 2 libraries which are not consistent with Hive 0.12 and Hadoop as well. (Hive libs aligns with Hadoop libs, and as a common practice, they should be consistent to work inter-operable).

RE: Hive From Spark

2014-07-22 Thread Andrew Lee
> problems in theory, and you show it causes a problem in practice. Not > to mention it causes issues for Hive-on-Spark now. > > On Mon, Jul 21, 2014 at 6:27 PM, Andrew Lee wrote: > > Hive and Hadoop are using an older version of guava libraries (11.0.1) where > >

akka 2.3.x?

2014-07-23 Thread Lee Mighdoll
using the spark-cassandra-connector rather than the hadoop back end? Cheers, Lee

RE: Spark SQL and Hive tables

2014-07-25 Thread Andrew Lee
Hi Michael, If I understand correctly, the assembly JAR file is deployed onto HDFS /user/$USER/.stagingSpark folders that will be used by all computing (worker) nodes when people run in yarn-cluster mode. Could you elaborate more what does the document mean by this? It is a bit misleading and I

RE: Need help, got java.lang.ExceptionInInitializerError in Yarn-Client/Cluster mode

2014-07-25 Thread Andrew Lee
Hi Jianshi, Could you provide which HBase version you're using? By the way, a quick sanity check on whether the Workers can access HBase? Were you able to manually write one record to HBase with the serialize function? Hardcode and test it ? From: jianshi.hu...@gmail.com Date: Fri, 25 Jul 2014 15

Issues on spark-shell and spark-submit behave differently on spark-defaults.conf parameter spark.eventLog.dir

2014-07-28 Thread Andrew Lee
Hi All, Not sure if anyone has ran into this problem, but this exist in spark 1.0.0 when you specify the location in conf/spark-defaults.conf for spark.eventLog.dir hdfs:///user/$USER/spark/logs to use the $USER env variable. For example, I'm running the command with user 'test'. In spark-submit,

RE: Issues on spark-shell and spark-submit behave differently on spark-defaults.conf parameter spark.eventLog.dir

2014-07-28 Thread Andrew Lee
n the path you provide to spark.eventLog.dir. -Andrew 2014-07-28 12:40 GMT-07:00 Andrew Lee : Hi All, Not sure if anyone has ran into this problem, but this exist in spark 1.0.0 when you specify the location in conf/spark-defaults.conf for spark.eventLog.dir hdfs:///user/$USER/spark/logs to u

RE: Need help, got java.lang.ExceptionInInitializerError in Yarn-Client/Cluster mode

2014-07-28 Thread Andrew Lee
e files so it got that exception. I appended the resource files explicitly to --jars option and it worked fine. The "Caused by..." messages were found in yarn logs actually, I think it might be useful if I can seem them from the console which runs spark-submit. Would that be po

Re: HiveContext is creating metastore warehouse locally instead of in hdfs

2014-07-31 Thread Andrew Lee
Hi All, It has been awhile, but what I did to make it work is to make sure the followings: 1. Hive is working when you run Hive CLI and JDBC via Hiveserver2 2. Make sure you have the hive-site.xml from above Hive configuration. The problem here is that you want the hive-site.xml from the Hive

Re: Spark Deployment Patterns - Automated Deployment & Performance Testing

2014-07-31 Thread Andrew Lee
You should be able to use either SBT or maven to create your JAR files (not a fat jar), and only deploying the JAR for spark-submit. 1. Sync spark libs and versions with your development env and CLASSPATH in your IDE (unfortunately this needs to be hard copied, and may result in split-brain syn

Re: HiveContext is creating metastore warehouse locally instead of in hdfs

2014-07-31 Thread Andrew Lee
ring Hive tables by using SET command. For example: >> >> hiveContext.hql("SET >> hive.metastore.warehouse.dir=hdfs://localhost:54310/user/hive/warehouse") >> >> >> >> >> On Thu, Jul 31, 2014 at 8:05 AM, Andrew Lee < > >> alee526@

RE: Spark SQL, Parquet and Impala

2014-08-02 Thread Andrew Lee
Hi Patrick, In Impala 131, when you update tables and metadata, do you still need to run 'invalidate metadata' in impala-shell? My understanding is that it is a pull architecture to refresh the metastore on the catalogd in Impala, not sure if this still applies to this case since you are updatin

Seattle Spark Meetup: Spark at eBay - Troubleshooting the everyday issues Slides

2014-08-14 Thread Denny Lee
For those whom were not able to attend the Seattle Spark Meetup - Spark at eBay - Troubleshooting the Everyday Issues, the slides have been now posted at:  http://files.meetup.com/12063092/SparkMeetupAugust2014Public.pdf. Enjoy! Denny

Re: Seattle Spark Meetup: Spark at eBay - Troubleshooting the everyday issues Slides

2014-08-15 Thread Denny Lee
Apologies but we had placed the settings for downloading the slides to Seattle Spark Meetup members only - but actually meant to share with everyone.  We have since fixed this and now you can download it.  HTH! On August 14, 2014 at 18:14:35, Denny Lee (denny.g@gmail.com) wrote: For

Spark-job error on writing result into hadoop w/ switch_user=false

2014-08-20 Thread Jongyoul Lee
Hi, I've used hdfs 2.3.0-cdh5.0.1, mesos 0.19.1 and spark 1.0.2 that is re-compiled. For a security reason, we run hdfs and mesos as hdfs, that is an account name and not in a root group, and non-root user submit a spark job on mesos. With no-switch_user, simple job, which only read data from hdf

Re: Spark-job error on writing result into hadoop w/ switch_user=false

2014-08-21 Thread Jongyoul Lee
remove - that file because of permission. Is there a solution? On Thu, Aug 21, 2014 at 2:46 AM, Jongyoul Lee wrote: > Hi, > > I've used hdfs 2.3.0-cdh5.0.1, mesos 0.19.1 and spark 1.0.2 that is > re-compiled. > > For a security reason, we run hdfs and mesos as hdfs

LDA example?

2014-08-21 Thread Denny Lee
Quick question - is there a handy sample / example of how to use the LDA algorithm within Spark MLLib?   Thanks! Denny

RE: Hive From Spark

2014-08-22 Thread Andrew Lee
(false).setMaster("local").setAppName("test data > > exchange with Hive") > > conf.set("spark.driver.host", "localhost") > > val sc = new SparkContext(conf) > > val rdd = sc.makeRDD(Seq(rec)) > > rdd.map((x: MyRe

RE: Hive From Spark

2014-08-25 Thread Andrew Lee
though - might > >be too risky at this point. > > > >I'm not familiar with spark-sql. > > > >On Fri, Aug 22, 2014 at 11:25 AM, Andrew Lee wrote: > >> Hopefully there could be some progress on SPARK-2420. It looks like > >>shading > >> ma

Spark / Thrift / ODBC connectivity

2014-08-28 Thread Denny Lee
I’m currently using the Spark 1.1 branch and have been able to get the Thrift service up and running.  The quick questions were whether I should able to use the Thrift service to connect to SparkSQL generated tables and/or Hive tables?   As well, by any chance do we have any documents that point

SparkSQL HiveContext No Suitable Driver / Cannot Find Driver

2014-08-29 Thread Denny Lee
My issue is similar to the issue as noted  http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201406.mbox/%3ccadoad2ks9_qgeign5-w7xogmrotrlbchvfukctgstj5qp9q...@mail.gmail.com%3E. Currently using Spark-1.1 (grabbed from git two days ago) and using Hive 0.12 with my metastore in MySQL.

Re: SparkSQL HiveContext No Suitable Driver / Cannot Find Driver

2014-08-29 Thread Denny Lee
Oh, forgot to add the managed libraries and the Hive libraries within the CLASSPATH.  As soon as I did that, we’re good to go now. On August 29, 2014 at 22:55:47, Denny Lee (denny.g@gmail.com) wrote: My issue is similar to the issue as noted  http://mail-archives.apache.org/mod_mbox

Re: Spark Hive max key length is 767 bytes

2014-08-30 Thread Denny Lee
Oh, you may be running into an issue with your MySQL setup actually, try running alter database metastore_db character set latin1 so that way Hive (and the Spark HiveContext) can execute properly against the metastore. On August 29, 2014 at 04:39:01, arthur.hk.c...@gmail.com (arthur.hk.c...@g

Starting Thriftserver via hostname on Spark 1.1 RC4?

2014-09-03 Thread Denny Lee
When I start the thrift server (on Spark 1.1 RC4) via: ./sbin/start-thriftserver.sh --master spark://hostname:7077 --driver-class-path $CLASSPATH It appears that the thrift server is starting off of localhost as opposed to hostname.  I have set the spark-env.sh to use the hostname, modified the

Re: Starting Thriftserver via hostname on Spark 1.1 RC4?

2014-09-04 Thread Denny Lee
behavior is inherited from Hive since Spark SQL Thrift server is a variant of HiveServer2. ​ On Wed, Sep 3, 2014 at 10:47 PM, Denny Lee wrote: When I start the thrift server (on Spark 1.1 RC4) via: ./sbin/start-thriftserver.sh --master spark://hostname:7077 --driver-class-path $CLASSPATH It appears

Re: Table not found: using jdbc console to query sparksql hive thriftserver

2014-09-10 Thread Denny Lee
Actually, when registering the table, it is only available within the sc context you are running it in. For Spark 1.1, the method name is changed to RegisterAsTempTable to better reflect that. The Thrift server process runs under a different process meaning that it cannot see any of the tables

RE: Announcing Spark 1.1.0!

2014-09-11 Thread Denny Lee
I’m not sure if I’m completely answering your question here but I’m currently working (on OSX) with Hadoop 2.5 and I used the Spark 1.1 with Hadoop 2.4 without any issues. On September 11, 2014 at 18:11:46, Haopu Wang (hw...@qilinsoft.com) wrote: I see the binary packages include hadoop 1, 2.3

Re: Table not found: using jdbc console to query sparksql hive thriftserver

2014-09-11 Thread Denny Lee
registerTempTable you mentioned works on SqlContext instead of HiveContext. Thanks, Du On 9/10/14, 1:21 PM, "Denny Lee" wrote: >Actually, when registering the table, it is only available within the sc >context you are running it in. For Spark 1.1, the method name

RE: Announcing Spark 1.1.0!

2014-09-11 Thread Denny Lee
, but in Spark 1.1.0, there are separate packages for hadoop 2.3 and 2.4. That implies some difference in Spark according to hadoop version.   From:Denny Lee [mailto:denny.g@gmail.com] Sent: Friday, September 12, 2014 9:35 AM To: user@spark.apache.org; Haopu Wang; d...@spark.apache.org

RE: Announcing Spark 1.1.0!

2014-09-11 Thread Denny Lee
to read from HDFS, you’ll need to build Spark against the specific HDFS version in your environment.”   Did you try to read a hadoop 2.5.0 file using Spark 1.1 with hadoop 2.4?   Thanks!   From:Denny Lee [mailto:denny.g@gmail.com] Sent: Friday, September 12, 2014 10:00 AM To: Patrick

Re: Spark SQL JDBC

2014-09-11 Thread Denny Lee
When you re-ran sbt did you clear out the packages first and ensure that the datanucleus jars were generated within lib_managed? I remembered having to do that when I was working testing out different configs. On Thu, Sep 11, 2014 at 10:50 AM, alexandria1101 < alexandria.shea...@gmail.com> wrote:

Re: Spark SQL Thrift JDBC server deployment for production

2014-09-11 Thread Denny Lee
Could you provide some context about running this in yarn-cluster mode? The Thrift server that's included within Spark 1.1 is based on Hive 0.12. Hive has been able to work against YARN since Hive 0.10. So when you start the thrift server, provided you copied the hive-site.xml over to the Spark co

Re: SchemaRDD and RegisterAsTable

2014-09-17 Thread Denny Lee
The registered table is stored within the spark context itself.  To have the table available for the thrift server to get access to, you can save the sc table into the Hive context so that way the Thrift server process can see the table.  If you are using derby as your metastore, then the thrift

RE: SchemaRDD and RegisterAsTable

2014-09-18 Thread Denny Lee
was that the Thrift Server was just a HIVEQL frontend and the undelying  query execution would be done by SPARK .   Regards Santosh   From: Denny Lee [mailto:denny.g@gmail.com] Sent: Wednesday, September 17, 2014 10:14 PM To: user@spark.apache.org; Addanki, Santosh Kumar Subject: Re

Re: SQL shell for Spark SQL?

2014-09-18 Thread Denny Lee
The CLI is the command line connection to SparkSQL and yes, SparkSQL replaces Shark - there’s a great article by Reynold on the Databricks blog that provides the context:  http://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html As for SparkSQL and

Re: What is a pre built package of Apache Spark

2014-09-24 Thread Denny Lee
This seems similar to a related Windows issue concerning python where pyspark could't find the python because the PYTHONSTARTUP environment wasn't set - by any chance could this be related? On Wed, Sep 24, 2014 at 7:51 PM, christy <760948...@qq.com> wrote: > Hi I have installed standalone on win7

Re: Spark Hive max key length is 767 bytes

2014-09-25 Thread Denny Lee
ql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: > Specified key was too long; max key length is 767 bytes > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > > > > Should I use HIVE 0.12.0 instead of HIVE 0.13.1? > > Regards > Arthur > > On 3

Re: add external jars to spark-shell

2014-10-20 Thread Denny Lee
–jar (ADD_JARS) is a special class loading for Spark while –driver-class-path (SPARK_CLASSPATH) is captured by the startup scripts and appended to classpath settings that is used to start the JVM running the driver You can reference https://www.concur.com/blog/en-us/connect-tableau-to-sparksql on

Re: winutils

2014-10-29 Thread Denny Lee
QQ - did you download the Spark 1.1 binaries that included the Hadoop one? Does this happen if you're using the Spark 1.1 binaries that do not include the Hadoop jars? On Wed, Oct 29, 2014 at 11:31 AM, Ron Ayoub wrote: > Apparently Spark does require Hadoop even if you do not intend to use > Had

Re: Spark + Tableau

2014-10-30 Thread Denny Lee
When you are starting the thrift server service - are you connecting to it locally or is this on a remote server when you use beeline and/or Tableau? On Thu, Oct 30, 2014 at 8:00 AM, Bojan Kostic wrote: > I use beta driver SQL ODBC from Databricks. > > > > -- > View this message in context: > ht

Spark Streaming not working in YARN mode

2014-11-19 Thread kam lee
I created a simple Spark Streaming program - it received numbers and computed averages and sent the results to Kafka. It worked perfectly in local mode as well as standalone master/slave mode across a two-node cluster. It did not work however in yarn-client or yarn-cluster mode. The job was acce

Re: Spark or MR, Scala or Java?

2014-11-22 Thread Denny Lee
extraction job against multiple data sources via Hadoop streaming. Another good call out but utilizing Scala within Spark is that most of the Spark code is written in Scala. On Sat, Nov 22, 2014 at 08:12 Denny Lee wrote: > There are various scenarios where traditional Hadoop makes more sense t

Re: Spark SQL Programming Guide - registerTempTable Error

2014-11-23 Thread Denny Lee
By any chance are you using Spark 1.0.2? registerTempTable was introduced from Spark 1.1+ while for Spark 1.0.2, it would be registerAsTable. On Sun Nov 23 2014 at 10:59:48 AM riginos wrote: > Hi guys , > Im trying to do the Spark SQL Programming Guide but after the: > > case class Person(name:

Re: Spark SQL Programming Guide - registerTempTable Error

2014-11-23 Thread Denny Lee
It sort of depends on your environment. If you are running on your local environment, I would just download the latest Spark 1.1 binaries and you'll be good to go. If its a production environment, it sort of depends on how you are setup (e.g. AWS, Cloudera, etc.) On Sun Nov 23 2014 at 11:27:49 A

Re: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava

2014-11-25 Thread Denny Lee
To determine if this is a Windows vs. other configuration, can you just try to call the Spark-class.cmd SparkSubmit without actually referencing the Hadoop or Thrift server classes? On Tue Nov 25 2014 at 5:42:09 PM Judy Nash wrote: > I traced the code and used the following to call: > > Spark-

Re: spark-submit on YARN is slow

2014-12-05 Thread Denny Lee
My submissions of Spark on YARN (CDH 5.2) resulted in a few thousand steps. If I was running this on standalone cluster mode the query finished in 55s but on YARN, the query was still running 30min later. Would the hard coded sleeps potentially be in play here? On Fri, Dec 5, 2014 at 11:23 Sandy Ry

Re: spark-submit on YARN is slow

2014-12-05 Thread Denny Lee
s were only at startup, so if jobs are taking significantly >>>> longer on YARN, that should be a different problem. When you ran on YARN, >>>> did you use the --executor-cores, --executor-memory, and --num-executors >>>> arguments? When running aga

Re: spark-submit on YARN is slow

2014-12-05 Thread Denny Lee
Okay, my bad for not testing out the documented arguments - once i use the correct ones, the query shrinks completes in ~55s (I can probably make it faster). Thanks for the help, eh?! On Fri Dec 05 2014 at 10:34:50 PM Denny Lee wrote: > Sorry for the delay in my response - for my spark ca

Spark on YARN memory utilization

2014-12-06 Thread Denny Lee
This is perhaps more of a YARN question than a Spark question but i was just curious to how is memory allocated in YARN via the various configurations. For example, if I spin up my cluster with 4GB with a different number of executors as noted below 4GB executor-memory x 10 executors = 46GB (4G

Re: Spark on YARN memory utilization

2014-12-06 Thread Denny Lee
executorMemory. > > When you set executor memory, the yarn resource request is executorMemory > + yarnOverhead. > > - Arun > > On Sat, Dec 6, 2014 at 4:27 PM, Denny Lee wrote: > >> This is perhaps more of a YARN question than a Spark question but i was >> just curious

Re: Spark on YARN memory utilization

2014-12-09 Thread Denny Lee
Thanks Sandy! On Mon, Dec 8, 2014 at 23:15 Sandy Ryza wrote: > Another thing to be aware of is that YARN will round up containers to the > nearest increment of yarn.scheduler.minimum-allocation-mb, which defaults > to 1024. > > -Sandy > > On Sat, Dec 6, 2014 at 3:48

Re: Spark-SQL JDBC driver

2014-12-11 Thread Denny Lee
Yes, that is correct. A quick reference on this is the post https://www.linkedin.com/pulse/20141007143323-732459-an-absolutely-unofficial-way-to-connect-tableau-to-sparksql-spark-1-1?_mSplash=1 with the pertinent section being: It is important to note that when you create Spark tables (for example

Re: Spark SQL Roadmap?

2014-12-13 Thread Denny Lee
Hi Xiaoyong, SparkSQL has already been released and has been part of the Spark code-base since Spark 1.0. The latest stable release is Spark 1.1 (here's the Spark SQL Programming Guide ) and we're currently voting on Spark 1.2. Hive

Limit the # of columns in Spark Scala

2014-12-14 Thread Denny Lee
I have a large of files within HDFS that I would like to do a group by statement ala val table = sc.textFile("hdfs://") val tabs = table.map(_.split("\t")) I'm trying to do something similar to tabs.map(c => (c._(167), c._(110), c._(200)) where I create a new RDD that only has but that isn't

Re: Limit the # of columns in Spark Scala

2014-12-14 Thread Denny Lee
ns looks like > the way to go given the context. What's not working? > > Kr, Gerard > On Dec 14, 2014 5:17 PM, "Denny Lee" wrote: > >> I have a large of files within HDFS that I would like to do a group by >> statement ala >> >> val table = sc

Re: Limit the # of columns in Spark Scala

2014-12-14 Thread Denny Lee
Yes - that works great! Sorry for implying I couldn't. Was just more flummoxed that I couldn't make the Scala call work on its own. Will continue to debug ;-) On Sun, Dec 14, 2014 at 11:39 Michael Armbrust wrote: > BTW, I cannot use SparkSQL / case right now because my table has 200 >> columns (a

Re: Limit the # of columns in Spark Scala

2014-12-14 Thread Denny Lee
or and not a runtime > error -- I believe c is an array of values so I think you want > tabs.map(c => (c(167), c(110), c(200)) instead of tabs.map(c => (c._(167), > c._(110), c._(200)) > > > > On Sun, Dec 14, 2014 at 3:12 PM, Denny Lee wrote: >> >> Yes - that work

Re: Spark Shell slowness on Google Cloud

2014-12-17 Thread Denny Lee
I'm curious if you're seeing the same thing when using bdutil against GCS? I'm wondering if this may be an issue concerning the transfer rate of Spark -> Hadoop -> GCS Connector -> GCS. On Wed Dec 17 2014 at 10:09:17 PM Alessandro Baretta wrote: > All, > > I'm using the Spark shell to interact w

Re: Spark Shell slowness on Google Cloud

2014-12-17 Thread Denny Lee
. See the > following. > > alex@hadoop-m:~/split$ time bash -c "gsutil ls > gs://my-bucket/20141205/csv/*/*/* | wc -l" > > 6860 > > real0m6.971s > user0m1.052s > sys 0m0.096s > > Alex > > > On Wed, Dec 17, 2014 at 10:29 PM, Denny Lee wro

Re: Spark Shell slowness on Google Cloud

2014-12-17 Thread Denny Lee
u suggest I run to test this? But more importantly, what > information would this give me? > > On Wed, Dec 17, 2014 at 10:46 PM, Denny Lee wrote: >> >> Oh, it makes sense of gsutil scans through this quickly, but I was >> wondering if running a Hadoop job / bdutil would res

Re: Hadoop 2.6 compatibility?

2014-12-19 Thread Denny Lee
To clarify, there isn't a Hadoop 2.6 profile per se but you can build using -Dhadoop.version=2.4 which works with Hadoop 2.6. On Fri, Dec 19, 2014 at 12:55 Ted Yu wrote: > You can use hadoop-2.4 profile and pass -Dhadoop.version=2.6.0 > > Cheers > > On Fri, Dec 19, 2014 at 12:51 PM, sa wrote: >

Re: Hadoop 2.6 compatibility?

2014-12-19 Thread Denny Lee
Sorry Ted! I saw profile (-P) but missed the -D. My bad! On Fri, Dec 19, 2014 at 16:46 Ted Yu wrote: > Here is the command I used: > > mvn package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 > -Dhadoop.version=2.6.0 -Phive -DskipTests > > FYI > > On Fri, Dec 19, 2014 at 4

Re: S3 files , Spark job hungsup

2014-12-23 Thread Denny Lee
You should be able to kill the job using the webUI or via spark-class. More info can be found in the thread: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-kill-a-Spark-job-running-in-cluster-mode-td18583.html. HTH! On Tue, Dec 23, 2014 at 4:47 PM, durga wrote: > Hi All , > > It se

RE: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2014-12-29 Thread Andrew Lee
Hi All, I have tried to pass the properties via the SparkContext.setLocalProperty and HiveContext.setConf, both failed. Based on the results (haven't get a chance to look into the code yet), HiveContext will try to initiate the JDBC connection right away, I couldn't set other properties dynamica

RE: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2014-12-29 Thread Andrew Lee
A follow up on the hive-site.xml, if you 1. Specify it in spark/conf, then you can NOT apply it via the --driver-class-path option, otherwise, you will get the following exceptions when initializing SparkContext. org.apache.spark.SparkException: Found both spark.driver.extraClassPath and

Spark 1.2 and Mesos 0.21.0 spark.executor.uri issue?

2014-12-30 Thread Denny Lee
I've been working with Spark 1.2 and Mesos 0.21.0 and while I have set the spark.executor.uri within spark-env.sh (and directly within bash as well), the Mesos slaves do not seem to be able to access the spark tgz file via HTTP or HDFS as per the message below. 14/12/30 15:57:35 INFO SparkILoop:

Spark 0.9.0-incubation + Apache Hadoop 2.2.0 + YARN encounter Compression codec com.hadoop.compression.lzo.LzoCodec not found

2014-03-17 Thread Andrew Lee
Hi All, I have been contemplating at this problem and couldn't figure out what is missing in the configuration. I traced the script and try to look for CLASSPATH and see what is included, however, I couldn't find any place that is honoring/inheriting HADOOP_CLASSPATH (or pulling in any map-reduce

Spark 0.9.1 - How to run bin/spark-class with my own hadoop jar files?

2014-03-25 Thread Andrew Lee
Hi All, I'm getting the following error when I execute start-master.sh which also invokes spark-class at the end. Failed to find Spark assembly in /root/spark/assembly/target/scala-2.10/ You need to build Spark with 'sbt/sbt assembly' before running this program. After digging into the cod

RE: Spark 0.9.1 - How to run bin/spark-class with my own hadoop jar files?

2014-03-25 Thread Andrew Lee
builtin to the jar it self so need for random class paths. On Tue, Mar 25, 2014 at 1:47 PM, Andrew Lee wrote: Hi All, I'm getting the following error when I execute start-master.sh which also invokes spark-class at the end. Failed to find Spark assembly in /root/spark/assemb

RE: Using an external jar in the driver, in yarn-standalone mode.

2014-03-25 Thread Andrew Lee
Hi Julien, The ADD_JAR doesn't work in the command line. I checked spark-class, and I couldn't find any Bash shell bringing in the variable ADD_JAR to the CLASSPATH. Were you able to print out the properties and environment variables from the Web GUI? localhost:4040 This should give you an idea w

Re: Calling Spark enthusiasts in NYC

2014-03-31 Thread Denny Lee
If you have any questions on helping to get a Spark Meetup off the ground, please do not hesitate to ping me (denny.g@gmail.com).  I helped jump start the one here in Seattle (and tangentially have been helping the Vancouver and Denver ones as well).  HTH! On March 31, 2014 at 12:35:38 PM,

CDH5 Spark on EC2

2014-04-02 Thread Denny Lee
I’ve been able to get CDH5 up and running on EC2 and according to Cloudera Manager, Spark is running healthy. But when I try to run spark-shell, I eventually get the error: 14/04/02 07:18:18 INFO client.AppClient$ClientActor: Connecting to master  spark://ip-172-xxx-xxx-xxx:7077... 14/04/02 07:1

Re: CDH5 Spark on EC2

2014-04-02 Thread Denny Lee
ct with it. > Also if you are running in distributed mode the workers should be registered. > > Mayur Rustagi > Ph: +1 (760) 203 3257 > http://www.sigmoidanalytics.com > @mayur_rustagi > > > >> On Wed, Apr 2, 2014 at 12:44 AM, Denny Lee wrote: >> I’ve

Re: Spark Training

2014-05-01 Thread Denny Lee
You may also want to check out Paco Nathan's Introduction to Spark courses: http://liber118.com/pxn/ > On May 1, 2014, at 8:20 AM, Mayur Rustagi wrote: > > Hi Nicholas, > We provide training on spark, hands-on also associated ecosystem. > We gave it recently at a conference in Santa Clara. P

spark-shell driver interacting with Workers in YARN mode - firewall blocking communication

2014-05-02 Thread Andrew Lee
Hi All, I encountered this problem when the firewall is enabled between the spark-shell and the Workers. When I launch spark-shell in yarn-client mode, I notice that Workers on the YARN containers are trying to talk to the driver (spark-shell), however, the firewall is not opened and caused time

RE: spark-shell driver interacting with Workers in YARN mode - firewall blocking communication

2014-05-02 Thread Andrew Lee
y 2014 14:49:23 -0400 Subject: Re: spark-shell driver interacting with Workers in YARN mode - firewall blocking communication From: yana.kadiy...@gmail.com To: user@spark.apache.org I think what you want to do is set spark.driver.port to a fixed port. On Fri, May 2, 2014 at 1:52 PM, Andrew Lee wr

Seattle Spark Meetup Slides

2014-05-02 Thread Denny Lee
We’ve had some pretty awesome presentations at the Seattle Spark Meetup - here are the links to the various slides: Seattle Spark Meetup KickOff with DataBricks | Introduction to Spark with Matei Zaharia and Pat McDonough Learnings from Running Spark at Twitter sessions Ben Hindman’s Mesos for

RE: spark-shell driver interacting with Workers in YARN mode - firewall blocking communication

2014-05-04 Thread Andrew Lee
tp://apache-spark-user-list.1001560.n3.nabble.com/Securing-Spark-s-Network-tp4832p4984.html [2] http://en.wikipedia.org/wiki/Ephemeral_port [3] http://www.cyberciti.biz/tips/linux-increase-outgoing-network-sockets-range.html Jacob D. Eisinger IBM Emerging Technologies jeis...@us.ibm.com - (512

Spark 0.9.1 - saveAsSequenceFile and large RDD

2014-05-05 Thread Allen Lee
llelism to 1 to keep the file from being partitioned sc.makeRDD(kv,1) .saveAsSequenceFile(path) Does anyone have any pointers on how to get past this? Thanks, -- *Allen Lee* Software Engineer MediaCrossing Inc.

RE: spark-shell driver interacting with Workers in YARN mode - firewall blocking communication

2014-05-06 Thread Andrew Lee
ng Technologies jeis...@us.ibm.com - (512) 286-6075 Andrew Lee ---05/04/2014 09:57:08 PM---Hi Jacob, Taking both concerns into account, I'm actually thinking about using a separate subnet to From: Andrew Lee To: "user@spark.apache.org" Date: 05/04/2014 09:57 PM Subject:

RE: run spark0.9.1 on yarn with hadoop CDH4

2014-05-06 Thread Andrew Lee
Please check JAVA_HOME. Usually it should point to /usr/java/default on CentOS/Linux. or FYI: http://stackoverflow.com/questions/1117398/java-home-directory > Date: Tue, 6 May 2014 00:23:02 -0700 > From: sln-1...@163.com > To: u...@spark.incubator.apache.org > Subject: run spark0.9.1 on yarn wit

Is spark 1.0.0 "spark-shell --master=yarn" running in yarn-cluster mode or yarn-client mode?

2014-05-21 Thread Andrew Lee
Does anyone know if: ./bin/spark-shell --master yarn is running yarn-cluster or yarn-client by default? Base on source code: ./core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala if (args.deployMode == "cluster" && args.master.startsWith("yarn")) { args.master = "yarn-cl

RE: Is spark 1.0.0 "spark-shell --master=yarn" running in yarn-cluster mode or yarn-client mode?

2014-05-21 Thread Andrew Lee
nd so it falls into the second "if" case you mentioned: if (args.deployMode != "cluster" && args.master.startsWith("yarn")) { args.master = "yarn-client"} 2014-05-21 10:57 GMT-07:00 Andrew Lee : Does anyone know if: ./bin/spark-shell --master yarn

Seattle Spark Meetup: xPatterns Slides and @pacoid session next week!

2014-05-23 Thread Denny Lee
For those whom were not able to attend the last Seattle Spark Meetup, we had a great session by Claudiu Barbura on xPatterns on Spark, Shark, Tachyon, and Mesos - you can find the slides at: http://www.slideshare.net/ClaudiuBarbura/seattle-spark-meetup-may-2014. As well, check out the next Seat

Yay for 1.0.0! EC2 Still has problems.

2014-05-30 Thread Jeremy Lee
re Dead-On-Arrival when run according to the instructions. Sorry. Any suggestions on how to proceed? I'll keep trying to fix the webserver, but (a) changes to httpd.conf get blown away by "resume", and (b) anything I do has to be redone every time I provision another cluster. Ugh. -- Jeremy Lee BCompSci(Hons) The Unorthodox Engineers

Re: Yay for 1.0.0! EC2 Still has problems.

2014-05-31 Thread Jeremy Lee
hon 2.7+. I regularly run them from the AWS Ubuntu 12.04 AMI... that > might be a good place to start. But if there is a straightforward way to > make them compatible with 2.6 we should do that. > > For r3.large, we can add that to the script. It's a newer type. Any > interest in con

Re: Yay for 1.0.0! EC2 Still has problems.

2014-05-31 Thread Jeremy Lee
;hvm" Clearly a masterpiece of hacking. :-) I haven't tested all of them. The r3 set seems to act like i2. On Sun, Jun 1, 2014 at 12:45 AM, Jeremy Lee wrote: > Hi there, Patrick. Thanks for the reply... > > It wouldn't surprise me that AWS Ubuntu has Python 2.7. Ub

Re: Yay for 1.0.0! EC2 Still has problems.

2014-05-31 Thread Jeremy Lee
;...can have a spark cluster up and running in five minutes." But it's been three days for me so far. I'm about to bite the bullet and start building my own AMI's from scratch... if anyone can save me from that, I'd be most grateful. -- Jeremy Lee BCompSci(Hons) The Unorthodox Engineers

Re: Spark on EC2

2014-06-01 Thread Jeremy Lee
nt.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-EC2-tp6638.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > -- Jeremy Lee BCompSci(Hons) The Unorthodox Engineers

Re: Yay for 1.0.0! EC2 Still has problems.

2014-06-01 Thread Jeremy Lee
t; Spot instance requests are not supported for this AMI. >> >> SuSE Linux Enterprise Server 11 sp3 (HVM) - ami-1a88bb5f >> Not tested - costs 10x more for spot instances, not economically viable. >> >> Ubuntu Server 14.04 LTS (HVM) - ami-f64f77b3 >> Provisions ser

Re: Trouble with EC2

2014-06-01 Thread Jeremy Lee
4 INFO master.Master: >> akka.tcp://spark@ip-10-100-75-70.ec2.internal:38485 got disassociated, >> removing it. >> 14/05/30 18:05:54 ERROR remote.EndpointWriter: AssociationError >> [akka.tcp://sparkMaster@ip-10-100-184-45.ec2.internal:7077] >> -> [akka.tcp://spark@ip-10-100-75-70.ec2.internal:38485]: Error >> [Association failed with >> [akka.tcp://spark@ip-10-100-75-70.ec2.internal:38485]] [ >> akka.remote.EndpointAssociationException: Association failed with [ >> akka.tcp://spark@ip-10-100-75-70.ec2.internal:38485] >> Caused by: >> akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: >> Connection refused: ip-10-100-75-70.ec2.internal/10.100.75.70:38485 >> >> >> > -- Jeremy Lee BCompSci(Hons) The Unorthodox Engineers

Re: Yay for 1.0.0! EC2 Still has problems.

2014-06-01 Thread Jeremy Lee
enough version of python. > > Spark-ec2 itself has a flag "-a" that allows you to give a specific > AMI. This flag is just an internal tool that we use for testing when > we spin new AMI's. Users can't set that to an arbitrary AMI because we > tightly control things lik

<    1   2   3   4   >