Re: Interested in contributing to SPARK-24815
Hi Pavan, Refer to the ASF Source Header and Copyright Notice Policy[1], code directly submitted to ASF should include the Apache license header without any additional copyright notice. Kent Yao [1] https://www.apache.org/legal/src-headers.html#headers Sean Owen 于2023年7月25日周二 07:22写道: > > When contributing to an ASF project, it's governed by the terms of the ASF > ICLA: https://www.apache.org/licenses/icla.pdf or CCLA: > https://www.apache.org/licenses/cla-corporate.pdf > > I don't believe ASF projects ever retain an original author copyright > statement, but rather source files have a statement like: > > ... > * Licensed to the Apache Software Foundation (ASF) under one or more > * contributor license agreements. See the NOTICE file distributed with > * this work for additional information regarding copyright ownership. > ... > > While it's conceivable that such a statement could live in a NOTICE file, I > don't believe that's been done for any of the thousands of other > contributors. That's really more for noting the license of > non-Apache-licensed code. Code directly contributed to the project is assumed > to have been licensed per above already. > > It might be wise to review the CCLA with Twilio and consider establishing > that to govern contributions. > > On Mon, Jul 24, 2023 at 6:10 PM Pavan Kotikalapudi > wrote: >> >> Hi Spark Dev, >> >> My name is Pavan Kotikalapudi, I work at Twilio. >> >> I am looking to contribute to this spark issue >> https://issues.apache.org/jira/browse/SPARK-24815. >> >> There is a clause from the company's OSS saying >> >> - The proposed contribution is about 100 lines of code modification in the >> Spark project, involving two files - this is considered a large >> contribution. An appropriate Twilio copyright notice needs to be added for >> the portion of code that is newly added. >> >> Please let me know if that is acceptable? >> >> Thank you, >> >> Pavan >> - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
[ANNOUNCE] Apache Kyuubi (Incubating) released 1.5.0-incubating
Hi all, The Apache Kyuubi (Incubating) community is pleased to announce that Apache Kyuubi (Incubating) 1.5.0-incubating has been released! Apache Kyuubi (Incubating) is a distributed multi-tenant JDBC server for large-scale data processing and analytics, built on top of Apache Spark and designed to support more engines like Apache Flink(Beta), Trino(Beta) and so on. Kyuubi provides a pure SQL gateway through Thrift JDBC/ODBC interface for end-users to manipulate large-scale data with pre-programmed and extensible Spark SQL engines. We are aiming to make Kyuubi an "out-of-the-box" tool for data warehouses and data lakes. This "out-of-the-box" model minimizes the barriers and costs for end-users to use Spark at the client-side. At the server-side, Kyuubi server and engine's multi-tenant architecture provides the administrators a way to achieve computing resource isolation, data security, high availability, high client concurrency, etc. The full release notes and download links are available at: Release notes: https://kyuubi.apache.org/release/1.5.0-incubating.html Download page: https://kyuubi.apache.org/releases.html To learn more about Apache Kyuubi (Incubating), please see https://kyuubi.apache.org/ Kyuubi Resources: - Issue Tracker: https://kyuubi.apache.org/issue_tracking.html - Mailing list: https://kyuubi.apache.org/mailing_lists.html We would like to thank all contributors of the Kyuubi community and Incubating community who made this release possible! Thanks, On behalf of Apache Kyuubi (Incubating) community
Re: Spark version verification
Hi Mich,> What are the correlations among these links and the ability to establish a spark build version Check the documentation list here, http://spark.apache.org/documentation.html . And the `latest` always points to the list head, for example http://spark.apache.org/docs/latest/ means http://spark.apache.org/docs/3.1.1/ for nowThe Spark build version in Spark releases is create by `spark-build-info ` see https://github.com/apache/spark/blob/89bf2afb3337a44f34009a36cae16dd0ff86b353/build/spark-build-info#L32 Some other options to check the spark build info1. the `RELEASE` filecat RELEASESpark 3.0.1 (git revision 2b147c4cd5) built for Hadoop 2.7.4Build flags: -B -Pmesos -Pyarn -Pkubernetes -Psparkr -Pscala-2.12 -Phadoop-2.7 -Phive -Phive-thriftserver -DzincPort=30362. bin/spark-submit —versionThe git revision itself does not tell you whether the release is rc or final.If you have the Spark source code locally, you can use `git show 1d550c4e90275ab418b9161925049239227f3dc9` and get the tag info, like `commit 1d550c4e90275ab418b9161925049239227f3dc9 (tag: v3.1.1-rc3, tag: v3.1.1)`.Or you can compare the revision you have got with all tags here https://github.com/apache/spark/tags Bests, Kent Yao @ Data Science Center, Hangzhou Research Institute, NetEase Corp.a spark enthusiastkyuubiis a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark.spark-authorizerA Spark SQL extension which provides SQL Standard Authorization for Apache Spark.spark-postgres A library for reading data from and transferring data to Postgres / Greenplum with Spark SQL and DataFrames, 10~100x faster.spark-func-extrasA library that brings excellent and useful functions from various modern database management systems to Apache Spark. On 03/22/2021 00:02,Mich Talebzadeh wrote: Hi Kent,Thanks for the links.You have to excuse my ignorance, what are the correlations among these links and the ability to establish a spark build version? view my Linkedin profile Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Sun, 21 Mar 2021 at 15:55, Kent Yao <yaooq...@qq.com> wrote: Please refer to http://spark.apache.org/docs/latest/api/sql/index.html#version Kent Yao @ Data Science Center, Hangzhou Research Institute, NetEase Corp.a spark enthusiastkyuubiis a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark.spark-authorizerA Spark SQL extension which provides SQL Standard Authorization for Apache Spark.spark-postgres A library for reading data from and transferring data to Postgres / Greenplum with Spark SQL and DataFrames, 10~100x faster.spark-func-extrasA library that brings excellent and useful functions from various modern database management systems to Apache Spark. On 03/21/2021 23:28,Mich Talebzadeh wrote: Many thanksspark-sql> SELECT version();3.1.1 1d550c4e90275ab418b9161925049239227f3dc9What does 1d550c4e90275ab418b9161925049239227f3dc9 signify please? view my Linkedin profile Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Sun, 21 Mar 2021 at 15:14, Sean Owen <sro...@gmail.com> wrote:I believe you can "SELECT version()" in Spark SQL to see the build version.On Sun, Mar 21, 2021 at 4:41 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote:Thanks for the detailed info.I was hoping that one can find a simpler answer to the Spark version than doing forensic examination on base code so to speak.The primer for this verification is that on GCP dataprocs originally built on 3.11-rc2, there was an issue with running Spark Structured Streaming (SSS) which I reported to this forum before.After a while and me reporting to Google, they have now upgraded the base to Spark 3.1.1 itself. I am not privy to how they did the upgrade itself.In the meantime we installed 3.1.1 on-premise and ran it with the same Python code for SSS. It worked fine.However, when I run the same code on GCP dataproc upgraded to 3.1.1, occasionally I see
Re: Spark version verification
Please refer to http://spark.apache.org/docs/latest/api/sql/index.html#version Kent Yao @ Data Science Center, Hangzhou Research Institute, NetEase Corp.a spark enthusiastkyuubiis a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark.spark-authorizerA Spark SQL extension which provides SQL Standard Authorization for Apache Spark.spark-postgres A library for reading data from and transferring data to Postgres / Greenplum with Spark SQL and DataFrames, 10~100x faster.spark-func-extrasA library that brings excellent and useful functions from various modern database management systems to Apache Spark. On 03/21/2021 23:28,Mich Talebzadeh wrote: Many thanksspark-sql> SELECT version();3.1.1 1d550c4e90275ab418b9161925049239227f3dc9What does 1d550c4e90275ab418b9161925049239227f3dc9 signify please? view my Linkedin profile Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Sun, 21 Mar 2021 at 15:14, Sean Owen <sro...@gmail.com> wrote:I believe you can "SELECT version()" in Spark SQL to see the build version.On Sun, Mar 21, 2021 at 4:41 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote:Thanks for the detailed info.I was hoping that one can find a simpler answer to the Spark version than doing forensic examination on base code so to speak.The primer for this verification is that on GCP dataprocs originally built on 3.11-rc2, there was an issue with running Spark Structured Streaming (SSS) which I reported to this forum before.After a while and me reporting to Google, they have now upgraded the base to Spark 3.1.1 itself. I am not privy to how they did the upgrade itself.In the meantime we installed 3.1.1 on-premise and ran it with the same Python code for SSS. It worked fine.However, when I run the same code on GCP dataproc upgraded to 3.1.1, occasionally I see this error21/03/18 16:53:38 ERROR org.apache.spark.scheduler.AsyncEventQueue: Listener EventLoggingListener threw an exceptionjava.util.ConcurrentModificationException at java.util.Hashtable$Enumerator.next(Hashtable.java:1387)This may be for other reasons or the consequence of upgrading from 3.1.1-rc2 to 3.11? view my Linkedin profile Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Sat, 20 Mar 2021 at 22:41, Attila Zsolt Piros <piros.attila.zs...@gmail.com> wrote:Hi!I would check out the Spark source then diff those two RCs (first just take look to the list of the changed files):$ git diff v3.1.1-rc1..v3.1.1-rc2 --stat...The shell scripts in the release can be checked very easily: $ git diff v3.1.1-rc1..v3.1.1-rc2 --stat | grep ".sh " bin/docker-image-tool.sh | 6 +- dev/create-release/release-build.sh | 2 +-We are lucky as docker-image-tool.sh is part of the released version. Is it from v3.1.1-rc2 or v3.1.1-rc1?Of course this only works if docker-image-tool.sh is not changed from the v3.1.1-rc2 back to v3.1.1-rc1. So let's continue with the python (and latter with R) files:$ git diff v3.1.1-rc1..v3.1.1-rc2 --stat | grep ".py " python/pyspark/sql/avro/functions.py | 4 +- python/pyspark/sql/dataframe.py | 1 + python/pyspark/sql/functions.py | 285 +-- .../pyspark/sql/tests/test_pandas_cogrouped_map.py | 12 + python/pyspark/sql/tests/test_pandas_map.py | 8 +...After you have enough proof you can stop (to decide what is enough here should be decided by you). Finally you can use javap / scalap on the classes from the jars and check some code changes which is more harder to be analyzed than a simple text file.Best Regards,AttilaOn Thu, Mar 18, 2021 at 4:09 PM Mich Talebzadeh <mich.talebza...@gmail.com> wrote:Hi What would be a signature in Spark version or binaries that confirms the release is built on Spark built on 3.1.1 as opposed to 3.1.1-RC-1 or RC-2?Thanks Mich view my Linkedin profile Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liabl
Re: [jira] [Commented] (SPARK-34648) Reading Parquet F =?utf-8?Q?iles_in_Spark_Extremely_Slow_for_Large_Number_of_Files??=
Hi Pankaj,Have you tried spark.sql.parquet.respectSummaryFiles=true? Bests, Kent Yao @ Data Science Center, Hangzhou Research Institute, NetEase Corp.a spark enthusiastkyuubiis a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark.spark-authorizerA Spark SQL extension which provides SQL Standard Authorization for Apache Spark.spark-postgres A library for reading data from and transferring data to Postgres / Greenplum with Spark SQL and DataFrames, 10~100x faster.spark-func-extrasA library that brings excellent and useful functions from various modern database management systems to Apache Spark. On 03/10/2021 21:59,钟雨 wrote: Hi Pankaj,Can you show your detail code and Job/Stage Info? Which Stage is slow?Pankaj Bhootra <pankajbhoo...@gmail.com> 于2021年3月10日周三 下午12:32写道:Hi,Could someone please revert on this?ThanksPankaj BhootraOn Sun, 7 Mar 2021, 01:22 Pankaj Bhootra, <pankajbhoo...@gmail.com> wrote:Hello TeamI am new to Spark and this question may be a possible duplicate of the issue highlighted here: https://issues.apache.org/jira/browse/SPARK-9347 We have a large dataset partitioned by calendar date, and within each date partition, we are storing the data as parquet files in 128 parts.We are trying to run aggregation on this dataset for 366 dates at a time with Spark SQL on spark version 2.3.0, hence our Spark job is reading 366*128=46848 partitions, all of which are parquet files. There is currently no _metadata or _common_metadata file(s) available for this dataset.The problem we are facing is that when we try to run spark.read.parquet on the above 46848 partitions, our data reads are extremely slow. It takes a long time to run even a simple map task (no shuffling) without any aggregation or group by.I read through the above issue and I think I perhaps generally understand the ideas around _common_metadata file. But the above issue was raised for Spark 1.3.1 and for Spark 2.3.0, I have not found any documentation related to this metadata file so far.I would like to clarify:What's the latest, best practice for reading large number of parquet files efficiently?Does this involve using any additional options with spark.read.parquet? How would that work?Are there other possible reasons for slow data reads apart from reading metadata for every part? We are basically trying to migrate our existing spark pipeline from using csv files to parquet, but from my hands-on so far, it seems that parquet's read time is slower than csv? This seems contradictory to popular opinion that parquet performs better in terms of both computation and storage?Thanks Pankaj Bhootra-- Forwarded message -From: Takeshi Yamamuro (Jira) <j...@apache.org>Date: Sat, 6 Mar 2021, 20:02Subject: [jira] [Commented] (SPARK-34648) Reading Parquet Files in Spark Extremely Slow for Large Number of Files?To: <pankajbhoo...@gmail.com> [ https://issues.apache.org/jira/browse/SPARK-34648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296528#comment-17296528 ] Takeshi Yamamuro commented on SPARK-34648: -- Please use the mailing list (user@spark.apache.org) instead. This is not a right place to ask questions. > Reading Parquet Files in Spark Extremely Slow for Large Number of Files? > > > Key: SPARK-34648 > URL: https://issues.apache.org/jira/browse/SPARK-34648 > Project: Spark > Issue Type: Question > Components: SQL > Affects Versions: 2.3.0 > Reporter: Pankaj Bhootra > Priority: Major > > Hello Team > I am new to Spark and this question may be a possible duplicate of the issue highlighted here: https://issues.apache.org/jira/browse/SPARK-9347 > We have a large dataset partitioned by calendar date, and within each date partition, we are storing the data as *parquet* files in 128 parts. > We are trying to run aggregation on this dataset for 366 dates at a time with Spark SQL on spark version 2.3.0, hence our Spark job is reading 366*128=46848 partitions, all of which are parquet files. There is currently no *_metadata* or *_common_metadata* file(s) available for this dataset. > The problem we are facing is that when we try to run *spark.read.parquet* on the above 46848 partitions, our data reads are extremely slow. It takes a long time to run even a simple map task (no shuffling) without any aggregation or group by. > I read through the above issue and I think I perhaps generally understand the ideas around *_common_metadata* file. But the above issue was raised f
Re:spark 3.1.1 support hive 1.2
Hi Li,Have you tried `Interacting with Different Versions of Hive Metastore` http://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html#interacting-with-different-versions-of-hive-metastore Bests, Kent Yao @ Data Science Center, Hangzhou Research Institute, NetEase Corp.a spark enthusiastkyuubiis a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark.spark-authorizerA Spark SQL extension which provides SQL Standard Authorization for Apache Spark.spark-postgres A library for reading data from and transferring data to Postgres / Greenplum with Spark SQL and DataFrames, 10~100x faster.spark-func-extrasA library that brings excellent and useful functions from various modern database management systems to Apache Spark. On 03/10/2021 10:56,jiahong li wrote: Hi,sorry to bother you.In spark 3.0.1,hive-1.2 is supported,but in spark 3.1.x maven profile hive-1.1 is removed.Is that means hive-1.2 does not supported in spark 3.1.x? how can i support hive-1.2 in spark 3.1.x,or any jira? can anyone help me ?
Re: [ANNOUNCE] Announcing Apache Spark 3.1.1
Congrats, all! Bests, Kent Yao @ Data Science Center, Hangzhou Research Institute, NetEase Corp.a spark enthusiastkyuubiis a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark.spark-authorizerA Spark SQL extension which provides SQL Standard Authorization for Apache Spark.spark-postgres A library for reading data from and transferring data to Postgres / Greenplum with Spark SQL and DataFrames, 10~100x faster.spark-func-extrasA library that brings excellent and useful functions from various modern database management systems to Apache Spark. On 03/3/2021 15:11,Takeshi Yamamuro wrote: Great work and Congrats, all!Bests,TakeshiOn Wed, Mar 3, 2021 at 2:18 PM Mridul Muralidharan <mri...@gmail.com> wrote:Thanks Hyukjin and congratulations everyone on the release !Regards,Mridul On Tue, Mar 2, 2021 at 8:54 PM Yuming Wang <wgy...@gmail.com> wrote:Great work, Hyukjin!On Wed, Mar 3, 2021 at 9:50 AM Hyukjin Kwon <gurwls...@gmail.com> wrote:We are excited to announce Spark 3.1.1 today.Apache Spark 3.1.1 is the second release of the 3.x line. This release addsPython type annotations and Python dependency management support as part of Project Zen.Other major updates include improved ANSI SQL compliance support, history server supportin structured streaming, the general availability (GA) of Kubernetes and node decommissioningin Kubernetes and Standalone. In addition, this release continues to focus on usability, stability,and polish while resolving around 1500 tickets.We'd like to thank our contributors and users for their contributions and early feedback tothis release. This release would not have been possible without you.To download Spark 3.1.1, head over to the download page:http://spark.apache.org/downloads.htmlTo view the release notes:https://spark.apache.org/releases/spark-release-3-1-1.html -- ---Takeshi Yamamuro
Re: EOFException when reading from HDFS
Can anyone help me with this? I have been stuck on this for a few days and don't know what to try anymore. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/EOFException-when-reading-from-HDFS-tp13844p14115.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
EOFException when reading from HDFS
I ran the SimpleApp program from spark tutorial (https://spark.apache.org/docs/1.0.0/quick-start.html), which works fine. However, if I change the file location from local to hdfs, then I get an EOFException. I did some search online which suggests this error is caused by hadoop version conflicts, I made the suggested modification in my sbt file, but still get the same error. libraryDependencies += org.apache.hadoop % hadoop-client % 2.3.0-cdh5.1.0 I am using CDH5.1, full error message is below. Any help is greatly appreciated. Thanks [hdfs@plogs001 test1]$ spark-submit --class SimpleApp --master spark://172.16.30.164:7077 target/scala-2.10/simple-project_2.10-1.0.jar 14/09/09 16:56:41 INFO spark.SecurityManager: Changing view acls to: hdfs 14/09/09 16:56:41 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hdfs) 14/09/09 16:56:41 INFO slf4j.Slf4jLogger: Slf4jLogger started 14/09/09 16:56:41 INFO Remoting: Starting remoting 14/09/09 16:56:41 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sp...@plogs001.sjc.domain.com:34607] 14/09/09 16:56:41 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sp...@plogs001.sjc.domain.com:34607] 14/09/09 16:56:41 INFO spark.SparkEnv: Registering MapOutputTracker 14/09/09 16:56:41 INFO spark.SparkEnv: Registering BlockManagerMaster 14/09/09 16:56:41 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20140909165641-375e 14/09/09 16:56:41 INFO storage.MemoryStore: MemoryStore started with capacity 294.9 MB. 14/09/09 16:56:41 INFO network.ConnectionManager: Bound socket to port 40833 with id = ConnectionManagerId(plogs001.sjc.domain.com,40833) 14/09/09 16:56:41 INFO storage.BlockManagerMaster: Trying to register BlockManager 14/09/09 16:56:41 INFO storage.BlockManagerInfo: Registering block manager plogs001.sjc.domain.com:40833 with 294.9 MB RAM 14/09/09 16:56:41 INFO storage.BlockManagerMaster: Registered BlockManager 14/09/09 16:56:41 INFO spark.HttpServer: Starting HTTP Server 14/09/09 16:56:42 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/09/09 16:56:42 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:47419 14/09/09 16:56:42 INFO broadcast.HttpBroadcast: Broadcast server started at http://172.16.30.161:47419 14/09/09 16:56:42 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-7026d0b6-777e-4dd3-9bbb-e79d7487e7d7 14/09/09 16:56:42 INFO spark.HttpServer: Starting HTTP Server 14/09/09 16:56:42 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/09/09 16:56:42 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:42388 14/09/09 16:56:42 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/09/09 16:56:42 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 14/09/09 16:56:42 INFO ui.SparkUI: Started SparkUI at http://plogs001.sjc.domain.com:4040 14/09/09 16:56:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/09/09 16:56:42 INFO spark.SparkContext: Added JAR file:/home/hdfs/kent/test1/target/scala-2.10/simple-project_2.10-1.0.jar at http://172.16.30.161:42388/jars/simple-project_2.10-1.0.jar with timestamp 1410307002737 14/09/09 16:56:42 INFO client.AppClient$ClientActor: Connecting to master spark://plogs004.sjc.domain.com:7077... 14/09/09 16:56:42 INFO storage.MemoryStore: ensureFreeSpace(155704) called with curMem=0, maxMem=309225062 14/09/09 16:56:42 INFO storage.MemoryStore: Block broadcast_0 stored as values to memory (estimated size 152.1 KB, free 294.8 MB) 14/09/09 16:56:42 INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20140909165642-0041 14/09/09 16:56:42 INFO client.AppClient$ClientActor: Executor added: app-20140909165642-0041/0 on worker-20140902113555-plogs005.sjc.domain.com-7078 (plogs005.sjc.domain.com:7078) with 24 cores 14/09/09 16:56:42 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20140909165642-0041/0 on hostPort plogs005.sjc.domain.com:7078 with 24 cores, 1024.0 MB RAM 14/09/09 16:56:42 INFO client.AppClient$ClientActor: Executor added: app-20140909165642-0041/1 on worker-20140902113555-plogs006.sjc.domain.com-7078 (plogs006.sjc.domain.com:7078) with 24 cores 14/09/09 16:56:42 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20140909165642-0041/1 on hostPort plogs006.sjc.domain.com:7078 with 24 cores, 1024.0 MB RAM 14/09/09 16:56:42 INFO client.AppClient$ClientActor: Executor added: app-20140909165642-0041/2 on worker-20140902113556-plogs004.sjc.domain.com-7078 (plogs004.sjc.domain.com:7078) with 24 cores 14/09/09 16:56:42 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20140909165642-0041/2 on hostPort plogs004.sjc.domain.com:7078 with 24 cores, 1024.0 MB RAM 14/09/09 16:56:42 INFO client.AppClient$ClientActor: Executor updated: app-20140909165642-0041/2 is now