Re: Interested in contributing to SPARK-24815

2023-07-25 Thread Kent Yao
Hi Pavan,

Refer to the ASF Source Header and Copyright Notice Policy[1], code
directly submitted to ASF should include the Apache license header
without any additional copyright notice.


Kent Yao

[1] https://www.apache.org/legal/src-headers.html#headers

Sean Owen  于2023年7月25日周二 07:22写道:

>
> When contributing to an ASF project, it's governed by the terms of the ASF 
> ICLA: https://www.apache.org/licenses/icla.pdf or CCLA: 
> https://www.apache.org/licenses/cla-corporate.pdf
>
> I don't believe ASF projects ever retain an original author copyright 
> statement, but rather source files have a statement like:
>
> ...
>  * Licensed to the Apache Software Foundation (ASF) under one or more
>  * contributor license agreements.  See the NOTICE file distributed with
>  * this work for additional information regarding copyright ownership.
> ...
>
> While it's conceivable that such a statement could live in a NOTICE file, I 
> don't believe that's been done for any of the thousands of other 
> contributors. That's really more for noting the license of 
> non-Apache-licensed code. Code directly contributed to the project is assumed 
> to have been licensed per above already.
>
> It might be wise to review the CCLA with Twilio and consider establishing 
> that to govern contributions.
>
> On Mon, Jul 24, 2023 at 6:10 PM Pavan Kotikalapudi 
>  wrote:
>>
>> Hi Spark Dev,
>>
>> My name is Pavan Kotikalapudi, I work at Twilio.
>>
>> I am looking to contribute to this spark issue 
>> https://issues.apache.org/jira/browse/SPARK-24815.
>>
>> There is a clause from the company's OSS saying
>>
>> - The proposed contribution is about 100 lines of code modification in the 
>> Spark project, involving two files - this is considered a large 
>> contribution. An appropriate Twilio copyright notice needs to be added for 
>> the portion of code that is newly added.
>>
>> Please let me know if that is acceptable?
>>
>> Thank you,
>>
>> Pavan
>>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



[ANNOUNCE] Apache Kyuubi (Incubating) released 1.5.0-incubating

2022-03-25 Thread Kent Yao
Hi all,

The Apache Kyuubi (Incubating) community is pleased to announce that
Apache Kyuubi (Incubating) 1.5.0-incubating has been released!

Apache Kyuubi (Incubating) is a distributed multi-tenant JDBC server for
large-scale data processing and analytics, built on top of Apache Spark
and designed to support more engines like Apache Flink(Beta), Trino(Beta)
and so on.

Kyuubi provides a pure SQL gateway through Thrift JDBC/ODBC interface
for end-users to manipulate large-scale data with pre-programmed and
extensible Spark SQL engines.

We are aiming to make Kyuubi an "out-of-the-box" tool for data warehouses
and data lakes.

This "out-of-the-box" model minimizes the barriers and costs for end-users
to use Spark at the client-side.

At the server-side, Kyuubi server and engine's multi-tenant architecture
provides the administrators a way to achieve computing resource isolation,
data security, high availability, high client concurrency, etc.

The full release notes and download links are available at:
Release notes: https://kyuubi.apache.org/release/1.5.0-incubating.html
Download page: https://kyuubi.apache.org/releases.html

To learn more about Apache Kyuubi (Incubating), please see
https://kyuubi.apache.org/

Kyuubi Resources:
- Issue Tracker: https://kyuubi.apache.org/issue_tracking.html
- Mailing list: https://kyuubi.apache.org/mailing_lists.html

We would like to thank all contributors of the Kyuubi community and
Incubating
community who made this release possible!

Thanks,
On behalf of Apache Kyuubi (Incubating) community


Re: Spark version verification

2021-03-21 Thread Kent Yao






Hi Mich,> What are the correlations among these links and the ability to establish a spark build version   Check the documentation list here, http://spark.apache.org/documentation.html . And the `latest` always points to the list head, for example http://spark.apache.org/docs/latest/ means http://spark.apache.org/docs/3.1.1/ for nowThe Spark build version in Spark releases is create by `spark-build-info ` see https://github.com/apache/spark/blob/89bf2afb3337a44f34009a36cae16dd0ff86b353/build/spark-build-info#L32 Some other options to check the spark build info1. the `RELEASE` filecat RELEASESpark 3.0.1 (git revision 2b147c4cd5) built for Hadoop 2.7.4Build flags: -B -Pmesos -Pyarn -Pkubernetes -Psparkr -Pscala-2.12 -Phadoop-2.7 -Phive -Phive-thriftserver -DzincPort=30362. bin/spark-submit —versionThe git revision itself does not tell you whether the release is rc or final.If you have the Spark source code locally, you can use `git show 1d550c4e90275ab418b9161925049239227f3dc9` and get the tag info, like `commit 1d550c4e90275ab418b9161925049239227f3dc9 (tag: v3.1.1-rc3, tag: v3.1.1)`.Or you can compare the revision you have got with all tags here https://github.com/apache/spark/tags Bests,






  


















    
Kent Yao @ Data Science Center, Hangzhou Research Institute, NetEase Corp.a spark enthusiastkyuubiis a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark.spark-authorizerA Spark SQL extension which provides SQL Standard Authorization for Apache Spark.spark-postgres A library for reading data from and transferring data to Postgres / Greenplum with Spark SQL and DataFrames, 10~100x faster.spark-func-extrasA library that brings excellent and useful functions from various modern database management systems to Apache Spark.















 


On 03/22/2021 00:02,Mich Talebzadeh wrote: 


Hi Kent,Thanks for the links.You have to excuse my ignorance, what are the correlations among these links and the ability to establish a spark build version?

   view my Linkedin profile

 Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction
of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from such
loss, damage or destruction.  

On Sun, 21 Mar 2021 at 15:55, Kent Yao <yaooq...@qq.com> wrote:







Please refer to http://spark.apache.org/docs/latest/api/sql/index.html#version 






  
















    


Kent Yao @ Data Science Center, Hangzhou Research Institute, NetEase Corp.a spark enthusiastkyuubiis a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark.spark-authorizerA Spark SQL extension which provides SQL Standard Authorization for Apache Spark.spark-postgres A library for reading data from and transferring data to Postgres / Greenplum with Spark SQL and DataFrames, 10~100x faster.spark-func-extrasA library that brings excellent and useful functions from various modern database management systems to Apache Spark.















 


On 03/21/2021 23:28,Mich Talebzadeh wrote: 


Many thanksspark-sql> SELECT version();3.1.1 1d550c4e90275ab418b9161925049239227f3dc9What does 1d550c4e90275ab418b9161925049239227f3dc9 signify please?



   view my Linkedin profile

 Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction
of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from such
loss, damage or destruction.  

On Sun, 21 Mar 2021 at 15:14, Sean Owen <sro...@gmail.com> wrote:I believe you can "SELECT version()" in Spark SQL to see the build version.On Sun, Mar 21, 2021 at 4:41 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote:Thanks for the detailed info.I was hoping that one can find a simpler answer to the Spark version than doing forensic examination on base code so to speak.The primer for this verification is that on GCP dataprocs originally built on 3.11-rc2, there was an issue with running Spark Structured Streaming (SSS) which I reported to this forum before.After a while and me reporting to Google, they have now upgraded the base to Spark 3.1.1 itself. I am not privy to how they did the upgrade itself.In the meantime we installed 3.1.1 on-premise and ran it with the same Python code for SSS. It worked fine.However, when I run the same code on GCP dataproc upgraded to 3.1.1, occasionally I see

Re: Spark version verification

2021-03-21 Thread Kent Yao






Please refer to http://spark.apache.org/docs/latest/api/sql/index.html#version 






  



















Kent Yao @ Data Science Center, Hangzhou Research Institute, NetEase Corp.a spark enthusiastkyuubiis a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark.spark-authorizerA Spark SQL extension which provides SQL Standard Authorization for Apache Spark.spark-postgres A library for reading data from and transferring data to Postgres / Greenplum with Spark SQL and DataFrames, 10~100x faster.spark-func-extrasA library that brings excellent and useful functions from various modern database management systems to Apache Spark.















 


On 03/21/2021 23:28,Mich Talebzadeh wrote: 


Many thanksspark-sql> SELECT version();3.1.1 1d550c4e90275ab418b9161925049239227f3dc9What does 1d550c4e90275ab418b9161925049239227f3dc9 signify please?



   view my Linkedin profile

 Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction
of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from such
loss, damage or destruction.  

On Sun, 21 Mar 2021 at 15:14, Sean Owen <sro...@gmail.com> wrote:I believe you can "SELECT version()" in Spark SQL to see the build version.On Sun, Mar 21, 2021 at 4:41 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote:Thanks for the detailed info.I was hoping that one can find a simpler answer to the Spark version than doing forensic examination on base code so to speak.The primer for this verification is that on GCP dataprocs originally built on 3.11-rc2, there was an issue with running Spark Structured Streaming (SSS) which I reported to this forum before.After a while and me reporting to Google, they have now upgraded the base to Spark 3.1.1 itself. I am not privy to how they did the upgrade itself.In the meantime we installed 3.1.1 on-premise and ran it with the same Python code for SSS. It worked fine.However, when I run the same code on GCP dataproc upgraded to 3.1.1, occasionally I see this error21/03/18 16:53:38 ERROR org.apache.spark.scheduler.AsyncEventQueue: Listener EventLoggingListener threw an exceptionjava.util.ConcurrentModificationException        at java.util.Hashtable$Enumerator.next(Hashtable.java:1387)This may be for other reasons or the consequence of upgrading from 3.1.1-rc2 to 3.11?

   view my Linkedin profile

 Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction
of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from such
loss, damage or destruction.  

On Sat, 20 Mar 2021 at 22:41, Attila Zsolt Piros <piros.attila.zs...@gmail.com> wrote:Hi!I would check out the Spark source then diff those two RCs (first just take look to the list of the changed files):$ git diff v3.1.1-rc1..v3.1.1-rc2 --stat...The shell scripts in the release can be checked very easily: $ git diff v3.1.1-rc1..v3.1.1-rc2 --stat | grep ".sh " bin/docker-image-tool.sh                           |   6 +- dev/create-release/release-build.sh                |   2 +-We are lucky as docker-image-tool.sh is part of the released version. Is it from v3.1.1-rc2 or v3.1.1-rc1?Of course this only works if docker-image-tool.sh is not changed from the v3.1.1-rc2 back to v3.1.1-rc1. So let's continue with the python (and latter with R) files:$ git diff v3.1.1-rc1..v3.1.1-rc2 --stat | grep ".py " python/pyspark/sql/avro/functions.py               |   4 +- python/pyspark/sql/dataframe.py                    |   1 + python/pyspark/sql/functions.py                    | 285 +-- .../pyspark/sql/tests/test_pandas_cogrouped_map.py |  12 + python/pyspark/sql/tests/test_pandas_map.py        |   8 +...After you have enough proof you can stop (to decide what is enough here should be decided by you). Finally you can use javap / scalap on the classes from the jars and check some code changes which is more harder to be analyzed than a simple text file.Best Regards,AttilaOn Thu, Mar 18, 2021 at 4:09 PM Mich Talebzadeh <mich.talebza...@gmail.com> wrote:Hi What would be a signature in Spark version or binaries that confirms the release is built on Spark built on 3.1.1 as opposed to 3.1.1-RC-1 or RC-2?Thanks

Mich

   view my Linkedin profile

 Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction
of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liabl

Re: [jira] [Commented] (SPARK-34648) Reading Parquet F =?utf-8?Q?iles_in_Spark_Extremely_Slow_for_Large_Number_of_Files??=

2021-03-10 Thread Kent Yao






Hi Pankaj,Have you tried spark.sql.parquet.respectSummaryFiles=true?




Bests,

  



















Kent Yao @ Data Science Center, Hangzhou Research Institute, NetEase Corp.a spark enthusiastkyuubiis a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark.spark-authorizerA Spark SQL extension which provides SQL Standard Authorization for Apache Spark.spark-postgres A library for reading data from and transferring data to Postgres / Greenplum with Spark SQL and DataFrames, 10~100x faster.spark-func-extrasA library that brings excellent and useful functions from various modern database management systems to Apache Spark.















 


On 03/10/2021 21:59,钟雨 wrote: 


Hi Pankaj,Can you show your detail code and Job/Stage Info? Which Stage is slow?Pankaj Bhootra <pankajbhoo...@gmail.com> 于2021年3月10日周三 下午12:32写道:Hi,Could someone please revert on this?ThanksPankaj BhootraOn Sun, 7 Mar 2021, 01:22 Pankaj Bhootra, <pankajbhoo...@gmail.com> wrote:Hello TeamI am new to Spark and this question may be a possible duplicate of the issue highlighted here: https://issues.apache.org/jira/browse/SPARK-9347 We have a large dataset partitioned by calendar date, and within each date partition, we are storing the data as parquet files in 128 parts.We are trying to run aggregation on this dataset for 366 dates at a time with Spark SQL on spark version 2.3.0, hence our Spark job is reading 366*128=46848 partitions, all of which are parquet files. There is currently no _metadata or _common_metadata file(s) available for this dataset.The problem we are facing is that when we try to run spark.read.parquet on the above 46848 partitions, our data reads are extremely slow. It takes a long time to run even a simple map task (no shuffling) without any aggregation or group by.I read through the above issue and I think I perhaps generally understand the ideas around _common_metadata file. But the above issue was raised for Spark 1.3.1 and for Spark 2.3.0, I have not found any documentation related to this metadata file so far.I would like to clarify:What's the latest, best practice for reading large number of parquet files efficiently?Does this involve using any additional options with spark.read.parquet? How would that work?Are there other possible reasons for slow data reads apart from reading metadata for every part? We are basically trying to migrate our existing spark pipeline from using csv files to parquet, but from my hands-on so far, it seems that parquet's read time is slower than csv? This seems contradictory to popular opinion that parquet performs better in terms of both computation and storage?Thanks Pankaj Bhootra-- Forwarded message -From: Takeshi Yamamuro (Jira) <j...@apache.org>Date: Sat, 6 Mar 2021, 20:02Subject: [jira] [Commented] (SPARK-34648) Reading Parquet Files in Spark Extremely Slow for Large Number of Files?To:  <pankajbhoo...@gmail.com>
    [ https://issues.apache.org/jira/browse/SPARK-34648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296528#comment-17296528 ] 

Takeshi Yamamuro commented on SPARK-34648:
--

Please use the mailing list (user@spark.apache.org) instead. This is not a right place to ask questions.

> Reading Parquet Files in Spark Extremely Slow for Large Number of Files?
> 
>
>                 Key: SPARK-34648
>                 URL: https://issues.apache.org/jira/browse/SPARK-34648
>             Project: Spark
>          Issue Type: Question
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Pankaj Bhootra
>            Priority: Major
>
> Hello Team
> I am new to Spark and this question may be a possible duplicate of the issue highlighted here: https://issues.apache.org/jira/browse/SPARK-9347 
> We have a large dataset partitioned by calendar date, and within each date partition, we are storing the data as *parquet* files in 128 parts.
> We are trying to run aggregation on this dataset for 366 dates at a time with Spark SQL on spark version 2.3.0, hence our Spark job is reading 366*128=46848 partitions, all of which are parquet files. There is currently no *_metadata* or *_common_metadata* file(s) available for this dataset.
> The problem we are facing is that when we try to run *spark.read.parquet* on the above 46848 partitions, our data reads are extremely slow. It takes a long time to run even a simple map task (no shuffling) without any aggregation or group by.
> I read through the above issue and I think I perhaps generally understand the ideas around *_common_metadata* file. But the above issue was raised f

Re:spark 3.1.1 support hive 1.2

2021-03-09 Thread Kent Yao






Hi Li,Have you tried `Interacting with Different Versions of Hive Metastore` http://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html#interacting-with-different-versions-of-hive-metastore 




Bests,

  



















Kent Yao @ Data Science Center, Hangzhou Research Institute, NetEase Corp.a spark enthusiastkyuubiis a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark.spark-authorizerA Spark SQL extension which provides SQL Standard Authorization for Apache Spark.spark-postgres A library for reading data from and transferring data to Postgres / Greenplum with Spark SQL and DataFrames, 10~100x faster.spark-func-extrasA library that brings excellent and useful functions from various modern database management systems to Apache Spark.















 


On 03/10/2021 10:56,jiahong li wrote: 


Hi,sorry to bother you.In spark 3.0.1,hive-1.2 is supported,but in spark 3.1.x maven profile hive-1.1 is removed.Is that means hive-1.2 does not supported  in spark 3.1.x? how can i support hive-1.2 in spark 3.1.x,or any jira? can anyone help me ?






Re: [ANNOUNCE] Announcing Apache Spark 3.1.1

2021-03-02 Thread Kent Yao







Congrats, all!







Bests,

  



















Kent Yao @ Data Science Center, Hangzhou Research Institute, NetEase Corp.a spark enthusiastkyuubiis a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark.spark-authorizerA Spark SQL extension which provides SQL Standard Authorization for Apache Spark.spark-postgres A library for reading data from and transferring data to Postgres / Greenplum with Spark SQL and DataFrames, 10~100x faster.spark-func-extrasA library that brings excellent and useful functions from various modern database management systems to Apache Spark.















 


On 03/3/2021 15:11,Takeshi Yamamuro wrote: 


Great work and Congrats, all!Bests,TakeshiOn Wed, Mar 3, 2021 at 2:18 PM Mridul Muralidharan <mri...@gmail.com> wrote:Thanks Hyukjin and congratulations everyone on the release !Regards,Mridul On Tue, Mar 2, 2021 at 8:54 PM Yuming Wang <wgy...@gmail.com> wrote:Great work, Hyukjin!On Wed, Mar 3, 2021 at 9:50 AM Hyukjin Kwon <gurwls...@gmail.com> wrote:We are excited to announce Spark 3.1.1 today.Apache Spark 3.1.1 is the second release of the 3.x line. This release addsPython type annotations and Python dependency management support as part of Project Zen.Other major updates include improved ANSI SQL compliance support, history server supportin structured streaming, the general availability (GA) of Kubernetes and node decommissioningin Kubernetes and Standalone. In addition, this release continues to focus on usability, stability,and polish while resolving around 1500 tickets.We'd like to thank our contributors and users for their contributions and early feedback tothis release. This release would not have been possible without you.To download Spark 3.1.1, head over to the download page:http://spark.apache.org/downloads.htmlTo view the release notes:https://spark.apache.org/releases/spark-release-3-1-1.html


-- ---Takeshi Yamamuro






Re: EOFException when reading from HDFS

2014-09-12 Thread kent
Can anyone help me with this?  I have been stuck on this for a few days and
don't know what to try anymore.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/EOFException-when-reading-from-HDFS-tp13844p14115.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



EOFException when reading from HDFS

2014-09-09 Thread kent
I ran the SimpleApp program from spark tutorial
(https://spark.apache.org/docs/1.0.0/quick-start.html), which works fine. 

However, if I change the file location from local to hdfs, then I get an
EOFException. 

I did some search online which suggests this error is caused by hadoop
version conflicts, I made the suggested modification in my sbt file, but
still get the same error. 

libraryDependencies += org.apache.hadoop % hadoop-client %
2.3.0-cdh5.1.0 

I am using CDH5.1, full error message is below.  Any help is greatly
appreciated. 

Thanks 


[hdfs@plogs001 test1]$ spark-submit --class SimpleApp --master
spark://172.16.30.164:7077 target/scala-2.10/simple-project_2.10-1.0.jar 
14/09/09 16:56:41 INFO spark.SecurityManager: Changing view acls to: hdfs 
14/09/09 16:56:41 INFO spark.SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view permissions:
Set(hdfs) 
14/09/09 16:56:41 INFO slf4j.Slf4jLogger: Slf4jLogger started 
14/09/09 16:56:41 INFO Remoting: Starting remoting 
14/09/09 16:56:41 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://sp...@plogs001.sjc.domain.com:34607] 
14/09/09 16:56:41 INFO Remoting: Remoting now listens on addresses:
[akka.tcp://sp...@plogs001.sjc.domain.com:34607] 
14/09/09 16:56:41 INFO spark.SparkEnv: Registering MapOutputTracker 
14/09/09 16:56:41 INFO spark.SparkEnv: Registering BlockManagerMaster 
14/09/09 16:56:41 INFO storage.DiskBlockManager: Created local directory at
/tmp/spark-local-20140909165641-375e 
14/09/09 16:56:41 INFO storage.MemoryStore: MemoryStore started with
capacity 294.9 MB. 
14/09/09 16:56:41 INFO network.ConnectionManager: Bound socket to port 40833
with id = ConnectionManagerId(plogs001.sjc.domain.com,40833) 
14/09/09 16:56:41 INFO storage.BlockManagerMaster: Trying to register
BlockManager 
14/09/09 16:56:41 INFO storage.BlockManagerInfo: Registering block manager
plogs001.sjc.domain.com:40833 with 294.9 MB RAM 
14/09/09 16:56:41 INFO storage.BlockManagerMaster: Registered BlockManager 
14/09/09 16:56:41 INFO spark.HttpServer: Starting HTTP Server 
14/09/09 16:56:42 INFO server.Server: jetty-8.y.z-SNAPSHOT 
14/09/09 16:56:42 INFO server.AbstractConnector: Started
SocketConnector@0.0.0.0:47419 
14/09/09 16:56:42 INFO broadcast.HttpBroadcast: Broadcast server started at
http://172.16.30.161:47419
14/09/09 16:56:42 INFO spark.HttpFileServer: HTTP File server directory is
/tmp/spark-7026d0b6-777e-4dd3-9bbb-e79d7487e7d7 
14/09/09 16:56:42 INFO spark.HttpServer: Starting HTTP Server 
14/09/09 16:56:42 INFO server.Server: jetty-8.y.z-SNAPSHOT 
14/09/09 16:56:42 INFO server.AbstractConnector: Started
SocketConnector@0.0.0.0:42388 
14/09/09 16:56:42 INFO server.Server: jetty-8.y.z-SNAPSHOT 
14/09/09 16:56:42 INFO server.AbstractConnector: Started
SelectChannelConnector@0.0.0.0:4040 
14/09/09 16:56:42 INFO ui.SparkUI: Started SparkUI at
http://plogs001.sjc.domain.com:4040
14/09/09 16:56:42 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable 
14/09/09 16:56:42 INFO spark.SparkContext: Added JAR
file:/home/hdfs/kent/test1/target/scala-2.10/simple-project_2.10-1.0.jar at
http://172.16.30.161:42388/jars/simple-project_2.10-1.0.jar with timestamp
1410307002737 
14/09/09 16:56:42 INFO client.AppClient$ClientActor: Connecting to master
spark://plogs004.sjc.domain.com:7077... 
14/09/09 16:56:42 INFO storage.MemoryStore: ensureFreeSpace(155704) called
with curMem=0, maxMem=309225062 
14/09/09 16:56:42 INFO storage.MemoryStore: Block broadcast_0 stored as
values to memory (estimated size 152.1 KB, free 294.8 MB) 
14/09/09 16:56:42 INFO cluster.SparkDeploySchedulerBackend: Connected to
Spark cluster with app ID app-20140909165642-0041 
14/09/09 16:56:42 INFO client.AppClient$ClientActor: Executor added:
app-20140909165642-0041/0 on
worker-20140902113555-plogs005.sjc.domain.com-7078
(plogs005.sjc.domain.com:7078) with 24 cores 
14/09/09 16:56:42 INFO cluster.SparkDeploySchedulerBackend: Granted executor
ID app-20140909165642-0041/0 on hostPort plogs005.sjc.domain.com:7078 with
24 cores, 1024.0 MB RAM 
14/09/09 16:56:42 INFO client.AppClient$ClientActor: Executor added:
app-20140909165642-0041/1 on
worker-20140902113555-plogs006.sjc.domain.com-7078
(plogs006.sjc.domain.com:7078) with 24 cores 
14/09/09 16:56:42 INFO cluster.SparkDeploySchedulerBackend: Granted executor
ID app-20140909165642-0041/1 on hostPort plogs006.sjc.domain.com:7078 with
24 cores, 1024.0 MB RAM 
14/09/09 16:56:42 INFO client.AppClient$ClientActor: Executor added:
app-20140909165642-0041/2 on
worker-20140902113556-plogs004.sjc.domain.com-7078
(plogs004.sjc.domain.com:7078) with 24 cores 
14/09/09 16:56:42 INFO cluster.SparkDeploySchedulerBackend: Granted executor
ID app-20140909165642-0041/2 on hostPort plogs004.sjc.domain.com:7078 with
24 cores, 1024.0 MB RAM 
14/09/09 16:56:42 INFO client.AppClient$ClientActor: Executor updated:
app-20140909165642-0041/2 is now