Re: Slack for PySpark users

2023-03-30 Thread Xiao Li
g official. > > Bests, > Dongjoon. > > > > On Wed, Mar 29, 2023 at 11:32 PM Xiao Li wrote: > >> +1 >> >> + @d...@spark.apache.org >> >> This is a good idea. The other Apache projects (e.g., Pinot, Druid, >> Flink) have created their own dedicate

Re: Slack for PySpark users

2023-03-30 Thread Xiao Li
+1 + @d...@spark.apache.org This is a good idea. The other Apache projects (e.g., Pinot, Druid, Flink) have created their own dedicated Slack workspaces for faster communication. We can do the same in Apache Spark. The Slack workspace will be maintained by the Apache Spark PMC. I propose to

Stickers and Swag

2022-06-14 Thread Xiao Li
Hi, all, The ASF has an official store at RedBubble that Apache Community Development (ComDev) runs. If you are interested in buying Spark Swag, 70 products featuring the Spark logo are available: https://www.redbubble.com/shop/ap/113203780 Go

Re: [ANNOUNCE] Apache Spark 3.2.0

2021-10-19 Thread Xiao Li
Thank you, Gengliang! Congrats to our community and all the contributors! Xiao Henrik Peng 于2021年10月19日周二 上午8:26写道: > Congrats and thanks! > > > Gengliang Wang 于2021年10月19日 周二下午10:16写道: > >> Hi all, >> >> Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous >> contribution

Re: [ANNOUNCE] Apache Spark 3.1.2 released

2021-06-01 Thread Xiao Li
Thank you! Xiao On Tue, Jun 1, 2021 at 9:29 PM Hyukjin Kwon wrote: > awesome! > > 2021년 6월 2일 (수) 오전 9:59, Dongjoon Hyun 님이 작성: > >> We are happy to announce the availability of Spark 3.1.2! >> >> Spark 3.1.2 is a maintenance release containing stability fixes. This >> release is based on the

Re: Spark and Bintray's shutdown

2021-04-12 Thread Xiao Li
Not all the Spark packages in https://spark-packages.org/ are eligible for maven central. We are looking for the replacement of Bintray for spark-packages.org. Bo Zhang is actively working on this. Bo, can you share your ideas with the community? Cheers, Xiao On Mon, Apr 12, 2021 at 9:28 AM

Re: [UPDATE] Apache Spark 3.1.0 Release Window

2020-10-12 Thread Xiao Li
Thank you, Dongjoon Xiao On Mon, Oct 12, 2020 at 4:19 PM Dongjoon Hyun wrote: > Hi, All. > > Apache Spark 3.1.0 Release Window is adjusted like the following today. > Please check the latest information on the official website. > > - >

Re: Apache Spark 3.1 Preparation Status (Oct. 2020)

2020-10-04 Thread Xiao Li
ing > to the changed release cadence the code freeze should happen in > mid-November. > > On Sun, Oct 4, 2020 at 6:26 PM Xiao Li wrote: > >> Apache Spark 3.1.0 should be compared with Apache Spark 2.1.0. >> >> >> I think we made a change in release cadence si

Re: Apache Spark 3.1 Preparation Status (Oct. 2020)

2020-10-04 Thread Xiao Li
e: >> >>> >>> +1 on pushing the branch cut for increased dev time to match previous >>> releases. >>> >>> Regards, >>> Mridul >>> >>> On Sat, Oct 3, 2020 at 10:22 PM Xiao Li wrote: >>> >>>>

Re: Apache Spark 3.1 Preparation Status (Oct. 2020)

2020-10-03 Thread Xiao Li
Thank you for your updates. Spark 3.0 got released on Jun 18, 2020. If Nov 1st is the target date of the 3.1 branch cut, the feature development time window is less than 5 months. This is shorter than what we did in Spark 2.3 and 2.4 releases. Below are three highly desirable feature work I am

Re: Spark UI

2020-07-19 Thread Xiao Li
https://spark.apache.org/docs/3.0.0/web-ui.html is the official doc for Spark UI. Xiao On Sun, Jul 19, 2020 at 1:38 PM venkatadevarapu wrote: > Hi, > > I'm looking for a tutorial/video/material which explains the content of > various tabes in SPARK WEB UI. > Can some one direct me with the

Re: How to disable pushdown predicate in spark 2.x query

2020-06-22 Thread Xiao Li
Just turn off the JDBC option pushDownPredicate, which was introduced in Spark 2.4. https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html Xiao On Mon, Jun 22, 2020 at 11:36 AM Mohit Durgapal wrote: > Hi All, > > I am trying to read a table of a relational database using spark 2.x. >

Re: Different execution results with wholestage codegen on and off

2020-05-27 Thread Xiao Li
Thanks for reporting it. Please open a JIRA with a test case. Cheers, Xiao On Wed, May 27, 2020 at 1:42 PM Pasha Finkelshteyn < pavel.finkelsht...@gmail.com> wrote: > Hi folks, > > I'm implementing Kotlin bindings for Spark and faced strange problem. In > one cornercase Spark works differently

Re: Why were changes of SPARK-9241 removed?

2020-03-12 Thread Xiao Li
I do not think we intentionally dropped it. Could you open a ticket in Spark JIRA with your query? Cheers, Xiao On Thu, Mar 12, 2020 at 8:24 PM 马阳阳 wrote: > Hi, > I wonder why the changes made in > "[SPARK-9241][SQL] Supporting > multiple DISTINCT columns (2) - > Rewriting Rule" are not

Re: Spark 2.4.4 having worse performance than 2.4.2 when running the same code [pyspark][sql]

2020-01-15 Thread Xiao Li
27 vs 5.16 so I can't be sure... Maybe I'll try getting the latest > version of Spark locally and make the comparison that way. > > Regards, > Kalin > > On Wed, Jan 15, 2020 at 7:58 PM Xiao Li wrote: > >> EMR is having their own fork of Spark, called EMR runtime. They

Re: Spark 2.4.4 having worse performance than 2.4.2 when running the same code [pyspark][sql]

2020-01-15 Thread Xiao Li
EMR is having their own fork of Spark, called EMR runtime. They are not Apache Spark. You might need to talk with them instead of posting questions in the Apache Spark community. Cheers, Xiao Kalin Stoyanov 于2020年1月15日周三 上午9:53写道: > Hi all, > > First of all let me say that I am pretty new to

Re: Why Apache Spark doesn't use Calcite?

2020-01-15 Thread Xiao Li
In the upcoming Spark 3.0, we introduced a new framework for Adaptive Query Execution in Catalyst. This can adjust the plans based on the runtime statistics. This is missing in Calcite based on my understanding. Catalyst is also very easy to enhance. We also use the dynamic programming approach

Re: Fail to use SparkR of 3.0 preview 2

2020-01-07 Thread Xiao Li
We can use R version 3.6.1, if we have a concern about the quality of 3.6.2? On Thu, Dec 26, 2019 at 8:14 PM Hyukjin Kwon wrote: > I was randomly googling out of curiosity, and seems indeed that's the > problem ( >

Re: [ANNOUNCE] Announcing Apache Spark 3.0.0-preview2

2019-12-24 Thread Xiao Li
Thank you all. Happy Holidays! Xiao On Tue, Dec 24, 2019 at 12:53 PM Yuming Wang wrote: > Hi all, > > To enable wide-scale community testing of the upcoming Spark 3.0 release, > the Apache Spark community has posted a new preview release of Spark 3.0. > This preview is *not a stable release in

Happy Diwali everyone!!!

2019-10-27 Thread Xiao Li
Happy Diwali everyone!!! Xiao

Re: JDK11 Support in Apache Spark

2019-08-24 Thread Xiao Li
Thank you for your contributions! This is a great feature for Spark 3.0! We finally achieve it! Xiao On Sat, Aug 24, 2019 at 12:18 PM Felix Cheung wrote: > That’s great! > > -- > *From:* ☼ R Nair > *Sent:* Saturday, August 24, 2019 10:57:31 AM > *To:* Dongjoon Hyun

Re: Filter cannot be pushed via a Join

2019-06-18 Thread Xiao Li
Hi, William, Thanks for reporting it. Could you open a JIRA? Cheers, Xiao William Wong 于2019年6月18日周二 上午8:57写道: > BTW, I noticed a workaround is creating a custom rule to remove 'empty > local relation' from a union table. However, I am not 100% sure if it is > the right approach. > > On Tue,

[ANNOUNCE] Announcing Apache Spark 2.4.3

2019-05-09 Thread Xiao Li
been possible without you. Xiao Li

Re: [ANNOUNCE] Announcing Apache Spark 2.4.0

2018-11-08 Thread Xiao Li
Try to clear your browsing data or use a different web browser. Enjoy it, Xiao On Thu, Nov 8, 2018 at 4:15 PM Reynold Xin wrote: > Do you have a cached copy? I see it here > > http://spark.apache.org/downloads.html > > > > On Thu, Nov 8, 2018 at 4:12 PM Li Gao wrote: > >> this is wonderful !

Happy Diwali everyone!!!

2018-11-07 Thread Xiao Li
Happy Diwali everyone!!! Xiao Li

Re: How to use 'insert overwrite [local] directory' correctly?

2018-08-27 Thread Xiao Li
Open a JIRA? Bang Xiao 于2018年8月27日周一 上午2:46写道: > solve the problem by create directory on hdfs before execute the sql. > but i met a new error when i use : > > INSERT OVERWRITE LOCAL DIRECTORY '/search/odin/test' row format delimited > FIELDS TERMINATED BY '\t' select vrid, query, url,

Re: ORC native in Spark 2.3, with zlib, gives java.nio.BufferUnderflowException during read

2018-03-27 Thread Xiao Li
Hi, Eirik, Yes, please open a JIRA. Thanks, Xiao 2018-03-23 8:03 GMT-07:00 Eirik Thorsnes : > Hi all, > > I'm trying the new ORC native in Spark 2.3 > (org.apache.spark.sql.execution.datasources.orc). > > I've compiled Spark 2.3 from the git branch-2.3 as of March 20th.

Re: There is no UDF0 interface?

2018-02-04 Thread Xiao Li
The upcoming 2.3 will have it. On Sun, Feb 4, 2018 at 12:24 PM kant kodali wrote: > Hi All, > > I see the current UDF API's can take one or more arguments but I don't see > any UDF0 in Spark 2.2.0. am I correct? > > Thanks! >

Re: spark 2.0 and spark 2.2

2018-01-22 Thread Xiao Li
Generally, the behavior changes in Spark SQL will be documented in https://spark.apache.org/docs/latest/sql-programming-guide.html#migration-guide In the ongoing Spark 2.3 release, all the behavior changes in Spark SQL/DataFrame/Dataset that causes behavior changes are documented in this section.

Re: Spark 2.0 and Oracle 12.1 error

2017-07-21 Thread Xiao Li
g why SELECT > * will not work. > > Regards, > Leena > > On Fri, Jul 21, 2017 at 8:21 AM, Xiao Li <gatorsm...@gmail.com> wrote: > >> Could you try 2.2? We fixed multiple Oracle related issues in the latest >> release. >> >> Thanks >> >> Xiao

Re: Spark 2.0 and Oracle 12.1 error

2017-07-21 Thread Xiao Li
Could you try 2.2? We fixed multiple Oracle related issues in the latest release. Thanks Xiao On Wed, 19 Jul 2017 at 11:10 PM Cassa L wrote: > Hi, > I am trying to use Spark to read from Oracle (12.1) table using Spark 2.0. > My table has JSON data. I am getting below

Re: Remove .HiveStaging files

2017-02-16 Thread Xiao Li
Maybe you can check this PR? https://github.com/apache/spark/pull/16399 Thanks, Xiao 2017-02-15 15:05 GMT-08:00 KhajaAsmath Mohammed : > Hi, > > I am using spark temporary tables to write data back to hive. I have seen > weird behavior of .hive-staging files after

Re: Cannot read Hive Views in Spark SQL

2017-02-06 Thread Xiao Li
Which Spark version are you using? 2017-02-06 12:25 GMT-05:00 vaquar khan : > Did you try MSCK REPAIR TABLE ? > > Regards, > Vaquar Khan > > On Feb 6, 2017 11:21 AM, "KhajaAsmath Mohammed" > wrote: > >> I dont think so, i was able to insert

Re: Spark 2.0 issue

2016-09-29 Thread Xiao Li
Hi, Ashish, Will take a look at this soon. Thanks for reporting this, Xiao 2016-09-29 14:26 GMT-07:00 Ashish Shrowty : > If I try to inner-join two dataframes which originated from the same initial > dataframe that was loaded using spark.sql() call, it results in an

Re: What are using Spark for

2016-08-01 Thread Xiao Li
Hi, Rohit, The Spark summit has many interesting use cases. Hopefully, it can answer your question. https://spark-summit.org/2015/schedule/ https://spark-summit.org/2016/schedule/ Thanks, Xiao 2016-08-01 22:48 GMT-07:00 Rohit L : > Hi Everyone, > > I want to know

Re: Difference between DataFrame.write.jdbc and DataFrame.write.format("jdbc")

2016-07-06 Thread Xiao Li
Hi, Dragisa, Just submitted a PR for implementing the save API. https://github.com/apache/spark/pull/14077 Let me know if you have any question, Xiao 2016-07-06 10:41 GMT-07:00 Rabin Banerjee : > HI Buddy, > >I sued both but DataFrame.write.jdbc is old, and

Re: Difference between DataFrame.write.jdbc and DataFrame.write.format("jdbc")

2016-07-06 Thread Xiao Li
Hi, Dragisa, Your second way is incomplete, right? To get the error you showed, you need to put save() there. Yeah, we can implement the trait CreatableRelationProvider for JDBC. Then, you will not see that error. Will submit a PR for that. Thanks, Xiao 2016-07-06 10:05 GMT-07:00 Dragisa

Re: Copying all Hive tables from Prod to UAT

2016-04-08 Thread Xiao Li
You also need to ensure no workload is running on both sides. 2016-04-08 15:54 GMT-07:00 Ali Gouta : > For hive, you may use sqoop to achieve this. In my opinion, you may also > run a spark job to make it.. > Le 9 avr. 2016 00:25, "Ashok Kumar"

Re: Is Hive CREATE DATABASE IF NOT EXISTS atomic

2016-04-07 Thread Xiao Li
Hi, Assuming you are using 1.6 or before, this is a native Hive command. Basically, the execution of Database creation is completed by Hive. Thanks, Xiao Li 2016-04-07 15:23 GMT-07:00 antoniosi <antonio...@gmail.com>: > Hi, > > I am using hiveContext.sql("create da

Re: Spark SQL Optimization

2016-03-21 Thread Xiao Li
Hi, Maybe you can open a JIRA and upload your plan as Michael suggested. This is an interesting feature. Thanks! Xiao Li 2016-03-21 10:36 GMT-07:00 Michael Armbrust <mich...@databricks.com>: > It's helpful if you can include the output of EXPLAIN EXTENDED or > df.explain(true) whe

Re: how to interview spark developers

2016-02-23 Thread Xiao Li
This is interesting! I believe the interviewees should AT LEAST subscribe this mailing list, if they are spark developers. Then, they will know your questions before the interview. : ) 2016-02-23 22:07 GMT-08:00 charles li : > hi, there, we are going to recruit several

Re: how to introduce spark to your colleague if he has no background about *** spark related

2016-01-31 Thread Xiao Li
, but the audiences are three RDBMS engine experts. I will go over the paper Spark SQL in Sigmod 2015 with them and show them the source codes. Good luck! Xiao Li 2016-01-31 22:35 GMT-08:00 Jörn Franke <jornfra...@gmail.com>: > It depends of course on the background of the people but how a

Re: Spark SQL IN Clause

2015-12-04 Thread Xiao Li
https://github.com/apache/spark/pull/9055 This JIRA explains how to convert IN to Joins. Thanks, Xiao Li 2015-12-04 11:27 GMT-08:00 Michael Armbrust <mich...@databricks.com>: > The best way to run this today is probably to manually convert the query > into a join. I.e. create

Re: Low Latency SQL query

2015-12-01 Thread Xiao Li
http://cacm.acm.org/magazines/2011/6/108651-10-rules-for-scalable-performance-in-simple-operation-datastores/fulltext Try to read this article. It might help you understand your problem. Thanks, Xiao Li 2015-12-01 16:36 GMT-08:00 Mark Hamstra <m...@clearstorydata.com>: > I'd as

Re: newbie : why are thousands of empty files being created on HDFS?

2015-11-23 Thread Xiao Li
In your case, maybe you can try to call the function coalesce? Good luck, Xiao Li 2015-11-23 12:15 GMT-08:00 Andy Davidson <a...@santacruzintegration.com>: > Hi Sabarish > > I am but a simple padawan :-) I do not understand your answer. Why would > Spark be creating so man

Re: Relation between RDDs, DataFrames and Project Tungsten

2015-11-23 Thread Xiao Li
. They can analyze their data by clicking a few buttons, instead of writing the programs. : ) Wish Spark will be the most popular analytics OS in the world! : ) Have a good holiday everyone! Xiao Li 2015-11-23 17:56 GMT-08:00 Jakob Odersky <joder...@gmail.com>: > Thanks Michael, that helped

Re: How 'select name,age from TBL_STUDENT where age = 37' is optimized when caching it

2015-11-16 Thread Xiao Li
Your dataframe is cached. Thus, your plan is stored as an InMemoryRelation. You can read the logics in CacheManager.scala. Good luck, Xiao Li 2015-11-16 6:35 GMT-08:00 Todd <bit1...@163.com>: > Hi, > > When I cache the dataframe and run the query, > > val df = sqlCont

Re: [spark1.5.1] HiveQl.parse throws org.apache.spark.sql.AnalysisException: null

2015-10-23 Thread Xiao Li
to call the other APIs before calling this API. Note, lazy evaluation is a little bit annoying when you traverse the code base. Good luck, Xiao Li 2015-10-21 3:06 GMT-07:00 Sebastian Nadorp <sebastian.nad...@nugg.ad>: > What we're trying to achieve is a fast way of testing the validi

Re: driver ClassNotFoundException when MySQL JDBC exceptions are thrown on executor

2015-10-22 Thread Xiao Li
[*] --class com.sparkEngine. /Users/smile/spark-1.3.1-bin-hadoop2.3/projects/SparkApps-master/spark-load-from-db/target/-1.0.jar Hopefully, it works for you. Xiao Li 2015-10-22 4:56 GMT-07:00 Akhil Das <ak...@sigmoidanalytics.com>: > Did you try passing the mysql connector ja

Re: How to distinguish columns when joining DataFrames with shared parent?

2015-10-22 Thread Xiao Li
in the node to indicate if this is from alias; otherwise, we need to traverse the underlying tree for each column to confirm it is not from alias even if it is not from an alias Good luck, Xiao Li 2015-10-21 16:33 GMT-07:00 Isabelle Phan <nlip...@gmail.com>: > Ok, got it. > Thanks a

Re: Spark groupby and agg inconsistent and missing data

2015-10-22 Thread Xiao Li
Hi, Saif, Could you post your code here? It might help others reproduce the errors and give you a correct answer. Thanks, Xiao Li 2015-10-22 8:27 GMT-07:00 <saif.a.ell...@wellsfargo.com>: > Hello everyone, > > I am doing some analytics experiments under a 4 server stan

Re: Multiple joins in Spark

2015-10-20 Thread Xiao Li
e non-null; >at org.apache.spark.sql.hive.HiveQL$.createPlan(HiveQl.scala:260) >. > > I cannot paste the entire stack since it's on a company laptop and I am > not allowed to copy paste things! Though if absolutely needed to help, I > can figure out some way to provide

Re: [spark1.5.1] HiveQl.parse throws org.apache.spark.sql.AnalysisException: null

2015-10-20 Thread Xiao Li
month INT, day INT) STORED AS PARQUET Location 'temp'") Good luck, Xiao Li 2015-10-20 10:23 GMT-07:00 Michael Armbrust <mich...@databricks.com>: > Thats not really intended to be a public API as there is some internal > setup that needs to be done for Hive to work. Have you c

Re: Spark SQL: Preserving Dataframe Schema

2015-10-20 Thread Xiao Li
if a value is null for all the nullable data types. Thus, it might cause a problem if you need to use Spark to transfer the data between parquet and RDBMS. My suggestion is to introduce another external parameter? Thanks, Xiao Li 2015-10-20 10:20 GMT-07:00 Michael Armbrust <m

Re: Spark SQL: Preserving Dataframe Schema

2015-10-20 Thread Xiao Li
are not avoidable, it will issue warnings or errors to the users. Does that make sense? Thanks, Xiao Li In this case, 2015-10-20 12:38 GMT-07:00 Michael Armbrust <mich...@databricks.com>: > First, this is not documented in the official document. Maybe we should do &g

Re: Dynamic partition pruning

2015-10-16 Thread Xiao Li
Hi, Younes, Maybe you can open a JIRA? Thanks, Xiao Li 2015-10-16 12:43 GMT-07:00 Younes Naguib <younes.nag...@tritondigital.com>: > Thanks, > > Do you have a Jira I can follow for this? > > > > y > > > > *From:* Michael Armbrust [mailto:mich...@databric

Re: Problem of RDD in calculation

2015-10-16 Thread Xiao Li
Hi, Frank, After registering these DF as a temp table (via the API registerTempTable), you can do it using SQL. I believe this should be much easier. Good luck, Xiao Li 2015-10-16 12:10 GMT-07:00 ChengBo <cheng...@huawei.com>: > Hi all, > > > > I am new in Spark,

Re: Multiple joins in Spark

2015-10-16 Thread Xiao Li
;tab2") df3.registerTempTable("tab3") val exampleSQL = sqlContext.sql("select * from tab1, tab2, tab3 where tab1.name = tab2.name and tab2.id = tab3.id") Good luck, Xiao Li 2015-10-16 17:01 GMT-07:00 Shyam Parimal Katti <spk...@nyu.edu>: > Hello All, >

Re: Problem of RDD in calculation

2015-10-16 Thread Xiao Li
For most programmers, dataFrames are preferred thanks to the flexibility, but using sql syntax is a great option for users who feel more comfortable using SQL. : ) 2015-10-16 18:22 GMT-07:00 Ali Tajeldin EDU : > Since DF2 only has the userID, I'm assuming you are musing

Re: Multiple joins in Spark

2015-10-16 Thread Xiao Li
tual physical plan to execute your SQL query is generated by the result of Catalyst optimizer. Good luck, Xiao Li 2015-10-16 20:53 GMT-07:00 Shyam Parimal Katti <spk...@nyu.edu>: > Thanks Xiao! Question about the internals, would you know what happens > when createTempTable() is called

Re: How to speed up reading from file?

2015-10-16 Thread Xiao Li
in your system. Good luck, Xiao Li 2015-10-16 14:08 GMT-07:00 <saif.a.ell...@wellsfargo.com>: > Hello, > > Is there an optimal number of partitions per number of rows, when writing > into disk, so we can re-read later from source in a distributed way? > Any thoughts? > > Thanks > Saif > >

Re: SparkSQL: First query execution is always slower than subsequent queries

2015-10-12 Thread Xiao Li
Hi, Lloyd, Both runs are cold/warm? Memory/cache hit/miss could be a big factor if your application is IO intensive. You need to monitor your system to understand what is your bottleneck. Good lucks, Xiao Li

Re: Best practices to call small spark jobs as part of REST api

2015-10-12 Thread Xiao Li
The design majorly depends on your use cases. You have to think about the requirements and rank them. For example, if your application cares the response time and is ok to read the stale data, using a nosql database as a middleware is a good option. Good Luck, Xiao Li 2015-10-11 21:00 GMT-07

Re: Spark handling parallel requests

2015-10-12 Thread Xiao Li
. Otherwise, you need to read the designs or even source codes of Kafka and Spark Streaming. Best wishes, Xiao Li 2015-10-11 23:19 GMT-07:00 Akhil Das <ak...@sigmoidanalytics.com>: > Instead of pushing your requests to the socket, why don't you push them to > a Kafka or any other m

Re: Kafka and Spark combination

2015-10-09 Thread Xiao Li
Please see the following discussion: http://search-hadoop.com/m/YGbbS0SqClMW5T1 Thanks, Xiao Li 2015-10-09 6:17 GMT-07:00 Nikhil Gs <gsnikhil1432...@gmail.com>: > Has anyone worked with Kafka in a scenario where the Streaming data from > the Kafka consumer is picked by

Re: Best storage format for intermediate process

2015-10-09 Thread Xiao Li
. It is time consuming. If the source side is mainframe, it could also eat a lot of MIPS. Thus, the best way is to save it in a persistent media without any data transformation and then transform and store them based on your query types. Thanks, Xiao Li 2015-10-09 11:25 GMT-07:00 <saif.a.

Re: Datastore or DB for spark

2015-10-09 Thread Xiao Li
FYI, in my local environment, Spark is connected to DB2 on z/OS but that requires a special JDBC driver. Xiao Li 2015-10-09 8:38 GMT-07:00 Rahul Jeevanandam <rahu...@incture.com>: > Hi Jörn Franke > > I was sure that relational database wouldn't be a good option for Spark. &