Re: SPIP: Support Kafka delegation token in Structured Streaming

2018-09-29 Thread Saisai Shao
I like this proposal. Since Kafka already provides delegation token mechanism, we can also leverage Spark's delegation token framework to add Kafka as a built-in support. BTW I think there's no much difference in support structured streaming and DStream, maybe we can set both as goal. Thanks

Re: Python friendly API for Spark 3.0

2018-09-29 Thread Stavros Kontopoulos
Regarding Python 3.x upgrade referenced earlier. Some people already gone down that path of upgrading: https://blogs.dropbox.com/tech/2018/09/how-we-rolled-out-one-of-the-largest-python-3-migrations-ever They describe some good reasons. Stavros On Tue, Sep 18, 2018 at 6:35 PM, Erik Erlandson

saveAsTable in 2.3.2 throws IOException while 2.3.1 works fine?

2018-09-29 Thread Jacek Laskowski
Hi, The following query fails in 2.3.2: scala> spark.range(10).write.saveAsTable("t1") ... 2018-09-29 20:48:06 ERROR FileOutputCommitter:314 - Mkdirs failed to create file:/user/hive/warehouse/bucketed/_temporary/0 2018-09-29 20:48:07 ERROR Utils:91 - Aborting task java.io.IOException: Mkdirs

Re: [VOTE] SPARK 2.4.0 (RC2)

2018-09-29 Thread Stavros Kontopoulos
+1 Stavros On Sat, Sep 29, 2018 at 5:59 AM, Sean Owen wrote: > +1, with comments: > > There are 5 critical issues for 2.4, and no blockers: > SPARK-25378 ArrayData.toArray(StringType) assume UTF8String in 2.4 > SPARK-25325 ML, Graph 2.4 QA: Update user guide for new features & APIs >

Re: SPIP: Support Kafka delegation token in Structured Streaming

2018-09-29 Thread Jungtaek Lim
Hi Gabor, Thanks for proposing the feature. I'm definitely interested to see this feature, but honestly I'm not familiar with how Spark deals with delegation token for HDFS and HBase. I'll try to review the doc in general, and try to learn it, and review again based on understanding. Thanks,

Re: time for Apache Spark 3.0?

2018-09-29 Thread Xiao Li
Yes. We should create a SPIP for each major breaking change. Reynold Xin 于2018年9月28日周五 下午11:05写道: > i think we should create spips for some of them, since they are pretty > large ... i can create some tickets to start with > > -- > excuse the brevity and lower case due to wrist injury > > > On

Re: [DISCUSS] Syntax for table DDL

2018-09-29 Thread Xiao Li
Are they consistent with the current syntax defined in SqlBase.g4? I think we are following the Hive DDL syntax: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/Partition/Column Ryan Blue 于2018年9月28日周五 下午3:47写道: > Hi everyone, > > I’m currently

Re: time for Apache Spark 3.0?

2018-09-29 Thread Reynold Xin
i think we should create spips for some of them, since they are pretty large ... i can create some tickets to start with -- excuse the brevity and lower case due to wrist injury On Fri, Sep 28, 2018 at 11:01 PM Xiao Li wrote: > Based on the above discussions, we have a "rough consensus" that

Re: time for Apache Spark 3.0?

2018-09-29 Thread Xiao Li
Based on the above discussions, we have a "rough consensus" that the next release will be 3.0. Now, we can start working on the API breaking changes (e.g., the ones mentioned in the original email from Reynold). Cheers, Xiao Matei Zaharia 于2018年9月6日周四 下午2:21写道: > Yes, you can start with