Re: On Java 9+ support, Cleaners, modules and the death of reflection

2018-11-12 Thread Sean Owen
For those following, I have a PR up at https://github.com/apache/spark/pull/22993 The implication is that ignoring MaxDirectMemorySize doesn't work out of the box in Java 9+ now. However, you can make it work by setting JVM flags to allow access to the new Cleaner class. Or set MaxDirectMemorySize

Re: DataSourceV2 capability API

2018-11-12 Thread Wenchen Fan
I think this works, but there are also other solutions, e.g. mixin traits and runtime exceptions Assuming the general abstraction is: table -> scan builder -> scan -> batch/batches (see alternative #2 in the doc

Re: Spark Utf 8 encoding

2018-11-12 Thread lsn24
My Terminal can display UTF-8 encoded characters. I already verified that. But will double check again. Thanks! -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@s

Re: time for Apache Spark 3.0?

2018-11-12 Thread Vinoo Ganesh
Quickly following up on this – is there a target date for when Spark 3.0 may be released and/or a list of the likely api breaks that are anticipated? From: Xiao Li Date: Saturday, September 29, 2018 at 02:09 To: Reynold Xin Cc: Matei Zaharia , Ryan Blue , Mark Hamstra , "u...@spark.apache.org"

Re: time for Apache Spark 3.0?

2018-11-12 Thread Reynold Xin
Master branch now tracks 3.0.0-SHAPSHOT version, so the next one will be 3.0. In terms of time lining, unless we change anything specifically, Spark feature releases are on a 6-mo cadence. Spark 2.4 was just released last week, so 3.0 will be roughly 6 month from now. On Mon, Nov 12, 2018 at 1:54

Re: time for Apache Spark 3.0?

2018-11-12 Thread Vinoo Ganesh
Makes sense, thanks Reynold. From: Reynold Xin Date: Monday, November 12, 2018 at 16:57 To: Vinoo Ganesh Cc: Xiao Li , Matei Zaharia , Ryan Blue , Mark Hamstra , dev Subject: Re: time for Apache Spark 3.0? Master branch now tracks 3.0.0-SHAPSHOT version, so the next one will be 3.0. In term

Re: time for Apache Spark 3.0?

2018-11-12 Thread Matt Cheah
I wanted to clarify what categories of APIs are eligible to be broken in Spark 3.0. Specifically: Are we removing all deprecated methods? If we’re only removing some subset of deprecated methods, what is that subset? I see a bunch were removed in https://github.com/apache/spark/pull/22921 for

Re: time for Apache Spark 3.0?

2018-11-12 Thread Reynold Xin
All API removal and deprecation JIRAs should be tagged "releasenotes", so we can reference them when we build release notes. I don't know if everybody is still following that practice, but it'd be great to do that. Since we don't have that many PRs, we should still be able to retroactively tag. We

Re: time for Apache Spark 3.0?

2018-11-12 Thread Sean Owen
My non-definitive takes -- I would personally like to remove all deprecated methods for Spark 3. I started by removing 'old' deprecated methods in that commit. Things deprecated in 2.4 are maybe less clear, whether they should be removed Everything's fair game for removal or change in a major rel

Re: DataSourceV2 capability API

2018-11-12 Thread JackyLee
I don't know if it is a right thing to make table API as ContinuousScanBuilder -> ContinuousScan -> ContinuousBatch, it makes batch/microBatch/Continuous too different from each other. In my opinion, these are basically similar at the table level. So is it possible to design an API like this? ScanB