Design proposal for streaming APIs in data source V2

2018-05-24 Thread Joseph Torres
Hi all,

https://docs.google.com/document/d/1VzxEuvpLfuHKL6vJO9qJ6ug0x9J_gLoLSH_vJL3-Cho

I've finished a full design proposal for streaming APIs in data source V2,
following up on my earlier doc with just the writer. Please take a look.

(Note that slightly different versions of the APIs already exist as
prototypes - this proposal is intended to override them, now that we have a
better handle on what's required.)


Jose


Re: [VOTE] Spark 2.3.1 (RC2)

2018-05-24 Thread Li Jin
I'd like to bring https://issues.apache.org/jira/browse/SPARK-24373 to
people's attention cause it could be a regression from 2.2.

I will leave it to more experienced people to decide whether this should be
a blocker or not.

On Wed, May 23, 2018 at 12:54 PM, Marcelo Vanzin 
wrote:

> Sure. Also, I'd appreciate if these bugs were properly triaged and
> targeted, so that we could avoid creating RCs when we know there are
> blocking bugs that will prevent the RC vote from succeeding.
>
> On Wed, May 23, 2018 at 9:02 AM, Xiao Li  wrote:
> > -1
> >
> > Yeah, we should fix it in Spark 2.3.1.
> > https://issues.apache.org/jira/browse/SPARK-24257 is a correctness bug.
> The
> > PR can be merged soon. Thus, let us have another RC?
> >
> > Thanks,
> >
> > Xiao
> >
> >
> > 2018-05-23 8:04 GMT-07:00 chenliang613 :
> >>
> >> Hi
> >>
> >> Agree with Wenchen, it is better to fix this issue.
> >>
> >> Regards
> >> Liang
> >>
> >>
> >> cloud0fan wrote
> >> > We found a critical bug in tungsten that can lead to silent data
> >> > corruption: https://github.com/apache/spark/pull/21311
> >> >
> >> > This is a long-standing bug that starts with Spark 2.0(not a
> >> > regression),
> >> > but since we are going to release 2.3.1, I think it's a good chance to
> >> > include this fix.
> >> >
> >> > We will also backport this fix to Spark 2.0, 2.1, 2.2, and then we can
> >> > discuss if we should do a new release for 2.0, 2.1, 2.2 later.
> >> >
> >> > Thanks,
> >> > Wenchen
> >> >
> >> > On Wed, May 23, 2018 at 9:54 PM, Sean Owen 
> >>
> >> > srowen@
> >>
> >> >  wrote:
> >> >
> >> >> +1 Same result for me as with RC1.
> >> >>
> >> >>
> >> >> On Tue, May 22, 2018 at 2:45 PM Marcelo Vanzin 
> >>
> >> > vanzin@
> >>
> >> > 
> >> >> wrote:
> >> >>
> >> >>> Please vote on releasing the following candidate as Apache Spark
> >> >>> version
> >> >>> 2.3.1.
> >> >>>
> >> >>> The vote is open until Friday, May 25, at 20:00 UTC and passes if
> >> >>> at least 3 +1 PMC votes are cast.
> >> >>>
> >> >>> [ ] +1 Release this package as Apache Spark 2.3.1
> >> >>> [ ] -1 Do not release this package because ...
> >> >>>
> >> >>> To learn more about Apache Spark, please see
> http://spark.apache.org/
> >> >>>
> >> >>> The tag to be voted on is v2.3.1-rc2 (commit 93258d80):
> >> >>> https://github.com/apache/spark/tree/v2.3.1-rc2
> >> >>>
> >> >>> The release files, including signatures, digests, etc. can be found
> >> >>> at:
> >> >>> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc2-bin/
> >> >>>
> >> >>> Signatures used for Spark RCs can be found in this file:
> >> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS
> >> >>>
> >> >>> The staging repository for this release can be found at:
> >> >>>
> >> >>> https://repository.apache.org/content/repositories/
> orgapachespark-1270/
> >> >>>
> >> >>> The documentation corresponding to this release can be found at:
> >> >>> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc2-docs/
> >> >>>
> >> >>> The list of bug fixes going into 2.3.1 can be found at the following
> >> >>> URL:
> >> >>> https://issues.apache.org/jira/projects/SPARK/versions/12342432
> >> >>>
> >> >>> FAQ
> >> >>>
> >> >>> =
> >> >>> How can I help test this release?
> >> >>> =
> >> >>>
> >> >>> If you are a Spark user, you can help us test this release by taking
> >> >>> an existing Spark workload and running on this release candidate,
> then
> >> >>> reporting any regressions.
> >> >>>
> >> >>> If you're working in PySpark you can set up a virtual env and
> install
> >> >>> the current RC and see if anything important breaks, in the
> Java/Scala
> >> >>> you can add the staging repository to your projects resolvers and
> test
> >> >>> with the RC (make sure to clean up the artifact cache before/after
> so
> >> >>> you don't end up building with a out of date RC going forward).
> >> >>>
> >> >>> ===
> >> >>> What should happen to JIRA tickets still targeting 2.3.1?
> >> >>> ===
> >> >>>
> >> >>> The current list of open tickets targeted at 2.3.1 can be found at:
> >> >>> https://s.apache.org/Q3Uo
> >> >>>
> >> >>> Committers should look at those and triage. Extremely important bug
> >> >>> fixes, documentation, and API tweaks that impact compatibility
> should
> >> >>> be worked on immediately. Everything else please retarget to an
> >> >>> appropriate release.
> >> >>>
> >> >>> ==
> >> >>> But my bug isn't fixed?
> >> >>> ==
> >> >>>
> >> >>> In order to make timely releases, we will typically not hold the
> >> >>> release unless the bug in question is a regression from the previous
> >> >>> release. That being said, if there is something which is a
> regression
> >> >>> that has not been correctly targeted please ping me or a committer
> to
> >> >>> help target the issue.
> >> >>>
> >> >>>