Re: Structured Streaming with Kafka sources/sinks

2016-08-30 Thread Reynold Xin
In this case simply not much progress has been made, because people might
be busy with other stuff.

Ofir it looks like you have spent non-trivial amount of time thinking about
this topic and have even designed something to work -- can you chime in on
the JIRA ticket with your thoughts and your prototype? That would be
tremendously useful to the project.



On Tue, Aug 30, 2016 at 11:44 PM, Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:

> > I personally find it disappointing that a big chuck of Spark's design
> and development is happening behind closed curtains.
>
> I'm not too familiar with Streaming, but I see design docs and proposals
> for ML and SQL published here and on JIRA all the time, and they are
> discussed extensively.
>
> For example, here are some ML JIRAs with extensive design discussions:
> SPARK-6725 , SPARK-13944
> , SPARK-16365
> 
>
> Nick
>
> On Tue, Aug 30, 2016 at 11:10 AM Cody Koeninger 
> wrote:
>
>> Not that I wouldn't rather have more open communication around this
>> issue...but what are people actually expecting to get out of
>> structured streaming with regard to Kafka?
>>
>> There aren't any realistic pushdown-type optimizations available, and
>> from what I could tell the last time I looked at structured streaming,
>> resolving the event time vs processing time issue was still a ways
>> off.
>>
>> On Tue, Aug 30, 2016 at 1:56 AM, Ofir Manor 
>> wrote:
>> > I personally find it disappointing that a big chuck of Spark's design
>> and
>> > development is happening behind closed curtains. It makes it harder than
>> > necessary for me to work with Spark. We had to improvise in the recent
>> weeks
>> > a temporary solution for reading from Kafka (from Structured Streaming)
>> to
>> > unblock our development, and I feed that if the design and development
>> of
>> > that feature was done in the open, it would have saved us a lot of
>> hassle
>> > (and would reduce the refactoring of our code base).
>> >
>> > It hard not compare it to other Apache projects - for example, I believe
>> > most of the Apache Kafka full-time contributors work at a single
>> company,
>> > but they manage as a community to have a very transparent design and
>> > development process, which seems to work great.
>> >
>> > Ofir Manor
>> >
>> > Co-Founder & CTO | Equalum
>> >
>> > Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io
>> >
>> >
>> > On Mon, Aug 29, 2016 at 10:39 PM, Fred Reiss 
>> wrote:
>> >>
>> >> I think that the community really needs some feedback on the progress
>> of
>> >> this very important task. Many existing Spark Streaming applications
>> can't
>> >> be ported to Structured Streaming without Kafka support.
>> >>
>> >> Is there a design document somewhere?  Or can someone from the
>> DataBricks
>> >> team break down the existing monolithic JIRA issue into smaller steps
>> that
>> >> reflect the current development plan?
>> >>
>> >> Fred
>> >>
>> >>
>> >> On Sat, Aug 27, 2016 at 2:32 PM, Koert Kuipers 
>> wrote:
>> >>>
>> >>> thats great
>> >>>
>> >>> is this effort happening anywhere that is publicly visible? github?
>> >>>
>> >>> On Tue, Aug 16, 2016 at 2:04 AM, Reynold Xin 
>> wrote:
>> 
>>  We (the team at Databricks) are working on one currently.
>> 
>> 
>>  On Mon, Aug 15, 2016 at 7:26 PM, Cody Koeninger 
>>  wrote:
>> >
>> > https://issues.apache.org/jira/browse/SPARK-15406
>> >
>> > I'm not working on it (yet?), never got an answer to the question of
>> > who was planning to work on it.
>> >
>> > On Mon, Aug 15, 2016 at 9:12 PM, Guo, Chenzhao <
>> chenzhao@intel.com>
>> > wrote:
>> > > Hi all,
>> > >
>> > >
>> > >
>> > > I’m trying to write Structured Streaming test code and will deal
>> with
>> > > Kafka
>> > > source. Currently Spark 2.0 doesn’t support Kafka sources/sinks.
>> > >
>> > >
>> > >
>> > > I found some Databricks slides saying that Kafka sources/sinks
>> will
>> > > be
>> > > implemented in Spark 2.0, so is there anybody working on this? And
>> > > when will
>> > > it be released?
>> > >
>> > >
>> > >
>> > > Thanks,
>> > >
>> > > Chenzhao Guo
>> >
>> > 
>> -
>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >
>> 
>> >>>
>> >>
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: Structured Streaming with Kafka sources/sinks

2016-08-30 Thread Nicholas Chammas
> I personally find it disappointing that a big chuck of Spark's design and
development is happening behind closed curtains.

I'm not too familiar with Streaming, but I see design docs and proposals
for ML and SQL published here and on JIRA all the time, and they are
discussed extensively.

For example, here are some ML JIRAs with extensive design discussions:
SPARK-6725 , SPARK-13944
, SPARK-16365


Nick

On Tue, Aug 30, 2016 at 11:10 AM Cody Koeninger  wrote:

> Not that I wouldn't rather have more open communication around this
> issue...but what are people actually expecting to get out of
> structured streaming with regard to Kafka?
>
> There aren't any realistic pushdown-type optimizations available, and
> from what I could tell the last time I looked at structured streaming,
> resolving the event time vs processing time issue was still a ways
> off.
>
> On Tue, Aug 30, 2016 at 1:56 AM, Ofir Manor  wrote:
> > I personally find it disappointing that a big chuck of Spark's design and
> > development is happening behind closed curtains. It makes it harder than
> > necessary for me to work with Spark. We had to improvise in the recent
> weeks
> > a temporary solution for reading from Kafka (from Structured Streaming)
> to
> > unblock our development, and I feed that if the design and development of
> > that feature was done in the open, it would have saved us a lot of hassle
> > (and would reduce the refactoring of our code base).
> >
> > It hard not compare it to other Apache projects - for example, I believe
> > most of the Apache Kafka full-time contributors work at a single company,
> > but they manage as a community to have a very transparent design and
> > development process, which seems to work great.
> >
> > Ofir Manor
> >
> > Co-Founder & CTO | Equalum
> >
> > Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io
> >
> >
> > On Mon, Aug 29, 2016 at 10:39 PM, Fred Reiss 
> wrote:
> >>
> >> I think that the community really needs some feedback on the progress of
> >> this very important task. Many existing Spark Streaming applications
> can't
> >> be ported to Structured Streaming without Kafka support.
> >>
> >> Is there a design document somewhere?  Or can someone from the
> DataBricks
> >> team break down the existing monolithic JIRA issue into smaller steps
> that
> >> reflect the current development plan?
> >>
> >> Fred
> >>
> >>
> >> On Sat, Aug 27, 2016 at 2:32 PM, Koert Kuipers 
> wrote:
> >>>
> >>> thats great
> >>>
> >>> is this effort happening anywhere that is publicly visible? github?
> >>>
> >>> On Tue, Aug 16, 2016 at 2:04 AM, Reynold Xin 
> wrote:
> 
>  We (the team at Databricks) are working on one currently.
> 
> 
>  On Mon, Aug 15, 2016 at 7:26 PM, Cody Koeninger 
>  wrote:
> >
> > https://issues.apache.org/jira/browse/SPARK-15406
> >
> > I'm not working on it (yet?), never got an answer to the question of
> > who was planning to work on it.
> >
> > On Mon, Aug 15, 2016 at 9:12 PM, Guo, Chenzhao <
> chenzhao@intel.com>
> > wrote:
> > > Hi all,
> > >
> > >
> > >
> > > I’m trying to write Structured Streaming test code and will deal
> with
> > > Kafka
> > > source. Currently Spark 2.0 doesn’t support Kafka sources/sinks.
> > >
> > >
> > >
> > > I found some Databricks slides saying that Kafka sources/sinks will
> > > be
> > > implemented in Spark 2.0, so is there anybody working on this? And
> > > when will
> > > it be released?
> > >
> > >
> > >
> > > Thanks,
> > >
> > > Chenzhao Guo
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
> 
> >>>
> >>
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Structured Streaming with Kafka sources/sinks

2016-08-30 Thread Cody Koeninger
Not that I wouldn't rather have more open communication around this
issue...but what are people actually expecting to get out of
structured streaming with regard to Kafka?

There aren't any realistic pushdown-type optimizations available, and
from what I could tell the last time I looked at structured streaming,
resolving the event time vs processing time issue was still a ways
off.

On Tue, Aug 30, 2016 at 1:56 AM, Ofir Manor  wrote:
> I personally find it disappointing that a big chuck of Spark's design and
> development is happening behind closed curtains. It makes it harder than
> necessary for me to work with Spark. We had to improvise in the recent weeks
> a temporary solution for reading from Kafka (from Structured Streaming) to
> unblock our development, and I feed that if the design and development of
> that feature was done in the open, it would have saved us a lot of hassle
> (and would reduce the refactoring of our code base).
>
> It hard not compare it to other Apache projects - for example, I believe
> most of the Apache Kafka full-time contributors work at a single company,
> but they manage as a community to have a very transparent design and
> development process, which seems to work great.
>
> Ofir Manor
>
> Co-Founder & CTO | Equalum
>
> Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io
>
>
> On Mon, Aug 29, 2016 at 10:39 PM, Fred Reiss  wrote:
>>
>> I think that the community really needs some feedback on the progress of
>> this very important task. Many existing Spark Streaming applications can't
>> be ported to Structured Streaming without Kafka support.
>>
>> Is there a design document somewhere?  Or can someone from the DataBricks
>> team break down the existing monolithic JIRA issue into smaller steps that
>> reflect the current development plan?
>>
>> Fred
>>
>>
>> On Sat, Aug 27, 2016 at 2:32 PM, Koert Kuipers  wrote:
>>>
>>> thats great
>>>
>>> is this effort happening anywhere that is publicly visible? github?
>>>
>>> On Tue, Aug 16, 2016 at 2:04 AM, Reynold Xin  wrote:

 We (the team at Databricks) are working on one currently.


 On Mon, Aug 15, 2016 at 7:26 PM, Cody Koeninger 
 wrote:
>
> https://issues.apache.org/jira/browse/SPARK-15406
>
> I'm not working on it (yet?), never got an answer to the question of
> who was planning to work on it.
>
> On Mon, Aug 15, 2016 at 9:12 PM, Guo, Chenzhao 
> wrote:
> > Hi all,
> >
> >
> >
> > I’m trying to write Structured Streaming test code and will deal with
> > Kafka
> > source. Currently Spark 2.0 doesn’t support Kafka sources/sinks.
> >
> >
> >
> > I found some Databricks slides saying that Kafka sources/sinks will
> > be
> > implemented in Spark 2.0, so is there anybody working on this? And
> > when will
> > it be released?
> >
> >
> >
> > Thanks,
> >
> > Chenzhao Guo
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

>>>
>>
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Structured Streaming with Kafka sources/sinks

2016-08-30 Thread Ofir Manor
I personally find it disappointing that a big chuck of Spark's design and
development is happening behind closed curtains. It makes it harder than
necessary for me to work with Spark. We had to improvise in the recent
weeks a temporary solution for reading from Kafka (from Structured
Streaming) to unblock our development, and I feed that if the design and
development of that feature was done in the open, it would have saved us a
lot of hassle (and would reduce the refactoring of our code base).

It hard not compare it to other Apache projects - for example, I believe
most of the Apache Kafka full-time contributors work at a single company,
but they manage as a community to have a very transparent design and
development process, which seems to work great.

Ofir Manor

Co-Founder & CTO | Equalum

Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io

On Mon, Aug 29, 2016 at 10:39 PM, Fred Reiss  wrote:

> I think that the community really needs some feedback on the progress of
> this very important task. Many existing Spark Streaming applications can't
> be ported to Structured Streaming without Kafka support.
>
> Is there a design document somewhere?  Or can someone from the DataBricks
> team break down the existing monolithic JIRA issue into smaller steps that
> reflect the current development plan?
>
> Fred
>
>
> On Sat, Aug 27, 2016 at 2:32 PM, Koert Kuipers  wrote:
>
>> thats great
>>
>> is this effort happening anywhere that is publicly visible? github?
>>
>> On Tue, Aug 16, 2016 at 2:04 AM, Reynold Xin  wrote:
>>
>>> We (the team at Databricks) are working on one currently.
>>>
>>>
>>> On Mon, Aug 15, 2016 at 7:26 PM, Cody Koeninger 
>>> wrote:
>>>
 https://issues.apache.org/jira/browse/SPARK-15406

 I'm not working on it (yet?), never got an answer to the question of
 who was planning to work on it.

 On Mon, Aug 15, 2016 at 9:12 PM, Guo, Chenzhao 
 wrote:
 > Hi all,
 >
 >
 >
 > I’m trying to write Structured Streaming test code and will deal with
 Kafka
 > source. Currently Spark 2.0 doesn’t support Kafka sources/sinks.
 >
 >
 >
 > I found some Databricks slides saying that Kafka sources/sinks will be
 > implemented in Spark 2.0, so is there anybody working on this? And
 when will
 > it be released?
 >
 >
 >
 > Thanks,
 >
 > Chenzhao Guo

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


>>>
>>
>


Re: Structured Streaming with Kafka sources/sinks

2016-08-29 Thread Fred Reiss
I think that the community really needs some feedback on the progress of
this very important task. Many existing Spark Streaming applications can't
be ported to Structured Streaming without Kafka support.

Is there a design document somewhere?  Or can someone from the DataBricks
team break down the existing monolithic JIRA issue into smaller steps that
reflect the current development plan?

Fred


On Sat, Aug 27, 2016 at 2:32 PM, Koert Kuipers  wrote:

> thats great
>
> is this effort happening anywhere that is publicly visible? github?
>
> On Tue, Aug 16, 2016 at 2:04 AM, Reynold Xin  wrote:
>
>> We (the team at Databricks) are working on one currently.
>>
>>
>> On Mon, Aug 15, 2016 at 7:26 PM, Cody Koeninger 
>> wrote:
>>
>>> https://issues.apache.org/jira/browse/SPARK-15406
>>>
>>> I'm not working on it (yet?), never got an answer to the question of
>>> who was planning to work on it.
>>>
>>> On Mon, Aug 15, 2016 at 9:12 PM, Guo, Chenzhao 
>>> wrote:
>>> > Hi all,
>>> >
>>> >
>>> >
>>> > I’m trying to write Structured Streaming test code and will deal with
>>> Kafka
>>> > source. Currently Spark 2.0 doesn’t support Kafka sources/sinks.
>>> >
>>> >
>>> >
>>> > I found some Databricks slides saying that Kafka sources/sinks will be
>>> > implemented in Spark 2.0, so is there anybody working on this? And
>>> when will
>>> > it be released?
>>> >
>>> >
>>> >
>>> > Thanks,
>>> >
>>> > Chenzhao Guo
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>>
>


Re: Structured Streaming with Kafka sources/sinks

2016-08-27 Thread Koert Kuipers
thats great

is this effort happening anywhere that is publicly visible? github?

On Tue, Aug 16, 2016 at 2:04 AM, Reynold Xin  wrote:

> We (the team at Databricks) are working on one currently.
>
>
> On Mon, Aug 15, 2016 at 7:26 PM, Cody Koeninger 
> wrote:
>
>> https://issues.apache.org/jira/browse/SPARK-15406
>>
>> I'm not working on it (yet?), never got an answer to the question of
>> who was planning to work on it.
>>
>> On Mon, Aug 15, 2016 at 9:12 PM, Guo, Chenzhao 
>> wrote:
>> > Hi all,
>> >
>> >
>> >
>> > I’m trying to write Structured Streaming test code and will deal with
>> Kafka
>> > source. Currently Spark 2.0 doesn’t support Kafka sources/sinks.
>> >
>> >
>> >
>> > I found some Databricks slides saying that Kafka sources/sinks will be
>> > implemented in Spark 2.0, so is there anybody working on this? And when
>> will
>> > it be released?
>> >
>> >
>> >
>> > Thanks,
>> >
>> > Chenzhao Guo
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>


Re: Structured Streaming with Kafka sources/sinks

2016-08-16 Thread Reynold Xin
We (the team at Databricks) are working on one currently.


On Mon, Aug 15, 2016 at 7:26 PM, Cody Koeninger  wrote:

> https://issues.apache.org/jira/browse/SPARK-15406
>
> I'm not working on it (yet?), never got an answer to the question of
> who was planning to work on it.
>
> On Mon, Aug 15, 2016 at 9:12 PM, Guo, Chenzhao 
> wrote:
> > Hi all,
> >
> >
> >
> > I’m trying to write Structured Streaming test code and will deal with
> Kafka
> > source. Currently Spark 2.0 doesn’t support Kafka sources/sinks.
> >
> >
> >
> > I found some Databricks slides saying that Kafka sources/sinks will be
> > implemented in Spark 2.0, so is there anybody working on this? And when
> will
> > it be released?
> >
> >
> >
> > Thanks,
> >
> > Chenzhao Guo
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Structured Streaming with Kafka sources/sinks

2016-08-15 Thread Cody Koeninger
https://issues.apache.org/jira/browse/SPARK-15406

I'm not working on it (yet?), never got an answer to the question of
who was planning to work on it.

On Mon, Aug 15, 2016 at 9:12 PM, Guo, Chenzhao  wrote:
> Hi all,
>
>
>
> I’m trying to write Structured Streaming test code and will deal with Kafka
> source. Currently Spark 2.0 doesn’t support Kafka sources/sinks.
>
>
>
> I found some Databricks slides saying that Kafka sources/sinks will be
> implemented in Spark 2.0, so is there anybody working on this? And when will
> it be released?
>
>
>
> Thanks,
>
> Chenzhao Guo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org