Re: Question about upgrading Kafka client version

2017-03-10 Thread Shixiong(Ryan) Zhu
I did some investigation yesterday and just posted my finds in the ticket.
Please read my latest comment in https://issues.apache.org/
jira/browse/SPARK-18057

On Fri, Mar 10, 2017 at 11:41 AM, Cody Koeninger  wrote:

> There are existing tickets on the issues around kafka versions, e.g.
> https://issues.apache.org/jira/browse/SPARK-18057 that haven't gotten
> any committer weigh-in on direction.
>
> On Thu, Mar 9, 2017 at 12:52 PM, Oscar Batori 
> wrote:
> > Guys,
> >
> > To change the subject from meta-voting...
> >
> > We are doing Spark Streaming against a Kafka setup, everything is pretty
> > standard, and pretty current. In particular we are using Spark 2.1, and
> > Kafka 0.10.1, with batch windows that are quite large (5-10 minutes). The
> > problem we are having is pretty well described in the following excerpt
> from
> > the Spark documentation:
> > "For possible kafkaParams, see Kafka consumer config docs. If your Spark
> > batch duration is larger than the default Kafka heartbeat session timeout
> > (30 seconds), increase heartbeat.interval.ms and session.timeout.ms
> > appropriately. For batches larger than 5 minutes, this will require
> changing
> > group.max.session.timeout.ms on the broker. Note that the example sets
> > enable.auto.commit to false, for discussion see Storing Offsets below."
> >
> > In our case "group.max.session.timeout.ms" is set to default value, and
> our
> > processing time per batch easily exceeds that value. I did some further
> > hunting around and found the following SO post:
> > "KIP-62, decouples heartbeats from calls to poll() via a background
> > heartbeat thread. This, allow for a longer processing time (ie, time
> between
> > two consecutive poll()) than heartbeat interval."
> >
> > This pretty accurately describes our scenario: effectively our per batch
> > processing time is 2-6 minutes, well within the batch window, but in
> excess
> > of the max session timeout between polls, causing the consumer to be
> kicked
> > out of the group.
> >
> > Are there any plans to move the Kafka client up to 0.10.1 and make this
> > feature available to consumers? Or have I missed some helpful
> configuration
> > that would ameliorate this problem? I recognize changing
> > "group.max.session.timeout.ms" is one solution, though it seems doing
> > heartbeat checking outside of implicitly piggy backing on polling seems
> more
> > elegant.
> >
> > -Oscar
> >
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Question about upgrading Kafka client version

2017-03-10 Thread Cody Koeninger
There are existing tickets on the issues around kafka versions, e.g.
https://issues.apache.org/jira/browse/SPARK-18057 that haven't gotten
any committer weigh-in on direction.

On Thu, Mar 9, 2017 at 12:52 PM, Oscar Batori  wrote:
> Guys,
>
> To change the subject from meta-voting...
>
> We are doing Spark Streaming against a Kafka setup, everything is pretty
> standard, and pretty current. In particular we are using Spark 2.1, and
> Kafka 0.10.1, with batch windows that are quite large (5-10 minutes). The
> problem we are having is pretty well described in the following excerpt from
> the Spark documentation:
> "For possible kafkaParams, see Kafka consumer config docs. If your Spark
> batch duration is larger than the default Kafka heartbeat session timeout
> (30 seconds), increase heartbeat.interval.ms and session.timeout.ms
> appropriately. For batches larger than 5 minutes, this will require changing
> group.max.session.timeout.ms on the broker. Note that the example sets
> enable.auto.commit to false, for discussion see Storing Offsets below."
>
> In our case "group.max.session.timeout.ms" is set to default value, and our
> processing time per batch easily exceeds that value. I did some further
> hunting around and found the following SO post:
> "KIP-62, decouples heartbeats from calls to poll() via a background
> heartbeat thread. This, allow for a longer processing time (ie, time between
> two consecutive poll()) than heartbeat interval."
>
> This pretty accurately describes our scenario: effectively our per batch
> processing time is 2-6 minutes, well within the batch window, but in excess
> of the max session timeout between polls, causing the consumer to be kicked
> out of the group.
>
> Are there any plans to move the Kafka client up to 0.10.1 and make this
> feature available to consumers? Or have I missed some helpful configuration
> that would ameliorate this problem? I recognize changing
> "group.max.session.timeout.ms" is one solution, though it seems doing
> heartbeat checking outside of implicitly piggy backing on polling seems more
> elegant.
>
> -Oscar
>
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Spark Improvement Proposals

2017-03-10 Thread Reynold Xin
We can just start using spip label and link to it.



On Fri, Mar 10, 2017 at 9:18 AM, Cody Koeninger  wrote:

> So to be clear, if I translate that google doc to markup and submit a
> PR, you will merge it?
>
> If we're just using "spip" label, that's probably fine, but we still
> need shared filters for open and closed SPIPs so the page can link to
> them.
>
> I do not believe I have jira permissions to share filters, I just
> attempted to edit one of mine and do not see an add shares field.
>
> On Fri, Mar 10, 2017 at 10:54 AM, Sean Owen  wrote:
> > Sure, that seems OK to me. I can merge anything like that.
> > I think anyone can make a new label in JIRA; I don't know if even the
> admins
> > can make a new issue type unfortunately. We may just have to mention a
> > convention involving title and label or something.
> >
> > On Fri, Mar 10, 2017 at 4:52 PM Cody Koeninger 
> wrote:
> >>
> >> I think it ought to be its own page, linked from the more / community
> >> menu dropdowns.
> >>
> >> We also need the jira tag, and for the page to clearly link to filters
> >> that show proposed / completed SPIPs
> >>
> >> On Fri, Mar 10, 2017 at 3:39 AM, Sean Owen  wrote:
> >> > Alrighty, if nobody is objecting, and nobody calls for a VOTE, then,
> >> > let's
> >> > say this document is the SPIP 1.0 process.
> >> >
> >> > I think the next step is just to translate the text to some suitable
> >> > location. I suggest adding it to
> >> > https://github.com/apache/spark-website/blob/asf-site/contributing.md
> >> >
> >> > On Thu, Mar 9, 2017 at 4:55 PM Sean Owen  wrote:
> >> >>
> >> >> I think a VOTE is over-thinking it, and is rarely used, but, can't
> >> >> hurt.
> >> >> Nah, anyone can call a vote. This really isn't that formal. We just
> >> >> want to
> >> >> declare and document consensus.
> >> >>
> >> >> I think SPIP is just a remix of existing process anyway, and don't
> >> >> think
> >> >> it will actually do much anyway, which is why I am sanguine about the
> >> >> whole
> >> >> thing.
> >> >>
> >> >> To bring this to a conclusion, I will just put the contents of the
> doc
> >> >> in
> >> >> an email tomorrow for a VOTE. Raise any objections now.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Spark Improvement Proposals

2017-03-10 Thread Cody Koeninger
Can someone with filter share permissions can make a filter for open
SPIP and one for closed SPIP and share it?

e.g.

project = SPARK AND status in (Open, Reopened, "In Progress") AND
labels=SPIP ORDER BY createdDate DESC

and another with the status closed equivalent

I just made an open ticket with the SPIP label show it should show up

On Fri, Mar 10, 2017 at 11:19 AM, Reynold Xin  wrote:
> We can just start using spip label and link to it.
>
>
>
> On Fri, Mar 10, 2017 at 9:18 AM, Cody Koeninger  wrote:
>>
>> So to be clear, if I translate that google doc to markup and submit a
>> PR, you will merge it?
>>
>> If we're just using "spip" label, that's probably fine, but we still
>> need shared filters for open and closed SPIPs so the page can link to
>> them.
>>
>> I do not believe I have jira permissions to share filters, I just
>> attempted to edit one of mine and do not see an add shares field.
>>
>> On Fri, Mar 10, 2017 at 10:54 AM, Sean Owen  wrote:
>> > Sure, that seems OK to me. I can merge anything like that.
>> > I think anyone can make a new label in JIRA; I don't know if even the
>> > admins
>> > can make a new issue type unfortunately. We may just have to mention a
>> > convention involving title and label or something.
>> >
>> > On Fri, Mar 10, 2017 at 4:52 PM Cody Koeninger 
>> > wrote:
>> >>
>> >> I think it ought to be its own page, linked from the more / community
>> >> menu dropdowns.
>> >>
>> >> We also need the jira tag, and for the page to clearly link to filters
>> >> that show proposed / completed SPIPs
>> >>
>> >> On Fri, Mar 10, 2017 at 3:39 AM, Sean Owen  wrote:
>> >> > Alrighty, if nobody is objecting, and nobody calls for a VOTE, then,
>> >> > let's
>> >> > say this document is the SPIP 1.0 process.
>> >> >
>> >> > I think the next step is just to translate the text to some suitable
>> >> > location. I suggest adding it to
>> >> > https://github.com/apache/spark-website/blob/asf-site/contributing.md
>> >> >
>> >> > On Thu, Mar 9, 2017 at 4:55 PM Sean Owen  wrote:
>> >> >>
>> >> >> I think a VOTE is over-thinking it, and is rarely used, but, can't
>> >> >> hurt.
>> >> >> Nah, anyone can call a vote. This really isn't that formal. We just
>> >> >> want to
>> >> >> declare and document consensus.
>> >> >>
>> >> >> I think SPIP is just a remix of existing process anyway, and don't
>> >> >> think
>> >> >> it will actually do much anyway, which is why I am sanguine about
>> >> >> the
>> >> >> whole
>> >> >> thing.
>> >> >>
>> >> >> To bring this to a conclusion, I will just put the contents of the
>> >> >> doc
>> >> >> in
>> >> >> an email tomorrow for a VOTE. Raise any objections now.
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Spark Improvement Proposals

2017-03-10 Thread Cody Koeninger
So to be clear, if I translate that google doc to markup and submit a
PR, you will merge it?

If we're just using "spip" label, that's probably fine, but we still
need shared filters for open and closed SPIPs so the page can link to
them.

I do not believe I have jira permissions to share filters, I just
attempted to edit one of mine and do not see an add shares field.

On Fri, Mar 10, 2017 at 10:54 AM, Sean Owen  wrote:
> Sure, that seems OK to me. I can merge anything like that.
> I think anyone can make a new label in JIRA; I don't know if even the admins
> can make a new issue type unfortunately. We may just have to mention a
> convention involving title and label or something.
>
> On Fri, Mar 10, 2017 at 4:52 PM Cody Koeninger  wrote:
>>
>> I think it ought to be its own page, linked from the more / community
>> menu dropdowns.
>>
>> We also need the jira tag, and for the page to clearly link to filters
>> that show proposed / completed SPIPs
>>
>> On Fri, Mar 10, 2017 at 3:39 AM, Sean Owen  wrote:
>> > Alrighty, if nobody is objecting, and nobody calls for a VOTE, then,
>> > let's
>> > say this document is the SPIP 1.0 process.
>> >
>> > I think the next step is just to translate the text to some suitable
>> > location. I suggest adding it to
>> > https://github.com/apache/spark-website/blob/asf-site/contributing.md
>> >
>> > On Thu, Mar 9, 2017 at 4:55 PM Sean Owen  wrote:
>> >>
>> >> I think a VOTE is over-thinking it, and is rarely used, but, can't
>> >> hurt.
>> >> Nah, anyone can call a vote. This really isn't that formal. We just
>> >> want to
>> >> declare and document consensus.
>> >>
>> >> I think SPIP is just a remix of existing process anyway, and don't
>> >> think
>> >> it will actually do much anyway, which is why I am sanguine about the
>> >> whole
>> >> thing.
>> >>
>> >> To bring this to a conclusion, I will just put the contents of the doc
>> >> in
>> >> an email tomorrow for a VOTE. Raise any objections now.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Spark Improvement Proposals

2017-03-10 Thread Cody Koeninger
I think it ought to be its own page, linked from the more / community
menu dropdowns.

We also need the jira tag, and for the page to clearly link to filters
that show proposed / completed SPIPs

On Fri, Mar 10, 2017 at 3:39 AM, Sean Owen  wrote:
> Alrighty, if nobody is objecting, and nobody calls for a VOTE, then, let's
> say this document is the SPIP 1.0 process.
>
> I think the next step is just to translate the text to some suitable
> location. I suggest adding it to
> https://github.com/apache/spark-website/blob/asf-site/contributing.md
>
> On Thu, Mar 9, 2017 at 4:55 PM Sean Owen  wrote:
>>
>> I think a VOTE is over-thinking it, and is rarely used, but, can't hurt.
>> Nah, anyone can call a vote. This really isn't that formal. We just want to
>> declare and document consensus.
>>
>> I think SPIP is just a remix of existing process anyway, and don't think
>> it will actually do much anyway, which is why I am sanguine about the whole
>> thing.
>>
>> To bring this to a conclusion, I will just put the contents of the doc in
>> an email tomorrow for a VOTE. Raise any objections now.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Will .count() always trigger an evaluation of each row?

2017-03-10 Thread zero323
Technically speaking it is still possible to:


df.createOrReplaceTempView("df")
spark.sql("CACHE TABLE df")
spark.table("df")




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Will-count-always-trigger-an-evaluation-of-each-row-tp21018p21142.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Spark Improvement Proposals

2017-03-10 Thread Sean Owen
Alrighty, if nobody is objecting, and nobody calls for a VOTE, then, let's
say this document is the SPIP 1.0 process.

I think the next step is just to translate the text to some suitable
location. I suggest adding it to
https://github.com/apache/spark-website/blob/asf-site/contributing.md

On Thu, Mar 9, 2017 at 4:55 PM Sean Owen  wrote:

> I think a VOTE is over-thinking it, and is rarely used, but, can't hurt.
> Nah, anyone can call a vote. This really isn't that formal. We just want to
> declare and document consensus.
>
> I think SPIP is just a remix of existing process anyway, and don't think
> it will actually do much anyway, which is why I am sanguine about the whole
> thing.
>
> To bring this to a conclusion, I will just put the contents of the doc in
> an email tomorrow for a VOTE. Raise any objections now.
>