[jira] [Commented] (SPARK-34427) Session window support in SS
[ https://issues.apache.org/jira/browse/SPARK-34427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17284507#comment-17284507 ] Jungtaek Lim commented on SPARK-34427: -- OK I agree it's going to meaningless argue. I should have raised the discussion to dev@ mailing list. Will do. Please don't get me wrong. My origin concern is that you're trying to preempt major two efforts which would take non-trivial time for each one. There's no prove that there's ongoing work internally - you should have created a design doc or WIP PR if you made a meaningful progress internally, but you shared nothing and just assigned both issues to you and said I'm working on both. Sorry but that's not something I can understand. Again I'm not "just" concerned about this because it conflicts SPARK-10816. You want it? I can give up SPARK-10816 if you want it, though I'd -1 if you don't ensure having design doc, perf test, etc. to make the efforts on par. Just I don't think you can take up multiple major efforts altogether even none of things don't reach the PR (even WIP). I would have no argument if you just do the thing one by one, leaving space for contributors to play with. (Say I have no concern if you let RocksDB stuff be taken over from other contributor to focus on this stuff. Vice versa.) > Session window support in SS > > > Key: SPARK-34427 > URL: https://issues.apache.org/jira/browse/SPARK-34427 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Priority: Major > > Currently structured streaming supports two kinds of windows: tumbling window > and sliding window. Another useful window function is session window. Which > is not supported by SS. We have user requirement to use session window. We'd > like to have this support in the upstream. > About session window, there is some info: > https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#session-windows. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34427) Session window support in SS
[ https://issues.apache.org/jira/browse/SPARK-34427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17284504#comment-17284504 ] L. C. Hsieh commented on SPARK-34427: - Sigh...do you ever see that I say I want to ignore SPARK-10816 in my previous comments? Do I say I don't want to consider the existing effort? I just said (you can look at previous comments, it is unchanged): > From the code size, that (yours) PR is much larger than another. I'm not sure > if from feature perspective they are the same. As it comes to the weekend, I > can take another look at the previous two PRs. > From my side, I'd like to push this feature as we have real use case and > requirement. But I'm not sure if we want to follow up with previous PRs. I am not aware of SPARK-10816 when I created this JIRA with assignee. That's all. I don't know why this JIRA irritates you so much. What I did is NOT that I created this SPARK-34427, then see there is an existing SPARK-10816, then I immediately assign SPARK-34427 or SPARK-10816 to myself to occupy the issue and prevent others working on it... The assignee works like a placholder to notify others the issue is ongoing work or a work on plan. It is not strict and as you did, it can be easier removed or changed. If I don't set it, then other folks might think it is open issue and put some efforts on working on it. That is so called not to step on others toes. Once we figure out from communication with all parties what is best way to have an implementation for the feature, we can definitely change the assignee. I cannot accept your point to explain this assignee case is different. If I am going to assign SPARK-10816 to myself, then it is not acceptable. But I just created a new JIRA we plan to do with assignee. I don't know what is wrong with this usual practice. So sorry, but your point doesn't make sense to me. It is also not what I saw in past years and now in the Spark community. I guess you are unhappy here as I assigned this JIRA because you was working on it, and you think I occupy it. But again, when I created this JIRA with assignee, I don't know there is SPARK-10816 and you worked on it before. I don't mean to occupy the work you have worked on it. Is it clear to you? I don't really want to continue this argument. It is meaningless to me and waste my weekend time. Let me to be clear again: I created this JIRA with assignee because we plan to have this feature. Setting assignee is to prevent others (especially the contributors who are not familiar with Spark community) accidentally think it is open and put their time working on it. We will respect existing efforts. I did not know there is existing SPARK-10816. I need take some time to look at the existing works (they are both big change). Note that there is not only one implementation even in SPARK-10816, and I don't see any cooperation between two implementations. We can have communication between all parties involved and see what is the best way to have the feature. I will like to focus on real work instead of arguing this stuff. If you are interested in continuing pushing the session window. I think I need some time taking look the details of design and code in SPARK-10816 and think how to have the feature in best shape. > Session window support in SS > > > Key: SPARK-34427 > URL: https://issues.apache.org/jira/browse/SPARK-34427 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Priority: Major > > Currently structured streaming supports two kinds of windows: tumbling window > and sliding window. Another useful window function is session window. Which > is not supported by SS. We have user requirement to use session window. We'd > like to have this support in the upstream. > About session window, there is some info: > https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#session-windows. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34427) Session window support in SS
[ https://issues.apache.org/jira/browse/SPARK-34427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17284496#comment-17284496 ] Jungtaek Lim commented on SPARK-34427: -- This assignee case is quite different from what I've seen committers have been doing, because these issues are not "new" (there has been existing efforts just not on right time) and the idea is quite well known so many of contributors can simply plan in parallel. e.g. In SPARK-34198 you'd realize one contributor in FB is also working on the solution in parallel. I don't think we are happy with someone occupies the major feature without even providing design doc or so. No one knows about the plan - no one knows whether the effort is started or even in backlog actually. In parallel, someone may have more progress. Stepping on others toes has been normal in Spark community and setting assignee never avoids it properly. It just makes an unfair competition between contributor and committer. If you want to make clear on the ownership for the major feature, then please prepare SPIP and raise it on dev@ mailing list. That ensures recognition that you're making meaningful progress already, and others could help on reviewing. (Even in that case someone argue with another SPIP, then either collaboration or competition should happen. I don't think committer can simply preempt.) Also, I think we should try to find the JIRA issue which did the same or similar, and leverage the one. There're lots of information and history of efforts which we can leverage "even" we take the different PR. Once you're filing a new JIRA issue and let the old one be ignored then the efforts were lost. I don't think you could simply raise a PR for SPARK-34427 and ask for review, as from SPARK-10816 we found there're various ways to implement it, which requires design doc to make sure the implementation considers these designs as well and picks up the best one. The implementation should also run the performance test and ensure it's superior or at least on par. That establishes the "minimum bar" on the efforts. Before achieving that, consider my voice as -1 on the proposal. To make the comparison easier I think you should really continue your work in SPARK-10816, not here. I'm happy to see some other committer finally found the necessity of the feature, but also unhappy that resurrection of the existing effort is not considered "at first" which would save a bunch of time among us. The existing effort wasn't discarded because of technical issue, that said, the design and implementation are still valid. That wasn't just put right on time. > Session window support in SS > > > Key: SPARK-34427 > URL: https://issues.apache.org/jira/browse/SPARK-34427 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Priority: Major > > Currently structured streaming supports two kinds of windows: tumbling window > and sliding window. Another useful window function is session window. Which > is not supported by SS. We have user requirement to use session window. We'd > like to have this support in the upstream. > About session window, there is some info: > https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#session-windows. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34427) Session window support in SS
[ https://issues.apache.org/jira/browse/SPARK-34427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17284368#comment-17284368 ] L. C. Hsieh commented on SPARK-34427: - Please check the JIRA history and I don't think this is unconventional to assign JIRA issue when there are ongoing works internally without PRs submitted. This works in many years in Spark community. Again, conventionally I do see the committers assign JIRA issues to themselves or other contributors because they are working on it (even PR is not submitted yet), or they plan to do it. That is how the Spark community does in the past and now. So again, if you are against the convention, please raise a discussion to disallow it. Otherwise I don't know why these issues are special for you. We all need to plan what we want to do in Spark community. Opening JIRA issue early can help gather thoughts from others. If we don't assign it, we can easily step on others toes. From your perspective, once a JIRA issue is created and we cannot assign it, it is open for others to work on it. How does the plan work? Then I think no one will be willing to create JIRA issue before really submitting PR. We are experimenting RocksDB work internally so we create SPARK-34198 and assign it. I don't know why it means we occupy major effort in parallel and block others? So we can only work on one JIRA issue at a time? I think these issues are not active in past years. I don't know why when we want to push it and work on it, now we are blocking others??? I'm not saying that we definitely want to push our implementation for SPARK-10816 by abandoning other two efforts in the past. But before any communication ahead, it sounds too harsh to me that after we put the feature on our plan explicitly, then there comes the claim that we should leave the work, otherwise we are blocking others. > Session window support in SS > > > Key: SPARK-34427 > URL: https://issues.apache.org/jira/browse/SPARK-34427 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Priority: Major > > Currently structured streaming supports two kinds of windows: tumbling window > and sliding window. Another useful window function is session window. Which > is not supported by SS. We have user requirement to use session window. We'd > like to have this support in the upstream. > About session window, there is some info: > https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#session-windows. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34427) Session window support in SS
[ https://issues.apache.org/jira/browse/SPARK-34427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17284363#comment-17284363 ] Jungtaek Lim commented on SPARK-34427: -- If you'd like to say assignee is to avoid stepping on others toes, what if I assign myself in SPARK-10816 and claim I own the issue and I have more progress then this? Once I claim then my claim is quite true, right? > Session window support in SS > > > Key: SPARK-34427 > URL: https://issues.apache.org/jira/browse/SPARK-34427 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Priority: Major > > Currently structured streaming supports two kinds of windows: tumbling window > and sliding window. Another useful window function is session window. Which > is not supported by SS. We have user requirement to use session window. We'd > like to have this support in the upstream. > About session window, there is some info: > https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#session-windows. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34427) Session window support in SS
[ https://issues.apache.org/jira/browse/SPARK-34427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17284362#comment-17284362 ] Jungtaek Lim commented on SPARK-34427: -- I'd say SPARK-10816 has no active progress so far because of lack of interest. Once you found the necessity then SPARK-10816 can be unblocked instead of spending another non-trivial time to reinvent the wheel, no? See SPARK-10816, there're SPIP docs, discussions, ideas around them, even perf tests on WIP PRs. These efforts were actually more than a month. What this issue provides? It's just a simple link Flink provides. No design doc, no implementation, no test. This JIRA issue basically does nothing yet. That said, it's far behind than the existing effort. You can't simply say existing PR is complicated than you have in mind unless you can prove it via similar sort of effort, SPIP doc with design doc. I pointed out JIRA issue assignee issue because you're trying to take up multiple major efforts where there're folks in community want to take anything up. You're also assigning yourself on SPARK-34198 which isn't even having a PR up for reviewing, right? Don't try to occupy major efforts in parallel. > Session window support in SS > > > Key: SPARK-34427 > URL: https://issues.apache.org/jira/browse/SPARK-34427 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Priority: Major > > Currently structured streaming supports two kinds of windows: tumbling window > and sliding window. Another useful window function is session window. Which > is not supported by SS. We have user requirement to use session window. We'd > like to have this support in the upstream. > About session window, there is some info: > https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#session-windows. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34427) Session window support in SS
[ https://issues.apache.org/jira/browse/SPARK-34427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17284348#comment-17284348 ] L. C. Hsieh commented on SPARK-34427: - If I don't miss anything, SPARK-10816 has no active progress in more than two years. I don't know about your intention here, but I don't really see that it was actively pushed during such long time. Seems to me the effort was abandoned there and it looks totally okay to me that others can work on it, isn't? About assigning JIRA issue, I'm not sure if you really do not know but basically I remember the committers can assign JIRA issues to themselves if they are working on it. We don't assign the JIRA issues created by contributors to ourselves, because it is really unfair. For JIRA issues created for our ongoing work, this is like a convention by other committers too. It is for not step on others toes. If you are really against it in general, maybe you can raise a discussion to formally disallow it in Spark community. I'm happy to follow it if we finally have a consensus about it. Thanks. > Session window support in SS > > > Key: SPARK-34427 > URL: https://issues.apache.org/jira/browse/SPARK-34427 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Priority: Major > > Currently structured streaming supports two kinds of windows: tumbling window > and sliding window. Another useful window function is session window. Which > is not supported by SS. We have user requirement to use session window. We'd > like to have this support in the upstream. > About session window, there is some info: > https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#session-windows. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34427) Session window support in SS
[ https://issues.apache.org/jira/browse/SPARK-34427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17284342#comment-17284342 ] Jungtaek Lim commented on SPARK-34427: -- Just to make clear, SPARK-10816 has SPIP docs from two different groups and details/comparison docs as well. > Session window support in SS > > > Key: SPARK-34427 > URL: https://issues.apache.org/jira/browse/SPARK-34427 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Priority: Major > > Currently structured streaming supports two kinds of windows: tumbling window > and sliding window. Another useful window function is session window. Which > is not supported by SS. We have user requirement to use session window. We'd > like to have this support in the upstream. > About session window, there is some info: > https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#session-windows. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34427) Session window support in SS
[ https://issues.apache.org/jira/browse/SPARK-34427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17284334#comment-17284334 ] Jungtaek Lim commented on SPARK-34427: -- We can close this one and continue from SPARK-10816. This JIRA issue loses all inputs and efforts in SPARK-10816 which were worth a month. For complexity, I can simply push back linked-list version and it'll reduce 1000+ lines. That was to address one of new requirements in SPARK-10816 and I don't think it should be addressed. I'm also OK to revisit [~XuanYuan] and decide to take one of twos. Both I and [~XuanYuan] are active in the community, so any minor issues could be handled without taking over or new implementation. One thing I would like to say is, let's not assign the JIRA issue - that is against what we do with most JIRA issues, and simply "unfair" to contributors. I'd like to see major efforts be well distributed across community. > Session window support in SS > > > Key: SPARK-34427 > URL: https://issues.apache.org/jira/browse/SPARK-34427 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > Currently structured streaming supports two kinds of windows: tumbling window > and sliding window. Another useful window function is session window. Which > is not supported by SS. We have user requirement to use session window. We'd > like to have this support in the upstream. > About session window, there is some info: > https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#session-windows. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34427) Session window support in SS
[ https://issues.apache.org/jira/browse/SPARK-34427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17284076#comment-17284076 ] L. C. Hsieh commented on SPARK-34427: - Thanks for the quick feedback. I didn't find the previous PRs although I did a quick search of session window + spark structured streaming on Internet. Roughly I think the requirement should be as the same as what Flink provides: static and dynamic gap. But I can confirm again with the customer. Seems to me there were two PRs working on the session window feature. I took a quick look at [~kabhwan] 's and the PR looks pretty complicated than what I have in my mind. From the code size, that PR is much larger than another. I'm not sure if from feature perspective they are the same. As it comes to the weekend, I can take another look at the previous two PRs. >From my side, I'd like to push this feature as we have real use case and >requirement. But I'm not sure if we want to follow up with previous PRs. For some implementations on flatMapGroupsWithState, seems to me it doesn't actually achieve the same feature as real session window feature. > Session window support in SS > > > Key: SPARK-34427 > URL: https://issues.apache.org/jira/browse/SPARK-34427 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > Currently structured streaming supports two kinds of windows: tumbling window > and sliding window. Another useful window function is session window. Which > is not supported by SS. We have user requirement to use session window. We'd > like to have this support in the upstream. > About session window, there is some info: > https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#session-windows. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34427) Session window support in SS
[ https://issues.apache.org/jira/browse/SPARK-34427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17284041#comment-17284041 ] Jungtaek Lim commented on SPARK-34427: -- You'd need to provide the actual use case (yes, user requirement) of session window, as previous concern was that in most cases the session is not only defined as a inactivity gap and they ended up with implementing their own via flatMapGroupsWithState. > Session window support in SS > > > Key: SPARK-34427 > URL: https://issues.apache.org/jira/browse/SPARK-34427 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > Currently structured streaming supports two kinds of windows: tumbling window > and sliding window. Another useful window function is session window. Which > is not supported by SS. We have user requirement to use session window. We'd > like to have this support in the upstream. > About session window, there is some info: > https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#session-windows. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34427) Session window support in SS
[ https://issues.apache.org/jira/browse/SPARK-34427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17284038#comment-17284038 ] Jungtaek Lim commented on SPARK-34427: -- Duplicate of SPARK-10816. I'm happy to revisit and propose a PR again if we succeed to persuade the community on the necessity. (Note that the PR was closed due to the lack of interest.) > Session window support in SS > > > Key: SPARK-34427 > URL: https://issues.apache.org/jira/browse/SPARK-34427 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > Currently structured streaming supports two kinds of windows: tumbling window > and sliding window. Another useful window function is session window. Which > is not supported by SS. We have user requirement to use session window. We'd > like to have this support in the upstream. > About session window, there is some info: > https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#session-windows. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34427) Session window support in SS
[ https://issues.apache.org/jira/browse/SPARK-34427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17284033#comment-17284033 ] L. C. Hsieh commented on SPARK-34427: - cc [~dbtsai] > Session window support in SS > > > Key: SPARK-34427 > URL: https://issues.apache.org/jira/browse/SPARK-34427 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > Currently structured streaming supports two kinds of windows: tumbling window > and sliding window. Another useful window function is session window. Which > is not supported by SS. We have user requirement to use session window. We'd > like to have this support in the upstream. > About session window, there is some info: > https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#session-windows. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org