Re: [build system] our colo is having power issues again. there will be a few 'events' this week

2019-09-24 Thread Jungtaek Lim
On Tue, Sep 24, 2019 at 1:50 PM Jungtaek Lim wrote: > > > > Hi Shane, > > > > Thanks for the update, and take care of build system! > > > > Looks like some of build just got failed without test failures, looks > like env. issue. > > > https://amp

Re: [build system] our colo is having power issues again. there will be a few 'events' this week

2019-09-24 Thread Jungtaek Lim
are not supported. Could you please check on this? Thanks, Jungtaek Lim (HeartSaVioR) On Wed, Sep 25, 2019 at 2:48 AM Shane Knapp wrote: > quick update from our colo admin: they are going to keep the colo on > generator power until monday morning and not switch back and forth. > this is great

Re: [DISCUSS] Spark 2.5 release

2019-09-20 Thread Jungtaek Lim
ackport once you consider >>>> the parts needed to make dsv2 stable. >>>> >>>> >>>> >>>> On Fri, Sep 20, 2019 at 10:47 AM, Ryan Blue >>>> wrote: >>>> >>>> Hi everyone, >>>> >>>> In the DSv2 sync this week, we talked about a possible Spark 2.5 >>>> release based on the latest Spark 2.4, but with DSv2 and Java 11 support >>>> added. >>>> >>>> A Spark 2.5 release with these two additions will help people migrate >>>> to Spark 3.0 when it is released because they will be able to use a single >>>> implementation for DSv2 sources that works in both 2.5 and 3.0. Similarly, >>>> upgrading to 3.0 won't also require also updating to Java 11 because users >>>> could update to Java 11 with the 2.5 release and have fewer major changes. >>>> >>>> Another reason to consider a 2.5 release is that many people are >>>> interested in a release with the latest DSv2 API and support for DSv2 SQL. >>>> I'm already going to be backporting DSv2 support to the Spark 2.4 line, so >>>> it makes sense to share this work with the community. >>>> >>>> This release line would just consist of backports like DSv2 and Java 11 >>>> that assist compatibility, to keep the scope of the release small. The >>>> purpose is to assist people moving to 3.0 and not distract from the 3.0 >>>> release. >>>> >>>> Would a Spark 2.5 release help anyone else? Are there any concerns >>>> about this plan? >>>> >>>> >>>> rb >>>> >>>> >>>> -- >>>> Ryan Blue >>>> Software Engineer >>>> Netflix >>>> >>>> >>>> >>>> >>>> -- >>>> Ryan Blue >>>> Software Engineer >>>> Netflix >>>> >>>> >>>> >>>> >>>> -- >>>> Ryan Blue >>>> Software Engineer >>>> Netflix >>>> >>> >>> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >> > -- Name : Jungtaek Lim Blog : http://medium.com/@heartsavior Twitter : http://twitter.com/heartsavior LinkedIn : http://www.linkedin.com/in/heartsavior

Re: [DISCUSS] Spark 2.5 release

2019-09-20 Thread Jungtaek Lim
small correction: confusion -> conflict, so I had to go through and understand parts of the changes On Sat, Sep 21, 2019 at 1:25 PM Jungtaek Lim wrote: > Just 2 cents, I haven't tracked the change of DSv2 (though I needed to > deal with this as the change made confusion on my PRs...

FYI - filed bunch of issues for flaky tests in recent CI builds

2019-09-17 Thread Jungtaek Lim
ere yesterday, but for master build) so any helps are appreciated. Thanks, Jungtaek Lim (HeartSaVioR)

Re: Weird build failures in PR builder

2019-09-16 Thread Jungtaek Lim
ing tomorrow is pause builds, wipe out the > ivy/sbt caches and SparkPullRequestBuilder* dirs on all workers and > see if that helps. > > shane > -- > Shane Knapp > UC Berkeley EECS Research / RISELab Staff Technical Lead > https://rise.cs.berkeley.edu > --

Weird build failures in PR builder

2019-09-16 Thread Jungtaek Lim
lab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110688/consoleFull Any guess/suspect? Thanks, Jungtaek Lim (HeartSaVioR)

Re: Thoughts on Spark 3 release, or a preview release

2019-09-12 Thread Jungtaek Lim
uss necessary info and access in barrier mode + Mesos >>> SPARK-25074 Implement maxNumConcurrentTasks() in >>> MesosFineGrainedSchedulerBackend >>> SPARK-23710 Upgrade the built-in Hive to 2.3.5 for hadoop-3.2 >>> SPARK-25186 Stabilize Data Source V2 API >>> SPARK-25376 Scenarios we should handle but missed in 2.4 for barrier >>> execution mode >>> SPARK-25390 data source V2 API refactoring >>> SPARK-7768 Make user-defined type (UDT) API public >>> SPARK-14922 Alter Table Drop Partition Using Predicate-based Partition >>> Spec >>> SPARK-15691 Refactor and improve Hive support >>> SPARK-15694 Implement ScriptTransformation in sql/core >>> SPARK-16217 Support SELECT INTO statement >>> SPARK-16452 basic INFORMATION_SCHEMA support >>> SPARK-18134 SQL: MapType in Group BY and Joins not working >>> SPARK-18245 Improving support for bucketed table >>> SPARK-19842 Informational Referential Integrity Constraints Support in >>> Spark >>> SPARK-22231 Support of map, filter, withColumn, dropColumn in nested >>> list of structures >>> SPARK-22632 Fix the behavior of timestamp values for R's DataFrame to >>> respect session timezone >>> SPARK-22386 Data Source V2 improvements >>> SPARK-24723 Discuss necessary info and access in barrier mode + YARN >>> >>> - >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >>> >>> >>> -- Name : Jungtaek Lim Blog : http://medium.com/@heartsavior Twitter : http://twitter.com/heartsavior LinkedIn : http://www.linkedin.com/in/heartsavior

Re: Welcoming some new committers and PMC members

2019-09-09 Thread Jungtaek Lim
nd the Spark PMC >> > >> > >> > - >> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> > >> >> >> -- >> Shane Knapp >> UC Berkeley EECS Research / RISELab Staf

Re: Design review of SPARK-28594

2019-09-01 Thread Jungtaek Lim
Great, thanks for reviewing, Felix! On Mon, Sep 2, 2019 at 2:16 AM Felix Cheung wrote: > I did review it and solving this problem makes sense. I will comment in > the JIRA. > > -- > *From:* Jungtaek Lim > *Sent:* Sunday, August 25, 2019 3:34

Re: Providing a namespace for third-party configurations

2019-08-30 Thread Jungtaek Lim
parties to scope their custom configurations to > that area? e.g. Something like `spark.external.[vendor].[whatever]`. > > > > Nick > > --------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org &g

Re: [SS] KafkaSource doesn't use KafkaSourceInitialOffsetWriter for initial offsets?

2019-08-26 Thread Jungtaek Lim
> > On Mon, Aug 26, 2019 at 3:57 PM Jungtaek Lim wrote: > >> Nice finding! I don't see any reason to not use >> KafkaSourceInitialOffsetWriter from KafkaSource, as they're identical. I >> guess it was copied and pasted sometime before and not addressed yet. >> As you

Re: [SS] KafkaSource doesn't use KafkaSourceInitialOffsetWriter for initial offsets?

2019-08-26 Thread Jungtaek Lim
for your patch if you plan to do it. Please let me know. Thanks, Jungtaek Lim (HeartSaVioR) On Mon, Aug 26, 2019 at 8:03 PM Jacek Laskowski wrote: > Hi, > > Just found out that KafkaSource [1] does not > use KafkaSourceInitialOffsetWriter (of KafkaMicroBatchStream) [2] for >

Design review of SPARK-28594

2019-08-25 Thread Jungtaek Lim
and the proposal works orthogonal to current feature. Please let me know if it's not the case and SPIP process is necessary. Thanks, Jungtaek Lim (HeartSaVioR) 1. https://issues.apache.org/jira/browse/SPARK-28594 2. https://docs.google.com/document/d/12bdCC4nA58uveRxpeo8k7kGOI2NRTXmXyBOweSi4YcY/edit?usp

Re: My curation of pending structured streaming PRs to review

2019-08-18 Thread Jungtaek Lim
e in a while. > > One common theme here is 'structured streaming' -- who amongst the > committers feels they are able to review these changes? I sense we > have a shortage there. > -- Name : Jungtaek Lim Blog : http://medium.com/@heartsavior Twitter : http://twitter.com/heartsavior LinkedIn : http://www.linkedin.com/in/heartsavior

Re: Recognizing non-code contributions

2019-08-06 Thread Jungtaek Lim
g chance to get various merits what ASF committers have been enjoying. I hope there's other way to provide these merits while we don't grant "unnecessary" privilege. -Jungtaek Lim (HeartSaVioR) On Tue, Aug 6, 2019 at 10:08 PM Hyukjin Kwon wrote: > I usually make such judgement

Re: DataSourceV2 : Transactional Write support

2019-08-02 Thread Jungtaek Lim
ry instead is intermediate output: inserting into temporal table in executors, and move inserted records to the final table in driver (must be atomic). Thanks, Jungtaek Lim (HeartSaVioR) On Sat, Aug 3, 2019 at 4:56 AM Shiv Prashant Sood wrote: > All, > > I understood that DataSourceV2 supports

My curation of pending structured streaming PRs to review

2019-07-15 Thread Jungtaek Lim
ermark. Please chime in and share your curation if I'm missing something. Thanks, Jungtaek Lim (HeartSaVioR)

Re: Spark SQL upgrade / migration guide: discoverability and content organization

2019-07-14 Thread Jungtaek Lim
release note for this, but migration guide would be better to help for some users from 2.4.x to 3.0.x since release note would be bound to only 3.0.0. -Jungtaek Lim (HeartSaVioR) On Mon, Jul 15, 2019 at 8:25 AM Xiao Li wrote: > Yeah, Josh! All these ideas sound good to me. All the

Re: correctness issue on chained streaming-streaming join

2019-06-12 Thread Jungtaek Lim
to consider per-operator watermark to make things pretty easier for end users to understand. Would like to hear voices on this. -Jungtaek Lim (HeartSaVioR) On Wed, Jun 12, 2019 at 4:41 PM Jungtaek Lim wrote: > Hi devs, > > While helping user in user mailing list, I start to suspect tha

Re: [StructuredStreaming] HDFSBackedStateStoreProvider is leaking .crc files.

2019-06-12 Thread Jungtaek Lim
Nice finding! Given you already pointed out previous issue which fixed similar issue, it would be also easy for you to craft the patch and verify whether the fix resolves your issue. Looking forward to see your patch. Thanks, Jungtaek Lim (HeartSaVioR) On Wed, Jun 12, 2019 at 8:23 PM Gerard

correctness issue on chained streaming-streaming join

2019-06-12 Thread Jungtaek Lim
rim mitigation. For long-term solution, we may want to visit SPARK-26655 [1] which addresses operator-wise watermarks. Thanks, Jungtaek Lim (HeartSaVioR)

Re: [ANNOUNCE] Announcing Apache Spark 2.4.1

2019-04-05 Thread Jungtaek Lim
gt; this release. This release would not have been possible without you. > > > > - > > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- Name : Jungtaek Lim Blog : http://medium.com/@heartsavior Twitter : http://twitter.com/heartsavior LinkedIn : http://www.linkedin.com/in/heartsavior

Re: Request review for long-standing PRs

2019-02-26 Thread Jungtaek Lim
contributors have lots of risks on putting major efforts - shouldn't conflict to what others have been doing privately, should be accepted after putting numerous effort to design and have POC. 2019년 2월 27일 (수) 오전 8:14, Jungtaek Lim 님이 작성: > Thanks Sean, as always, to share your thought quickly! &g

Re: Request review for long-standing PRs

2019-02-26 Thread Jungtaek Lim
uite a lot. I don't know whether they > should be merged. This isn't a 'bug' though; not all changes should be > committed. Simple and targeted is much easier to say yes to, because > you implicitly here ask a lot of people to assume responsibility for > your change. > > On Tue,

Request review for long-standing PRs

2019-02-26 Thread Jungtaek Lim
regardless of size of code diff to merge once committer(s) gave a focus on PR and reviewed. Thanks, Jungtaek Lim (HeartSaVioR) ps. I may agree all committers in SS area could be busy (It might clearly represent SS area lacks committers), but I may not agree they're involved in DSv2 and DSv2

Re: [SS] Allowing stream Sink metadata as part of checkpoint?

2019-02-26 Thread Jungtaek Lim
both things together and finding a way to deal with them. -Jungtaek Lim (HeartSaVioR) 1. https://issues.apache.org/jira/browse/SPARK-26411 2. https://issues.apache.org/jira/browse/SPARK-24295 2019년 2월 26일 (화) 오후 4:28, Arun Mahadevan 님이 작성: > Unless its some sink metadata to be maintai

[SS] Allowing stream Sink metadata as part of checkpoint?

2019-02-25 Thread Jungtaek Lim
intentional to share between queries - but sometimes we may want to make it coupled with query checkpoint. What do you think about passing metadata path to sink (we have only one for query) so that sink metadata can be coupled with query checkpoint? Thanks, Jungtaek Lim (HeartSaVioR) 1. https

Re: Time to cut an Apache 2.4.1 release?

2019-02-11 Thread Jungtaek Lim
Given SPARK-26154 [1] is a correctness issue and PR [2] is submitted, I hope it can be reviewed and included within Spark 2.4.1 - otherwise it will be a long-live correctness issue. Thanks, Jungtaek Lim (HeartSaVioR) 1. https://issues.apache.org/jira/browse/SPARK-26154 2. https://github.com

Re: [DISCUSS] Change default executor log URLs for YARN

2019-02-10 Thread Jungtaek Lim
behavior doesn’t seem broken... > > > > I could be wrong though. > > > > > > ____ > > From: Ryan Blue > > Sent: Friday, February 8, 2019 4:39 PM > > To: Sean Owen > > Cc: Jungtaek Lim; dev > > Subject: Re: [DISCUSS] Change default executor log

Re: [VOTE] Release Apache Spark 2.3.3 (RC2)

2019-02-10 Thread Jungtaek Lim
FYI: I've been working on stabilizing tests on streaming join and Kafka continuous mode (they're somewhat coupled with - Kafka continuous mode fails after porting back commit on streaming join) for branch-2.3, and I think it's done. https://github.com/apache/spark/pull/23757 2019년 2월 11일 (월) 오전

Re: [DISCUSS] Change default executor log URLs for YARN

2019-02-08 Thread Jungtaek Lim
; implement the behavior you're suggesting: to link to the logs path in YARN >> instead of directly to stderr and stdout. >> > >> > On Fri, Feb 8, 2019 at 3:33 PM Jungtaek Lim wrote: >> >> >> >> Ryan, >> >> >> >> actually I'm not cle

Re: [DISCUSS] Change default executor log URLs for YARN

2019-02-08 Thread Jungtaek Lim
ost of one more hop to get to logs. >> I don't feel strongly about it but think that's a reasonable thing to do. >> >> On Fri, Feb 8, 2019 at 4:57 PM Jungtaek Lim wrote: >> > >> > Let me quote some voices here: seems like they don't participate this >> thr

Re: [DISCUSS] Change default executor log URLs for YARN

2019-02-08 Thread Jungtaek Lim
f there are logs going to other files, then I think making this an option > is reasonable. Otherwise, I think we should leave links as they are. > > rb > > On Thu, Feb 7, 2019 at 12:31 PM Jungtaek Lim wrote: > >> New URL shows all of local logs which includes stdout

Re: [DISCUSS] Change default executor log URLs for YARN

2019-02-07 Thread Jungtaek Lim
the content (one-click) but users have to remove file part manually from URL to access list page. Instead of this we may be able to change default URL to show all of local logs and let users choose which file to read. (though it would be two-clicks to access to actual file) -Jungtaek Lim (HeartSaVioR

[DISCUSS] Change default executor log URLs for YARN

2019-02-07 Thread Jungtaek Lim
Spark UI as well, but I've got suggestion to just change the default log URL. Thanks again, Jungtaek Lim (HeartSaVioR)

Re: [VOTE] Release Apache Spark 2.3.3 (RC2)

2019-02-06 Thread Jungtaek Lim
Might be out of topic: regarding SPARK-24211 (flaky tests in StreamingJoinSuite) I might volunteer to take a look, but if things are not flaky with branch 2.4 and EOL on branch 2.3 is coming sooner (in some months), I wonder we still want to tackle it in any way. 2019년 2월 7일 (목) 오후 2:21, Sean

Re: [VOTE] Release Apache Spark 2.3.3 (RC1)

2019-01-30 Thread Jungtaek Lim
ue, Jan 29, 2019 at 4:57 AM Jungtaek Lim wrote: > >> Regarding PR 23634, it is waiting for getting consensus on the approach >> for the fix, as well as it also needs to have some time to clean up some >> code and move focus to concern backward compatibility. I'm postponing these

Re: Welcome Jose Torres as a Spark committer

2019-01-29 Thread Jungtaek Lim
Congrats Jose! Well deserved. - Jungtaek Lim (HeartSaVioR) 2019년 1월 30일 (수) 오전 5:19, Dongjoon Hyun 님이 작성: > Congrats, Jose! :) > > Bests, > Dongjoon. > > On Tue, Jan 29, 2019 at 11:41 AM Arun Mahadevan wrote: > >> Congrats Jose! Well deserved. >> >> On

Re: [VOTE] Release Apache Spark 2.3.3 (RC1)

2019-01-28 Thread Jungtaek Lim
Regarding PR 23634, it is waiting for getting consensus on the approach for the fix, as well as it also needs to have some time to clean up some code and move focus to concern backward compatibility. I'm postponing these works since I haven't reached consensus on the approach. So it may take some

Re: DSv2 question

2019-01-24 Thread Jungtaek Lim
I guess explaining rationalization would be better to understanding the situation. It's related to skip converting params to lowercase before assigning to Kafka parameter. (https://github.com/apache/spark/pull/23612) If we guarantee lowercase key on interface(s) we can simply pass them to Kafka

Re: Ask for reviewing on Structured Streaming PRs

2019-01-14 Thread Jungtaek Lim
Yes I understand what Reynold stated (as Michael Armbrust stated earlier), and I agree it's major great thing that improvements on CORE/SQL also benefit to SS as well. I just concerned that both of SQL / SS are being impacted with DSv2, but things are going differently between SQL and SS. SQL is

Re: Ask for reviewing on Structured Streaming PRs

2019-01-14 Thread Jungtaek Lim
t; > > There isn't a way to make people work on it, and I personally am not > > interested in it nor have a background in SS. > > > > I did leave some comments on your PR and will see if we can get > > comfortable with merging it, as I presume you are pretty knowledg

Re: Ask for reviewing on Structured Streaming PRs

2019-01-14 Thread Jungtaek Lim
as I presume you are pretty knowledgeable > about the change. > > On Sun, Jan 13, 2019 at 4:55 PM Jungtaek Lim wrote: > > > > Sean, this is actually a fail-back on pinging committers. I know who can > review and merge in SS area, and pinged to them, didn't work. Even there's > a PR w

Re: Ask for reviewing on Structured Streaming PRs

2019-01-13 Thread Jungtaek Lim
contributors and committers for such module, but SS is not. Maybe either other committers who weren't familiar with should try to get familiar and cover the area, or the area needs more committers. -Jungtaek Lim (HeartSaVioR) 2019년 1월 13일 (일) 오후 11:37, Sean Owen 님이 작성: > Jungtaek, the best strat

Re: Ask for reviewing on Structured Streaming PRs

2019-01-12 Thread Jungtaek Lim
I'm sorry but let me remind this, as non-SS PRs are being reviewed accordingly, whereas many of SS PRs (regardless of who create) are still not reviewed and merged in time. 2019년 1월 3일 (목) 오전 7:57, Jungtaek Lim 님이 작성: > Spark devs, happy new year! > > I would like to remind this kind

Re: Ask for reviewing on Structured Streaming PRs

2019-01-02 Thread Jungtaek Lim
Spark devs, happy new year! I would like to remind this kindly, since there was actually no review after initiating the thread. Thanks, Jungtaek Lim (HeartSaVioR) 2018년 12월 12일 (수) 오후 11:12, Vaclav Kosar 님이 작성: > I am also waiting for any finalization of my PR [3]. I seems that SS

Ask for reviewing on Structured Streaming PRs

2018-12-12 Thread Jungtaek Lim
Hi devs, Would I kindly ask for reviewing on PRs for Structured Streaming? I have 5 open pull requests on SS side [1] (earliest PR was opened around 4 months so far), and there looks like couple of PR for others [2] which looks good to be reviewed, too. Thanks in advance, Jungtaek Lim

Re: Why does join use rows that were sent after watermark of 20 seconds?

2018-12-10 Thread Jungtaek Lim
+ INTERVAL 1 HOUR), 2. Join on event-time windows (e.g. ...JOIN ON leftTimeWindow = rightTimeWindow). So yes, join condition should directly deal with timestamp column, otherwise state will grow infinitely. Thanks, Jungtaek Lim (HeartSaVioR) 2018년 12월 11일 (화) 오후 2:52

Re: [SS] FlatMapGroupsWithStateExec with no commitTimeMs metric?

2018-11-25 Thread Jungtaek Lim
Just filed a new JIRA issue [1] as well as PR [2]. - Jungtaek Lim (HeartSaVioR) 1. https://issues.apache.org/jira/browse/SPARK-26170 2. https://github.com/apache/spark/pull/23142 2018년 11월 26일 (월) 오후 3:26, Jungtaek Lim 님이 작성: > Looks like just a kind of missing spot. Just crafted the pa

Re: [SS] FlatMapGroupsWithStateExec with no commitTimeMs metric?

2018-11-25 Thread Jungtaek Lim
Looks like just a kind of missing spot. Just crafted the patch and now in progress of testing. Once done with testing I'll file a new issue and submit a patch. Thanks, Jungtaek Lim (HeartSaVioR) 2018년 11월 26일 (월) 오전 4:49, Burak Yavuz 님이 작성: > Probably just oversight. Anyone is welcome to

Re: Plan on Structured Streaming in next major/minor release?

2018-10-30 Thread Jungtaek Lim
OK thanks for clarifying. I guess it is one of major features in streaming area and nice to add, but also agree it would require huge investigation. 2018년 10월 31일 (수) 오전 8:06, Michael Armbrust 님이 작성: > Agree. Just curious, could you explain what do you mean by "negation"? >> Does it mean

Re: Plan on Structured Streaming in next major/minor release?

2018-10-30 Thread Jungtaek Lim
er and > also create a high performance (i.e. whole stage code-gened) aggregation > operator that understands negation). > Agree. Just curious, could you explain what do you mean by "negation"? Does it mean applying retraction on aggregated? > Thanks again for starting the d

Re: Plan on Structured Streaming in next major/minor release?

2018-10-30 Thread Jungtaek Lim
change) 2018년 10월 22일 (월) 오후 12:25, Jungtaek Lim 님이 작성: > Yeah, the main intention of this thread is to collect interest on possible > feature list for structured streaming. From what I can see in Spark > community, most of the discussions as well as contributions are for SQL, > and I'd

Re: queryable state & streaming

2018-10-21 Thread Jungtaek Lim
alternative approach when it doesn't. Sounds like it is a huge item and can be handled individually. - Jungtaek Lim (HeartSaVioR) 2017년 12월 9일 (토) 오후 10:51, Stavros Kontopoulos 님이 작성: > Nice I was looking for a jira. So I agree we should justify why we are > building something. Now to tha

Re: Plan on Structured Streaming in next major/minor release?

2018-10-21 Thread Jungtaek Lim
discussion thread. For queryable state, at least there seems no workaround in Spark to provide similar thing, especially state is getting bigger. I may have some concerns on the details, but I'll add my thought on the discussion thread. - Jungtaek Lim (HeartSaVioR) 2018년 10월 22일 (월) 오전 1:15, Stavros

Re: Plan on Structured Streaming in next major/minor release?

2018-10-21 Thread Jungtaek Lim
Stavros, if my memory is right, you were trying to drive queryable state, right? Could you summary the progress and the reason why the progress got stopped? 2018년 10월 21일 (일) 오후 10:27, Stavros Kontopoulos < stavros.kontopou...@lightbend.com>님이 작성: > That is a very interesting list thanks. I

Re: Plan on Structured Streaming in next major/minor release?

2018-10-18 Thread Jungtaek Lim
ggregation, whereas State TTL is just work as its name is represented (self-explanatory). Hence State TTL looks valid for all the cases. 2018년 10월 19일 (금) 오후 12:20, Jungtaek Lim 님이 작성: > Hi devs, > > While Spark 2.4.0 is still in progress of release votes, I'm seeing some > pull reques

Re: DataSourceWriter V2 Api questions

2018-10-18 Thread Jungtaek Lim
"move" its data from staging to final destination within storage. So could we consider lessen the contract on DataSource V2 writer, or have a new representation of guarantee for such case so it is not "fully transactional" but another kind of "exactly-once" and not &

Plan on Structured Streaming in next major/minor release?

2018-10-18 Thread Jungtaek Lim
ke to hear others opinions about this. Please also share if there're ongoing efforts on other items for structured streaming. Happy to help out if it needs another hand. Thanks, Jungtaek Lim (HeartSaVioR) 1. https://issues.apache.org/jira/browse/SPARK-10816 2. https://issues.apache.org/jira/br

Re: Structured Streaming with Watermark

2018-10-18 Thread Jungtaek Lim
is still added, please let me know about the version of Spark as well as physical plan (if you don't mind) and I can take a look. Thanks, Jungtaek Lim (HeartSaVioR) 2018년 10월 18일 (목) 오후 5:51, sandeep_katta 님이 작성: > Now I ve added same aggregation query as below but still it is didn't > filter &

Re: welcome a new batch of committers

2018-10-03 Thread Jungtaek Lim
Congrats all! You all deserved it. On Wed, 3 Oct 2018 at 6:35 PM Marco Gaido wrote: > Congrats you all! > > Il giorno mer 3 ott 2018 alle ore 11:29 Liang-Chi Hsieh > ha scritto: > >> >> Congratulations to all new committers! >> >> >> rxin wrote >> > Hi all, >> > >> > The Apache Spark PMC has

Re: [Structured Streaming SPARK-23966] Why non-atomic rename is problem in State Store ?

2018-10-02 Thread Jungtaek Lim
. Jungtaek Lim (HeartSaVioR) 2018년 10월 3일 (수) 오전 2:48, chandan prakash 님이 작성: > Thanks a lot Steve and Jungtaek for your answers. > Steve, > You explained really well in depth. > > I understood that the existing old implementation was not correct for > object store like S3. The

Re: [Structured Streaming SPARK-23966] Why non-atomic rename is problem in State Store ?

2018-09-30 Thread Jungtaek Lim
file is left, both speculative task and task in retrying batch could skip writing and mark as successful, result in partial delta being considered for correct delta file. Does it make sense? Thanks, Jungtaek Lim (HeartSaVioR) 2018년 9월 30일 (일) 오후 4:51, chandan prakash 님이 작성: > Anyone who can cl

Re: SPIP: Support Kafka delegation token in Structured Streaming

2018-09-29 Thread Jungtaek Lim
, Jungtaek Lim (HeartSaVioR) 2018년 9월 27일 (목) 오후 8:58, Gabor Somogyi 님이 작성: > Hi all, > > I am writing this e-mail in order to discuss the delegation token support > for kafka feature which is reported in SPARK-25501 > <https://issues.apache.org/jira/browse/SPARK-25501>. I've pre

Re: [DISCUSS] SPIP: Native support of session window

2018-09-28 Thread Jungtaek Lim
to avoid drawing something which would take non-trivial efforts. New classes are linked to the actual source code so that we can read the code directly whenever curious/wonders about something. Please let me know anytime if something is unclear and need elaboration. -Jungtaek Lim (HeartSaVioR) 2018년

Re: [DISCUSS] SPIP: Native support of session window

2018-09-28 Thread Jungtaek Lim
most of UTs I've added fail but some UTs are for update mode, and the patch doesn't provide same experience with select only session window, so I'm pointing only one UT which is testing basic session window.) -Jungtaek Lim (HeartSaVioR) 2018년 9월 28일 (금) 오후 9:22, Yuanjian Li 님이 작성: > Hi Jungt

[DISCUSS] SPIP: Native support of session window

2018-09-27 Thread Jungtaek Lim
to go too deep on SPIP doc so anyone could review and see the benefit of adopting this. Looking forward to hear your feedback. Thanks, Jungtaek Lim (HeartSaVioR) 1. https://issues.apache.org/jira/browse/SPARK-10816 2. https://docs.google.com/document/d/1_rMLmUSyGzb62RnP2A3WX6D6uRxox8Q_7WcoI_HrTw

Re: [VOTE] SPARK 2.4.0 (RC1)

2018-09-21 Thread Jungtaek Lim
Fan 님이 작성: > Thanks! If both versions are specified, yes we can just remove 3.0.0 > > On Fri, Sep 21, 2018 at 1:38 PM Jungtaek Lim wrote: > >> OK got it. Thanks for clarifying. >> >> I can help checking and modifying version, but not sure the case both >> versi

Re: [VOTE] SPARK 2.4.0 (RC1)

2018-09-20 Thread Jungtaek Lim
hen resolving a ticket, the > default fixed version is 3.0.0. I guess someone forgot to type the fixed > version and lead to this mistake. > > On Fri, Sep 21, 2018 at 1:15 PM Jungtaek Lim wrote: > >> Ah these issues were resolved before branch-2.4 is cut, like SPARK-24441 >> &g

Re: [VOTE] SPARK 2.4.0 (RC1)

2018-09-20 Thread Jungtaek Lim
we set the resolved version to 2.4.1 and then if roll a > new RC we switch the 2.4.1 issues to 2.4.0. > > On Thu, Sep 20, 2018 at 9:55 PM Jungtaek Lim wrote: > >> I also noticed there're some fixed issues which are included in >> branch-2.4 but its versions are still

Re: [VOTE] SPARK 2.4.0 (RC1)

2018-09-20 Thread Jungtaek Lim
I also noticed there're some fixed issues which are included in branch-2.4 but its versions are still 3.0.0. Would we want to update versions to 2.4.0? If we are not planning to run some automations to correct it, I'm happy to fix them. 2018년 9월 20일 (목) 오후 9:22, Weichen Xu 님이 작성: > We need to

Re: DataSourceWriter V2 Api questions

2018-09-10 Thread Jungtaek Lim
t;> care about multi-client transaction. Or using a staging table like Ryan >> described before. >> >> >> >> On Tue, Sep 11, 2018 at 5:10 AM Jungtaek Lim wrote: >> >>> > And regarding the issue that Jungtaek brought up, 2PC doesn't require >>

Re: DataSourceWriter V2 Api questions

2018-09-10 Thread Jungtaek Lim
window of potential failure is pretty >> short for appends. For writers at the partition level it is fine because it >> is just renaming directory, which is atomic. >> >> On Mon, Sep 10, 2018 at 1:40 PM Jungtaek Lim wrote: >> >>> When network partitioning ha

Re: DataSourceWriter V2 Api questions

2018-09-10 Thread Jungtaek Lim
FS, the move is a pretty fast operation so while it is > not completely transactional, the window of potential failure is pretty > short for appends. For writers at the partition level it is fine because it > is just renaming directory, which is atomic. > > On Mon, Sep 10, 2018 at 1:40 PM Jun

Re: DataSourceWriter V2 Api questions

2018-09-10 Thread Jungtaek Lim
transaction to move data from staging table to > final table. > > > > > > On Mon, Sep 10, 2018 at 12:56 PM Jungtaek Lim wrote: > >> I guess we all are aware of limitation of contract on DSv2 writer. >> Actually it can be achieved only with HDFS sink (or other

Re: DataSourceWriter V2 Api questions

2018-09-10 Thread Jungtaek Lim
transaction ends normally means aborting transaction). Spark should also integrate 2PC with its checkpointing mechanism to guarantee completeness of batch. And it might require different integration for continuous mode. Jungtaek Lim (HeartSaVioR) 2018년 9월 11일 (화) 오전 4:37, Arun Mahadevan 님이 작성

Re: data source api v2 refactoring

2018-08-31 Thread Jungtaek Lim
Nice suggestion Reynold and great news to see that Wenchen succeeded prototyping! One thing I would like to make sure is, how continuous mode works with such abstraction. Would continuous mode be also abstracted with Stream, and createScan would provide unbounded Scan? Thanks, Jungtaek Lim

Re: [Proposal] New feature: reconfigurable number of partitions on stateful operators in Structured Streaming

2018-08-05 Thread Jungtaek Lim
018년 8월 5일 (일) 오후 7:28, Jungtaek Lim 님이 작성: > "coalesce" looks like working: I misunderstood it as an efficient version > of "repartition" which does shuffle, so expected it would trigger shuffle. > My proposal would be covered as using "coalesce": thanks

Re: [Proposal] New feature: reconfigurable number of partitions on stateful operators in Structured Streaming

2018-08-05 Thread Jungtaek Lim
eally matter for scalability / elasticity. Thanks, Jungtaek Lim (HeartSaVioR) 2018년 8월 4일 (토) 오전 3:10, Joseph Torres 님이 작성: > I'd agree it might make sense to bundle this into an API. We'd have to > think about whether it's a common enough use case to justify the API > complexity. > > I

Re: [Proposal] New feature: reconfigurable number of partitions on stateful operators in Structured Streaming

2018-08-03 Thread Jungtaek Lim
Joseph Torres wrote: > Scheduling multiple partitions in the same task is basically what > coalesce() does. Is there a reason that doesn't work here? > > On Fri, Aug 3, 2018 at 5:55 AM, Jungtaek Lim wrote: > >> Here's a link for Google docs (anyone can comment): >>

Re: [Proposal] New feature: reconfigurable number of partitions on stateful operators in Structured Streaming

2018-08-03 Thread Jungtaek Lim
h couple of external storages like Redis or HBase or so, but I would avoid the step which requires end users to maintain other system as well. Spark is coupled with specific version of Hadoop, so we could expect that end users could run and maintain HDFS. > Thanks, > > Arun > > > On 2 Augus

[Proposal] New feature: reconfigurable number of partitions on stateful operators in Structured Streaming

2018-08-03 Thread Jungtaek Lim
posal: opinion regarding accept or decline, things to correct in my mail, any suggestions for improvement, etc. Please also let me know if it would be better to move this to google doc or pdf with filing JIRA issue. Thanks, Jungtaek Lim (HeartSaVioR) 1. https://github.com/apache/spark/pull/21718

Re: Review notification bot

2018-07-31 Thread Jungtaek Lim
explicitly) stop contributing the project for various reason, so concerning activeness (or date of commit) would be ideal. I admit above things might be ideal rather than realistic, but just think out loud to see review notification bot more useful for contributors and less annoyed for someone. Thanks, Jun

Re: Asking for reviewing PRs regarding structured streaming

2018-07-26 Thread Jungtaek Lim
by couple of contributors. They're open for 17 days at least and more than 2 months at most. I'm not persuading committers to merge them in 2.4, but hope to see any reactions / reviews so that I can hopefully reflect and take them forward to be ready to merge. - Jungtaek Lim (HeartSaVioR) 2018년 7월 16일

Re: Asking for reviewing PRs regarding structured streaming

2018-07-12 Thread Jungtaek Lim
. tests. Thanks, Jungtaek Lim (HeartSaVioR) 1. https://issues.apache.org/jira/browse/SPARK-24763 2. https://issues.apache.org/jira/browse/SPARK-24763?focusedCommentId=16541367=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16541367 2018년 7월 9일 (월) 오후 5:28, Jungtaek Lim 님이

Re: Asking for reviewing PRs regarding structured streaming

2018-07-09 Thread Jungtaek Lim
additional operations like projection and join, but smaller state row would also give performance benefit, which can offset each other. Please refer the comment in JIRA issue [2] to see the numbers from simple perf. test. Thanks, Jungtaek Lim (HeartSaVioR) 1. https://issues.apache.org/jira/browse/SPARK

Re: Asking for reviewing PRs regarding structured streaming

2018-07-05 Thread Jungtaek Lim
find more flexible way to resolve the issue (SPARK-24717) what I've mentioned in tl;dr. So 3 of 5 issues are coupled so far to track and resolve one issue. Hope that it helps explaining worth of reviews for these patches. Thanks, Jungtaek Lim (HeartSaVioR) 1. https://issues.apache.org/jira/br

Re: Asking for reviewing PRs regarding structured streaming

2018-07-05 Thread Jungtaek Lim
, Jungtaek Lim (HeartSaVioR) 2018년 7월 1일 (일) 오전 6:21, Jungtaek Lim 님이 작성: > Kindly reminder since around 2 weeks passed. I've added more PR during 2 > weeks and even planning to do more. > > 2018년 6월 19일 (화) 오후 6:34, Jungtaek Lim 님이 작성: > >> Hi Spark devs, >> >&g

Re: Asking for reviewing PRs regarding structured streaming

2018-06-30 Thread Jungtaek Lim
Kindly reminder since around 2 weeks passed. I've added more PR during 2 weeks and even planning to do more. 2018년 6월 19일 (화) 오후 6:34, Jungtaek Lim 님이 작성: > Hi Spark devs, > > I have couple of pull requests for structured streaming which are getting > older and fading out from e

Re: RepartitionByKey Behavior

2018-06-21 Thread Jungtaek Lim
It is not possible because the cardinality of the partitioning key is non-deterministic, while partition count should be fixed. There's a chance that cardinality > partition count and then the system can't ensure the requirement. Thanks, Jungtaek Lim (HeartSaVioR) 2018년 6월 22일 (금) 오전 8

Asking for reviewing PRs regarding structured streaming

2018-06-19 Thread Jungtaek Lim
. Thanks in advance, Jungtaek Lim (HeartSaVioR)

Re: TextSocketMicroBatchReader no longer supports nc utility

2018-06-05 Thread Jungtaek Lim
FYI: Filed https://issues.apache.org/jira/browse/SPARK-24466 and provided the patch https://github.com/apache/spark/pull/21497 2018년 6월 5일 (화) 오전 11:30, Jungtaek Lim 님이 작성: > Yeah that's why I initiated this thread, especially socket source is > expected to be used from examples on of

Re: TextSocketMicroBatchReader no longer supports nc utility

2018-06-04 Thread Jungtaek Lim
ree that this is a bug. It's kinda silly that nc does this, > but a socket connector that doesn't work with netcat will surely seem > broken to users. It wouldn't be a huge change to defer opening the socket > until a read is actually required. > > On Sun, Jun 3, 2018 at 9:55 PM, Jungtaek Lim w

TextSocketMicroBatchReader no longer supports nc utility

2018-06-03 Thread Jungtaek Lim
and contribute on fixing this if we think this is a bug (otherwise we need to replace nc utility with another one, maybe our own implementation?), but not sure we are happy to apply workaround for specific source. Would like to hear opinions before giving a shot. Thanks, Jungtaek Lim (HeartSaVioR)

<    1   2   3   4