2020 at 3:06 PM Jungtaek Lim
wrote:
> Hi German,
>
> option 1 isn't about "deleting" the old files, as your input directory may
> be accessed by multiple queries. Kafka centralizes the maintenance of input
> data hence possible to apply retention without problem.
>
compatible?
>
> How I see it, I think It would be interesting to have a retention period
> to delete old files and/or the possibility of indicating an offset
> (Timestamp). It would be very "similar" to how we do it with kafka.
>
> WDYT?
>
> On Thu, 30 Jul 2020 at 2
+1 (non-binding, I guess)
Thanks for raising the issue and sorting it out!
On Fri, Jul 31, 2020 at 6:47 AM Holden Karau wrote:
> Hi Spark Developers,
>
> After the discussion of the proposal to amend Spark committer guidelines,
> it appears folks are generally in agreement on policy clarificati
d to consider is the listing cost. is there
> any way we can avoid listing the entire base directory and then filtering
> out the new files. if the data is organized as partitions using date, will
> it help to list only those partitions where new files were added?
>
>
> On Thu, J
bump, is there any interest on this topic?
On Mon, Jul 20, 2020 at 6:21 AM Jungtaek Lim
wrote:
> (Just to add rationalization, you can refer the original mail thread on
> dev@ list to see efforts on addressing problems in file stream source /
> sink -
> https://lists.apache.org
at 6:18 AM Jungtaek Lim
wrote:
> Hi devs,
>
> As I have been going through the various issues on metadata log growing,
> it's not only the issue of sink, but also the issue of source.
> Unlike sink metadata log which entries should be available to the readers,
> the sour
timestamp, which Spark will read from such timestamp and forward order.
This doesn't cover all use cases of "latestFirst", but "latestFirst"
doesn't seem to be natural with the concept of SS (think about watermark),
I'd prefer to support alternatives instead of struggling with "latestFirst".
Would like to hear your opinions.
Thanks,
Jungtaek Lim (HeartSaVioR)
For me merge script worked for python 2.7, but I got some trouble with the
encoding issue (probably from contributor's name) so now I use the merge
script with virtualenv & python 3.7.7.
"python3" would be OK for me as well as it doesn't break virtualenv with
python 3.
On Sat, Jul 18, 2020 at 6:1
On Fri, Jul 17, 2020 at 8:06 AM Holden Karau wrote:
>
>
> On Thu, Jul 16, 2020 at 3:34 PM Jungtaek Lim
> wrote:
>
>> I agree with Wenchen that there are different topics.
>>
> I agree. I mentioned it in my postscript because I wanted to provide the
> context
I agree with Wenchen that there are different topics.
The policy of veto is obvious, as ASF doc describes it with explicitly
saying non-overridable per project. In any way, the approach of resolving
the situation should lead to voters withdrawing their vetoes. There's
nothing to interpret differen
wrote:
>>
>>>
>>> Congratulations !
>>>
>>> Regards,
>>> Mridul
>>>
>>> On Tue, Jul 14, 2020 at 12:37 PM Matei Zaharia
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> The Spark PMC recently voted t
Just submitted the patch: https://github.com/apache/spark/pull/29077
On Tue, Jun 16, 2020 at 3:40 PM Jungtaek Lim
wrote:
> Bump this again. I filed SPARK-31985 [1] and plan to submit a PR in a
> couple of days if there's no voice on the reason we should keep it.
>
> 1. https://
As a side note, I've raised patches for addressing two frequent flaky
tests, CliSuite [1] and HiveSessionImplSuite [2]. Hope this helps to
mitigate the situation.
1. https://github.com/apache/spark/pull/29036
2. https://github.com/apache/spark/pull/29039
On Thu, Jul 9, 2020 at 11:51 AM Hyukjin Kw
at 5:35 AM Jungtaek Lim
wrote:
> Could this be a flaky or persistent issue? It failed with Scala gendoc but
> it didn't fail with the part the PR modified. It ran from worker-05.
>
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125121/consoleFull
>
>
>>> Shane, can we remove .m2 in worker machine 4?
>>>
>>> 2020년 7월 3일 (금) 오전 8:18, Jungtaek Lim 님이
>>> 작성:
>>>
>>>> Looks like Jenkins service itself becomes unstable. It took
>>>> considerable time to just open the test report f
Looks like Jenkins service itself becomes unstable. It took considerable
time to just open the test report for a specific build, and Jenkins doesn't
pick the request on rebuild (retest this, please) in Github comment.
On Thu, Jul 2, 2020 at 2:12 PM Hyukjin Kwon wrote:
> Ah, okay. Actually there
help.
> >>
> >>
> >>
> >> https://issues.apache.org/jira/browse/SPARK-32136
> >>
> >>
> >>
> >> Thanks,
> >>
> >> Jason.
> >>
> >>
> >>
> >> From: Jungtaek Lim
>
>> On Thu, Jun 25, 2020 at 4:58 AM 郑瑞峰 wrote:
>>
>>> I volunteer to be a release manager of 3.0.1, if nobody is working on
>>> this.
>>>
>>>
>>> -- 原始邮件 --
>>> *发件人:* "Gengliang Wang";
&g
Does this count only "new features" (probably major), or also count
"improvements"? I'm aware of a couple of improvements which should be
ideally included in the next release, but if this counts only major new
features then don't feel they should be listed.
On Tue, Jun 30, 2020 at 1:32 AM Holden K
coder in
Spark 3.0.0 pulls the schema from serializer, which removes the problem.
The remaining question is, would we like to fix it in 2.4.x?
On Tue, May 26, 2020 at 2:54 PM Jungtaek Lim
wrote:
> I meant how to interpret Java Beans in Spark are not consistently defined.
>
> Unlike you&
which was throwing OOME.
1. https://github.com/apache/spark/pull/28904
On Sun, Jun 14, 2020 at 4:14 PM Jungtaek Lim
wrote:
> Bump again - hope to get some traction because these issues are either
> long-standing problems or noticeable improvements (each PR has numbers/UI
> graph to s
+1 on a 3.0.1 soon.
Probably it would be nice if some Scala experts can take a look at
https://issues.apache.org/jira/browse/SPARK-32051 and include the fix into
3.0.1 if possible.
Looks like APIs designed to work with Scala 2.11 & Java bring ambiguity in
Scala 2.12 & Java.
On Wed, Jun 24, 2020 a
Great, thanks all for your efforts on the huge step forward!
On Fri, Jun 19, 2020 at 12:13 PM Hyukjin Kwon wrote:
> Yay!
>
> 2020년 6월 19일 (금) 오전 4:46, Mridul Muralidharan 님이 작성:
>
>> Great job everyone ! Congratulations :-)
>>
>> Regards,
>> Mridul
>>
>> On Thu, Jun 18, 2020 at 10:21 AM Reynold
Bump this again. I filed SPARK-31985 [1] and plan to submit a PR in a
couple of days if there's no voice on the reason we should keep it.
1. https://issues.apache.org/jira/browse/SPARK-31985
On Thu, May 21, 2020 at 8:54 AM Jungtaek Lim
wrote:
> Let me share the effect on remo
m/apache/spark/pull/28422
2. https://github.com/apache/spark/pull/28363
3. https://github.com/apache/spark/pull/27620
4. https://github.com/apache/spark/pull/27649
5. https://github.com/apache/spark/pull/27694
On Fri, May 22, 2020 at 12:50 PM Jungtaek Lim
wrote:
> Worth noting that I got s
ong-standing issue or the feature
has been provided for a long time in competitive products.
Thanks,
Jungtaek Lim (HeartSaVioR)
1.
http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Spark-2-5-release-td27963.html#a27979
On Sat, Jun 13, 2020 at 10:13 AM Ryan Blue
wrote:
> +1 for a
I'm seeing the effort of including the correctness issue SPARK-28067 [1] to
3.0.0 via SPARK-31894 [2]. That doesn't seem to be a regression so
technically doesn't block the release, so while it'd be good to weigh its
worth (it requires some SS users to discard the state so might bring less
frighten
are seeing is that it's not checking the property names, just using
> ordering, in your reproducer. That seems different?
>
> On Sun, May 24, 2020 at 3:00 AM Jungtaek Lim
> wrote:
> >
> > OK I just went through the change, and the change breaks bunch of
> existi
eaking change and the
difference would be confusing if we don't explain it enough.
Any thoughts?
On Mon, May 11, 2020 at 1:36 PM Jungtaek Lim
wrote:
> First case is not tied to the batch / streaming as Encoders.bean simply
> fails when inferring schema.
>
> Second case is tied to
Worth noting that I got similar question around local community as well.
These reporters didn't encounter the edge-case, they're encountered the
critical issue in the normal running of streaming query.
On Fri, May 8, 2020 at 4:49 PM Jungtaek Lim
wrote:
> (bump to expose the discu
Looks like there're new blocker issues newly figured out.
* https://issues.apache.org/jira/browse/SPARK-31786
* https://issues.apache.org/jira/browse/SPARK-31761 (not yet marked as
blocker but according to JIRA comment it's a regression issue as well as
correctness issue IMHO)
Let's collect the l
gt; has to be added to maintain the mode.
>
> I mean, I would want all pipelines that I build to work magically without
> me having to put any thought into it, but then I feel most people in this
> email list would be out of jobs. These are typical considerations that you
>
e to drop complete mode. But before then it's more important to build
a consensus that complete mode is only used for few use case (we need to
collect these use cases of course) and the cost of maintenance exceeds the
benefit. For sure I'm open for disagreement.
Thanks,
Jungtaek Lim (Hear
about compatibility, etc. while it never be used in production.
On Tue, May 19, 2020 at 1:14 PM Jungtaek Lim
wrote:
> Hi devs,
>
> during experiment on complete mode I realized we left some incomplete code
> parts on supporting aggregation for continuous mode. (shuffle & coalesce)
&g
ct anyone is working
on this). The functionality is undocumented (as the work was only done
partially) and continuous mode is experimental so I don't feel risks to get
rid of the part.
What do you think? If it makes sense then I'll raise a PR to get rid of the
incomplete codes.
T
n make a consensus on the viewpoint of complete mode and drop supporting
it if we agree with.
Would like to hear everyone's opinions. It would be great if someone brings
the valid cases where complete mode is being used in production.
Thanks,
Jungtaek Lim (HeartSaVioR)
1. https://issues.apache
Looks like the priority of SPARK-31706 [1] is incorrectly marked - it
sounds like a blocker, as SPARK-26785 [2] / SPARK-26956 [3] dropped the
feature of "update" on streaming output mode (as a result) and SPARK-31706
restores it. SPARK-31706 is not yet resolved, which may be valid reason to
roll a
the parameter (even if it
> is hidden)
>
> On Tue, May 12, 2020 at 12:46 PM Ryan Blue wrote:
>
>> +1 for the approach Jungtaek suggests. That will avoid needing to support
>> behavior that is not well understood with minimal changes.
>>
>> On Tue, May
Before I forget, we'd better not forget to change the doc, as create table
doc looks to represent current syntax which will be incorrect later.
On Tue, May 12, 2020 at 5:32 PM Jungtaek Lim
wrote:
> It's not only for end users, but also for us. Spark itself uses the config
> &
t; wrote:
>
>> I'm all for getting the unified syntax into master. The only issue
>> appears to be whether or not to pass the presence of the EXTERNAL keyword
>> through to a catalog in v2. Maybe it's time to start a discuss thread for
>> that issue so we're
Btw another wondering here is, is it good to retain the flag on master as
an intermediate step? Wouldn't it be better for us to start "unified create
table syntax" from scratch?
On Tue, May 12, 2020 at 6:50 AM Jungtaek Lim
wrote:
> I'm sorry, but I have to agree with Ry
.
>>
>> Unless we plan to NOT support the behavior
>> when spark.sql.legacy.createHiveTableByDefault.enabled is disabled, we
>> should not ship Spark 3.0 with SPARK-30098. Otherwise, we will have to deal
>> with this problem for years to come.
>>
>> On Mon,
only relying on the sequence of the columns while
matching row with schema, then it could be affected.)
On Mon, May 11, 2020 at 1:24 PM Wenchen Fan wrote:
> is it a problem only for streaming or it affects batch queries as well?
>
> On Fri, May 8, 2020 at 11:42 PM Jungtaek Lim
> wr
or EXTERNAL is
> specified. This gives us more time to think about how to do it in 3.1.
>
> If you have other ideas, please reply to this thread.
>
> Thanks,
> Wenchen
>
> On Thu, Mar 26, 2020 at 7:28 AM Jungtaek Lim
> wrote:
>
>> Thanks, filed SPARK-31257
>> &
, 2020 at 5:50 PM Wenchen Fan wrote:
> Can you give some simple examples to demonstrate the problem? I think the
> inconsistency would bring problems but don't know how.
>
> On Fri, May 8, 2020 at 3:49 PM Jungtaek Lim
> wrote:
>
>> (bump to expose the discussion
(bump to expose the discussion to more readers)
On Mon, May 4, 2020 at 5:45 PM Jungtaek Lim
wrote:
> Hi devs,
>
> I'm seeing more and more structured streaming end users encountered the
> metadata issues on file stream source and sink. They have been known-issues
> an
(bump to expose the discussion to more readers)
On Mon, May 4, 2020 at 4:57 PM Jungtaek Lim
wrote:
> Hi devs,
>
> There're couple of issues being reported on the user@ mailing list which
> results in being affected by inconsistent schema on Encoders.bean.
>
> 1. Ty
I don't see any new features/functions for these blockers.
For SPARK-31257 (which is filed and marked as a blocker from me), I agree
unifying create table syntax shouldn't be a blocker for Spark 3.0.0, as
that is a new feature, but even we put the proposal aside, the problem
remains the same and I
know there're couple of
alternatives, but I don't think starter would start from there. End users
may just try to find alternatives - not alternative of data source, but
alternative of streaming processing framework.
Thanks,
Jungtaek Lim (HeartSaVioR)
1.
https://lists.apache.org/thread.
want to at least
document the ideal form of the bean Spark expects.
Would like to hear opinions on this.
Thanks,
Jungtaek Lim (HeartSaVioR)
1.
https://lists.apache.org/thread.html/r8f8e680e02955cdf05b4dd34c60a9868288fd10a03f1b1b8627f3d84%40%3Cuser.spark.apache.org%3E
2.
http://mail-archives.apach
Please correct me if I'm missing something. At a glance, your statements
look correct if I understand correctly. I guess it might be simply missed,
but it sounds as pretty trivial one as only a line can be removed safely
which won't affect anything. (filterNot should be retained even we remove
the
Nice addition, looks pretty good!
On Tue, Apr 14, 2020 at 1:17 AM Xiao Li wrote:
> Looks great!
>
> Thanks for making this happen. This is pretty helpful.
>
> Xiao
>
> On Sun, Apr 12, 2020 at 11:52 PM Hyukjin Kwon wrote:
>
>> Okay, now it started to work. Let's see if it works well!
>>
>> 2020년
, 2020 at 10:01 AM Xiao Li wrote:
>
>> Only the low-risk or high-value bug fixes, and the documentation changes
>> are allowed to merge to branch-3.0. I expect all the committers are
>> following the same rules like what we did in the previous releases.
>>
>> Xiao
>
hesitate to test the RC1 (see how many people have been tested RC1 in this
thread), as they probably need to test the same with RC2.
On Thu, Apr 9, 2020 at 5:50 PM Jungtaek Lim
wrote:
> I went through some manually tests for the new features of Structured
> Streaming in Spark 3.0.0. (Please let m
I went through some manually tests for the new features of Structured
Streaming in Spark 3.0.0. (Please let me know if there're more features
we'd like to test manually.)
* file source cleanup - both “archive" and “delete" work. Query fails as
expected when the input directory is the output direct
On Fri, Apr 3, 2020 at 12:31 AM Sean Owen wrote:
> On Wed, Apr 1, 2020 at 10:28 PM Jungtaek Lim
> wrote:
> > The definition of "latest version" would matter, especially there's a
> time we prepare minor+ version release.
> >
> > For example, lots of p
;>> an Improvement applies to; it just isn't that useful. We aren't
>>>> generally going to back-port improvements anyway.
>>>>
>>>> Even for bugs, we don't really need to know that a bug in master
>>>> affects 2.4.5, 2.4.4, 2.4.3
and in worse case (there's no such UT) we should
do E2E manual verification which I would give up.
There should have some balance/threshold, and the balance should be the
thing the community has a consensus.
Would like to hear everyone's voice on this.
Thanks,
Jungtaek Lim (HeartSaVioR)
-1 (non-binding)
I filed SPARK-31257 as a blocker, and now others start to agree that it's a
critical issue which should be dealt before releasing Spark 3.0. Please
refer recent comments in https://github.com/apache/spark/pull/28026
It won't delay the release pretty much, as we can either revert
set it to "false"
and deal with it. WDYT?
On Tue, Mar 31, 2020 at 7:48 AM Jungtaek Lim
wrote:
> I'm not sure I understand the direction of resolution. I'm not saying it's
> just a confusion - it's "ambiguous" and "indeterministic".
>
e:
>
>> I don't have a dog in this race, but: Would it be OK to ship 3.0 with
>> some release notes and/or prominent documentation calling out this issue,
>> and then fixing it in 3.0.1?
>>
>> On Sat, Mar 28, 2020 at 8:45 PM Jungtaek Lim <
>> kabhwan.open
Sat, Mar 28, 2020 at 11:51 AM, Sean Owen wrote:
>
>> I'm also curious - there no open blockers for 3.0 but I know a few are
>> still floating around open to revert changes. What is the status there?
>> From my field of view I'm not aware of other blocking i
entrate these things.
Thanks,
Jungtaek Lim (HeartSaVioR)
On Wed, Mar 25, 2020 at 1:52 PM Xiao Li wrote:
> Let us try to finish the remaining major blockers in the next few days.
> For example, https://issues.apache.org/jira/browse/SPARK-31085
>
> +1 to cut the RC even if we still have the b
bes
> this and doesn't appear to be done.
>
> On Wed, Mar 25, 2020 at 4:03 PM Jungtaek Lim
> wrote:
>
>> UPDATE: Sorry I just missed the PR (
>> https://github.com/apache/spark/pull/28026). I still think it'd be nice
>> to avoid recycling the JIRA iss
UPDATE: Sorry I just missed the PR (
https://github.com/apache/spark/pull/28026). I still think it'd be nice to
avoid recycling the JIRA issue which was resolved before. Shall we have a
new JIRA issue with linking to SPARK-30098, and set proper priority?
On Thu, Mar 26, 2020 at 7:59 AM Jun
Would it be better to prioritize this to make sure the change is included
in Spark 3.0? (Maybe filing an issue and set as a blocker)
Looks like there's consensus that SPARK-30098 brought ambiguous issue which
should be fixed (though the consideration of severity seems to be
different), and once we
Anything would be OK if the create table DDL provides a "clear way" to
expect the table provider "before" they run the query. Great news that it
doesn't require major rework - looking forward to the PR.
Thanks again to jump in and sort this out.
- Jungtaek Lim (HeartSaVi
if Hive specific clauses are being used.
Yes as I said earlier it may make end users' query to be changed, but
better than uncertain.
Btw, if the main purpose to add native syntax and change it by default is
to discontinue supporting Hive create table rule sooner, simply dropping
rule 2 with
their query fits and they don't need to spend a lot
> of time understanding the subtle difference between these 2 syntaxes.
>
> On Wed, Mar 18, 2020 at 7:01 PM Jungtaek Lim
> wrote:
>
>> A bit correction: the example I provided for vice versa is not really a
>> corr
A bit correction: the example I provided for vice versa is not really a
correct case for vice versa. It's actually same case (intended to use rule
2 which is not default) but different result.
On Wed, Mar 18, 2020 at 7:22 PM Jungtaek Lim
wrote:
> My concern is that although we simp
; fields?
>>
>>
>> On Wed, Mar 18, 2020 at 4:38 PM Wenchen Fan wrote:
>>
>>> I think the general guideline is to promote Spark's own CREATE TABLE
>>> syntax instead of the Hive one. Previously these two rules are mutually
>>> exclusive
just writes CREATE TABLE without USING or ROW
> FORMAT or STORED AS, does it matter what table we create? Internally the
> parser rules conflict and we pick the native syntax depending on the rule
> order. But the user-facing behavior looks fine.
>
> CREATE EXTERNAL TABLE is a prob
ry if they intend to
create Hive table. (Given we will also provide legacy option I'm feeling
this is acceptable.)
2. Define "ROW FORMAT" or "STORED AS" as mandatory one.
pros. Less invasive for existing queries.
cons. Less intuitive, because they have been optional and now be
Context.jsonRDD
>>> - SQLContext.load
>>> - SQLContext.jdbc
>>>
>>> If you think these APIs should not be added back, let me know and we can
>>> discuss the items further. In general, I think we should provide more
>>> evidences and discuss
+1 for Sean as well.
Moreover, as I added a voice on previous thread, if we want to be strict
with retaining public API, what we really need to do along with this is
having similar level or stricter of policy for adding public API. If we
don't apply the policy symmetrically, problems would go wors
e balance on this to avoid restricting
ourselves too much, but I feel there's no balance now - most things are
just going through PRs without discussion. It would be ideal we have time
to consider on this.
On Thu, Feb 20, 2020 at 8:50 AM Jungtaek Lim
wrote:
> Apache Spark 2.0 was release
Apache Spark 2.0 was released in July 2016. Assuming the project has been
trying the best to follow the semantic versioning, it is "more than three
years" to wait for the breaking changes. What the community misses to
address necessary breaking changes would be going to be technical debts for
anoth
address in this thread.
>
> Shall we conclude this thread by deciding to document the direct
> relationship between configurations preferably in one prevailing style?
>
>
> 2020년 2월 14일 (금) 오전 11:36, Jungtaek Lim 님이
> 작성:
>
>> Even spark.dynamicAllocation.* doesn't f
7;s do it as our final goal. Otherwise, let's
> simplify it to reduce the overhead rather then having a policy for the
> mid-term specifically.
>
>
> 2020년 2월 13일 (목) 오후 12:24, Jungtaek Lim 님이
> 작성:
>
>> I tend to agree that there should be a time to make thing be consi
+1 Thanks for the proposal. Looks very reasonable to me.
On Thu, Feb 13, 2020 at 10:53 AM Hyukjin Kwon wrote:
> +1.
>
> 2020년 2월 13일 (목) 오전 9:30, Gengliang Wang 님이
> 작성:
>
>> +1, this is really helpful. We should make the SQL configurations
>> consistent and more readable.
>>
>> On Wed, Feb 12,
ts on here. Could I
>>>>>> ask what you guys think about this in general?
>>>>>>
>>>>>> 2020년 2월 12일 (수) 오후 12:02, Hyukjin Kwon 님이 작성:
>>>>>>
>>>>>>> To do that, we should explicitly document such st
k setting 'spark.eventLog.rolling.maxFileSize'
> automatically enables rolling. Then, they realise the log is not rolling
> later after the file
> size becomes bigger.
>
>
> 2020년 2월 12일 (수) 오전 10:47, Jungtaek Lim 님이
> 작성:
>
>> I'm sorry if I miss somethi
ations should have
redundant part of the doc. More redundant if the condition is nested. I
agree this is the good step of "be kind" but less pragmatic.
I'd be happy to follow the consensus we would make in this thread.
Appreciate more voices.
Thanks,
Jungtaek Lim (HeartSaVioR)
On Wed, F
Nice work, Dongjoon! Thanks for the huge efforts on sorting out with
correctness things as well.
On Tue, Feb 11, 2020 at 12:40 PM Wenchen Fan wrote:
> Great Job, Dongjoon!
>
> On Mon, Feb 10, 2020 at 4:18 PM Hyukjin Kwon wrote:
>
>> Thanks Dongjoon!
>>
>> 2020년 2월 9일 (일) 오전 10:49, Takeshi Yamam
Once we decided to cancel the RC1, what about including SPARK-29450 (
https://github.com/apache/spark/pull/27209) into RC2?
SPARK-29450 was merged into master, and Xiao figured out it fixed a
regression, long lasting one (broken at 2.3.0). The link refers the PR for
2.4 branch.
Thanks,
Jungtaek
+1 to have another Spark 2.4 release, as Spark 2.4.4 was released in 4
months old and there's release window for this.
On Mon, Jan 6, 2020 at 12:38 PM Hyukjin Kwon wrote:
> Yeah, I think it's nice to have another maintenance release given Spark
> 3.0 timeline.
>
> 2020년 1월 6일 (월) 오전 7:58, Dongjo
You seem to hit wrong mailing list - please send to Kafka dev. mailing list.
On Fri, Dec 27, 2019 at 8:10 PM jelmer wrote:
> Hi folks,
>
> A while back I opened a pull request (
> https://github.com/apache/kafka/pull/7567 ) that makes it possible to
> produce messages with a null body using the
to get reviewed and
merged later?
Happy Holiday!
Thanks,
Jungtaek Lim (HeartSaVioR)
On Wed, Dec 25, 2019 at 8:36 AM Takeshi Yamamuro
wrote:
> Looks nice, happy holiday, all!
>
> Bests,
> Takeshi
>
> On Wed, Dec 25, 2019 at 3:56 AM Dongjoon Hyun
> wrote:
>
>> +1
Great work, Yuming! Happy Holidays.
On Wed, Dec 25, 2019 at 9:08 AM Dongjoon Hyun
wrote:
> Indeed! Thank you again, Yuming and all.
>
> Bests,
> Dongjoon.
>
>
> On Tue, Dec 24, 2019 at 13:38 Takeshi Yamamuro
> wrote:
>
>> Great work, Yuming!
>>
>> Bests,
>> Takeshi
>>
>> On Wed, Dec 25, 2019 at
If I understand correctly, you'll just want to package your implementation
with your preference of project manager (maven, sbt, etc.) which registers
your dialect implementation into JdbcDialects, and pass the jar and let end
users load the jar. That will automatically do everything and they can us
ll/26845)
On Thu, Dec 12, 2019 at 3:53 AM Nicholas Chammas
wrote:
> Is this something that would be exposed/relevant to the Python API? Or is
> this just for people implementing their own Spark data source?
>
> On Wed, Dec 11, 2019 at 12:35 AM Jungtaek Lim <
> kabhwan.opensou...
Nice, thanks for the answer! I'll craft a PR soon. Thanks again.
On Thu, Dec 12, 2019 at 3:32 AM Ryan Blue wrote:
> Sounds good to me, too.
>
> On Wed, Dec 11, 2019 at 1:18 AM Jungtaek Lim
> wrote:
>
>> Thanks for the quick response, Wenchen!
>>
>> I'
me
> for DataWriter.
>
> On Wed, Dec 11, 2019 at 1:35 PM Jungtaek Lim
> wrote:
>
>> Hi devs,
>>
>> I'd like to propose to add close() on DataWriter explicitly, which is the
>> place for resource cleanup.
>>
>> The rationalization of the propo
tible
changes in Spark 3.0, so I feel it may not matter.
Would love to hear your thoughts.
Thanks in advance,
Jungtaek Lim (HeartSaVioR)
> There are 2 open questions we need to answer:
> 1. How to make sure all tasks are launched at the same time to implement
> 2PC? barrier execution?
> 2. To reach "eventually consistent", we must retry the job until successe.
> How shall we guarantee the job retry?
>
&g
ing on how the input is broken down to multiple batches. By the
definition of ground rule, streaming aggregation is required to be stateful.
Thanks,
Jungtaek Lim (HeartSaVioR)
On Thu, Nov 28, 2019 at 9:17 PM Chitral Verma
wrote:
> Hi Devs,
> I have a query regarding stateless agg
t mitigate the issue heavily... so please treat my idea as
> rough idea just for possible optimization.)
> >
> > But again that's very rough idea, and it won't make sense if the
> expected output is not acceptable as representation.
> >
> > -Jungtaek Lim (HeartSaV
wait... Hmm... Looks like I missed the another point of optimization
here which might mitigate the issue heavily... so please treat my idea as
rough idea just for possible optimization.)
But again that's very rough idea, and it won't make sense if the expected
output is not acceptable as representat
is 100, and
replace sorting 100 elements with sorting 10 elements 11 times. The
difference would be bigger if the number of tasks is bigger.
Just a rough idea so any feedbacks are appreciated.
Thanks,
Jungtaek Lim (HeartSaVioR)
Jacek,
would you mind if I ask for the query to reproduce? Not sure I get you
without having the example of "not working".
Thanks,
Jungtaek Lim (HeartSaVioR)
On Tue, Nov 12, 2019 at 12:04 AM Jacek Laskowski wrote:
> Hi,
>
> I think watermark does not work for StreamingSy
201 - 300 of 413 matches
Mail list logo