On Tue, Sep 24, 2019 at 1:50 PM Jungtaek Lim wrote:
> >
> > Hi Shane,
> >
> > Thanks for the update, and take care of build system!
> >
> > Looks like some of build just got failed without test failures, looks
> like env. issue.
> >
> https://amp
are not supported.
Could you please check on this?
Thanks,
Jungtaek Lim (HeartSaVioR)
On Wed, Sep 25, 2019 at 2:48 AM Shane Knapp wrote:
> quick update from our colo admin: they are going to keep the colo on
> generator power until monday morning and not switch back and forth.
> this is great
ackport once you consider
>>>> the parts needed to make dsv2 stable.
>>>>
>>>>
>>>>
>>>> On Fri, Sep 20, 2019 at 10:47 AM, Ryan Blue
>>>> wrote:
>>>>
>>>> Hi everyone,
>>>>
>>>> In the DSv2 sync this week, we talked about a possible Spark 2.5
>>>> release based on the latest Spark 2.4, but with DSv2 and Java 11 support
>>>> added.
>>>>
>>>> A Spark 2.5 release with these two additions will help people migrate
>>>> to Spark 3.0 when it is released because they will be able to use a single
>>>> implementation for DSv2 sources that works in both 2.5 and 3.0. Similarly,
>>>> upgrading to 3.0 won't also require also updating to Java 11 because users
>>>> could update to Java 11 with the 2.5 release and have fewer major changes.
>>>>
>>>> Another reason to consider a 2.5 release is that many people are
>>>> interested in a release with the latest DSv2 API and support for DSv2 SQL.
>>>> I'm already going to be backporting DSv2 support to the Spark 2.4 line, so
>>>> it makes sense to share this work with the community.
>>>>
>>>> This release line would just consist of backports like DSv2 and Java 11
>>>> that assist compatibility, to keep the scope of the release small. The
>>>> purpose is to assist people moving to 3.0 and not distract from the 3.0
>>>> release.
>>>>
>>>> Would a Spark 2.5 release help anyone else? Are there any concerns
>>>> about this plan?
>>>>
>>>>
>>>> rb
>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Software Engineer
>>>> Netflix
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Software Engineer
>>>> Netflix
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Software Engineer
>>>> Netflix
>>>>
>>>
>>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>
--
Name : Jungtaek Lim
Blog : http://medium.com/@heartsavior
Twitter : http://twitter.com/heartsavior
LinkedIn : http://www.linkedin.com/in/heartsavior
small correction: confusion -> conflict, so I had to go through and
understand parts of the changes
On Sat, Sep 21, 2019 at 1:25 PM Jungtaek Lim wrote:
> Just 2 cents, I haven't tracked the change of DSv2 (though I needed to
> deal with this as the change made confusion on my PRs...
ere yesterday, but for master build) so any
helps are appreciated.
Thanks,
Jungtaek Lim (HeartSaVioR)
ing tomorrow is pause builds, wipe out the
> ivy/sbt caches and SparkPullRequestBuilder* dirs on all workers and
> see if that helps.
>
> shane
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>
--
lab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110688/consoleFull
Any guess/suspect?
Thanks,
Jungtaek Lim (HeartSaVioR)
uss necessary info and access in barrier mode + Mesos
>>> SPARK-25074 Implement maxNumConcurrentTasks() in
>>> MesosFineGrainedSchedulerBackend
>>> SPARK-23710 Upgrade the built-in Hive to 2.3.5 for hadoop-3.2
>>> SPARK-25186 Stabilize Data Source V2 API
>>> SPARK-25376 Scenarios we should handle but missed in 2.4 for barrier
>>> execution mode
>>> SPARK-25390 data source V2 API refactoring
>>> SPARK-7768 Make user-defined type (UDT) API public
>>> SPARK-14922 Alter Table Drop Partition Using Predicate-based Partition
>>> Spec
>>> SPARK-15691 Refactor and improve Hive support
>>> SPARK-15694 Implement ScriptTransformation in sql/core
>>> SPARK-16217 Support SELECT INTO statement
>>> SPARK-16452 basic INFORMATION_SCHEMA support
>>> SPARK-18134 SQL: MapType in Group BY and Joins not working
>>> SPARK-18245 Improving support for bucketed table
>>> SPARK-19842 Informational Referential Integrity Constraints Support in
>>> Spark
>>> SPARK-22231 Support of map, filter, withColumn, dropColumn in nested
>>> list of structures
>>> SPARK-22632 Fix the behavior of timestamp values for R's DataFrame to
>>> respect session timezone
>>> SPARK-22386 Data Source V2 improvements
>>> SPARK-24723 Discuss necessary info and access in barrier mode + YARN
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>>>
>>>
--
Name : Jungtaek Lim
Blog : http://medium.com/@heartsavior
Twitter : http://twitter.com/heartsavior
LinkedIn : http://www.linkedin.com/in/heartsavior
nd the Spark PMC
>> >
>> >
>> > -
>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >
>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staf
Great, thanks for reviewing, Felix!
On Mon, Sep 2, 2019 at 2:16 AM Felix Cheung
wrote:
> I did review it and solving this problem makes sense. I will comment in
> the JIRA.
>
> --
> *From:* Jungtaek Lim
> *Sent:* Sunday, August 25, 2019 3:34
parties to scope their custom configurations to
> that area? e.g. Something like `spark.external.[vendor].[whatever]`.
> >
> > Nick
>
> ---------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
&g
>
> On Mon, Aug 26, 2019 at 3:57 PM Jungtaek Lim wrote:
>
>> Nice finding! I don't see any reason to not use
>> KafkaSourceInitialOffsetWriter from KafkaSource, as they're identical. I
>> guess it was copied and pasted sometime before and not addressed yet.
>> As you
for your patch if you plan to do it. Please
let me know.
Thanks,
Jungtaek Lim (HeartSaVioR)
On Mon, Aug 26, 2019 at 8:03 PM Jacek Laskowski wrote:
> Hi,
>
> Just found out that KafkaSource [1] does not
> use KafkaSourceInitialOffsetWriter (of KafkaMicroBatchStream) [2] for
>
and the proposal works orthogonal to current
feature. Please let me know if it's not the case and SPIP process is
necessary.
Thanks,
Jungtaek Lim (HeartSaVioR)
1. https://issues.apache.org/jira/browse/SPARK-28594
2.
https://docs.google.com/document/d/12bdCC4nA58uveRxpeo8k7kGOI2NRTXmXyBOweSi4YcY/edit?usp
e in a while.
>
> One common theme here is 'structured streaming' -- who amongst the
> committers feels they are able to review these changes? I sense we
> have a shortage there.
>
--
Name : Jungtaek Lim
Blog : http://medium.com/@heartsavior
Twitter : http://twitter.com/heartsavior
LinkedIn : http://www.linkedin.com/in/heartsavior
g chance to get various merits what
ASF committers have been enjoying. I hope there's other way to provide
these merits while we don't grant "unnecessary" privilege.
-Jungtaek Lim (HeartSaVioR)
On Tue, Aug 6, 2019 at 10:08 PM Hyukjin Kwon wrote:
> I usually make such judgement
ry instead is intermediate output: inserting into temporal
table in executors, and move inserted records to the final table in driver
(must be atomic).
Thanks,
Jungtaek Lim (HeartSaVioR)
On Sat, Aug 3, 2019 at 4:56 AM Shiv Prashant Sood
wrote:
> All,
>
> I understood that DataSourceV2 supports
ermark.
Please chime in and share your curation if I'm missing something.
Thanks,
Jungtaek Lim (HeartSaVioR)
release note for this, but
migration guide would be better to help for some users from 2.4.x to 3.0.x
since release note would be bound to only 3.0.0.
-Jungtaek Lim (HeartSaVioR)
On Mon, Jul 15, 2019 at 8:25 AM Xiao Li wrote:
> Yeah, Josh! All these ideas sound good to me. All the
to consider per-operator
watermark to make things pretty easier for end users to understand.
Would like to hear voices on this.
-Jungtaek Lim (HeartSaVioR)
On Wed, Jun 12, 2019 at 4:41 PM Jungtaek Lim wrote:
> Hi devs,
>
> While helping user in user mailing list, I start to suspect tha
Nice finding!
Given you already pointed out previous issue which fixed similar issue, it
would be also easy for you to craft the patch and verify whether the fix
resolves your issue. Looking forward to see your patch.
Thanks,
Jungtaek Lim (HeartSaVioR)
On Wed, Jun 12, 2019 at 8:23 PM Gerard
rim mitigation. For long-term solution, we may want
to visit SPARK-26655 [1] which addresses operator-wise watermarks.
Thanks,
Jungtaek Lim (HeartSaVioR)
gt; this release. This release would not have been possible without you.
> >
> > -
> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>
--
Name : Jungtaek Lim
Blog : http://medium.com/@heartsavior
Twitter : http://twitter.com/heartsavior
LinkedIn : http://www.linkedin.com/in/heartsavior
contributors have lots of risks on putting major efforts - shouldn't
conflict to what others have been doing privately, should be accepted after
putting numerous effort to design and have POC.
2019년 2월 27일 (수) 오전 8:14, Jungtaek Lim 님이 작성:
> Thanks Sean, as always, to share your thought quickly!
&g
uite a lot. I don't know whether they
> should be merged. This isn't a 'bug' though; not all changes should be
> committed. Simple and targeted is much easier to say yes to, because
> you implicitly here ask a lot of people to assume responsibility for
> your change.
>
> On Tue,
regardless of size of code diff to merge once
committer(s) gave a focus on PR and reviewed.
Thanks,
Jungtaek Lim (HeartSaVioR)
ps. I may agree all committers in SS area could be busy (It might clearly
represent SS area lacks committers), but I may not agree they're involved
in DSv2 and DSv2
both things together and finding a way to deal with them.
-Jungtaek Lim (HeartSaVioR)
1. https://issues.apache.org/jira/browse/SPARK-26411
2. https://issues.apache.org/jira/browse/SPARK-24295
2019년 2월 26일 (화) 오후 4:28, Arun Mahadevan 님이 작성:
> Unless its some sink metadata to be maintai
intentional to share between queries - but sometimes
we may want to make it coupled with query checkpoint.
What do you think about passing metadata path to sink (we have only one for
query) so that sink metadata can be coupled with query checkpoint?
Thanks,
Jungtaek Lim (HeartSaVioR)
1. https
Given SPARK-26154 [1] is a correctness issue and PR [2] is submitted, I
hope it can be reviewed and included within Spark 2.4.1 - otherwise it will
be a long-live correctness issue.
Thanks,
Jungtaek Lim (HeartSaVioR)
1. https://issues.apache.org/jira/browse/SPARK-26154
2. https://github.com
behavior doesn’t seem broken...
> >
> > I could be wrong though.
> >
> >
> > ____
> > From: Ryan Blue
> > Sent: Friday, February 8, 2019 4:39 PM
> > To: Sean Owen
> > Cc: Jungtaek Lim; dev
> > Subject: Re: [DISCUSS] Change default executor log
FYI: I've been working on stabilizing tests on streaming join and Kafka
continuous mode (they're somewhat coupled with - Kafka continuous mode
fails after porting back commit on streaming join) for branch-2.3, and I
think it's done. https://github.com/apache/spark/pull/23757
2019년 2월 11일 (월) 오전
; implement the behavior you're suggesting: to link to the logs path in YARN
>> instead of directly to stderr and stdout.
>> >
>> > On Fri, Feb 8, 2019 at 3:33 PM Jungtaek Lim wrote:
>> >>
>> >> Ryan,
>> >>
>> >> actually I'm not cle
ost of one more hop to get to logs.
>> I don't feel strongly about it but think that's a reasonable thing to do.
>>
>> On Fri, Feb 8, 2019 at 4:57 PM Jungtaek Lim wrote:
>> >
>> > Let me quote some voices here: seems like they don't participate this
>> thr
f there are logs going to other files, then I think making this an option
> is reasonable. Otherwise, I think we should leave links as they are.
>
> rb
>
> On Thu, Feb 7, 2019 at 12:31 PM Jungtaek Lim wrote:
>
>> New URL shows all of local logs which includes stdout
the
content (one-click) but users have to remove file part manually from URL to
access list page. Instead of this we may be able to change default URL to
show all of local logs and let users choose which file to read. (though it
would be two-clicks to access to actual file)
-Jungtaek Lim (HeartSaVioR
Spark UI as well, but I've
got suggestion to just change the default log URL.
Thanks again,
Jungtaek Lim (HeartSaVioR)
Might be out of topic: regarding SPARK-24211 (flaky tests in
StreamingJoinSuite) I might volunteer to take a look, but if things are not
flaky with branch 2.4 and EOL on branch 2.3 is coming sooner (in some
months), I wonder we still want to tackle it in any way.
2019년 2월 7일 (목) 오후 2:21, Sean
ue, Jan 29, 2019 at 4:57 AM Jungtaek Lim wrote:
>
>> Regarding PR 23634, it is waiting for getting consensus on the approach
>> for the fix, as well as it also needs to have some time to clean up some
>> code and move focus to concern backward compatibility. I'm postponing these
Congrats Jose! Well deserved.
- Jungtaek Lim (HeartSaVioR)
2019년 1월 30일 (수) 오전 5:19, Dongjoon Hyun 님이 작성:
> Congrats, Jose! :)
>
> Bests,
> Dongjoon.
>
> On Tue, Jan 29, 2019 at 11:41 AM Arun Mahadevan wrote:
>
>> Congrats Jose! Well deserved.
>>
>> On
Regarding PR 23634, it is waiting for getting consensus on the approach for
the fix, as well as it also needs to have some time to clean up some code
and move focus to concern backward compatibility. I'm postponing these
works since I haven't reached consensus on the approach.
So it may take some
I guess explaining rationalization would be better to understanding the
situation.
It's related to skip converting params to lowercase before assigning to
Kafka parameter. (https://github.com/apache/spark/pull/23612) If we
guarantee lowercase key on interface(s) we can simply pass them to Kafka
Yes I understand what Reynold stated (as Michael Armbrust stated earlier),
and I agree it's major great thing that improvements on CORE/SQL also
benefit to SS as well.
I just concerned that both of SQL / SS are being impacted with DSv2, but
things are going differently between SQL and SS. SQL is
t;
> > There isn't a way to make people work on it, and I personally am not
> > interested in it nor have a background in SS.
> >
> > I did leave some comments on your PR and will see if we can get
> > comfortable with merging it, as I presume you are pretty knowledg
as I presume you are pretty knowledgeable
> about the change.
>
> On Sun, Jan 13, 2019 at 4:55 PM Jungtaek Lim wrote:
> >
> > Sean, this is actually a fail-back on pinging committers. I know who can
> review and merge in SS area, and pinged to them, didn't work. Even there's
> a PR w
contributors
and committers for such module, but SS is not. Maybe either other
committers who weren't familiar with should try to get familiar and cover
the area, or the area needs more committers.
-Jungtaek Lim (HeartSaVioR)
2019년 1월 13일 (일) 오후 11:37, Sean Owen 님이 작성:
> Jungtaek, the best strat
I'm sorry but let me remind this, as non-SS PRs are being reviewed
accordingly, whereas many of SS PRs (regardless of who create) are still
not reviewed and merged in time.
2019년 1월 3일 (목) 오전 7:57, Jungtaek Lim 님이 작성:
> Spark devs, happy new year!
>
> I would like to remind this kind
Spark devs, happy new year!
I would like to remind this kindly, since there was actually no review
after initiating the thread.
Thanks,
Jungtaek Lim (HeartSaVioR)
2018년 12월 12일 (수) 오후 11:12, Vaclav Kosar 님이 작성:
> I am also waiting for any finalization of my PR [3]. I seems that SS
Hi devs,
Would I kindly ask for reviewing on PRs for Structured Streaming? I have 5
open pull requests on SS side [1] (earliest PR was opened around 4 months
so far), and there looks like couple of PR for others [2] which looks good
to be reviewed, too.
Thanks in advance,
Jungtaek Lim
+ INTERVAL 1 HOUR),
2.
Join on event-time windows (e.g. ...JOIN ON leftTimeWindow =
rightTimeWindow).
So yes, join condition should directly deal with timestamp column,
otherwise state will grow infinitely.
Thanks,
Jungtaek Lim (HeartSaVioR)
2018년 12월 11일 (화) 오후 2:52
Just filed a new JIRA issue [1] as well as PR [2].
- Jungtaek Lim (HeartSaVioR)
1. https://issues.apache.org/jira/browse/SPARK-26170
2. https://github.com/apache/spark/pull/23142
2018년 11월 26일 (월) 오후 3:26, Jungtaek Lim 님이 작성:
> Looks like just a kind of missing spot. Just crafted the pa
Looks like just a kind of missing spot. Just crafted the patch and now in
progress of testing.
Once done with testing I'll file a new issue and submit a patch.
Thanks,
Jungtaek Lim (HeartSaVioR)
2018년 11월 26일 (월) 오전 4:49, Burak Yavuz 님이 작성:
> Probably just oversight. Anyone is welcome to
OK thanks for clarifying. I guess it is one of major features in streaming
area and nice to add, but also agree it would require huge investigation.
2018년 10월 31일 (수) 오전 8:06, Michael Armbrust 님이 작성:
> Agree. Just curious, could you explain what do you mean by "negation"?
>> Does it mean
er and
> also create a high performance (i.e. whole stage code-gened) aggregation
> operator that understands negation).
>
Agree. Just curious, could you explain what do you mean by "negation"? Does
it mean applying retraction on aggregated?
> Thanks again for starting the d
change)
2018년 10월 22일 (월) 오후 12:25, Jungtaek Lim 님이 작성:
> Yeah, the main intention of this thread is to collect interest on possible
> feature list for structured streaming. From what I can see in Spark
> community, most of the discussions as well as contributions are for SQL,
> and I'd
alternative approach when it doesn't. Sounds like it is
a huge item and can be handled individually.
- Jungtaek Lim (HeartSaVioR)
2017년 12월 9일 (토) 오후 10:51, Stavros Kontopoulos 님이
작성:
> Nice I was looking for a jira. So I agree we should justify why we are
> building something. Now to tha
discussion thread.
For queryable state, at least there seems no workaround in Spark to provide
similar thing, especially state is getting bigger. I may have some concerns
on the details, but I'll add my thought on the discussion thread.
- Jungtaek Lim (HeartSaVioR)
2018년 10월 22일 (월) 오전 1:15, Stavros
Stavros, if my memory is right, you were trying to drive queryable state,
right?
Could you summary the progress and the reason why the progress got stopped?
2018년 10월 21일 (일) 오후 10:27, Stavros Kontopoulos <
stavros.kontopou...@lightbend.com>님이 작성:
> That is a very interesting list thanks. I
ggregation,
whereas State TTL is just work as its name is represented
(self-explanatory). Hence State TTL looks valid for all the cases.
2018년 10월 19일 (금) 오후 12:20, Jungtaek Lim 님이 작성:
> Hi devs,
>
> While Spark 2.4.0 is still in progress of release votes, I'm seeing some
> pull reques
"move" its data
from staging to final destination within storage.
So could we consider lessen the contract on DataSource V2 writer, or have a
new representation of guarantee for such case so it is not "fully
transactional" but another kind of "exactly-once" and not &
ke to hear others opinions about this. Please also share if
there're ongoing efforts on other items for structured streaming. Happy to
help out if it needs another hand.
Thanks,
Jungtaek Lim (HeartSaVioR)
1. https://issues.apache.org/jira/browse/SPARK-10816
2. https://issues.apache.org/jira/br
is still added,
please let me know about the version of Spark as well as physical plan (if
you don't mind) and I can take a look.
Thanks,
Jungtaek Lim (HeartSaVioR)
2018년 10월 18일 (목) 오후 5:51, sandeep_katta 님이
작성:
> Now I ve added same aggregation query as below but still it is didn't
> filter
&
Congrats all! You all deserved it.
On Wed, 3 Oct 2018 at 6:35 PM Marco Gaido wrote:
> Congrats you all!
>
> Il giorno mer 3 ott 2018 alle ore 11:29 Liang-Chi Hsieh
> ha scritto:
>
>>
>> Congratulations to all new committers!
>>
>>
>> rxin wrote
>> > Hi all,
>> >
>> > The Apache Spark PMC has
.
Jungtaek Lim (HeartSaVioR)
2018년 10월 3일 (수) 오전 2:48, chandan prakash 님이 작성:
> Thanks a lot Steve and Jungtaek for your answers.
> Steve,
> You explained really well in depth.
>
> I understood that the existing old implementation was not correct for
> object store like S3. The
file is left, both speculative task and task in retrying batch
could skip writing and mark as successful, result in partial delta being
considered for correct delta file.
Does it make sense?
Thanks,
Jungtaek Lim (HeartSaVioR)
2018년 9월 30일 (일) 오후 4:51, chandan prakash 님이 작성:
> Anyone who can cl
,
Jungtaek Lim (HeartSaVioR)
2018년 9월 27일 (목) 오후 8:58, Gabor Somogyi 님이 작성:
> Hi all,
>
> I am writing this e-mail in order to discuss the delegation token support
> for kafka feature which is reported in SPARK-25501
> <https://issues.apache.org/jira/browse/SPARK-25501>. I've pre
to avoid
drawing something which would take non-trivial efforts. New classes are
linked to the actual source code so that we can read the code directly
whenever curious/wonders about something.
Please let me know anytime if something is unclear and need elaboration.
-Jungtaek Lim (HeartSaVioR)
2018년
most of UTs I've added fail but some UTs are for update mode, and
the patch doesn't provide same experience with select only session window,
so I'm pointing only one UT which is testing basic session window.)
-Jungtaek Lim (HeartSaVioR)
2018년 9월 28일 (금) 오후 9:22, Yuanjian Li 님이 작성:
> Hi Jungt
to go too deep on SPIP doc so anyone could review and see the
benefit of adopting this.
Looking forward to hear your feedback.
Thanks,
Jungtaek Lim (HeartSaVioR)
1. https://issues.apache.org/jira/browse/SPARK-10816
2.
https://docs.google.com/document/d/1_rMLmUSyGzb62RnP2A3WX6D6uRxox8Q_7WcoI_HrTw
Fan 님이 작성:
> Thanks! If both versions are specified, yes we can just remove 3.0.0
>
> On Fri, Sep 21, 2018 at 1:38 PM Jungtaek Lim wrote:
>
>> OK got it. Thanks for clarifying.
>>
>> I can help checking and modifying version, but not sure the case both
>> versi
hen resolving a ticket, the
> default fixed version is 3.0.0. I guess someone forgot to type the fixed
> version and lead to this mistake.
>
> On Fri, Sep 21, 2018 at 1:15 PM Jungtaek Lim wrote:
>
>> Ah these issues were resolved before branch-2.4 is cut, like SPARK-24441
>>
&g
we set the resolved version to 2.4.1 and then if roll a
> new RC we switch the 2.4.1 issues to 2.4.0.
>
> On Thu, Sep 20, 2018 at 9:55 PM Jungtaek Lim wrote:
>
>> I also noticed there're some fixed issues which are included in
>> branch-2.4 but its versions are still
I also noticed there're some fixed issues which are included in branch-2.4
but its versions are still 3.0.0. Would we want to update versions to
2.4.0? If we are not planning to run some automations to correct it, I'm
happy to fix them.
2018년 9월 20일 (목) 오후 9:22, Weichen Xu 님이 작성:
> We need to
t;> care about multi-client transaction. Or using a staging table like Ryan
>> described before.
>>
>>
>>
>> On Tue, Sep 11, 2018 at 5:10 AM Jungtaek Lim wrote:
>>
>>> > And regarding the issue that Jungtaek brought up, 2PC doesn't require
>>
window of potential failure is pretty
>> short for appends. For writers at the partition level it is fine because it
>> is just renaming directory, which is atomic.
>>
>> On Mon, Sep 10, 2018 at 1:40 PM Jungtaek Lim wrote:
>>
>>> When network partitioning ha
FS, the move is a pretty fast operation so while it is
> not completely transactional, the window of potential failure is pretty
> short for appends. For writers at the partition level it is fine because it
> is just renaming directory, which is atomic.
>
> On Mon, Sep 10, 2018 at 1:40 PM Jun
transaction to move data from staging table to
> final table.
>
>
>
>
>
> On Mon, Sep 10, 2018 at 12:56 PM Jungtaek Lim wrote:
>
>> I guess we all are aware of limitation of contract on DSv2 writer.
>> Actually it can be achieved only with HDFS sink (or other
transaction ends normally means aborting transaction). Spark should also
integrate 2PC with its checkpointing mechanism to guarantee completeness of
batch. And it might require different integration for continuous mode.
Jungtaek Lim (HeartSaVioR)
2018년 9월 11일 (화) 오전 4:37, Arun Mahadevan 님이 작성
Nice suggestion Reynold and great news to see that Wenchen succeeded
prototyping!
One thing I would like to make sure is, how continuous mode works with such
abstraction. Would continuous mode be also abstracted with Stream, and
createScan would provide unbounded Scan?
Thanks,
Jungtaek Lim
018년 8월 5일 (일) 오후 7:28, Jungtaek Lim 님이 작성:
> "coalesce" looks like working: I misunderstood it as an efficient version
> of "repartition" which does shuffle, so expected it would trigger shuffle.
> My proposal would be covered as using "coalesce": thanks
eally
matter for scalability / elasticity.
Thanks,
Jungtaek Lim (HeartSaVioR)
2018년 8월 4일 (토) 오전 3:10, Joseph Torres 님이 작성:
> I'd agree it might make sense to bundle this into an API. We'd have to
> think about whether it's a common enough use case to justify the API
> complexity.
>
> I
Joseph Torres
wrote:
> Scheduling multiple partitions in the same task is basically what
> coalesce() does. Is there a reason that doesn't work here?
>
> On Fri, Aug 3, 2018 at 5:55 AM, Jungtaek Lim wrote:
>
>> Here's a link for Google docs (anyone can comment):
>>
h couple of external storages like Redis or HBase
or so, but I would avoid the step which requires end users to maintain
other system as well. Spark is coupled with specific version of Hadoop, so
we could expect that end users could run and maintain HDFS.
> Thanks,
>
> Arun
>
>
> On 2 Augus
posal: opinion regarding accept or
decline, things to correct in my mail, any suggestions for improvement, etc.
Please also let me know if it would be better to move this to google doc or
pdf with filing JIRA issue.
Thanks,
Jungtaek Lim (HeartSaVioR)
1. https://github.com/apache/spark/pull/21718
explicitly) stop contributing the project for various reason, so concerning
activeness (or date of commit) would be ideal.
I admit above things might be ideal rather than realistic, but just think
out loud to see review notification bot more useful for contributors and
less annoyed for someone.
Thanks,
Jun
by couple of contributors. They're open for 17 days at
least and more than 2 months at most. I'm not persuading committers to
merge them in 2.4, but hope to see any reactions / reviews so that I can
hopefully reflect and take them forward to be ready to merge.
- Jungtaek Lim (HeartSaVioR)
2018년 7월 16일
. tests.
Thanks,
Jungtaek Lim (HeartSaVioR)
1. https://issues.apache.org/jira/browse/SPARK-24763
2.
https://issues.apache.org/jira/browse/SPARK-24763?focusedCommentId=16541367=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16541367
2018년 7월 9일 (월) 오후 5:28, Jungtaek Lim 님이
additional operations
like projection and join, but smaller state row would also give performance
benefit, which can offset each other.
Please refer the comment in JIRA issue [2] to see the numbers from simple
perf. test.
Thanks,
Jungtaek Lim (HeartSaVioR)
1. https://issues.apache.org/jira/browse/SPARK
find more flexible way to resolve
the issue (SPARK-24717) what I've mentioned in tl;dr.
So 3 of 5 issues are coupled so far to track and resolve one issue. Hope
that it helps explaining worth of reviews for these patches.
Thanks,
Jungtaek Lim (HeartSaVioR)
1. https://issues.apache.org/jira/br
,
Jungtaek Lim (HeartSaVioR)
2018년 7월 1일 (일) 오전 6:21, Jungtaek Lim 님이 작성:
> Kindly reminder since around 2 weeks passed. I've added more PR during 2
> weeks and even planning to do more.
>
> 2018년 6월 19일 (화) 오후 6:34, Jungtaek Lim 님이 작성:
>
>> Hi Spark devs,
>>
>&g
Kindly reminder since around 2 weeks passed. I've added more PR during 2
weeks and even planning to do more.
2018년 6월 19일 (화) 오후 6:34, Jungtaek Lim 님이 작성:
> Hi Spark devs,
>
> I have couple of pull requests for structured streaming which are getting
> older and fading out from e
It is not possible because the cardinality of the partitioning key is
non-deterministic, while partition count should be fixed. There's a chance
that cardinality > partition count and then the system can't ensure the
requirement.
Thanks,
Jungtaek Lim (HeartSaVioR)
2018년 6월 22일 (금) 오전 8
.
Thanks in advance,
Jungtaek Lim (HeartSaVioR)
FYI: Filed https://issues.apache.org/jira/browse/SPARK-24466 and provided
the patch https://github.com/apache/spark/pull/21497
2018년 6월 5일 (화) 오전 11:30, Jungtaek Lim 님이 작성:
> Yeah that's why I initiated this thread, especially socket source is
> expected to be used from examples on of
ree that this is a bug. It's kinda silly that nc does this,
> but a socket connector that doesn't work with netcat will surely seem
> broken to users. It wouldn't be a huge change to defer opening the socket
> until a read is actually required.
>
> On Sun, Jun 3, 2018 at 9:55 PM, Jungtaek Lim w
and contribute on fixing this if we think
this is a bug (otherwise we need to replace nc utility with another one,
maybe our own implementation?), but not sure we are happy to apply
workaround for specific source.
Would like to hear opinions before giving a shot.
Thanks,
Jungtaek Lim (HeartSaVioR)
301 - 395 of 395 matches
Mail list logo