+1 (binding)
Best Regards,
Ryan
On Tue, Jun 9, 2020 at 4:24 AM Wenchen Fan wrote:
> +1 (binding)
>
> On Tue, Jun 9, 2020 at 6:15 PM Dr. Kent Yao wrote:
>
>> +1 (non-binding)
>>
>>
>>
>> --
>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>
>>
Hey Jungtaek,
I totally agree with you about the issues of the complete mode you raised
here. However, not all streaming queries have unbounded states and
will grow quickly to a crazy state.
Actually, I found the complete mode is pretty useful when the states are
bounded and small. For example,
"spark.sql("set -v")" returns a Dataset that has all non-internal SQL
configurations. Should be pretty easy to automatically generate a SQL
configuration page.
Best Regards,
Ryan
On Wed, Jan 15, 2020 at 5:47 AM Hyukjin Kwon wrote:
> I think automatically creating a configuration page isn't a
Should we also add a guideline for non Scala tests? Other languages (Java,
Python, R) don't support using string as a test name.
Best Regards,
Ryan
On Thu, Nov 14, 2019 at 4:04 AM Hyukjin Kwon wrote:
> I opened a PR - https://github.com/apache/spark-website/pull/231
>
> 2019년 11월 13일 (수) 오전
Yep, historical reasons. And Netty 4 is under another namespace, so we can
use Netty 3 and Netty 4 in the same JVM.
On Tue, Sep 3, 2019 at 6:15 AM Sean Owen wrote:
> It was for historical reasons; some other transitive dependencies needed
> it.
> I actually was just able to exclude Netty 3 last
We were worried about regression when adding Kafka source v2 because it had
lots of changes. Hence we copy-pasted codes to keep the Kafka source v1
untouched and provided a config to fallback to v1.
On Mon, Aug 26, 2019 at 7:05 AM Jungtaek Lim wrote:
> Thanks! The patch is here:
+1 I have tested it and looks good!
Best Regards,
Ryan
On Sun, Apr 21, 2019 at 8:49 PM Wenchen Fan wrote:
> Yea these should be mentioned in the 2.4.1 release notes.
>
> It seems we only have one ticket that is labeled as "release-notes" for
> 2.4.2:
Forgot to link the ticket that removed the global ScalaReflectionLock:
https://issues.apache.org/jira/browse/SPARK-19810
Best Regards,
Ryan
On Fri, Mar 15, 2019 at 10:40 AM Shixiong(Ryan) Zhu
wrote:
> Hey Sean,
>
> Sounds good to me. At least, it's not worse than any versions prior t
Hey Sean,
Sounds good to me. At least, it's not worse than any versions prior to
2.3.0 which has a global ScalaReflectionLock. In addition, if someone hits
a performance regression caused by this, they probably are creating too
many Encoders. Reusing Encoders is a better solution for this case.
-1. Found an issue in a new 2.4 Java API:
https://issues.apache.org/jira/browse/SPARK-25644 We should fix it in 2.4.0
to avoid future breaking changes.
Best Regards,
Ryan
On Mon, Oct 1, 2018 at 7:22 PM Michael Heuer wrote:
> FYI I’ve open two new issues against 2.4.0 rc2
>
>
Structured Streaming supports standard SQL as the batch queries, so the
users can switch their queries between batch and streaming easily. Could
you clarify what problems SqlStreaming solves and what are the benefits of
the new syntax?
Best Regards,
Ryan
On Thu, Jun 14, 2018 at 7:06 PM, JackyLee
@apache.org>
>>>>>>> wrote:
>>>>>>> > Sure, please feel free to backport.
>>>>>>> >
>>>>>>> > On 20 February 2018 at 18:02, Marcelo Vanzin <van...@cloudera.com>
>>>>>>> wrote:
>>>>>
I'm -1 because of the UI regression https://issues.apache.org/jira
/browse/SPARK-23470 : the All Jobs page may be too slow and cause "read
timeout" when there are lots of jobs and stages. This is one of the most
important pages because when it's broken, it's pretty hard to use Spark Web
UI.
On
+ Jose
On Thu, Jan 25, 2018 at 2:18 PM, Dongjoon Hyun
wrote:
> SPARK-23221 is one of the reasons for Kafka-test-suite deadlock issue.
>
> For the hang issues, it seems not to be marked as a failure correctly in
> Apache Spark Jenkins history.
>
>
> On Thu, Jan 25, 2018
FYI, we reverted a commit in
https://github.com/apache/spark/commit/55dbfbca37ce4c05f83180777ba3d4fe2d96a02e
to fix the issue.
On Fri, Jan 12, 2018 at 11:45 AM, Xin Lu wrote:
> seems like someone should investigate what caused the build time to go up
> an hour and if it's
SQL metrics are collected using SparkListener. If there are no
tasks, org.apache.spark.sql.execution.ui.SQLListener cannot collect any
metrics.
On Thu, Nov 16, 2017 at 1:53 AM, Jacek Laskowski wrote:
> Hi,
>
> I seem to have figured out why the metric is not in the web UI for
+1
On Tue, Nov 7, 2017 at 1:34 PM, Joseph Bradley
wrote:
> +1
>
> On Mon, Nov 6, 2017 at 5:11 PM, Michael Armbrust
> wrote:
>
>> +1
>>
>> On Sat, Nov 4, 2017 at 11:02 AM, Xiao Li wrote:
>>
>>> +1
>>>
>>> 2017-11-04 11:00
stateStoreCoordinator uses runId to deal with a small chance that Spark
cannot turn a bad task down. Please see
https://github.com/apache/spark/pull/18355
On Fri, Oct 27, 2017 at 3:40 AM, Jacek Laskowski wrote:
> Hi,
>
> I'm wondering why
Can we just create those tables once locally using official Spark versions
and commit them? Then the unit tests can just read these files and don't
need to download Spark.
On Thu, Sep 14, 2017 at 8:13 AM, Sean Owen wrote:
> I think the download could use the Apache mirror,
Right now they are safe because the caller also calls synchronized when
using them. This is to avoid copying objects. It's probably a bad design.
If you want to refactor them, PR is welcome.
On Mon, Jun 26, 2017 at 2:27 AM, Oleksandr Vayda
wrote:
> Hi all,
>
> Reading
Hey Assaf,
You need to "v2.2.0" to "v2.2.0-rc5" in GitHub links because there is no
v2.2.0 right now.
On Mon, Jun 26, 2017 at 12:57 AM, assaf.mendelson
wrote:
> Not a show stopper, however, I was looking at the structured streaming
> programming guide and under
I created https://issues.apache.org/jira/browse/SPARK-21123. PR is welcome.
On Thu, Jun 15, 2017 at 10:55 AM, Shixiong(Ryan) Zhu <
shixi...@databricks.com> wrote:
> Good catch. These are file source options. Could you submit a PR to fix
> the doc? Thanks!
>
> On Thu, Jun 15
Good catch. These are file source options. Could you submit a PR to fix the
doc? Thanks!
On Thu, Jun 15, 2017 at 10:46 AM, Mendelson, Assaf
wrote:
> Hi,
>
> I have started to play around with structured streaming and it seems the
> documentation (structured streaming
I took a look at ChannelTrafficShapingHandler. Looks like it's because it
doesn't support FileRegion. Spark's messages use this interface.
See org.apache.spark.network.protocol.MessageWithHeader.
On Tue, Jun 13, 2017 at 4:17 AM, Niu Zhaojie wrote:
> Hi All:
>
> I am trying
I did some investigation yesterday and just posted my finds in the ticket.
Please read my latest comment in https://issues.apache.org/
jira/browse/SPARK-18057
On Fri, Mar 10, 2017 at 11:41 AM, Cody Koeninger wrote:
> There are existing tickets on the issues around kafka
@Sean, I'm using Java 8 but don't see these errors until I manually build
the API docs. Hence I think dropping Java 7 support may not help.
Right now we don't build docs in most of builds as building docs takes a
long time (e.g.,
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-docs/2889/
You used one Spark version to compile your codes but another newer version
to run. As the Source APIs are not stable, Spark doesn't guarantee that
they are binary compatibility.
On Tue, Jan 31, 2017 at 1:39 PM, Sam Elamin wrote:
> Hi Folks
>
>
> I am getting a weird
Congrats Burak & Holden!
On Tue, Jan 24, 2017 at 10:39 AM, Joseph Bradley
wrote:
> Congratulations Burak & Holden!
>
> On Tue, Jan 24, 2017 at 10:33 AM, Dongjoon Hyun
> wrote:
>
>> Great! Congratulations, Burak and Holden.
>>
>> Bests,
>> Dongjoon.
Could you post your codes, please?
On Wed, Jan 11, 2017 at 3:53 PM, Kalvin Chau <kalvinnc...@gmail.com> wrote:
> "spark.speculation" is not set, so it would be whatever the default is.
>
>
> On Wed, Jan 11, 2017 at 3:43 PM Shixiong(Ryan) Zhu <
> shixi...@dat
be documented anywhere.
>
> Does the Kafka 0.10 require the number of cores on an executor be set to
> 1? I didn't see that documented anywhere either.
>
> On Wed, Jan 11, 2017 at 3:27 PM Shixiong(Ryan) Zhu <
> shixi...@databricks.com> wrote:
>
>> Do you change
of the worker threads.
>
> On Wed, Jan 11, 2017 at 2:53 PM Shixiong(Ryan) Zhu <
> shixi...@databricks.com> wrote:
>
> I think you may reuse the kafka DStream (the DStream returned by
> createDirectStream). If you need to read from the same Kafka source, you
> need to create
I think you may reuse the kafka DStream (the DStream returned by
createDirectStream). If you need to read from the same Kafka source, you
need to create another DStream.
On Wed, Jan 11, 2017 at 2:38 PM, Kalvin Chau wrote:
> Hi,
>
> We've been running into
Hi Niek,
That's expected. Just answered on stackoverflow.
On Sun, Dec 25, 2016 at 8:07 AM, Niek wrote:
> Hi,
>
> I described my issue in full detail on
> http://stackoverflow.com/questions/41300223/spark-
>
Hey Prashant. Thanks for your codes. I did some investigation and it turned
out that ContextCleaner is too slow and its "referenceQueue" keeps growing.
My hunch is cleaning broadcast is very slow since it's a blocking call.
On Mon, Dec 19, 2016 at 12:50 PM, Shixiong(Ryan) Z
Hey, Prashant. Could you track the GC root of byte arrays in the heap?
On Sat, Dec 17, 2016 at 10:04 PM, Prashant Sharma
wrote:
> Furthermore, I ran the same thing with 26 GB as the memory, which would
> mean 1.3GB per thread of memory. My jmap
>
Sean, "stress test for failOnDataLoss=false" is because Kafka consumer may
be thrown NPE when a topic is deleted. I added some logic to retry on such
failure, however, it may still fail when topic deletion is too frequent
(the stress test). Just reopened
No. I meant only updating master. It's not worth to update a maintenance
branch unless there are critical issues.
On Mon, Dec 5, 2016 at 5:39 PM, Nicholas Chammas <nicholas.cham...@gmail.com
> wrote:
> You mean just for branch-2.0, right?
>
>
> On Mon, Dec 5, 2016 at 8:35 PM
Hey Nick,
It should be safe to upgrade Netty to the latest 4.0.x version. Could you
submit a PR, please?
On Mon, Dec 5, 2016 at 11:47 AM, Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:
> That file is in Netty 4.0.29, but I believe the PR I referenced is not.
> It's only in Netty 4.0.37
RDD.sparkContext is public:
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD@sparkContext:org.apache.spark.SparkContext
On Mon, Dec 5, 2016 at 1:04 PM, Teng Long wrote:
> Thank you for providing another answer, Holden.
>
> So I did what
If you create a HiveContext before starting StreamingContext, then
`SQLContext.getOrCreate` in foreachRDD will return the HiveContext you
created. You can just call asInstanceOf[HiveContext] to convert it to
HiveContext.
On Tue, Nov 22, 2016 at 8:25 AM, Dirceu Semighini Filho <
I remember it's because you need to run `mvn install` before running
lint-java if the maven cache is empty, and `mvn install` is pretty heavy.
On Tue, Nov 15, 2016 at 1:21 PM, Marcelo Vanzin wrote:
> Hey all,
>
> Is there a reason why lint-java is not run during PR builds?
+1
On Tue, Nov 8, 2016 at 5:50 AM, Ricardo Almeida <
ricardo.alme...@actnowib.com> wrote:
> +1 (non-binding)
>
> over Ubuntu 16.10, Java 8 (OpenJDK 1.8.0_111) built with Hadoop 2.7.3,
> YARN, Hive
>
>
> On 8 November 2016 at 12:38, Herman van Hövell tot Westerflier <
> hvanhov...@databricks.com>
This is my test pom:
4.0.0
foo
bar
1.0
org.apache.spark
spark-core_2.10
2.0.1
scalatest is in the compile scope:
[INFO] bar:foo:jar:1.0
[INFO] \- org.apache.spark:spark-core_2.10:jar:2.0.1:compile
[INFO]+- org.apache.avro:avro-mapred:jar:hadoop2:1.7.7:compile
[INFO]
You can just exclude scalatest from Spark.
On Fri, Oct 28, 2016 at 12:51 PM, Jeremy Smith
wrote:
> spark-core depends on spark-launcher (compile)
> spark-launcher depends on spark-tags (compile)
> spark-tags depends on scalatest (compile)
>
> To be honest I'm not all
spark-tags is in the compile scope of spark-core...
On Fri, Oct 28, 2016 at 12:27 PM, Sean Owen wrote:
> It's required because the tags module uses it to define annotations for
> tests. I don't see it in compile scope for anything but the tags module,
> which is then in test
Seems the runtime Spark is different from the compiled one. You should
mark the Spark components "provided". See
https://issues.apache.org/jira/browse/SPARK-9219
On Sun, Oct 9, 2016 at 8:13 PM, kant kodali wrote:
>
> I tried SpanBy but look like there is a strange error
Congrats!
On Tue, Oct 4, 2016 at 9:09 AM, Yanbo Liang wrote:
> Congrats and welcome!
>
> On Tue, Oct 4, 2016 at 9:01 AM, Herman van Hövell tot Westerflier <
> hvanhov...@databricks.com> wrote:
>
>> Congratulations Xiao! Very well deserved!
>>
>> On Mon, Oct 3, 2016 at 10:46
Hey Mark,
I can reproduce the failure locally using your command. There were a lot of
OutOfMemoryError in the unit test log. I increased the heap size from 3g to
4g at https://github.com/apache/spark/blob/v2.0.1-rc4/pom.xml#L2029 and it
passed tests. I think the patch you mentioned increased the
+1
On Sun, Sep 25, 2016 at 10:43 PM, Pete Lee wrote:
> +1
>
>
> On Sun, Sep 25, 2016 at 3:26 PM, Herman van Hövell tot Westerflier <
> hvanhov...@databricks.com> wrote:
>
>> +1 (non-binding)
>>
>> On Sun, Sep 25, 2016 at 2:05 PM, Ricardo Almeida <
>>
r call.
>
> Cheers,
>
> On Tue, Jun 21, 2016 at 6:40 PM Shixiong(Ryan) Zhu <
> shixi...@databricks.com> wrote:
>
>> Hey Pete,
>>
>> I didn't backport it to 1.6 because it just affects tests in most cases.
>> I'm sure we also have other places calling
Hey Pete,
I didn't backport it to 1.6 because it just affects tests in most cases.
I'm sure we also have other places calling blocking methods in the event
loops, so similar issues are still there even after applying this patch.
Hence, I don't think it's a blocker for 1.6.2.
On Tue, Jun 21, 2016
Congrats, Yanbo!
On Sun, Jun 5, 2016 at 6:25 PM, Liwei Lin wrote:
> Congratulations Yanbo!
>
> On Mon, Jun 6, 2016 at 7:07 AM, Bryan Cutler wrote:
>
>> Congratulations Yanbo!
>> On Jun 5, 2016 4:03 AM, "Kousuke Saruta"
>> wrote:
Just to prevent from restarting LiveListenerBus. The internal Thread cannot
be restarted.
On Wed, May 25, 2016 at 12:59 PM, Jacek Laskowski wrote:
> Hi,
>
> I'm wondering why LiveListenerBus has two AtomicBoolean flags [1]?
> Could it not have just one, say started? Why does
There is a fix: https://github.com/apache/spark/pull/11567
On Mon, Mar 7, 2016 at 11:39 PM, Reynold Xin wrote:
> +Sean, who was playing with this.
>
>
>
>
> On Mon, Mar 7, 2016 at 11:38 PM, Jacek Laskowski wrote:
>
>> Hi,
>>
>> Got the BUILD FAILURE.
Could you rebuild the whole project? I changed the python function
serialization format in https://github.com/apache/spark/pull/11535 to fix a
bug. This exception looks like some place was still using the old codes.
On Sun, Mar 6, 2016 at 6:24 PM, Hyukjin Kwon wrote:
> Just
You can take a look at
"org.apache.spark.streaming.scheduler.ReceiverTracker#getExecutors"
On Thu, Mar 3, 2016 at 3:10 PM, Reynold Xin wrote:
> What do you mean by consistent? Throughout the life cycle of an app, the
> executors can come and go and as a result really has no
Congrats!!! Herman and Wenchen!!!
On Mon, Feb 8, 2016 at 10:44 AM, Luciano Resende
wrote:
>
>
> On Mon, Feb 8, 2016 at 9:15 AM, Matei Zaharia
> wrote:
>
>> Hi all,
>>
>> The PMC has recently added two new Spark committers -- Herman van Hovell
>>
fileStream has a parameter "newFilesOnly". By default, it's true and means
processing only new files and ignore existing files in the directory. So
you need to ***move*** the files into the directory, otherwise it will
ignore existing files.
You can also set "newFilesOnly" to false. Then in the
Hi Jan, could you post your codes? I could not reproduce this issue in my
environment.
Best Regards,
Shixiong Zhu
2015-12-29 10:22 GMT-08:00 Shixiong Zhu :
> Could you create a JIRA? We can continue the discussion there. Thanks!
>
> Best Regards,
> Shixiong Zhu
>
>
59 matches
Mail list logo