Hi Karthick,
Sorry to say it but there's not enough "data" to help you. There should be
something more above or below this exception snippet you posted that could
pinpoint the root cause.
Pozdrawiam,
Jacek Laskowski
"The Internals Of" Online Books <https://bo
l/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala#L60
[4]
https://github.com/apache/spark/blob/c124037b97538b2656d29ce547b2a42209a41703/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLTab.scala#L24
Pozdrawiam,
Jacek Laskowski
"The Internals Of&
d and used properly using the custom catalog impl.
HTH
Pozdrawiam,
Jacek Laskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski
<https://twitter.com/jaceklaskowski>
On Fri, Apr 14, 2023 at 2:10 PM 许新浩 <948
reenshots won't give you that level
of detail. You'd have to intercept execution events and correlate them. Not
an easy task yet doable. HTH.
Pozdrawiam,
Jacek Laskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklas
Hi,
You could use QueryExecutionListener or Spark listeners to intercept query
execution events and extract whatever is required. That's what web UI does
(as it's simply a bunch of SparkListeners --> https://youtu.be/mVP9sZ6K__Y
;-)).
Pozdrawiam,
Jacek Laskowski
"The In
/github.com/apache/spark/blob/e60ce3e85081ca8bb247aeceb2681faf6a59a056/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala#L91
Pozdrawiam,
Jacek Laskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://
Yoohoo! Thanks Yuming for driving this release. A tiny step for Spark a
huge one for my clients (who still are on 3.2.1 or even older :))
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on htt
Hi Raj,
Do you want to do the following?
spark.read.format("prometheus").load...
I haven't heard of such a data source / format before.
What would you like it for?
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books
g and am really curious (not implying that one is better or
worse than the other(s)).
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski
<https://twitter.com/jacekl
part of Spark?
You should not really be doing such risky config changes (unless you've got
no other choice and you know what you're doing).
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
a thought but wanted to share as I think it's worth investigating.
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski
<https://twitter.com/jaceklasko
rrors
coming from broadcast joins perhaps?
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski
<https://twitter.com/jaceklaskowski>
On Mon, Aug 30, 2021 at
w are the above different from yours?
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski
<https://twitter.com/jaceklaskowski>
On Thu, Aug 19, 2021 at 5:
Hi Bobby,
What a great summary of what happens behind the scenes! Enjoyed every
sentence!
"The default shuffle implementation will always write out to disk." <--
that's what I wasn't sure about the most. Thanks again!
/me On digging deeper...
Pozdrawiam,
Jacek Laskowsk
k to avoid OOMEs).
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski
<https://twitter.com/jaceklaskowski>
Hi Pedro,
No idea what might be causing it. Do you perhaps have some code to
reproduce it locally?
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski
<
Big shout-out to you, Dongjoon! Thank you.
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski
<https://twitter.com/jaceklaskowski>
On Wed, Jun 2, 20
Hi,
The easiest (but perhaps not necessarily the most flexible) is simply to
use two different versions of spark-submit script with the env var set to
two different values. Have you tried it yet?
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" On
hat's what happens in Kafka Streams too
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski
<https://twitter.com/jaceklaskowski>
On Sun, Apr 4, 2021
Hi Vaquar,
Thanks a lot! Accepted as the answer (yet there was the other answer that
was very helpful too). Tons of reading ahead to understand it more.
That once again makes me feel that Hadoop MapReduce experience would help a
great deal (and I've got none).
Pozdrawiam,
Jacek Lask
Hi Bartosz,
This is not a question about whether the data source supports fixed or
user-defined schema but what schema to use when requested for a streaming
batch in Source.getBatch.
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Bo
uot;safe" and "safety" meanings.
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski
<https://twitter.com/jaceklaskowski>
On Sat, Apr 3,
many HTTP calls are there under the
covers? How to know it for GCS?
Thank you for any help you can provide. Merci beaucoup mes amis :)
[1] https://stackoverflow.com/q/66933229/1305344
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <
a3a4954291b74f8c8/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Source.scala#L61
[2]
https://github.com/apache/spark/blob/053dd858d38e6107bc71e0aa3a4954291b74f8c8/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Source.scala#L35
Pozdrawiam,
Jacek Laskowski
h
Hi,
I think I found it. I should be using OnDemand claim name so it gets
replaced to be unique per executor (?)
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jacekl
wski.github.io/spark-kubernetes-book/demo/persistentvolumeclaims/
Please help. Thank you!
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski
<https://twitter.com/jaceklaskowski>
Hi,
On GCP I'd go for buckets in Google Storage. Not sure how reliable it is in
production deployments though. Only demo experience here.
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me o
Hi,
> as Executors terminates after their work completes.
--conf spark.kubernetes.executor.deleteOnTermination=false ?
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter
case they're
not deleted as they simply wait forever. I might be mistaken here though.
What property is this for "this timeout of 60 sec."?
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/&
Hi Filip,
Care to share the code behind "The only thing I found so far involves using
forEachBatch and manually updating my aggregates. "?
I'm not completely sure I understand your use case and hope the code could
shed more light on it. Thank you.
Pozdrawiam,
Jacek Lasko
Hi,
Never heard of it (and have once been tasked to explore a similar use
case). I'm curious how you'd like it to work? (no idea how Hive does this
either)
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.j
Hi,
Can you use console sink and make sure that the pipeline shows some
progress?
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski
<https://twitter.com/j
Hi Brett,
No idea why it happens, but got curious about this "Cores" column being 0.
Is this always the case?
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.
Hi,
I'd look at stages and jobs as it's possible that the only task running is
the missing one in a stage of a job. Just guessing...
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow
life as a container of a driver pod.
There's no point using cluster deploy mode...ever. Makes sense?
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski
<https:
Hi Marco,
A Scala dev here.
In short: yet another reason against Python :)
Honestly, I've got no idea why the code gives the output. Ran it with
3.1.1-rc1 and got the very same results. Hoping pyspark/python devs will
chime in and shed more light on this.
Pozdrawiam,
Jacek Laskowski
pment IMHO).
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski
<https://twitter.com/jaceklaskowski>
On Wed, Jan 20, 2021 at 2:44 PM Marco Firrincieli
wrote:
ed and also forward that to ElasticSearch via log4j
for monitoring
Think SparkListener API would help here too.
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklask
n of a query is used to look up
any cached queries.
Again, I'm not really sure and if I'd have to answer it (e.g. as part of an
interview) I'd say nothing would be shared / re-used.
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Bo
Hey Yurii,
> which is unavailable from executors.
Register it on the driver and use accumulators on executors to update the
values (on the driver)?
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
a.schema.names: _*)
.write
.insertInto(sqlView)
In summary, you should report this to JIRA, but don't expect this get fixed
other than to catch this case just to throw this exception
from ResolveRelations: Inserting into a view is not allowed"
Unless I'm mistaken...
Pozd
Hi,
Can you post the whole message? I'm trying to find what might be causing
it. A small reproducible example would be of help too. Thank you.
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/&g
Hi,
Start with DataStreamWriter.foreachBatch.
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski
<https://twitter.com/jaceklaskowski>
On Thu, Jan 7, 2
ay back.
I wish myself that someone with more skills in this area chimed in...
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski
<https://twitter.com/jacekl
ble (as it's
on a stable HDFS file system not on an ephemeral executor). In either case,
the lineage should be the same = cut.
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on http
treaming queries, respectively.
Please note that I'm not a PMC member or even a committer so I'm speaking
for myself only (not representing the project in an official way).
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <http
Hi Emma,
I'm curious about the purpose of the email. Mind elaborating?
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski
<https://twitter.com/jacek
Hi Neeraj,
I'd start from "Contributing Documentation Changes" in
https://spark.apache.org/contributing.html
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twi
Hi,
It's been a while since I worked with Spark Standalone, but I'd check the
logs of the workers. How do you spark-submit the app?
DId you check /grid/1/spark/work/driver-20200508153502-1291 directory?
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Inter
e very least and/or use YARN as the
cluster manager".
Another thought was that the user code (your code) could be leaking
resources so Spark eventually reports heap-related errors that may not
necessarily be Spark's.
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLas
" on twitter @
https://twitter.com/adamwathan/status/1257641015835611138. You could borrow
some ideas of the docs that are claimed "the best".
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/&
Hi,
Thanks Dongjoon Hyun for stepping up as a release manager!
Much appreciated.
If there's a volunteer to cut a release, I'm always to support it.
In addition, the more frequent releases the better for end users so they
have a choice to upgrade and have all the latest fixes or wait. It's their
Hi,
I've got a talk "The internals of stateful stream processing in Spark
Structured Streaming" at https://dataxday.fr/ today and am going to include
the tool on the slides to thank you for the work. Thanks.
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
The Int
treaming systems, among which Flink and DataFlow allow changing
>> parallelism number, and by my knowledge of Spark Streaming, it seems it is
>> also able to do that: if some “key interval” concept is used, then state
>> can somehow decoupled from partition number by consistent
ng, it seems it is
> also able to do that: if some “key interval” concept is used, then state
> can somehow decoupled from partition number by consistent hashing.
>
>
>
>
>
> Regards
>
> Jialei
>
>
>
> *From: *Jacek Laskowski
> *Date: *Wednesday, June 26
Hi,
It's not allowed to change the numer of partitions after your streaming
query is started.
The reason is exactly the number of state stores which is exactly the
number of partitions (perhaps multiplied by the number of stateful
operators).
I think you'll even get a warning or an exception whe
://spark.apache.org/docs/latest/sql-programming-guide.html and then hop
onto
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.package
to
know the Spark API better. I'm sure you'll quickly find out the answer(s).
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskow
Hi,
What are "the spark driver and executor threads information" and "spark
application logging"?
Spark uses log4j so set up logging levels appropriately and you should be
done.
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
The Internals of Spark SQL http
'm mistaken, in your case, what you really need is to replace
`withColumn` with `select("id")` itself and you're done.
When I'm writing this (I'm saying exactly what you actually have already)
and I'm feeling confused.
Pozdrawiam,
Jacek Laskowski
https:/
; cast "timestamp").show
++
| ts|
++
|null|
|null|
++
scala> Seq("1", "2").toDF("ts").select($"ts" cast "long").select($"ts" cast
"timestamp").show
+-------+
| ts|
+---
bstr($"b", $"e" - $"b" + 1) as "demo").show
+-+
| demo|
+-+
|hello|
+-+
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streami
Hi,
I'm curious about "I found the bug code". Can you point me at it? Thanks.
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Kafk
Hi,
Not possible. What are you really trying to do? Why do you need to share
dataframes? They're nothing but metadata of a distributed computation (no
data inside) so what would be the purpose of such sharing?
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Mastering Spar
Hi,
Add the following line to conf/log4j.properties and you should have all the
logs:
log4j.logger.org.apache.spark.scheduler.DAGScheduler=ALL
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https
Hi,
Can you show the plans with explain(extended=true) for both versions?
That's where I'd start to pinpoint the issue. Perhaps the underlying
execution engine change to affect keyBy? Dunno and guessing...
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Mastering Spark
Hi Lian,
"What have you tried?" would be a good starting point. Any help on this?
How do you read the JSONs? readStream.json? You could use readStream.text
followed by filter to include/exclude good/bad JSONs.
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Mastering
x27;re talking about an RDD, this
abstraction is planned as a set of tasks (one per partition of the RDD).
And yes, the tasks are sent out over the wire to executors. It's been like
this from Spark 1.0 (and even earlier).
Hope I helped a bit.
Pozdrawiam,
Jacek Laskowski
https://about
Hope that helps.
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
Follow me at https://t
Hi Elior,
Could you show the query that led to the exception?
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Kafka Streams https://bit.ly
Hi,
What about https://issues.apache.org/jira/projects/SPARK/versions/12342385?
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Kafka Streams
Streaming since the
former uses Dataset API while the latter RDD API.
Don't touch RDD API and Spark Streaming unless you know what you're doing :)
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Strea
import org.apache.spark.sql.streaming.Trigger
df.writeStream.trigger(Trigger.ProcessingTime("1 second"))
See
http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sq
Hi,
The exception message should be self-explanatory and says that you cannot
join two streaming Datasets. This feature was added in 2.3 if I'm not
mistaken.
Just to be sure that you work with two streaming Datasets, can you show the
query plan of the join query?
Jacek
On Sat, 12 May 2018, 16:5
y?
(I am not saying that I'm 100% sure that the query is indeed the same since
I'm working on a reproducible test case and only when I got it I'll really
be).
Sorry for the vague description, but I've got nothing more to share yet.
Pozdrawiam,
Jacek Laskowski
https://abou
Hi,
What's the deployment process then (if not using spark-submit)? How is the
AM deployed? Why would you want to skip spark-submit?
Jacek
On 19 Mar 2018 00:20, "Serega Sheypak" wrote:
> Hi, Is it even possible to run spark on yarn as usual java application?
> I've built jat using maven with s
Hi,
Filled https://issues.apache.org/jira/browse/SPARK-23731 and am working on
a workaround (aka fix).
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streaming
scala:345)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Mastering
Hi,
Since I'm new to Hive, what does `stored by` do? I might help a bit in
Spark if I only knew a bit about Hive :)
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-struc
Hi,
What would you expect? The data is simply dropped as that's the purpose of
watermarking it. That's my understanding at least.
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming http
Hi,
Start with spark.executor.memory 2g. You may also
give spark.yarn.executor.memoryOverhead a try.
See https://spark.apache.org/docs/latest/configuration.html and
https://spark.apache.org/docs/latest/running-on-yarn.html for more in-depth
information.
Pozdrawiam,
Jacek Laskowski
https
o filter out records which are lagging behind (based on event
time) by a certain amount of time."
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mas
Hi Esa,
I'd say https://stackoverflow.com/questions/tagged/apache-spark is where
many active sparkians hang out :)
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-struc
| id|
+---+---+
| 0| 0|
+---+---+
Am I missing something? When aliasing a table, use the identifier in column
refs (inside).
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-s
Hi Michael,
-dev +user
What's the query? How do you "fool spark"?
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Kafka Streams
lt
tolerant)
> Do you have any other suggestion/recommendation ?
What's wrong with the current solution? I don't think you should change how
you do things currently. You should just avoid collect on large datasets
(which you have to do anywhere in Spark).
Pozdrawiam,
Jacek
ee if it's available on the driver in
cluster deploy mode? That should give you a definitive answer (or at least
get you closer).
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/s
Thanks Silvio!
In the meantime, with help of Adam and code review of WholeStageCodegenExec
and CollapseCodegenStages, I found out that anything that's codegend is as
fast as the tasks in a stage. In this case, union of two codegend subtrees
is indeed parallel.
Pozdrawiam,
Jacek Lask
artitionsRDD[14] at rdd at :26 []
| MapPartitionsRDD[13] at rdd at :26 []
| ParallelCollectionRDD[12] at rdd at :26 []
What am I missing and how to be certain whether and what parts of a query
are going to be executed in parallel?
Please help...
Pozdrawiam,
Jacek Laskowski
https://abo
u cannot have datasets of different schema in a query. You'd have to use
the most wide schema to cover all schemas.
p.s. Have you tried anything...spark-shell's your friend, my friend :)
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Spark Structured Streaming https://
Hi,
https://issues.apache.org/jira/browse/SPARK-19076
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com
Hi,
What about a custom streaming Sink that would stop the query after addBatch
has been called?
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Hi,
What about memory sink? That could work.
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski
On
Hi,
Not that I'm aware of, but in your case checking out whether a JSON message
fit your schema and the pipeline would've taken pyspark alone with JSONs on
disk, wouldn't it?
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Spark Structured Streaming https
Hi Satyajit,
That's exactly what Dataset.rdd does -->
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala?utf8=%E2%9C%93#L2916-L2921
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Spark Structured Streaming https://bit.
Hi,
Use explode function, filter operator and collect_list function.
Or "heavier" flatMap.
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-sp
Hi,
You're right...killing the spark streaming job is the way to go. If a batch
was completed successfully, Spark Streaming will recover from the
controlled failure and start where it left off. I don't think there's other
way to do it.
Pozdrawiam,
Jacek Laskowski
h
Hi,
Guessing it's a timing issue. Once you started the query the batch 0 did
not have rows to save or didn't start yet (it's a separate thread) and so
spark.sql ran once and saved nothing.
You should rather use foreach writer to save results to Hive.
Jacek
On 29 Sep 2017 11:36 am, "HanPan" wro
cribe to, e.g. topic\d [2]
[1]
https://jaceklaskowski.gitbooks.io/spark-structured-streaming/spark-sql-streaming-KafkaSource.html#subscribe
[2]
https://jaceklaskowski.gitbooks.io/spark-structured-streaming/spark-sql-streaming-KafkaSource.html#subscribepattern
Pozdrawiam,
Jacek Laskowski
Hi,
Ah, right! Start the queries and once they're running, awaitTermination
them.
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Spark Structured Streaming (Apache Spark 2.2+)
https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-a
Hi,
What's the code in readFromKafka to read from hello2 and hello1?
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Spark Structured Streaming (Apache Spark 2.2+)
https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Foll
1 - 100 of 467 matches
Mail list logo