Hello,
at first you will need to make sure that JAVA is installed, or
install it otherwise. Then install scala and a build tool (sbt or
maven). In my point of view, IntelliJ IDEA is a good option to create
your Spark applications. At the end you have to install a distributed
file system
Dear list,
I am very new to spark, and I am having trouble installing it on my mac. I have
following questions, please give me some guidance. Thank you very much.
1. How many and what software should I install before installing spark? I have
been searching online, people discussing their
Hi,
I am curious how records are being put to task, since, as you may see on
the photo below, there's 1 specific executor that contains more task than
the other.
The setup is this:
- Spark version 2.3.1
- Spark streaming job runs on Spark Standalone with following
configuration:
-
Hi Magnus,
Yes, I was thinking also about partitioning approach. And I think this is
the best solution in this type of scenario.
Also my scenario is relevant to your last paragraph, the dates which are
coming are very random. I can get updated from 2012 and from 2019.
Therefore, this strategy
Hi all
sorry, tl;dr
I'm on my first Python Spark structured streaming app, in the end joining
messages from ~10 different Kafka topics. I've recently upgraded to Spark
2.4.3, which has resolved all my issues with the time handling (watermarks,
join windows) I had before with Spark 2.3.2.
My
case without explain, also take long time to submit
19/06/04 05:56:37 DEBUG SparkSQLOperationManager: Created Operation for
> select count(*) from perf_as_reportads with
> session=org.apache.hive.service.cli.session.HiveSessionImpl@1f30fc84,
> runInBackground=true
> 19/06/04 05:56:37 INFO
Hi ,
Running thrift server on yarn.
It's fast when beeline client send query to thrift server, but it take a
while(about 90s) to submit to yarn cluster.
>From Thrift server log:
> *19/06/04 05:48:27* DEBUG SparkSQLOperationManager: Created Operation for
> explain select count(*) from