Re: installation of spark

2019-06-04 Thread Jack Kolokasis
Hello,     at first you will need to make sure that JAVA is installed, or install it otherwise. Then install scala and a build tool (sbt or maven). In my point of view, IntelliJ IDEA is a good option to create your Spark applications.  At the end you have to install a distributed file system

installation of spark

2019-06-04 Thread ya
Dear list, I am very new to spark, and I am having trouble installing it on my mac. I have following questions, please give me some guidance. Thank you very much. 1. How many and what software should I install before installing spark? I have been searching online, people discussing their

Spark Streaming: Task not distributed

2019-06-04 Thread Pipster Neko
Hi, I am curious how records are being put to task, since, as you may see on the photo below, there's 1 specific executor that contains more task than the other. The setup is this: - Spark version 2.3.1 - Spark streaming job runs on Spark Standalone with following configuration: -

Re: Upsert for hive tables

2019-06-04 Thread tkrol
Hi Magnus, Yes, I was thinking also about partitioning approach. And I think this is the best solution in this type of scenario. Also my scenario is relevant to your last paragraph, the dates which are coming are very random. I can get updated from 2012 and from 2019. Therefore, this strategy

Spark structured streaming leftOuter join not working as I expect

2019-06-04 Thread Joe Ammann
Hi all sorry, tl;dr I'm on my first Python Spark structured streaming app, in the end joining messages from ~10 different Kafka topics. I've recently upgraded to Spark 2.4.3, which has resolved all my issues with the time handling (watermarks, join windows) I had before with Spark 2.3.2. My

Re: Spark Thriftserver on yarn, sql submit take long time.

2019-06-04 Thread Jun Zhu
case without explain, also take long time to submit 19/06/04 05:56:37 DEBUG SparkSQLOperationManager: Created Operation for > select count(*) from perf_as_reportads with > session=org.apache.hive.service.cli.session.HiveSessionImpl@1f30fc84, > runInBackground=true > 19/06/04 05:56:37 INFO

Spark Thriftserver on yarn, sql submit take long time.

2019-06-04 Thread Jun Zhu
Hi , Running thrift server on yarn. It's fast when beeline client send query to thrift server, but it take a while(about 90s) to submit to yarn cluster. >From Thrift server log: > *19/06/04 05:48:27* DEBUG SparkSQLOperationManager: Created Operation for > explain select count(*) from