remove master configuration from code and then submit it to any cluster, it
should work.
On Fri, Nov 2, 2018 at 10:52 AM 崔苗(数据与人工智能产品开发部) <0049003...@znv.com> wrote:
>
> then how about spark sql and spark MLlib , we use them at most time
> 0049003208
> 0049003...@znv.com
>
>
then how about spark sql and spark MLlib , we use them at most time
Please, read about Spark Streaming or Spark Structured Streaming. Your web
application can easily communicate through some API and you won’t have the
overhead of start a new spark job, which is pretty heavy.
On Thu, Nov 1, 2018 at 23:01 崔苗(数据与人工智能产品开发部) <0049003...@znv.com> wrote:
>
> Hi,
> we
Hi,
we want to execute spark code with out submit application.jar,like this code:public static void main(String args[]) throws Exception{
SparkSession spark = SparkSession
.builder()
Hi,
I ran into an interesting scenario with no profile output today. I have
a PySpark application that primarily uses the Spark SQL APIs. I
understand that parts of the Spark SQL API may not generate data in the
PySpark profile dumps, but I was surprised when I had code containing a
UDF that
I able to read s3 files which are Server-Side Encryption(SSE-KMS). Added
KMSId to IAM role and able to read seamlessly .
Recently I am receiving S3 files which are Client-Side Encrypted ( AWS
KMS–Managed Customer Master Key (CMK)) , when I try to read these files i
am seeing count is 0.
To
(Sorry, first one sent to incubator maling list which probably doesn't come
here)
Hi, I have been stuck at this for a week.
I have a relatively simple dataframe like this:
+-+-++---+
| item| item_id| target|
Hi, I have been stuck at this for a week.
I have a relatively simple dataframe like this:
+-+-++---+
| item| item_id| target| start|
+-+-++---+
When I run spark.read.orc("hdfs://test").filter("conv_date = 20181025").count
with "spark.sql.orc.filterPushdown=true" I see below in executors logs.
Predicate push down is happening
18/11/01 17:31:17 INFO OrcInputFormat: ORC pushdown predicate: leaf-0 =
(IS_NULL conv_date)
leaf-1 = (EQUALS
Why do't you explore Livy. You can use the Rest API to submit the jobs -
https://community.hortonworks.com/articles/151164/how-to-submit-spark-application-through-livy-rest.html
On Thu, Nov 1, 2018 at 12:52 PM 崔苗(数据与人工智能产品开发部) <0049003...@znv.com> wrote:
> Hi,
> we want to use spark in our
Refer: https://spark.apache.org/docs/latest/quick-start.html 1. Create a
singleton SparkContext at initialization of your cluster, the spark-context or
spark-sql would be accessible through a static method anywhere in your
application. I recommend using Fair scheduling on your context, to share
Reported as SPARK-25907, if anyone is still interested.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Hi,
we want to use spark in our java web service , compute data in spark cluster according to request,now we have two probles:1、 how to get sparkSession of remote spark cluster (spark on yarn mode) , we want to keep one sparkSession to execute all data
A lot of small files is very inefficient itself and predicate push down will
not help you much there unless you merge them into one large file (one large
file can be much more efficiently processed).
How did you validate that predicate pushdown did not work on Hive? You Hive
Version is also
Hello all,
Am using spark-2.3.0 and hadoop-2.7.4.
I have spark streaming application which listens to kafka topic, does some
transformation and writes to Oracle database using JDBC client.
Step 1.
Read events from Kafka as shown below;
--
Dataset
15 matches
Mail list logo