Re: how to use cluster sparkSession like localSession

2018-11-01 Thread Arbab Khalil
remove master configuration from code and then submit it to any cluster, it should work. On Fri, Nov 2, 2018 at 10:52 AM 崔苗(数据与人工智能产品开发部) <0049003...@znv.com> wrote: > > then how about spark sql and spark MLlib , we use them at most time > 0049003208 > 0049003...@znv.com > >

Re: how to use cluster sparkSession like localSession

2018-11-01 Thread 数据与人工智能产品开发部
then how about spark sql and spark MLlib , we use them at most time

Re: how to use cluster sparkSession like localSession

2018-11-01 Thread Daniel de Oliveira Mantovani
Please, read about Spark Streaming or Spark Structured Streaming. Your web application can easily communicate through some API and you won’t have the overhead of start a new spark job, which is pretty heavy. On Thu, Nov 1, 2018 at 23:01 崔苗(数据与人工智能产品开发部) <0049003...@znv.com> wrote: > > Hi, > we

how to use cluster sparkSession like localSession

2018-11-01 Thread 数据与人工智能产品开发部
Hi, we want to execute spark code with out submit application.jar,like this code:public static void main(String args[]) throws Exception{         SparkSession spark = SparkSession                 .builder()                

[PySpark Profiler]: Does empty profile mean no execution in Python Interpreter?

2018-11-01 Thread Alex
Hi, I ran into an interesting scenario with no profile output today. I have a PySpark application that primarily uses the Spark SQL APIs. I understand that parts of the Spark SQL API may not generate data in the PySpark profile dumps, but I was surprised when I had code containing a UDF that

Would Spark can read file from S3 which are Client-Side Encrypted KMS–Managed Customer Master Key (CMK) ?

2018-11-01 Thread mytramesh
I able to read s3 files which are Server-Side Encryption(SSE-KMS). Added KMSId to IAM role and able to read seamlessly . Recently I am receiving S3 files which are Client-Side Encrypted ( AWS KMS–Managed Customer Master Key (CMK)) , when I try to read these files i am seeing count is 0. To

StackOverflowError for simple map (not to incubator mailing list)

2018-11-01 Thread Chris Olivier
(Sorry, first one sent to incubator maling list which probably doesn't come here) Hi, I have been stuck at this for a week. I have a relatively simple dataframe like this: +-+-++---+ | item| item_id| target|

StackOverflowError for simple map

2018-11-01 Thread Chris Olivier
Hi, I have been stuck at this for a week. I have a relatively simple dataframe like this: +-+-++---+ | item| item_id| target| start| +-+-++---+

Re: Apache Spark orc read performance when reading large number of small files

2018-11-01 Thread gpatcham
When I run spark.read.orc("hdfs://test").filter("conv_date = 20181025").count with "spark.sql.orc.filterPushdown=true" I see below in executors logs. Predicate push down is happening 18/11/01 17:31:17 INFO OrcInputFormat: ORC pushdown predicate: leaf-0 = (IS_NULL conv_date) leaf-1 = (EQUALS

Re: use spark cluster in java web service

2018-11-01 Thread hemant singh
Why do't you explore Livy. You can use the Rest API to submit the jobs - https://community.hortonworks.com/articles/151164/how-to-submit-spark-application-through-livy-rest.html On Thu, Nov 1, 2018 at 12:52 PM 崔苗(数据与人工智能产品开发部) <0049003...@znv.com> wrote: > Hi, > we want to use spark in our

Fwd: use spark cluster in java web service

2018-11-01 Thread onmstester onmstester
Refer: https://spark.apache.org/docs/latest/quick-start.html 1. Create a singleton SparkContext at initialization of your cluster, the spark-context or spark-sql would be accessible through a static method anywhere in your application. I recommend using Fair scheduling on your context, to share

Re: SIGBUS (0xa) when using DataFrameWriter.insertInto

2018-11-01 Thread alexzautke
Reported as SPARK-25907, if anyone is still interested. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

use spark cluster in java web service

2018-11-01 Thread 数据与人工智能产品开发部
Hi, we want to use spark in our java web service , compute data in spark cluster according to request,now we have two probles:1、 how to get sparkSession of remote spark cluster (spark on yarn mode) , we want to keep one sparkSession to execute all data

Re: Apache Spark orc read performance when reading large number of small files

2018-11-01 Thread Jörn Franke
A lot of small files is very inefficient itself and predicate push down will not help you much there unless you merge them into one large file (one large file can be much more efficiently processed). How did you validate that predicate pushdown did not work on Hive? You Hive Version is also

How to use Dataset forEachPartion and groupByKey together

2018-11-01 Thread Kuttaiah Robin
Hello all, Am using spark-2.3.0 and hadoop-2.7.4. I have spark streaming application which listens to kafka topic, does some transformation and writes to Oracle database using JDBC client. Step 1. Read events from Kafka as shown below; -- Dataset