Re:Upgrading from Spark SQL 3.2 to 3.3 faild

2023-02-15 Thread lk_spark
47:25, "lk_spark" wrote: hi,all : I have a sql statement wich can be run on spark 3.2.1 but not on spark 3.3.1 . when I try to explain it, will got error with message: org.apache.spark.sql.catalyst.expressions.Literal cannot be cast to org.apache.spark.sql.catalyst.expressions.AnsiCast

Upgrading from Spark SQL 3.2 to 3.3 faild

2023-02-15 Thread lk_spark
hi,all : I have a sql statement wich can be run on spark 3.2.1 but not on spark 3.3.1 . when I try to explain it, will got error with message: org.apache.spark.sql.catalyst.expressions.Literal cannot be cast to org.apache.spark.sql.catalyst.expressions.AnsiCast execute the sql, error stack is

Does 'Stage cancelled because SparkContext was shut down' is a error

2022-09-28 Thread lk_spark
hi,all : when I try to merge a iceberg table by spark , I can see faild job on spark ui , but the spark application final state is SUCCEEDED. I submit an issue : https://github.com/apache/iceberg/issues/5876 I wonder to know is this a real error ? thanks .

Re:NoSuchMethodError: org.apache.spark.sql.execution.command.CreateViewCommand.copy

2022-03-21 Thread lk_spark
sorry, it's my env problem. At 2022-03-21 14:00:01, "lk_spark" wrote: hi, all : I got a strange error: bin/spark-shell --deploy-mode client Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use set

NoSuchMethodError: org.apache.spark.sql.execution.command.CreateViewCommand.copy

2022-03-20 Thread lk_spark
hi, all : I got a strange error: bin/spark-shell --deploy-mode client Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 22/03/21 13:51:39 WARN util.Utils: spark.executor.instances less than spark.dynamicAllocation.

Why NPE happen with multi threading in cluster mode but not client model

2020-12-02 Thread lk_spark
hi,all : I'm using spark2.4, I try to use multi thread to use sparkcontext , I found a example : https://hadoopist.wordpress.com/2017/02/03/how-to-use-threads-in-spark-job-to-achieve-parallel-read-and-writes/ some code like this : for (a <- 0 until 4) { val thread = new Thread {

Re: how spark structrued stream write to kudu

2019-11-25 Thread lk_spark
I found _sqlContext is null , how to resolve it ? 2019-11-25 lk_spark 发件人:"lk_spark" 发送时间:2019-11-25 16:00 主题:how spark structrued stream write to kudu 收件人:"user.spark" 抄送: hi,all: I'm using spark 2.4.4 to readstream data from kafka and want to write to

how spark structrued stream write to kudu

2019-11-25 Thread lk_spark
CstoreNew2KUDU$$anon$1.process(CstoreNew2KUDU.scala:122) ... and SQLImplicits.scala:228 is : 227: implicit def localSeqToDatasetHolder[T : Encoder](s: Seq[T]): DatasetHolder[T] = { 228:DatasetHolder(_sqlContext.createDataset(s)) 229: } can anyone give me some help? 2019-11-25 lk_spark

how to limit tasks num when read hive with orc

2019-11-11 Thread lk_spark
hi,all: I have a hive table STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' , many files of it is very small , when I use spark to read it , thousands tasks will start , how can I limit the task num ? 2019-11-12 lk_spark

Re: Re: how can I dynamic parse json in kafka when using Structured Streaming

2019-09-17 Thread lk_spark
I want to parse the Struct of data dynamically , then write data to delta lake , I think it can automatically merge scheme. 2019-09-17 lk_spark 发件人:Tathagata Das 发送时间:2019-09-17 16:13 主题:Re: how can I dynamic parse json in kafka when using Structured Streaming 收件人:"lk_spar

how can I dynamic parse json in kafka when using Structured Streaming

2019-09-16 Thread lk_spark
implicit evidence$6: org.apache.spark.sql.Encoder[org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema])org.apache.spark.sql.Dataset[org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema]. Unspecified value parameter evidence$6. val words = lines.map(line => { 2019-09-17 lk_spark

how to get spark-sql lineage

2019-05-15 Thread lk_spark
lk_spark

Re: Re: how to generate a larg dataset paralleled

2018-12-14 Thread lk_spark
sorry, now what I can do is like this : var df5 = spark.read.parquet("/user/devuser/testdata/df1").coalesce(1) df5 = df5.union(df5).union(df5).union(df5).union(df5) 2018-12-14 lk_spark 发件人:15313776907 <15313776...@163.com> 发送时间:2018-12-14 16:39 主题:Re: how to generat

Re: Re: how to generate a larg dataset paralleled

2018-12-13 Thread lk_spark
generate some data in Spark . 2018-12-14 lk_spark 发件人:Jean Georges Perrin 发送时间:2018-12-14 11:10 主题:Re: how to generate a larg dataset paralleled 收件人:"lk_spark" 抄送:"user.spark" You just want to generate some data in Spark or ingest a large dataset outside of Spark?

how to generate a larg dataset paralleled

2018-12-13 Thread lk_spark
cluster. 2018-12-14 lk_spark

Re: about LIVY-424

2018-11-11 Thread lk_spark
le have 5760749 rows data. after run about 10 times , the Driver physical memory will beyond 4.5GB and killed by yarn. I saw the old generation memory keep growing and can not release by gc. 2018-11-12 lk_spark 发件人:"lk_hadoop" 发送时间:2018-11-12 09:37 主题:about LIVY-424 收件人:"user

Re: spark2.3 on kubernets

2018-04-07 Thread lk_spark
resolved. need to add "kubernetes.default.svc" to k8s api server TLS config. 2018-04-08 lk_spark 发件人:"lk_spark" 发送时间:2018-04-08 11:15 主题:spark2.3 on kubernets 收件人:"user" 抄送: hi,all: I am trying spark on k8s with Pi sample. I got error with driver

spark2.3 on kubernets

2018-04-07 Thread lk_spark
spark-examples_2.11-2.3.0.jar 2018-04-08 lk_spark

Re: Re: Re: spark2.1 kafka0.10

2017-06-22 Thread lk_spark
thank you Kumar , I will try it later. 2017-06-22 lk_spark 发件人:Pralabh Kumar 发送时间:2017-06-22 20:20 主题:Re: Re: spark2.1 kafka0.10 收件人:"lk_spark" 抄送:"user.spark" It looks like your replicas for partition are getting failed. If u have more brokers , can u try increasin

Re: Re: spark2.1 kafka0.10

2017-06-21 Thread lk_spark
each topic have 5 partition , 2 replicas . 2017-06-22 lk_spark 发件人:Pralabh Kumar 发送时间:2017-06-22 17:23 主题:Re: spark2.1 kafka0.10 收件人:"lk_spark" 抄送:"user.spark" How many replicas ,you have for this topic . On Thu, Jun 22, 2017 at 9:19

Re: spark2.1 kafka0.10

2017-06-21 Thread lk_spark
org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:88) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 2017-06-22 lk_spark 发件人:"lk_spark" 发送时间:2017-06-22 11:13 主题:spark2.1 kafka0.10 收件人:"user.spark" 抄送: hi,all: when I run stream application for a few minut

spark2.1 kafka0.10

2017-06-21 Thread lk_spark
ERROR JobScheduler: Error generating jobs for time 1498098896000 ms java.lang.IllegalStateException: No current assignment for partition pages-2 I don't know why ? 2017-06-22 lk_spark

spark2.1 and kafka0.10

2017-06-20 Thread lk_spark
hi,all : https://issues.apache.org/jira/browse/SPARK-19680 is this issue have any method to patch it ? I met the same problem. 2017-06-20 lk_spark

Re: Re: Re: how to call udf with parameters

2017-06-15 Thread lk_spark
thanks Kumar , that really helpful !! 2017-06-16 lk_spark 发件人:Pralabh Kumar 发送时间:2017-06-16 18:30 主题:Re: Re: how to call udf with parameters 收件人:"lk_spark" 抄送:"user.spark" val getlength=udf((idx1:Int,idx2:Int, data : String)=> data.substring(idx1,idx2)) data

Re: Re: how to call udf with parameters

2017-06-15 Thread lk_spark
thanks Kumar , I want to know how to cao udf with multiple parameters , maybe an udf to make a substr function,how can I pass parameter with begin and end index ? I try it with errors. Does the udf parameters could only be a column type? 2017-06-16 lk_spark 发件人:Pralabh Kumar 发送时间:2017

how to call udf with parameters

2017-06-15 Thread lk_spark
) org.apache.spark.sql.AnalysisException: cannot resolve '`true`' given input columns: [id, text];; 'Project [UDF(text#6, 'true, 'true, '2) AS words#16] +- Project [_1#2 AS id#5, _2#3 AS text#6] +- LocalRelation [_1#2, _2#3] I need help!! 2017-06-16 lk_spark

spark on yarn cluster model can't use saveAsTable ?

2017-05-15 Thread lk_spark
ybody give me some clue? 2017-05-15 lk_spark

Re: Re: Re: how to call recommend method from ml.recommendation.ALS

2017-03-15 Thread lk_spark
Tank you , that's what I want to confirm. 2017-03-16 lk_spark 发件人:Yuhao Yang 发送时间:2017-03-16 13:05 主题:Re: Re: how to call recommend method from ml.recommendation.ALS 收件人:"lk_spark" 抄送:"任弘迪","user.spark" This is something that was just added to ML and

Re: Re: how to call recommend method from ml.recommendation.ALS

2017-03-15 Thread lk_spark
thanks for your reply , what I exactly want to know is : in package mllib.recommendation , MatrixFactorizationModel have method like recommendProducts , but I didn't find it in package ml.recommendation. how can I do the samething as mllib when I use ml. 2017-03-16 lk_spark 发件人:任弘迪

how to call recommend method from ml.recommendation.ALS

2017-03-15 Thread lk_spark
hi,all: under spark2.0 ,I wonder to know after trained a ml.recommendation.ALSModel how I can do the recommend action? I try to save the model and load it by MatrixFactorizationModel but got error. 2017-03-16 lk_spark

java.io.NotSerializableException: org.apache.spark.streaming.StreamingContext

2017-02-26 Thread lk_spark
value().matches("\\d{4}.*")).map(record => { val assembly = record.topic() val value = record.value val datatime = value.substring(0, 22) val level = value.substring(24, 27) (assembly,value,datatime,level) }) how can I pass parameter to the map function. 2017-02-27 lk_spark

help,I want to call spark-submit from java shell

2017-01-20 Thread lk_spark
in 120 seconds ... 8 more 17/01/20 06:39:05 ERROR CoarseGrainedExecutorBackend: Driver 192.168.0.136:51197 disassociated! Shutting down. 2017-01-20 lk_spark

how to dynamic partition dataframe

2017-01-17 Thread lk_spark
01-18 lk_spark

how to use newAPIHadoopFile

2017-01-16 Thread lk_spark
2017-01-17 lk_spark

Re: Re: Re: how to change datatype by useing StructType

2017-01-12 Thread lk_spark
} else { ab += attributes(i) } } new GenericRow(ab.toArray) } } 2017-01-13 lk_spark 发件人:"lk_spark" 发送时间:2017-01-13 09:49 主题:Re: Re: Re: how to change datatype by useing StructType 收件人:"Nicholas Hakobian" 抄送:"user.sp

Re: Re: Re: how to change datatype by useing StructType

2017-01-12 Thread lk_spark
Thank you Nicholas , if the sourcedata was csv format ,CSV reader works well. 2017-01-13 lk_spark 发件人:Nicholas Hakobian 发送时间:2017-01-13 08:35 主题:Re: Re: Re: how to change datatype by useing StructType 收件人:"lk_spark" 抄送:"ayan guha","user.spark" Have you t

Re: Re: Re: how to change datatype by useing StructType

2017-01-12 Thread lk_spark
nsafeProjection.apply_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:290) all the file was Any, what should I do? 2017-01-12 lk_spark 发件人:"lk_sp

Re: Re: how to change datatype by useing StructType

2017-01-11 Thread lk_spark
yes, field year is in my data: data: kevin,30,2016 shen,30,2016 kai,33,2016 wei,30,2016 this will not work val rowRDD = peopleRDD.map(_.split(",")).map(attributes => Row(attributes(0),attributes(1),attributes(2))) but I need read data by configurable. 2017-01-12 lk_

how to change datatype by useing StructType

2017-01-11 Thread lk_spark
level row object), 0, name), StringType), true) if I change my code it will work: val rowRDD = peopleRDD.map(_.split(",")).map(attributes => Row(attributes(0),attributes(1).toInt) but this is not a good idea . 2017-01-12 lk_spark

Re: Re: Re: how to add colum to dataframe

2016-12-06 Thread lk_spark
...| |MzIzMjQ4NzQwOA==|http://mp.weixin| |MzAwOTIxMTcyMQ==|http://mp.weixin| |MzA3OTAyNzY2OQ==|http://mp.weixin| |MjM5NDAzMDAwMA==|http://mp.weixin| |MzAwMjE4MzU0Nw==|http://mp.weixin....| |MzA4NzcyNjI0Mw==|http://mp.weixin| |MzI5OTE5Nzc5Ng==|http://mp.weixin| 2016-12-06 lk

Re: Re: how to add colum to dataframe

2016-12-06 Thread lk_spark
thanks for reply. I will search how to use na.fill . and I don't know how to get the value of the column and do some operation like substr or split. 2016-12-06 lk_spark 发件人:Pankaj Wahane 发送时间:2016-12-06 17:39 主题:Re: how to add colum to dataframe 收件人:"lk_spark","user

how to add colum to dataframe

2016-12-06 Thread lk_spark
| | null|http://mp.weixin| | null|http://mp.weixin| | null|http://mp.weixin| | null|http://mp.weixin| | null|http://mp.weixin| Why what I got is null? 2016-12-06 lk_spark

Re:RE: how to merge dataframe write output files

2016-11-10 Thread lk_spark
e of input (if you try to input this parquet). Again, the important question is – Why do you need it to be one file? Are you planning to use it externally? If yes, can you not use fragmented files there? If the data is too big for the Spark executor, it’ll most certainly be too much

how to merge dataframe write output files

2016-11-09 Thread lk_spark
-10 15:11 /parquetdata/weixin/biztags/biztag2/part-r-00176-0f61afe4-23e8-40bb-b30b-09652ca677bc more an more... 2016-11-10 lk_spark

Re: Re: How to iterate the element of an array in DataFrame?

2016-10-21 Thread lk_spark
: string (nullable = true) 2016-10-21 lk_spark 发件人:颜发才(Yan Facai) 发送时间:2016-10-21 15:35 主题:Re: How to iterate the element of an array in DataFrame? 收件人:"user.spark" 抄送: I don't know how to construct `array>`. Could anyone help me? I try to get the array by : scala> mb

Spark ExternalTable doesn't recognize subdir

2016-10-19 Thread lk_spark
sh the metadata. spark doesn't recognize the data in subdir. How I can do it ? 2016-10-20 lk_spark

Re: Re: how to extract arraytype data to file

2016-10-18 Thread lk_spark
Thank you, all of you. explode() is helpful: df.selectExpr("explode(bizs) as e").select("e.*").show() 2016-10-19 lk_spark 发件人:Hyukjin Kwon 发送时间:2016-10-19 13:16 主题:Re: how to extract arraytype data to file 收件人:"Divya Gehlot" 抄送:"lk_spark"

how to extract arraytype data to file

2016-10-18 Thread lk_spark
code| +++ |[4938200, 4938201...|[罗甸网警, 室内设计师杨焰红, ...| |[4938300, 4938301...|[SDCS十全九美, 旅梦长大, ...| |[4938400, 4938401...|[日重重工液压行走回转, 氧老家,...| |[4938500, 4938501...|[PABXSLZ, 陈少燕, 笑蜜...| |[4938600, 4938601...|[税海微云, 西域美农云家店, 福...| +++ what I want is I can read colum in normal row type. how I can do it ? 2016-10-19 lk_spark