Re: 回复:Re: calculate correlation between multiple columns and one specific column after groupby the spark data frame

2022-03-16 Thread Enrico Minack
columns to a new data frame. It seems that there is no direct  API to do this. - 原始邮件 - 发件人:Sean Owen 收件人:ckgppl_...@sina.cn 抄送人:user 主题:Re: calculate correlation between multiple columns and one specific column after groupby the spark data frame 日期:2022年03月16日 11点55分 Are you just

回复:Re: calculate correlation between multiple columns and one specific column after groupby the spark data frame

2022-03-16 Thread ckgppl_yan
columns and one specific column after groupby the spark data frame 日期:2022年03月16日 11点55分 Are you just trying to avoid writing the function call 30 times? Just put this in a loop over all the columns instead, which adds a new corr col every time to a list. On Tue, Mar 15, 2022, 10:30 PM wrote

Re: calculate correlation between multiple columns and one specific column after groupby the spark data frame

2022-03-15 Thread Sean Owen
Are you just trying to avoid writing the function call 30 times? Just put this in a loop over all the columns instead, which adds a new corr col every time to a list. On Tue, Mar 15, 2022, 10:30 PM wrote: > Hi all, > > I am stuck at a correlation calculation problem. I have a dataframe like > b

calculate correlation between multiple columns and one specific column after groupby the spark data frame

2022-03-15 Thread ckgppl_yan
Hi all, I am stuck at a correlation calculation problem. I have a dataframe like below:groupiddatacol1datacol2datacol3datacol*corr_co112345123465242175289325371235335315I want to calculate the correlation between all datacol columns and corr_col column by each groupid.So

Re: Type Casting Error in Spark Data Frame

2018-01-31 Thread vijay.bvp
formatted = Assuming MessageHelper.sqlMapping schema is correctly mapped with input json (it would help if the schema and sample json is shared) here is explode function with dataframes similar functionality is available with SQL import sparkSession.implicits._ import org.apache.s

Re: Type Casting Error in Spark Data Frame

2018-01-31 Thread vijay.bvp
Assuming MessageHelper.sqlMapping schema is correctly mapped with input json (it would help if the schema and sample json is shared)here is explode function with dataframes similar functionality is available with SQL import sparkSession.implicits._import org.apache.spark.sql.functions._val routeEve

Re: Type Casting Error in Spark Data Frame

2018-01-29 Thread Jean Georges Perrin
You can try to create new columns with the nested value, > On Jan 29, 2018, at 15:26, Arnav kumar wrote: > > Hello Experts, > > I would need your advice in resolving the below issue when I am trying to > retrieving the data from a dataframe. > > Can you please let me know where I am going wr

Re: Type Casting Error in Spark Data Frame

2018-01-29 Thread Patrick McCarthy
You can't select from an array like that, try instead using 'lateral view explode' in the query for that element, or before the sql stage (py)spark.sql.functions.explode. On Mon, Jan 29, 2018 at 4:26 PM, Arnav kumar wrote: > Hello Experts, > > I would need your advice in resolving the below issu

Type Casting Error in Spark Data Frame

2018-01-29 Thread Arnav kumar
Hello Experts, I would need your advice in resolving the below issue when I am trying to retrieving the data from a dataframe. Can you please let me know where I am going wrong. code : // create the dataframe by parsing the json // Message Helper describes the JSON Struct //data out is the jso

Re: Spark Data Frame. PreSorded partitions

2017-11-28 Thread Michael Artz
I'm not sure other than retrieving from a hive table that is already sorted. This sounds cool though, would be interested to know this as well On Nov 28, 2017 10:40 AM, "Николай Ижиков" wrote: > Hello, guys! > > I work on implementation of custom DataSource for Spark D

Spark Data Frame. PreSorded partitions

2017-11-28 Thread Николай Ижиков
Hello, guys! I work on implementation of custom DataSource for Spark Data Frame API and have a question: If I have a `SELECT * FROM table1 ORDER BY some_column` query I can sort data inside a partition in my data source. Do I have a built-in option to tell spark that data from each partition

Re: Spark Data Frame Writer - Range Partiotioning

2017-07-25 Thread Jain, Nishit
@underarmour.com>>, "user@spark.apache.org<mailto:user@spark.apache.org>" mailto:user@spark.apache.org>> Subject: Re: Spark Data Frame Writer - Range Partiotioning How about creating a partituon column and use it? On Sat, 22 Jul 2017 at 2:47 am, Jain, Nishit mailto:nja...

Re: Spark Data Frame Writer - Range Partiotioning

2017-07-21 Thread ayan guha
How about creating a partituon column and use it? On Sat, 22 Jul 2017 at 2:47 am, Jain, Nishit wrote: > Is it possible to have Spark Data Frame Writer write based on > RangePartioning? > > For Ex - > > I have 10 distinct values for column_a, say 1 to 10. > > df.write

Spark Data Frame Writer - Range Partiotioning

2017-07-21 Thread Jain, Nishit
Is it possible to have Spark Data Frame Writer write based on RangePartioning? For Ex - I have 10 distinct values for column_a, say 1 to 10. df.write .partitionBy("column_a") Above code by default will create 10 folders .. column_a=1,column_a=2 ...column_a=10 I want to see if it i

Re: Spark data frame map problem

2017-03-22 Thread Yan Facai
Could you give more details of your code? On Wed, Mar 22, 2017 at 2:40 AM, Shashank Mandil wrote: > Hi All, > > I have a spark data frame which has 992 rows inside it. > When I run a map on this data frame I expect that the map should work for > all the 992 rows. > > A

Spark data frame map problem

2017-03-21 Thread Shashank Mandil
Hi All, I have a spark data frame which has 992 rows inside it. When I run a map on this data frame I expect that the map should work for all the 992 rows. As a mapper runs on an executor on a cluster I did a distributed count of the number of rows the mapper is being run on. dataframe.map(r

RE: as.Date can't be applied to Spark data frame in SparkR

2016-09-19 Thread xingye
Update: the job can finish, but takes a long time on a 10M row data. is there a better solution? From: xing_ma...@hotmail.com To: user@spark.apache.org Subject: as.Date can't be applied to Spark data frame in SparkR Date: Tue, 20 Sep 2016 10:22:17 +0800 Hi, all I've noticed that as.

as.Date can't be applied to Spark data frame in SparkR

2016-09-19 Thread xingye
Hi, all I've noticed that as.Date can't be applied to Spark data frame. I've created the following UDF and used dapply to change a integer column "aa" to a date with origin as 1960-01-01. change_date<-function(df){ df<-as.POSIXlt(as.Date(df$aa, or

Re: Spark data frame

2015-12-22 Thread Dean Wampler
rwal , "user@spark.apache.org" < > user@spark.apache.org> > Subject: Re: Spark data frame > > Dean, > > RDD in memory and then the collect() resulting in a collection, where both > are alive at the same time. > (Again not sure how Tungsten plays in to thi

Re: Spark data frame

2015-12-22 Thread Silvio Fiorito
...@hotmail.com>> Date: Tuesday, December 22, 2015 at 4:26 PM To: Dean Wampler mailto:deanwamp...@gmail.com>> Cc: Gaurav Agarwal mailto:gaurav130...@gmail.com>>, "user@spark.apache.org<mailto:user@spark.apache.org>" mailto:user@spark.apache.org>> Subject: Re

Re: Spark data frame

2015-12-22 Thread Dean Wampler
You can call the collect() method to return a collection, but be careful. If your data is too big to fit in the driver's memory, it will crash. Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition (O'Reilly) Typesafe @d

Spark data frame

2015-12-22 Thread Gaurav Agarwal
We are able to retrieve data frame by filtering the rdd object . I need to convert that data frame into java pojo. Any idea how to do that

Re: spark data frame write.mode("append") bug

2015-12-12 Thread Michael Armbrust
exists for all > // SQL database systems, considering "table" could also include the > database name. > Try(conn.prepareStatement(s"SELECT 1 FROM $table where > 1=2").executeQuery().next()).isSuccess > } > > > > Thanks > > > > --

Re: spark data frame write.mode("append") bug

2015-12-12 Thread sri hari kali charan Tummala
ob/master/src/main/java/com/kali/db/SaprkSourceToTargetBulkLoad.scala >> >> Spring Config File:- >> >> https://github.com/kali786516/ScalaDB/blob/master/src/main/resources/SourceToTargetBulkLoad.xml >> >> >> Thanks >> Sri >> >> >> >> -- >

Re: spark data frame write.mode("append") bug

2015-12-12 Thread kali.tumm...@gmail.com
uot; could also include the database name. Try(conn.prepareStatement(s"SELECT 1 FROM $table where 1=2").executeQuery().next()).isSuccess } Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-data-frame-write-mode-append-bug-tp25650p25693.html

Re: spark data frame write.mode("append") bug

2015-12-09 Thread Seongduk Cheon
> > Spring Config File:- > > https://github.com/kali786516/ScalaDB/blob/master/src/main/resources/SourceToTargetBulkLoad.xml > > > Thanks > Sri > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/spark-data-fr

spark data frame write.mode("append") bug

2015-12-09 Thread kali.tumm...@gmail.com
ToTargetBulkLoad.xml Thanks Sri -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-data-frame-write-mode-append-bug-tp25650.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Hive ORC Malformed while loading into spark data frame

2015-10-03 Thread Umesh Kacha
;>>> Zhan Zhang >>>>> >>>>> >>>>> Sent from my iPhone >>>>> >>>>> > On Sep 29, 2015, at 1:47 PM, unk1102 wrote: >>>>> > >>>>> > Hi I have a spark job which creates hive tables

Re: Hive ORC Malformed while loading into spark data frame

2015-10-03 Thread Umesh Kacha
: >>>> > >>>> > Hi I have a spark job which creates hive tables in orc format with >>>> > partitions. It works well I can read data back into hive table using >>>> hive >>>> > console. But if I try further process orc files g

Re: Hive ORC Malformed while loading into spark data frame

2015-09-29 Thread Umesh Kacha
But if I try further process orc files generated by Spark job >>> by >>> > loading into dataframe then I get the following exception >>> > Caused by: java.io.IOException: Malformed ORC file >>> > hdfs://l

Re: Hive ORC Malformed while loading into spark data frame

2015-09-29 Thread Umesh Kacha
ob >> by >> > loading into dataframe then I get the following exception >> > Caused by: java.io.IOException: Malformed ORC file >> > hdfs://localhost:9000/user/hive/warehouse/partorc/part_tiny.txt. Invalid >> > postscript. >> > >> > Dataframe d

Re: Hive ORC Malformed while loading into spark data frame

2015-09-29 Thread Hortonworks
gt; >> > Dataframe df = hiveContext.read().format("orc").load(to/path); >> > >> > Please guide. >> > >> > >> > >> > -- >> > View this message in context: >> > http://apache-spark-user-list.1001560.n3.nabble

Re: Hive ORC Malformed while loading into spark data frame

2015-09-29 Thread Umesh Kacha
tscript. > > > > Dataframe df = hiveContext.read().format("orc").load(to/path); > > > > Please guide. > > > > > > > > -- > > View this message in context: > http://apache-spark-user-lis

Re: Hive ORC Malformed while loading into spark data frame

2015-09-29 Thread Hortonworks
rk-user-list.1001560.n3.nabble.com/Hive-ORC-Malformed-while-loading-into-spark-data-frame-tp24876.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-ma

Hive ORC Malformed while loading into spark data frame

2015-09-29 Thread unk1102
m/Hive-ORC-Malformed-while-loading-into-spark-data-frame-tp24876.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-ma