Which SQL flavor does Spark SQL follow?

2020-05-06 Thread Aakash Basu
Hi, Wish to know, which type of SQL syntax is followed when we write a plain SQL query inside spark.sql? Is it MySQL or PGSQL? I know it isn't SQL Server or Oracle as while migrating, had to convert a lot of SQL functions. Also if you can provide a documentation which clearly says the above

回复: 回复:[Spark SQL] [Beginner] Dataset[Row] collect to driver throwjava.io.EOFException: Premature EOF: no length prefix available

2020-04-22 Thread maqy
) at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:244) at org.apache.hadoop.hdfs.DFSOutputStream$DataStream$ResponseProcessor.run(DFSOutputStream.java:733)  More information can be seen here:   https://stackoverflow.com/questions/61202566/spark-sql-datasetrow-collect-to-driver-throw-java-io

Re: How does spark sql evaluate case statements?

2020-04-17 Thread kant kodali
ond.isNull} && ${cond.value}) { > ${res.code} > $resultState = (byte)(${res.isNull} ? $HAS_NULL : $HAS_NONNULL); > ${ev.value} = ${res.value}; > continue; > } > > > } while (false) > > Refer to: > https://github.com/apach

Re: How does spark sql evaluate case statements?

2020-04-16 Thread ZHANG Wei
HAS_NULL : $HAS_NONNULL); ${ev.value} = ${res.value}; continue; } } while (false) Refer to: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala#L208 Here is a full generated cod

Re: How does spark sql evaluate case statements?

2020-04-14 Thread Yeikel
I do not know the answer to this question so I am also looking for it, but @kant maybe the generated code can help with this. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail:

unix_timestamp() equivalent in plain Spark SQL Query

2020-04-02 Thread Aakash Basu
Hi, What is the unix_timestamp() function equivalent in a plain spark SQL query? I want to subtract one timestamp column from another, but in plain SQL am getting error "Should be numeric or calendarinterval and not timestamp." But when I did through the above function inaide

Re: Integration testing Framework Spark SQL Scala

2020-02-25 Thread Ruijing Li
Just wanted to follow up on this. If anyone has any advice, I’d be interested in learning more! On Thu, Feb 20, 2020 at 6:09 PM Ruijing Li wrote: > Hi all, > > I’m interested in hearing the community’s thoughts on best practices to do > integration testing for spark sql jobs.

Integration testing Framework Spark SQL Scala

2020-02-20 Thread Ruijing Li
Hi all, I’m interested in hearing the community’s thoughts on best practices to do integration testing for spark sql jobs. We run a lot of our jobs with cloud infrastructure and hdfs - this makes debugging a challenge for us, especially with problems that don’t occur from just initializing

Performance advantage of Spark SQL versus CSL API

2019-12-24 Thread Rajev Agarwal
Hello, I am wondering whether there is a clear-cut performance advantage for using CSL API instead of Spark SQL for queries in Java? I am interested in Joins, Aggregates, and, Group By (with several fields) clauses. Thank you. RajevA

Re: Issue With mod function in Spark SQL

2019-12-17 Thread Enrico Minack
l even or all odd? On Tue, Dec 17, 2019 at 11:01 AM Tzahi File mailto:tzahi.f...@ironsrc.com>> wrote: I have in my spark sql query a calculated field that gets the value if field1 % 3. I'm using this field as a partition so I expected to get 3

Re: Issue With mod function in Spark SQL

2019-12-17 Thread Tzahi File
no.. there're 100M records both even and odd On Tue, Dec 17, 2019 at 8:13 PM Russell Spitzer wrote: > Is there a chance your data is all even or all odd? > > On Tue, Dec 17, 2019 at 11:01 AM Tzahi File > wrote: > >> I have in my spark sql query a calculated fiel

Re: Issue With mod function in Spark SQL

2019-12-17 Thread Russell Spitzer
Is there a chance your data is all even or all odd? On Tue, Dec 17, 2019 at 11:01 AM Tzahi File wrote: > I have in my spark sql query a calculated field that gets the value if > field1 % 3. > > I'm using this field as a partition so I expected to get 3 partitions in > the mentio

Issue With mod function in Spark SQL

2019-12-17 Thread Tzahi File
I have in my spark sql query a calculated field that gets the value if field1 % 3. I'm using this field as a partition so I expected to get 3 partitions in the mentioned case, and I do get. The issue happened with even numbers (instead of 3 - 4,2 ... ). When I tried to use even numbers

Temporary tables for Spark SQL

2019-11-12 Thread Laurent Bastien Corbeil
Hello, I am new to Spark, so I have a basic question which I couldn't find an answer online. If I want to run SQL queries on a Spark dataframe, do I have to create a temporary table first? I know I could use the Spark SQL API, but is there a way of simply reading the data and run SQL queries

Re: Using Percentile in Spark SQL

2019-11-11 Thread Jerry Vinokurov
t;>> On Mon, Nov 11, 2019 at 9:46 AM Tzahi File >>> wrote: >>> >>>> Hi, >>>> >>>> Currently, I'm using hive huge cluster(m5.24xl * 40 workers) to run a >>>> percentile function. I'm trying to improve this job by movin

Re: Using Percentile in Spark SQL

2019-11-11 Thread Tzahi File
huge cluster(m5.24xl * 40 workers) to run a >>> percentile function. I'm trying to improve this job by moving it to run >>> with spark SQL. >>> >>> Any suggestions on how to use a percentile function in Spark? >>> >>> >>> Thanks, >>> -

Re: Using Percentile in Spark SQL

2019-11-11 Thread Muthu Jayakumar
; Hi, >>> >>> Currently, I'm using hive huge cluster(m5.24xl * 40 workers) to run a >>> percentile function. I'm trying to improve this job by moving it to run >>> with spark SQL. >>> >>> Any suggestions on how to use a percentile function in Spa

Re: Using Percentile in Spark SQL

2019-11-11 Thread Patrick McCarthy
gt;> >> Currently, I'm using hive huge cluster(m5.24xl * 40 workers) to run a >> percentile function. I'm trying to improve this job by moving it to run >> with spark SQL. >> >> Any suggestions on how to use a percentile function in Spark? >> >>

Re: Using Percentile in Spark SQL

2019-11-11 Thread Jerry Vinokurov
for this task? Because I bet that's what's slowing you down. On Mon, Nov 11, 2019 at 9:46 AM Tzahi File wrote: > Hi, > > Currently, I'm using hive huge cluster(m5.24xl * 40 workers) to run a > percentile function. I'm trying to improve this job by moving it to run > with spa

Using Percentile in Spark SQL

2019-11-11 Thread Tzahi File
Hi, Currently, I'm using hive huge cluster(m5.24xl * 40 workers) to run a percentile function. I'm trying to improve this job by moving it to run with spark SQL. Any suggestions on how to use a percentile function in Spark? Thanks, -- Tzahi File Data Engineer [image: ironSource] <h

[Spark Sql] Direct write on hive and s3 while executing a CTAS on spark sql

2019-10-24 Thread francexo83
Hi all, I'm using spark 2.4.0, my spark.sql.catalogImplementation is set to hive while spark.sql.warehouse.dir is set to a specific s3 bucket. I want to execute a CTAS statement in spark sql like the one below. *create table as db_name.table_name as (select ..)* When writing, spark always uses

convert josn string in spark sql

2019-10-16 Thread amit kumar singh
Hi Team, I have kafka messages where json is coming as string how can create table after converting json string to json using spark sql

Re: Use our own metastore with Spark SQL

2019-10-14 Thread Zhu, Luke
everything Hadoop, you can also implement ExternalCatalog: https://github.com/apache/spark/blob/5264164a67df498b73facae207eda12ee133be7d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala See https://jira.apache.org/jira/browse/SPARK-23443 for ongoing progress

Use our own metastore with Spark SQL

2019-10-14 Thread xweb
Is it possible to use our own metastore instead of Hive Metastore with Spark SQL? Can you please point me to some docs or code I can look at to get it done? We are moving away from everything Hadoop. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com

How to handle this use-case in spark-sql-streaming

2019-09-30 Thread Shyam P
Hi, I have scenario like below https://stackoverflow.com/questions/58134379/how-to-handle-backup-scenario-in-spark-structured-streaming-using-joins How to handle this use-case ( back-up scenario) in spark-structured-streaming? Any clues would be highly appreciated. Thanks, Shyam

How to query StructField's metadata in spark sql?

2019-09-05 Thread kyunam
Using SQL, is it possible to query a column's metadata? Thanks, Kyunam -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Will this use-case can be handled with spark-sql streaming and cassandra?

2019-08-29 Thread Jörn Franke
1) this is not a use case, but a technical solution. Hence nobody can tell you if it make sense or not 2) do an upsert in Cassandra. However keep in mind that the application submitting to the Kafka topic and the one consuming from the Kafka topic need to ensure that they process messages in

Re: Will this use-case can be handled with spark-sql streaming and cassandra?

2019-08-29 Thread Aayush Ranaut
What exactly is your requirement?  Is the read before write mandatory? Are you maintaining states in Cassandra? Regards Prathmesh Ranaut https://linkedin.com/in/prathmeshranaut > On Aug 29, 2019, at 3:35 PM, Shyam P wrote: > > > thanks Aayush.     For every record I need to get the data

Re: Will this use-case can be handled with spark-sql streaming and cassandra?

2019-08-29 Thread Shyam P
thanks Aayush. For every record I need to get the data from cassandra table and update it ? Else it may not update the existing record. What is this datastax-spark-connector ? is that not a "Cassandra connector library written for spark"? If not , how to write ourselves. Where and how to

Re: Will this use-case can be handled with spark-sql streaming and cassandra?

2019-08-29 Thread Aayush Ranaut
Cassandra is upsert, you should be able to do what you need with a single statement unless you’re looking to maintain counters. I’m not sure if there is a Cassandra connector library written for spark streaming because we wrote one ourselves when we wanted to do the same. Regards Prathmesh

Will this use-case can be handled with spark-sql streaming and cassandra?

2019-08-29 Thread Shyam P
Hi, I need to do a PoC for a business use-case. *Use case :* Need to update a record in Cassandra table if exists. Will spark streaming support compare each record and update existing Cassandra record ? For each record received from kakfa topic , If I want to check and compare each record

Re: Any advice how to do this usecase in spark sql ?

2019-08-13 Thread Jörn Franke
on your use case. > Am 14.08.2019 um 05:08 schrieb Shyam P : > > Hi, > Any advice how to do this in spark sql ? > > I have a scenario as below > > dataframe1 = loaded from an HDFS Parquet file. > > dataframe2 = read from a Kafka Stream. > > If column1

Any advice how to do this usecase in spark sql ?

2019-08-13 Thread Shyam P
Hi, Any advice how to do this in spark sql ? I have a scenario as below dataframe1 = loaded from an HDFS Parquet file. dataframe2 = read from a Kafka Stream. If column1 of dataframe1 value in columnX value of dataframe2 , then I need then I need to replace column1 value of dataframe1

Re: Unable to run simple spark-sql

2019-06-21 Thread Raymond Honderdors
S > i.e.hdfs://xxx:8020/apps/hive/warehouse/ > For this the code ran fine. > > Thanks for the help, > -Nirmal > > From: Nirmal Kumar > Sent: 19 June 2019 11:51 > To: Raymond Honderdors > Cc: user > Subject: RE: Unable to run simple spark-sql > > Hi Raymond, &

RE: Unable to run simple spark-sql

2019-06-21 Thread Nirmal Kumar
filesystem. I created a new database and confirmed that the location was in HDFS i.e.hdfs://xxx:8020/apps/hive/warehouse/ For this the code ran fine. Thanks for the help, -Nirmal From: Nirmal Kumar Sent: 19 June 2019 11:51 To: Raymond Honderdors Cc: user Subject: RE: Unable to run simple spark-sql

RE: Unable to run simple spark-sql

2019-06-19 Thread Nirmal Kumar
directory of hive user (/home/hive/). Why is it referring the local file system and from where? Thanks, Nirmal From: Raymond Honderdors Sent: 19 June 2019 11:18 To: Nirmal Kumar Cc: user Subject: Re: Unable to run simple spark-sql Hi Nirmal, i came across the following article "

Re: Unable to run simple spark-sql

2019-06-18 Thread Raymond Honderdors
019 5:56:06 PM > To: Raymond Honderdors; Nirmal Kumar > Cc: user > Subject: RE: Unable to run simple spark-sql > > Hi Raymond, > > Permission on hdfs is 777 > drwxrwxrwx - impadmin hdfs 0 2019-06-13 16:09 > /home/hive/spark-warehouse > > > But it’

Re: Unable to run simple spark-sql

2019-06-18 Thread Nirmal Kumar
for Android<https://aka.ms/ghei36> From: Nirmal Kumar Sent: Tuesday, June 18, 2019 5:56:06 PM To: Raymond Honderdors; Nirmal Kumar Cc: user Subject: RE: Unable to run simple spark-sql Hi Raymond, Permission on hdfs is 777 drwxrwxrwx - impadmin hdfs 0 2

RE: Unable to run simple spark-sql

2019-06-18 Thread Nirmal Kumar
-warehouse/testdb.db/employee_orc/.hive-staging_hive_2019-06-18_16-08-21_448_1691186175028734135-1' Thanks, -Nirmal From: Raymond Honderdors Sent: 18 June 2019 17:52 To: Nirmal Kumar Cc: user Subject: Re: Unable to run simple spark-sql Hi Can you check the permission of the user running spark O

Re: Unable to run simple spark-sql

2019-06-18 Thread Raymond Honderdors
Hi Can you check the permission of the user running spark On the hdfs folder where it tries to create the table On Tue, Jun 18, 2019, 15:05 Nirmal Kumar wrote: > Hi List, > > I tried running the following sample Java code using Spark2 version 2.0.0 > on YARN (HDP-2.5.0.0) > > public class

Unable to run simple spark-sql

2019-06-18 Thread Nirmal Kumar
Hi List, I tried running the following sample Java code using Spark2 version 2.0.0 on YARN (HDP-2.5.0.0) public class SparkSQLTest { public static void main(String[] args) { SparkSession sparkSession = SparkSession.builder().master("yarn") .config("spark.sql.warehouse.dir",

Re: how to get spark-sql lineage

2019-05-16 Thread Arun Mahadevan
You can check out https://github.com/hortonworks-spark/spark-atlas-connector/ On Wed, 15 May 2019 at 19:44, lk_spark wrote: > hi,all: > When I use spark , if I run some SQL to do ETL how can I get > lineage info. I found that , CDH spark have some config about lineage : >

Re: how to get spark-sql lineage

2019-05-16 Thread Gabor Somogyi
Hi, spark.lineage.enabled is Cloudera specific and doesn't work with vanilla Spark. BR, G On Thu, May 16, 2019 at 4:44 AM lk_spark wrote: > hi,all: > When I use spark , if I run some SQL to do ETL how can I get > lineage info. I found that , CDH spark have some config about lineage :

how to get spark-sql lineage

2019-05-15 Thread lk_spark
hi,all: When I use spark , if I run some SQL to do ETL how can I get lineage info. I found that , CDH spark have some config about lineage : spark.lineage.enabled=true spark.lineage.log.dir=/var/log/spark2/lineage Are they also work for apache spark ? 2019-05-16

IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff] while using spark-sql-2.4.1v to read data from oracle

2019-05-08 Thread Shyam P
Hi , I have oracle table in which has column schema is : DATA_DATE DATE something like 31-MAR-02 I am trying to retrieve data from oracle using spark-sql-2.4.1 version. I tried to set the JdbcOptions as below : .option("lowerBound", "2002-03-31 00:00:00"); .option

Creating Hive Persistent view using Spark Sql defaults to Sequence File Format

2019-03-19 Thread arun rajesh
Hi All , I am using spark 2.2 in EMR cluster. I have a hive table in ORC format and I need to create a persistent view on top of this hive table. I am using spark sql to create the view. By default spark sql creates the view with LazySerde. How can I change the inputformat to use ORC ? PFA

[SPARK SQL] How to overwrite a Hive table with spark sql (SPARK2)

2019-03-12 Thread luby
Hi, All, I need to overwrite data in a Hive table and I use the following code to do so: df = sqlContext.sql(my-spark-sql-statement); df.count df.write.format("orc").mode("overwrite").saveAsTable("foo") // I also tried 'insertInto("foo") The "df

Re: Is there a way to validate the syntax of raw spark sql query?

2019-03-05 Thread kant kodali
t 10:23 PM kant kodali wrote: > >> Hi All, >> >> Is there a way to validate the syntax of raw spark SQL query? >> >> for example, I would like to know if there is any isValid API call spark >> provides? >> >> val query = "select * from table&

Re: Is there a way to validate the syntax of raw spark sql query?

2019-03-05 Thread Akshay Bhardwaj
> Hi All, > > Is there a way to validate the syntax of raw spark SQL query? > > for example, I would like to know if there is any isValid API call spark > provides? > > val query = "select * from table"if(isValid(query)) { > sparkSession.sql(query) } else {

Is there a way to validate the syntax of raw spark sql query?

2019-03-01 Thread kant kodali
Hi All, Is there a way to validate the syntax of raw spark SQL query? for example, I would like to know if there is any isValid API call spark provides? val query = "select * from table"if(isValid(query)) { sparkSession.sql(query) } else { log.error("Invalid Syn

答复: Re: Re: How to get all input tables of a SPARK SQL 'select' statement

2019-01-28 Thread luby
Thank you so much. I tried your suggestion and it really works! 发件人: "Ramandeep Singh Nanda" 收件人: l...@china-inv.cn 抄送: "Shahab Yunus" , "Tomas Bartalos" , "user @spark/'user @spark'/spark users/user@spark" 日期: 2019/01/26 05:42 主题: Re: Re: Ho

Re: Re: How to get all input tables of a SPARK SQL 'select' statement

2019-01-25 Thread Ramandeep Singh Nanda
tried the suggested approach and it works, but it requires to 'run' the > SQL statement first. > > I just want to parse the SQL statement without running it, so I can do > this in my laptop without connecting to our production environment. > > I tried to write a tool which uses th

答复: Re: How to get all input tables of a SPARK SQL 'select' statement

2019-01-25 Thread luby
bundled with SPARK SQL to extract names of the input tables and it works as expected. But I have a question: The parser generated by SqlBase.g4 only accepts 'select' statement with all keywords such as 'SELECT', 'FROM' and table names capitalized e.g. it accepts 'SELECT * FROM FOO

答复: Re: How to get all input tables of a SPARK SQL 'select' statement

2019-01-24 Thread luby
Thanks all for your help. I'll try your suggestions. Thanks again :) 发件人: "Shahab Yunus" 收件人: "Ramandeep Singh Nanda" 抄送: "Tomas Bartalos" , l...@china-inv.cn, "user @spark/'user @spark'/spark users/user@spark" 日期: 2019/01/24 06:45 主题: Re: Ho

Re: How to get all input tables of a SPARK SQL 'select' statement

2019-01-23 Thread Shahab Yunus
this info. Some details here: https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-Dataset.html#queryExecution On Wed, Jan 23, 2019 at 5:35 PM Ramandeep Singh Nanda wrote: > Explain extended or explain would list the plan along with the tables. Not > aware of any statements that expl

Re: How to get all input tables of a SPARK SQL 'select' statement

2019-01-23 Thread Ramandeep Singh Nanda
:43 napísal(a): > >> Hi, All, >> >> We need to get all input tables of several SPARK SQL 'select' statements. >> >> We can get those information of Hive SQL statements by using 'explain >> dependency select....'. >> But I can't find the equivalent

Re: How to get all input tables of a SPARK SQL 'select' statement

2019-01-23 Thread Tomas Bartalos
This might help: show tables; st 23. 1. 2019 o 10:43 napísal(a): > Hi, All, > > We need to get all input tables of several SPARK SQL 'select' statements. > > We can get those information of Hive SQL statements by using 'explain > dependency select'. > But I can'

How to get all input tables of a SPARK SQL 'select' statement

2019-01-23 Thread luby
Hi, All, We need to get all input tables of several SPARK SQL 'select' statements. We can get those information of Hive SQL statements by using 'explain dependency select'. But I can't find the equivalent command for SPARK SQL. Does anyone know how to get this information of a SPARK SQL

Re: [SPARK SQL] Difference between 'Hive on spark' and Spark SQL

2018-12-20 Thread Jörn Franke
; hand: > a. Turn on 'Hive on spark' feature and run HQLs and > b. Run those query statements with spark SQL > > What the difference between these options? > > Another question is: > There is a hive setting 'hive.optimze.ppd' to enable 'predicated pushdown' > query optimi

[SPARK SQL] Difference between 'Hive on spark' and Spark SQL

2018-12-19 Thread luby
want to improve the performance of these queries and have two options at hand: a. Turn on 'Hive on spark' feature and run HQLs and b. Run those query statements with spark SQL What the difference between these options? Another question is: There is a hive setting 'hive.optimze.ppd' to enable

??????Java: pass parameters in spark sql query

2018-11-30 Thread 965
; : 2018??11??29??(??) 7:55 ??: "user"; ????: Java: pass parameters in spark sql query Hello there, I am trying to pass parameters in spark.sql query in Java code, the same as in this link https://forums.databricks.com/questions/115/how-do-i-pass-parame

Re: Java: pass parameters in spark sql query

2018-11-28 Thread Ramandeep Singh
That's string interpolation. You could create your own for example :bind and then do replaceall, to replace named parameter. On Wed, Nov 28, 2018, 18:55 Mann Du Hello there, > > I am trying to pass parameters in spark.sql query in Java code, the same > as in this link > >

Java: pass parameters in spark sql query

2018-11-28 Thread Mann Du
Hello there, I am trying to pass parameters in spark.sql query in Java code, the same as in this link https://forums.databricks.com/questions/115/how-do-i-pass-parameters-to-my-sql-statements.html The link suggested to use 's' before 'select' as - val param = 100 spark.sql(s""" select * from

[Spark SQL]: Does Spark SQL 2.3+ suppor UDT?

2018-11-27 Thread Suny Tyagi
Class Test object. So will this work in Spark SQL after using SQLUserDefinedType tag and extending UserDefinedType class. As UserDefinedType is private in Spark 2.0. I am just want to know if UDT is support in Spark 2.3+. If yes what is the best to use UserDefinedType or UDTRegisteration

Re: [Spark SQL]: Does Spark SQL 2.3+ suppor UDT?

2018-11-26 Thread Suny Tyagi
st UDFMethod(string name, int age){ > >Test ob = new Test(); > >ob.name = name; > >ob.age = age; > > } > > Sample Spark query- `Select *, UDFMethod(name, age) From SomeTable;` > > Now UDFMethod(name, age) will return Class Test object.

Re: How to do a broadcast join using raw Spark SQL 2.3.1 or 2.3.2?

2018-10-03 Thread kathleen li
Not sure what you mean about “raw” Spark sql, but there is one parameter which will impact the optimizer choose broadcast join automatically or not : spark.sql.autoBroadcastJoinThreshold You can read Spark doc about above parameter setting and using explain to check your join using broadcast

How to do a broadcast join using raw Spark SQL 2.3.1 or 2.3.2?

2018-10-03 Thread kant kodali
Hi All, How to do a broadcast join using raw Spark SQL 2.3.1 or 2.3.2? Thanks

Re: [Spark SQL] why spark sql hash() are returns the same hash value though the keys/expr are not same

2018-09-28 Thread Thakrar, Jayesh
Not sure I get what you mean…. I ran the query that you had – and don’t get the same hash as you. From: Gokula Krishnan D Date: Friday, September 28, 2018 at 10:40 AM To: "Thakrar, Jayesh" Cc: user Subject: Re: [Spark SQL] why spark sql hash() are returns the same hash value thoug

Re: [Spark SQL] why spark sql hash() are returns the same hash value though the keys/expr are not same

2018-09-28 Thread Gokula Krishnan D
4589)|hash(40004)| > > +---+---+ > > | 777096871|-1593820563| > > +---+---+ > > > > > > scala> > > > > *From: *Gokula Krishnan D > *Date: *Tuesday, September 25, 2018 at 8:57 PM > *To: *user

Re: [Spark SQL] why spark sql hash() are returns the same hash value though the keys/expr are not same

2018-09-26 Thread Thakrar, Jayesh
Date: Tuesday, September 25, 2018 at 8:57 PM To: user Subject: [Spark SQL] why spark sql hash() are returns the same hash value though the keys/expr are not same Hello All, I am calculating the hash value of few columns and determining whether its an Insert/Delete/Update Record but found a s

[Spark SQL] why spark sql hash() are returns the same hash value though the keys/expr are not same

2018-09-25 Thread Gokula Krishnan D
Hello All, I am calculating the hash value of few columns and determining whether its an Insert/Delete/Update Record but found a scenario which is little weird since some of the records returns same hash value though the key's are totally different. For the instance, scala> spark.sql("select

Re: How to do efficient self join with Spark-SQL and Scala

2018-09-21 Thread hemant singh
You can use spark dataframe 'when' 'otherwise' clause to replace SQL case statement. This piece will be required to calculate before - 'select student_id from tbl_student where candidate_id = c.candidate_id and approval_id = 2 and academic_start_date is null' Take the count of above DF after

How to do efficient self join with Spark-SQL and Scala

2018-09-21 Thread Chetan Khatri
Dear Spark Users, I came across little weird MSSQL Query to replace with Spark and I am like no clue how to do it in an efficient way with Scala + SparkSQL. Can someone please throw light. I can create view of DataFrame and do it as *spark.sql *(query) but I would like to do it with Scala + Spark

Re: How do I generate current UTC timestamp in raw spark sql?

2018-08-28 Thread Nikita Goyal
ted the code. Regards, Nikita On Tue, Aug 28, 2018 at 2:34 PM kant kodali wrote: > Hi All, > > How do I generate current UTC timestamp using spark sql? > > When I do curent_timestamp() it is giving me local time. > > to_utc_timestamp(current_time(), ) takes timezone i

How do I generate current UTC timestamp in raw spark sql?

2018-08-28 Thread kant kodali
Hi All, How do I generate current UTC timestamp using spark sql? When I do curent_timestamp() it is giving me local time. to_utc_timestamp(current_time(), ) takes timezone in the second parameter and I see no udf that can give me current timezone. when I do spark.conf.set

Re: Re: Re: spark sql data skew

2018-07-23 Thread Gourav Sengupta
https://docs.databricks.com/spark/latest/spark-sql/skew-join.html The above might help, in case you are using a join. On Mon, Jul 23, 2018 at 4:49 AM, 崔苗 wrote: > but how to get count(distinct userId) group by company from count(distinct > userId) group by company+x? > cou

is there a way to parse and modify raw spark sql query?

2018-06-05 Thread kant kodali
Hi All, is there a way to parse and modify raw spark sql query? For example, given the following query spark.sql("select hello from view") I want to modify the query or logical plan such that I can get the result equivalent to the below query. spark.sql("select foo, hello f

what defines dataset partition number in spark sql

2018-05-26 Thread 崔苗
Hi, I want to know when I create a dataset by reading files in hdfs in spark sql, like : Dataset user = spark.read().format("json").load(filePath) , what defines the partition number of the dataset? And what if the filePath is a directory instead of a singe file ? Why we can't get the

Re: [Spark Streaming]: Does DStream workload run over Spark SQL engine?

2018-05-02 Thread Saisai Shao
xecution engine of Spark Streaming > (DStream API): Does Spark streaming jobs run over the Spark SQL engine? > > For example, if I change a configuration parameter related to Spark SQL > (like spark.sql.streaming.minBatchesToRetain or > spark.sql.objectHashAggregate.sortBased.fallbackTh

[Spark Streaming]: Does DStream workload run over Spark SQL engine?

2018-05-02 Thread Khaled Zaouk
Hi, I have a question regarding the execution engine of Spark Streaming (DStream API): Does Spark streaming jobs run over the Spark SQL engine? For example, if I change a configuration parameter related to Spark SQL (like spark.sql.streaming.minBatchesToRetain

Curious case of Spark SQL 2.3 - number of stages different for the same query ever?

2018-04-16 Thread Jacek Laskowski
with 1 vs 5 stages for the very same query plan (even if I changed number of executors or number of cores or anything execution-related). So my question is, is this possible that Spark SQL could give 1-stage execution plan and 5-stage execution plan for the very same query? (I am not saying

Apache spark -2.1.0 question in Spark SQL

2018-04-03 Thread anbu
Please help me on the below error & give me different approach on the below data manipulation. Error:Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other

Merge query using spark sql

2018-04-02 Thread Deepak Sharma
I am using spark to run merge query in postgres sql. The way its being done now is save the data to be merged in postgres as temp tables. Now run the merge queries in postgres using java sql connection and statment . So basically this query runs in postgres. The queries are insert into source

Re: how "hour" function in Spark SQL is supposed to work?

2018-03-20 Thread Serega Sheypak
Ok, this one works: .withColumn("hour", hour(from_unixtime(typedDataset.col("ts") / 1000))) 2018-03-20 22:43 GMT+01:00 Serega Sheypak : > Hi, any updates? Looks like some API inconsistency or bug..? > > 2018-03-17 13:09 GMT+01:00 Serega Sheypak

Re: how "hour" function in Spark SQL is supposed to work?

2018-03-20 Thread Serega Sheypak
Hi, any updates? Looks like some API inconsistency or bug..? 2018-03-17 13:09 GMT+01:00 Serega Sheypak : > > Not sure why you are dividing by 1000. from_unixtime expects a long type > It expects seconds, I have milliseconds. > > > > 2018-03-12 6:16 GMT+01:00 vermanurag

Re: how "hour" function in Spark SQL is supposed to work?

2018-03-17 Thread Serega Sheypak
> Not sure why you are dividing by 1000. from_unixtime expects a long type It expects seconds, I have milliseconds. 2018-03-12 6:16 GMT+01:00 vermanurag : > Not sure why you are dividing by 1000. from_unixtime expects a long type > which is time in milliseconds

Re: how "hour" function in Spark SQL is supposed to work?

2018-03-11 Thread vermanurag
Not sure why you are dividing by 1000. from_unixtime expects a long type which is time in milliseconds from reference date. The following should work: val ds = dataset.withColumn("hour",hour(from_unixtime(dataset.col("ts" -- Sent from:

how "hour" function in Spark SQL is supposed to work?

2018-03-11 Thread Serega Sheypak
hi, desperately trying to extract hour from unix seconds year, month, dayofmonth functions work as expected. hour function always returns 0. val ds = dataset .withColumn("year", year(to_date(from_unixtime(dataset.col("ts") / 1000 .withColumn("month",

"Too Large DataFrame" shuffle Fetch Failed exception in Spark SQL (SPARK-16753) (SPARK-9862)(SPARK-5928)(TAGs - Spark SQL, Intermediate Level, Debug)

2018-02-16 Thread Ashutosh Ranjan
Hi All, My spark Configuration is following. spark = SparkSession.builder.master(mesos_ip) \ .config('spark.executor.cores','3')\ .config('spark.executor.memory','8g')\ .config('spark.es.scroll.size','1')\ .config('spark.network.timeout','600s')\

Re: is there a way to create new column with timeuuid using raw spark sql ?

2018-02-01 Thread Liana Napalkova
unsubscribe DISCLAIMER: Aquest missatge pot contenir informació confidencial. Si vostè no n'és el destinatari, si us plau, esborri'l i faci'ns-ho saber immediatament a la següent adreça: le...@eurecat.org Si el destinatari d'aquest missatge no consent la

Re: is there a way to create new column with timeuuid using raw spark sql ?

2018-02-01 Thread Subhash Sriram
fashion I > want to see how I can create a new Column using the raw sql. I am looking > at this reference https://docs.databricks.com/spark/latest/ > spark-sql/index.html and I am not seeing a way. > > Thanks! > > On Thu, Feb 1, 2018 at 4:01 AM, Jean Georges Perrin <j...@jgp

Re: is there a way to create new column with timeuuid using raw spark sql ?

2018-02-01 Thread kant kodali
a similar fashion I want to see how I can create a new Column using the raw sql. I am looking at this reference https://docs.databricks.com/spark/latest/spark-sql/index.html and I am not seeing a way. Thanks! On Thu, Feb 1, 2018 at 4:01 AM, Jean Georges Perrin <j...@jgp.net> wrote: > Sur

Re: is there a way to create new column with timeuuid using raw spark sql ?

2018-02-01 Thread Jean Georges Perrin
Sure, use withColumn()... jg > On Feb 1, 2018, at 05:50, kant kodali wrote: > > Hi All, > > Is there any way to create a new timeuuid column of a existing dataframe > using raw sql? you can assume that there is a timeuuid udf function if that > helps. > > Thanks!

is there a way to create new column with timeuuid using raw spark sql ?

2018-02-01 Thread kant kodali
Hi All, Is there any way to create a new timeuuid column of a existing dataframe using raw sql? you can assume that there is a timeuuid udf function if that helps. Thanks!

Re: Issue with Cast in Spark Sql

2018-01-30 Thread naresh Goud
onvert a string with decimal value to decimal in Spark Sql > and load it into Hive/Sql Server. > > In Hive instead of getting converted to decimal all my values are coming > as null. > > In Sql Server instead of getting decimal values are coming without > precision > > Can

Issue with Cast in Spark Sql

2018-01-30 Thread Arnav kumar
Hi Experts I am trying to convert a string with decimal value to decimal in Spark Sql and load it into Hive/Sql Server. In Hive instead of getting converted to decimal all my values are coming as null. In Sql Server instead of getting decimal values are coming without precision Can you please

Re: Streaming Analytics/BI tool to connect Spark SQL

2017-12-07 Thread Pierce Lamb
Hi Umar, While this answer is a bit dated, you make find it useful in diagnosing a store for Spark SQL tables: https://stackoverflow.com/a/39753976/3723346 I don't know much about Pentaho or Arcadia, but I assume many of the listed options have a JDBC or ODBC client. Hope this helps, Pierce

Streaming Analytics/BI tool to connect Spark SQL

2017-12-07 Thread umargeek
Hi All, We are currently looking for real-time streaming analytics of data stored as Spark SQL tables is there any external connectivity available to connect with BI tools(Pentaho/Arcadia). currently, we are storing data into the hive tables but its response on the Arcadia dashboard is slow

Re: How to export the Spark SQL jobs from the HiveThriftServer2

2017-12-06 Thread wenxing zheng
> > Can anyone kindly advice how to dump the spark SQL jobs for audit? Just > like the one for the MapReduce jobs (https://hadoop.apache.org/ > docs/current/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html). > > Thanks again, > Wenxing >

How to export the Spark SQL jobs from the HiveThriftServer2

2017-12-05 Thread wenxing zheng
#rest-api), I still can't get the jobs for a given application with the endpoint: */applications/[app-id]/jobs* Can anyone kindly advice how to dump the spark SQL jobs for audit? Just like the one for the MapReduce jobs ( https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site

<    1   2   3   4   5   6   7   8   9   10   >