Re: [Structured Streaming Query] Calculate Running Avg from Kafka feed using SQL query

2018-04-02 Thread Aakash Basu
-For Console Print--- query1 = avg \ .writeStream \ .format("console") \ .outputMode("complete") \ .start() query = aggregate_func \ .writeStream \ .format("console") \ .start() # .

Re: [Structured Streaming Query] Calculate Running Avg from Kafka feed using SQL query

2018-04-02 Thread Aakash Basu
ransformed_Stream_DF") > aggregate_func = spark.sql( > "select t.Col2 , (Select AVG(Col1) as Avg from transformed_Stream_DF) > as myAvg from transformed_Stream_DF t") # (Col2/(AVG(Col1)) as Col3)") > > # ---For Console Print--- > > query = aggregate_func \ > .writeStream \ > .format("console") \ > .start() > # .outputMode("complete") \ > # ---Console Print ends--- > > query.awaitTermination() > # /home/kafka/Downloads/spark-2.3.0-bin-hadoop2.7/bin/spark-submit > --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.0 > /home/aakashbasu/PycharmProjects/AllMyRnD/Kafka_Spark/Stream_Col_Oper_Spark.py > > > > > Thanks, > Aakash. >

Merge query using spark sql

2018-04-02 Thread Deepak Sharma
I am using spark to run merge query in postgres sql. The way its being done now is save the data to be merged in postgres as temp tables. Now run the merge queries in postgres using java sql connection and statment . So basically this query runs in postgres. The queries are insert into source

[Structured Streaming Query] Calculate Running Avg from Kafka feed using SQL query

2018-04-02 Thread Aakash Basu
# (Col2/(AVG(Col1)) as Col3)") # ---For Console Print--- query = aggregate_func \ .writeStream \ .format("console") \ .start() # .outputMode("complete") \ # ---Console Print ends--- query.awaitTerm

Re: spark-sql importing schemas from catalogString or schema.toString()

2018-03-28 Thread Colin Williams
>> StructField("someName",BooleanType,true), >> StructField("someName",LongType,true), >> StructField("someName",StringType,true), >> StructField("someName",StringType,true), >> StructField("someName",StringType,true), >> Struc

Re: spark-sql importing schemas from catalogString or schema.toString()

2018-03-28 Thread Colin Williams
t;,LongType,true), > StructField("someName",StringType,true), > StructField("someName",StringType,true), > StructField("someName",StringType,true), > StructField("someName",StringType,true)) > > > The catalogString looks something like where SOME_TABLE_NAME is

Re: spark-sql importing schemas from catalogString or schema.toString()

2018-03-28 Thread Colin Williams
ructField("someName",StringType,true), StructField("someName",StringType,true), StructField("someName",StringType,true)) The catalogString looks something like where SOME_TABLE_NAME is unique: struct, SOME_TABLE_NAME:struct,SOME_TABLE_NAME:struct, SOME_TABLE_NAME:struct,SOME_TAB

spark-sql importing schemas from catalogString or schema.toString()

2018-03-28 Thread Colin Williams
I've been learning spark-sql and have been trying to export and import some of the generated schemas to edit them. I've been writing the schemas to strings like df1.schema.toString() and df.schema.catalogString But I've been having trouble loading the schemas created. Does anyo

Re: how "hour" function in Spark SQL is supposed to work?

2018-03-20 Thread Serega Sheypak
Ok, this one works: .withColumn("hour", hour(from_unixtime(typedDataset.col("ts") / 1000))) 2018-03-20 22:43 GMT+01:00 Serega Sheypak : > Hi, any updates? Looks like some API inconsistency or bug..? > > 2018-03-17 13:09 GMT+01:00 Serega Sheypak : > >> > Not sure why you are dividing by 1000. f

Re: how "hour" function in Spark SQL is supposed to work?

2018-03-20 Thread Serega Sheypak
Hi, any updates? Looks like some API inconsistency or bug..? 2018-03-17 13:09 GMT+01:00 Serega Sheypak : > > Not sure why you are dividing by 1000. from_unixtime expects a long type > It expects seconds, I have milliseconds. > > > > 2018-03-12 6:16 GMT+01:00 vermanurag : > >> Not sure why you are

Re: [PySpark SQL] sql function to_date and to_timestamp return the same data type

2018-03-18 Thread Hyukjin Kwon
Mind if I ask a reproducer? seems returning timestamps fine: >>> from pyspark.sql.functions import * >>> spark.range(1).select(to_timestamp(current_timestamp())).printSchema() root |-- to_timestamp(current_timestamp()): timestamp (nullable = false) >>> spark.range(1).select(to_timestamp(current_

Re: how "hour" function in Spark SQL is supposed to work?

2018-03-17 Thread Serega Sheypak
> Not sure why you are dividing by 1000. from_unixtime expects a long type It expects seconds, I have milliseconds. 2018-03-12 6:16 GMT+01:00 vermanurag : > Not sure why you are dividing by 1000. from_unixtime expects a long type > which is time in milliseconds from reference date. > > The foll

Re: [PySpark SQL] sql function to_date and to_timestamp return the same data type

2018-03-15 Thread Nicholas Sharkey
unsubscribe On Thu, Mar 15, 2018 at 8:00 PM, Alan Featherston Lago wrote: > I'm a pretty new user of spark and I've run into this issue with the > pyspark docs: > > The functions pyspark.sql.functions.to_date && > pyspark.sql.functions.to_timestamp > behave in the same way. As in both functions

[PySpark SQL] sql function to_date and to_timestamp return the same data type

2018-03-15 Thread Alan Featherston Lago
I'm a pretty new user of spark and I've run into this issue with the pyspark docs: The functions pyspark.sql.functions.to_date && pyspark.sql.functions.to_timestamp behave in the same way. As in both functions convert a Column of pyspark.sql.types.StringType or pyspark.sql.types.TimestampType into

Re: how "hour" function in Spark SQL is supposed to work?

2018-03-11 Thread vermanurag
Not sure why you are dividing by 1000. from_unixtime expects a long type which is time in milliseconds from reference date. The following should work: val ds = dataset.withColumn("hour",hour(from_unixtime(dataset.col("ts" -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com

spark sql get result time larger than compute Duration

2018-03-11 Thread wkhapy_1
get result 1.67s <http://apache-spark-user-list.1001560.n3.nabble.com/file/t3966/wpLV3.png> compute cost 0.2s <http://apache-spark-user-list.1001560.n3.nabble.com/file/t3966/Kl0VG.png> below is sql select event_date, dim ,concat_ws('|',collect_list(result)) result f

how "hour" function in Spark SQL is supposed to work?

2018-03-11 Thread Serega Sheypak
hi, desperately trying to extract hour from unix seconds year, month, dayofmonth functions work as expected. hour function always returns 0. val ds = dataset .withColumn("year", year(to_date(from_unixtime(dataset.col("ts") / 1000 .withColumn("month", month(to_date(from_unixtime(dataset.c

"Too Large DataFrame" shuffle Fetch Failed exception in Spark SQL (SPARK-16753) (SPARK-9862)(SPARK-5928)(TAGs - Spark SQL, Intermediate Level, Debug)

2018-02-16 Thread Ashutosh Ranjan
.shuffle.spill.compress', 'true')\ .config('spark.driver.memory','8g')\ .config('spark.cores.max','12')\ .config('spark.sql.shuffle.partitions','6000')\ .config('es.nodes',es_nodes)\ .config('es.port',es_por

Re: [spark-sql] Custom Query Execution listener via conf properties

2018-02-16 Thread Marcelo Vanzin
According to https://issues.apache.org/jira/browse/SPARK-19558 this feature was added in 2.3. On Fri, Feb 16, 2018 at 12:43 AM, kurian vs wrote: > Hi, > > I was trying to create a custom Query execution listener by extending the > org.apache.spark.sql.util.QueryExecutionListener class. My custom

[spark-sql] Custom Query Execution listener via conf properties

2018-02-16 Thread kurian vs
g an extra ExecutionListenerManager <https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-BaseSessionStateBuilder.html#listenerManager> .register(customListener) line. Is this assumption correct? - From which version of spark is this supported? (i'm using spark V 2.2.1) Ca

Re: is there a way to create new column with timeuuid using raw spark sql ?

2018-02-01 Thread Liana Napalkova
unsubscribe DISCLAIMER: Aquest missatge pot contenir informació confidencial. Si vostè no n'és el destinatari, si us plau, esborri'l i faci'ns-ho saber immediatament a la següent adreça: le...@eurecat.org Si el destinatari d'aquest missatge no consent la utilitz

Re: is there a way to create new column with timeuuid using raw spark sql ?

2018-02-01 Thread Subhash Sriram
umn() ? If so, thats not what I meant. I > meant creating a new column using raw sql. otherwords say I dont have a > dataframe I only have the view name from df.createOrReplaceView("table") > so I can do things like "select * from table" so in a similar fashion I > wan

Re: is there a way to create new column with timeuuid using raw spark sql ?

2018-02-01 Thread kant kodali
Hi, Are you talking about df.withColumn() ? If so, thats not what I meant. I meant creating a new column using raw sql. otherwords say I dont have a dataframe I only have the view name from df.createOrReplaceView("table") so I can do things like "select * from table" so in

Re: is there a way to create new column with timeuuid using raw spark sql ?

2018-02-01 Thread Jean Georges Perrin
Sure, use withColumn()... jg > On Feb 1, 2018, at 05:50, kant kodali wrote: > > Hi All, > > Is there any way to create a new timeuuid column of a existing dataframe > using raw sql? you can assume that there is a timeuuid udf function if that &g

is there a way to create new column with timeuuid using raw spark sql ?

2018-02-01 Thread kant kodali
Hi All, Is there any way to create a new timeuuid column of a existing dataframe using raw sql? you can assume that there is a timeuuid udf function if that helps. Thanks!

Re: Issue with Cast in Spark Sql

2018-01-30 Thread naresh Goud
specify the precision value as max precision value for column -1 in above case max precision is 5 (123456.*6*) so we should specify decimal(10,5) Thank you, Naresh On Tue, Jan 30, 2018 at 8:48 PM, Arnav kumar wrote: > Hi Experts > > I am trying to convert a string with deci

why groupByKey still shuffle if SQL does "Distribute By" on same columns ?

2018-01-30 Thread Dibyendu Bhattacharya
Hi, I am trying something like this.. val sesDS: Dataset[XXX] = hiveContext.sql(select).as[XXX] The select statement is something like this : "select * from sometable DISTRIBUTE by col1, col2, col3" Then comes groupByKey... val gpbyDS = sesDS .groupByKey(x => (x.col1, x.col2, x.col3))

Issue with Cast in Spark Sql

2018-01-30 Thread Arnav kumar
Hi Experts I am trying to convert a string with decimal value to decimal in Spark Sql and load it into Hive/Sql Server. In Hive instead of getting converted to decimal all my values are coming as null. In Sql Server instead of getting decimal values are coming without precision Can you please

Kafka deserialization to Structured Streaming SQL - Encoders.bean result doesn't match itself?

2018-01-25 Thread Iain Cundy
Hi All I'm trying to move from MapWithState to Structured Streaming v2.2.1, but I've run into a problem. To convert from Kafka data with a binary (protobuf) value to SQL I'm taking the dataset from readStream and doing Dataset s = dataset.selectExpr("timestamp&q

Spark SQL bucket pruning support

2018-01-22 Thread Joe Wang
hub.com/apache/spark/pull/12300>, and the logic in the BucketedReadSuite to verify that pruned buckets are empty is currently commented out <https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala#L114> . Thanks, Joe

Does Spark and Hive use Same SQL parser : ANTLR

2018-01-18 Thread Pralabh Kumar
Hi Does hive and spark uses same SQL parser provided by ANTLR . Did they generate the same logical plan . Please help on the same. Regards Pralabh Kumar

Re: Broken SQL Visualization?

2018-01-15 Thread Wenchen Fan
message > From: Tomasz Gawęda > Date: 1/15/18 2:07 PM (GMT-08:00) > To: d...@spark.apache.org, user@spark.apache.org > Subject: Broken SQL Visualization? > > Hi, > > today I have updated my test cluster to current Spark master, after that > my SQL Vis

Re: Broken SQL Visualization?

2018-01-15 Thread Ted Yu
Did you include any picture ? Looks like the picture didn't go thru. Please use third party site.  Thanks Original message From: Tomasz Gawęda Date: 1/15/18 2:07 PM (GMT-08:00) To: d...@spark.apache.org, user@spark.apache.org Subject: Broken SQL Visualization? Hi, to

Regression in Spark SQL UI Tab in Spark 2.2.1

2018-01-11 Thread Yuval Itzchakov
Hi, I've recently installed Spark 2.2.1, and it seems like the SQL tab isn't getting updated at all, although the "Jobs" tab gets updated with new incoming jobs, the SQL tab remains empty, all the time. I was wondering if anyone noticed such regression in 2.2.1? --

Re: [Spark SQL] How to run a custom meta query for `ANALYZE TABLE`

2018-01-02 Thread Jörn Franke
Hi, No this is not possible with the current data source API. However, there is a new data source API v2 on its way - maybe it will support it. Alternatively, you can have a config option to calculate meta data after an insert. However, could you please explain more for which dB your datasour

[Spark SQL] How to run a custom meta query for `ANALYZE TABLE`

2018-01-02 Thread Jason Heo
Hi, I'm working on integrating Spark and a custom data source. Most things go well with nice Spark Data Source APIs (Thanks to well designed APIs) But, one thing I couldn't resolve is that how to execute custom meta query for `ANALYZE TABLE` The custom data source I'm currently working on has a

[Spark SQL]: Dataset can not map into Dataset in java

2017-12-07 Thread Himasha de Silva
Hi, I'm trying to map a Dataset that read from csv files into a Dataset. But it gives some errors. Can anyone please help me to figure it out? Dataset t_en_data = session.read().option("header","true") .option("inferSchema","true") .csv("J:\\csv_path\\T_EN"); Dataset mappedDatas

[Spark SQL]: Dataset can not map into Dataset in java

2017-12-07 Thread Himasha de Silva
ext.scala:2062) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:336) at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Data

Re: sparkSession.sql("sql query") vs df.sqlContext().sql(this.query) ?

2017-12-07 Thread khathiravan raj maadhaven
es the session/context by default created and available: * sparkSession.sql(**"select value from table")* while the following would look for create one & run the query (which I believe is extra overhead): *df.sqlContext().sql(**"select value from table")* Regards Raj On We

Re: Streaming Analytics/BI tool to connect Spark SQL

2017-12-07 Thread Pierce Lamb
Hi Umar, While this answer is a bit dated, you make find it useful in diagnosing a store for Spark SQL tables: https://stackoverflow.com/a/39753976/3723346 I don't know much about Pentaho or Arcadia, but I assume many of the listed options have a JDBC or ODBC client. Hope this helps, P

Streaming Analytics/BI tool to connect Spark SQL

2017-12-07 Thread umargeek
Hi All, We are currently looking for real-time streaming analytics of data stored as Spark SQL tables is there any external connectivity available to connect with BI tools(Pentaho/Arcadia). currently, we are storing data into the hive tables but its response on the Arcadia dashboard is slow

sparkSession.sql("sql query") vs df.sqlContext().sql(this.query) ?

2017-12-06 Thread kant kodali
tSchema(); *Dataset resultSet = df.sqlContext().sql(* *"select value from table"); //sparkSession.sql(this.query);*StreamingQuery streamingQuery = resultSet .writeStream() .trigger(Trigger.ProcessingTime(1000)) .format("console") .start(); v

Re: How to export the Spark SQL jobs from the HiveThriftServer2

2017-12-06 Thread wenxing zheng
ication id > and the attempt ID of the thrift server. But with the REST api described on > the page (https://spark.apache.org/docs/latest/monitoring.html#rest-api), > I still can't get the jobs for a given application with the endpoint: > */applications/[app-id]/jobs* > > Can any

How to export the Spark SQL jobs from the HiveThriftServer2

2017-12-05 Thread wenxing zheng
#rest-api), I still can't get the jobs for a given application with the endpoint: */applications/[app-id]/jobs* Can anyone kindly advice how to dump the spark SQL jobs for audit? Just like the one for the MapReduce jobs ( https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn

[Spark SQL]: DataFrame schema resulting in NullPointerException

2017-11-19 Thread Chitral Verma
Hey, I'm working on this use case that involves converting DStreams to Dataframes after some transformations. I've simplified my code into the following snippet so as to reproduce the error. Also, I've mentioned below my environment settings. *Environment:* Spark Version: 2.2.0 Java: 1.8 Executi

Re: Spark SQL - Truncate Day / Hour

2017-11-13 Thread Eike von Seggern
Hi, you can truncate datetimes like this (in pyspark), e.g. to 5 minutes: import pyspark.sql.functions as F df.select((F.floor(F.col('myDateColumn').cast('long') / 300) * 300).cast('timestamp')) Best, Eike David Hodefi schrieb am Mo., 13. Nov. 2017 um 12:27 Uhr: > I am familiar with those fun

Re: Spark SQL - Truncate Day / Hour

2017-11-13 Thread David Hodefi
I am familiar with those functions, none of them is actually truncating a date. We can use those methods to help implement truncate method. I think truncating a day/ hour should be as simple as "truncate(...,"DD") or truncate(...,"HH") ". On Thu, Nov 9, 2017 at 8:23 PM, Gaspar Muñoz wrote: > T

Re: Can we pass the Calcite streaming sql queries to spark sql?

2017-11-09 Thread Tathagata Das
I dont think so. Calcite's SQL is an extension of standard SQL (keywords like STREAM, etc.) which we dont support; we just support regular SQL, so queries like "SELECT STREAM " will not work. On Thu, Nov 9, 2017 at 11:50 AM, kant kodali wrote: > Can we pass the Cal

Can we pass the Calcite streaming sql queries to spark sql?

2017-11-09 Thread kant kodali
Can we pass the Calcite streaming sql queries to spark sql? https://calcite.apache.org/docs/stream.html#references

Re: Spark SQL - Truncate Day / Hour

2017-11-09 Thread Gaspar Muñoz
There are functions for day (called dayOfMonth and dayOfYear) and hour (called hour). You can view them here: https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions Example: import org.apache.spark.sql.functions._ val df = df.select(hour($"myDateColumn"), dayOfMo

Spark SQL - Truncate Day / Hour

2017-11-09 Thread David Hodefi
I would like to truncate date to his day or hour. currently it is only possible to truncate MONTH or YEAR. 1.How can achieve that? 2.Is there any pull request about this issue? 3.If there is not any open pull request about this issue, what are the implications that I should be aware of when coding

A pyspark sql query

2017-11-06 Thread paulgureghian
are the min,max, and mean functions correct ? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsu

spark sql truncate function

2017-10-30 Thread David Hodeffi
I saw that it is possible to truncate date function with MM or YY but it is not possible to truncate by WEEK ,HOUR, MINUTE. Am I right? Is there any objection to support it or it is just not implemented yet. Thanks David Confidentiality: This communication and any attachments are intended for

Re: Orc predicate pushdown with Spark Sql

2017-10-27 Thread Siva Gudavalli
I found a workaround, when I create Hive Table using Spark “saveAsTable”, I see filters being pushed down. -> other approaches I tried where filters are not pushed down Is, 1) when I create Hive Table upfront and load orc into it using Spark SQL 2) when I create orc files using spark SQL

Does spark sql has timezone support?

2017-10-25 Thread kant kodali
Hi All, Does spark sql has timezone support? Thanks, kant

Re: Orc predicate pushdown with Spark Sql

2017-10-24 Thread Jörn Franke
Well the meta information is in the file so I am not surprised that it reads the file, but it should not read all the content, which is probably also not happening. > On 24. Oct 2017, at 18:16, Siva Gudavalli > wrote: > > > Hello, > > I have an update here. >

Re: Orc predicate pushdown with Spark Sql

2017-10-24 Thread Siva Gudavalli
Hello, I have an update here.  spark SQL is pushing predicates down, if I load the orc files in spark Context and Is not the same when I try to read hive Table directly.please let me know if i am missing something here. Is this supported in spark  ?  when I load the files in spark Context

Orc predicate pushdown with Spark Sql

2017-10-23 Thread Siva Gudavalli
Hello, I am working with Spark SQL to query Hive Managed Table (in Orc Format) I have my data organized by partitions and asked to set indexes for each 50,000 Rows by setting ('orc.row.index.stride'='5') lets say -> after evaluating partition there are around 50

Re: [Spark SQL] Missing data in Elastisearch when writing data with elasticsearch-spark connector

2017-10-09 Thread ayan guha
Have you raised it in ES connector github as issues? In my past experience (with hadoop connector with Pig), they respond pretty quickly. On Tue, Oct 10, 2017 at 12:36 AM, sixers wrote: > ### Issue description > > We have an issue with data consistency when storing data in Elasticsearch > using

[Spark SQL] Missing data in Elastisearch when writing data with elasticsearch-spark connector

2017-10-09 Thread sixers
### Issue description We have an issue with data consistency when storing data in Elasticsearch using Spark and elasticsearch-spark connector. Job finishes successfully, but when we compare the original data (stored in S3), with the data stored in ES, some documents are not present in Elasticsearc

[SPARK-SQL] Spark Persist slower than non-persist call.

2017-09-28 Thread sfbayeng
My settings are: Running Spark 2.1 on 3 node YARN cluster with 160 GB. Dynamic allocation turned on. spark.executor.memory=6G, spark.executor.cores=6 First, I am reading hive tables: orders(329MB) and lineitems(1.43GB) and doing left outer join. Next, I apply 7 different fil

How to know what are possible operations spark raw sql can support?

2017-09-21 Thread kant kodali
How to know what are all possible operations spark raw sql can support? Is there any document ? Thanks!

Re: [SPARK-SQL] Does spark-sql have Authorization built in?

2017-09-17 Thread Arun Khetarpal
Ping. I did some digging around in the code base - I see that this is not present currently. Just looking for an acknowledgement Regards, Arun > On 15-Sep-2017, at 8:43 PM, Arun Khetarpal wrote: > > Hi - > > Wanted to understand if spark sql has GRANT and REVOKE state

Re: [SPARK-SQL] Does spark-sql have Authorization built in?

2017-09-16 Thread Jörn Franke
It depends on the permissions the user has on the local file system or HDFS, so there is no need to have grant/revoke. > On 15. Sep 2017, at 17:13, Arun Khetarpal wrote: > > Hi - > > Wanted to understand if spark sql has GRANT and REVOKE statements available? > Is anyone

Re: [SPARK-SQL] Does spark-sql have Authorization built in?

2017-09-16 Thread Akhil Das
I guess no. I came across a test case where they are marked as Unsupported, you can see it here. <https://github.com/apache/spark/blob/master/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala#L1120> However, the one running inside Databricks has support fo

[SPARK-SQL] Does spark-sql have Authorization built in?

2017-09-15 Thread Arun Khetarpal
Hi - Wanted to understand if spark sql has GRANT and REVOKE statements available? Is anyone working on making that available? Regards, Arun - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

[SPARK-SQL] Spark Persist slower than non-persist calls

2017-09-01 Thread sfbayeng
My settings are: Running Spark 2.1 on 3 node YARN cluster with 160 GB. Dynamic allocation turned on. spark.executor.memory=6G, spark.executor.cores=6 First, I am reading hive tables: orders(329MB) and lineitems(1.43GB) and doing left outer join. Next, I apply 7 different filter conditions based on

[SPARK-SQL] Spark Persist slower than non-persist calls

2017-08-31 Thread saurabh raval
Spark 2.1 My settings are: Running Spark 2.1 on 3 node YARN cluster with 160 GB. Dynamic allocation turned on. spark.executor.memory=6G, spark.executor.cores=6 First, I am reading hive tables: orders(329MB) and lineitems(1.43GB) and doing left outer join.Next, I apply 7 different filter condition

Re: Spark SQL vs HiveQL

2017-08-28 Thread Michael Artz
Thanks for responding BUT I would not be reading from a file if it was Hive. I'm comparing Hive LLAP from a hive table vs Spark SQL from a file. That is the question. Thanks On Mon, Aug 28, 2017 at 1:58 PM, Imran Rajjad wrote: > If reading directly from file then Spark SQL should

Re: Spark SQL vs HiveQL

2017-08-28 Thread Imran Rajjad
If reading directly from file then Spark SQL should be your choice On Mon, Aug 28, 2017 at 10:25 PM Michael Artz wrote: > Just to be clear, I'm referring to having Spark reading from a file, not > from a Hive table. And it will have tungsten engine off heap serialization > after

Re: Spark SQL vs HiveQL

2017-08-28 Thread Michael Artz
t; There isn't any good source to answer the question if Hive as an > SQL-On-Hadoop engine just as fast as Spark SQL now? I just want to know if > there has been a comparison done lately for HiveQL vs Spark SQL on Spark > versions 2.1 or later. I have a large ETL process, with many

Spark SQL vs HiveQL

2017-08-28 Thread Michael Artz
Hi, There isn't any good source to answer the question if Hive as an SQL-On-Hadoop engine just as fast as Spark SQL now? I just want to know if there has been a comparison done lately for HiveQL vs Spark SQL on Spark versions 2.1 or later. I have a large ETL process, with many table join

Re: Does Spark SQL uses Calcite?

2017-08-20 Thread kant kodali
Hi Jules, I am looking to connect to Spark via JDBC so I can run Spark SQL queries via JDBC but not use SPARK SQL to connect to other JDBC sources. Thanks! On Sat, Aug 19, 2017 at 5:54 PM, Jules Damji wrote: > Try this link to see how you may connect https://docs.databricks.com/ >

Re: Does Spark SQL uses Calcite?

2017-08-19 Thread Jules Damji
Try this link to see how you may connect https://docs.databricks.com/spark/latest/data-sources/sql-databases.html Cheers Jules Sent from my iPhone Pardon the dumb thumb typos :) > On Aug 19, 2017, at 5:27 PM, kant kodali wrote: > > Hi Russell, > > I went throug

Re: Does Spark SQL uses Calcite?

2017-08-19 Thread kant kodali
Hi Russell, I went through this https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-thrift-server.html and I am still a bit confused on what hive is doing in here ? Is there any example I can look at on how to talk to Spark using Spark SQL JDBC driver alone and not hive ? Thanks

Re: Does Spark SQL uses Calcite?

2017-08-12 Thread Russell Spitzer
You don't have to go through hive. It's just spark sql. The application is just a forked hive thrift server. On Fri, Aug 11, 2017 at 8:53 PM kant kodali wrote: > @Ryan it looks like if I enable thrift server I need to go through hive. I > was talking more about having JDBC con

Re: Does Spark SQL uses Calcite?

2017-08-11 Thread kant kodali
@Ryan it looks like if I enable thrift server I need to go through hive. I was talking more about having JDBC connector for Spark SQL itself other words not going through hive. On Fri, Aug 11, 2017 at 6:50 PM, kant kodali wrote: > @Ryan Does it work with Spark SQL 2.1.1? > > On Fr

Re: Does Spark SQL uses Calcite?

2017-08-11 Thread kant kodali
@Ryan Does it work with Spark SQL 2.1.1? On Fri, Aug 11, 2017 at 12:53 AM, Ryan wrote: > the thrift server is a jdbc server, Kanth > > On Fri, Aug 11, 2017 at 2:51 PM, wrote: > >> I also wonder why there isn't a jdbc connector for spark sql? >> >> Sent from

Re: SQL specific documentation for recent Spark releases

2017-08-11 Thread Reynold Xin
This PR should help you in the next release: https://github.com/apache/spark/pull/18702 On Thu, Aug 10, 2017 at 7:46 PM, Stephen Boesch wrote: > > The correct link is https://docs.databricks.com/ > spark/latest/spark-sql/index.html . > > This link does have the core syntax

Re: Write only one output file in Spark SQL

2017-08-11 Thread Chetan Khatri
ary table and end up having less files instead of more files > with zero bytes. > > I am using spark sql query of hive insert overwite not the write method on > dataframe as it is not supported in 1.6 version of spark for kerberos > cluster. > > > On Fri, Aug 11, 2017 at 12:23 PM,

Re: Write only one output file in Spark SQL

2017-08-11 Thread KhajaAsmath Mohammed
with zero bytes. I am using spark sql query of hive insert overwite not the write method on dataframe as it is not supported in 1.6 version of spark for kerberos cluster. On Fri, Aug 11, 2017 at 12:23 PM, Lukas Bradley wrote: > Please show the write() call, and the results in HDFS. What

Re: Write only one output file in Spark SQL

2017-08-11 Thread Lukas Bradley
RWRITE TABLE blab.pyspark_dpprq SELECT * FROM > tempRaw') > > > > > On Fri, Aug 11, 2017 at 11:00 AM, Daniel van der Ende < > daniel.vandere...@gmail.com> wrote: > >> Hi Asmath, >> >> Could you share the code you're running? >> >> Daniel &g

Re: Write only one output file in Spark SQL

2017-08-11 Thread KhajaAsmath Mohammed
running? > > Daniel > > On Fri, 11 Aug 2017, 17:53 KhajaAsmath Mohammed, > wrote: > >> Hi, >> >> >> >> I am using spark sql to write data back to hdfs and it is resulting in >> multiple output files. >> >> >> >> I tried

Re: Write only one output file in Spark SQL

2017-08-11 Thread Daniel van der Ende
Hi Asmath, Could you share the code you're running? Daniel On Fri, 11 Aug 2017, 17:53 KhajaAsmath Mohammed, wrote: > Hi, > > > > I am using spark sql to write data back to hdfs and it is resulting in > multiple output files. > > > > I tried changing number

Write only one output file in Spark SQL

2017-08-11 Thread KhajaAsmath Mohammed
Hi, I am using spark sql to write data back to hdfs and it is resulting in multiple output files. I tried changing number spark.sql.shuffle.partitions=1 but it resulted in very slow performance. Also tried coalesce and repartition still the same issue. any suggestions? Thanks, Asmath

Re: Does Spark SQL uses Calcite?

2017-08-11 Thread Ryan
the thrift server is a jdbc server, Kanth On Fri, Aug 11, 2017 at 2:51 PM, wrote: > I also wonder why there isn't a jdbc connector for spark sql? > > Sent from my iPhone > > On Aug 10, 2017, at 2:45 PM, Jules Damji wrote: > > Yes, it's more used in Hive tha

Re: Does Spark SQL uses Calcite?

2017-08-10 Thread kanth909
I also wonder why there isn't a jdbc connector for spark sql? Sent from my iPhone > On Aug 10, 2017, at 2:45 PM, Jules Damji wrote: > > Yes, it's more used in Hive than Spark > > Sent from my iPhone > Pardon the dumb thumb typos :) > >> On Aug 10, 2017,

Re: SQL specific documentation for recent Spark releases

2017-08-10 Thread Stephen Boesch
The correct link is https://docs.databricks.com/spark/latest/spark-sql/index.html . This link does have the core syntax such as the BNF for the DDL and DML and SELECT. It does *not *have a reference for date / string / numeric functions: is there any such reference at this point? It is not

Re: SQL specific documentation for recent Spark releases

2017-08-10 Thread Jules Damji
I refer to docs.databricks.com/Spark/latest/Spark-sql/index.html. Cheers Jules Sent from my iPhone Pardon the dumb thumb typos :) > On Aug 10, 2017, at 1:46 PM, Stephen Boesch wrote: > > > While the DataFrame/DataSets are useful in many circumstances they are > cumbersome

Re: Does Spark SQL uses Calcite?

2017-08-10 Thread Jules Damji
I see a calcite dependency in Spark I wonder where Calcite is being >> used? >> >>> On Thu, Aug 10, 2017 at 1:30 PM, Sathish Kumaran Vairavelu >>> wrote: >>> Spark SQL doesn't use Calcite >>> >>>> On Thu, Aug 10, 2017 at 3:14 P

Re: Does Spark SQL uses Calcite?

2017-08-10 Thread Sathish Kumaran Vairavelu
I think it is for hive dependency. On Thu, Aug 10, 2017 at 4:14 PM kant kodali wrote: > Since I see a calcite dependency in Spark I wonder where Calcite is being > used? > > On Thu, Aug 10, 2017 at 1:30 PM, Sathish Kumaran Vairavelu < > vsathishkuma...@gmail.com> wrote: &

Re: Does Spark SQL uses Calcite?

2017-08-10 Thread kant kodali
Since I see a calcite dependency in Spark I wonder where Calcite is being used? On Thu, Aug 10, 2017 at 1:30 PM, Sathish Kumaran Vairavelu < vsathishkuma...@gmail.com> wrote: > Spark SQL doesn't use Calcite > > On Thu, Aug 10, 2017 at 3:14 PM kant kodali wrote: > >>

SQL specific documentation for recent Spark releases

2017-08-10 Thread Stephen Boesch
While the DataFrame/DataSets are useful in many circumstances they are cumbersome for many types of complex sql queries. Is there an up to date *SQL* reference - i.e. not DataFrame DSL operations - for version 2.2? An example of what is not clear: what constructs are supported within

Re: Does Spark SQL uses Calcite?

2017-08-10 Thread Sathish Kumaran Vairavelu
Spark SQL doesn't use Calcite On Thu, Aug 10, 2017 at 3:14 PM kant kodali wrote: > Hi All, > > Does Spark SQL uses Calcite? If so, what for? I thought the Spark SQL has > catalyst which would generate its own logical plans, physical plans and > other optimizations. > > Thanks, > Kant >

Does Spark SQL uses Calcite?

2017-08-10 Thread kant kodali
Hi All, Does Spark SQL uses Calcite? If so, what for? I thought the Spark SQL has catalyst which would generate its own logical plans, physical plans and other optimizations. Thanks, Kant

[Spark SQL Lack of ForEach Sink in Python]: Is there anyway to use a ForEach sink in a Python application?

2017-08-01 Thread Denis Li
I am trying to use PySpark to read a Kafka stream and then write it to Redis. However, PySpark does not have support for a ForEach sink. So, I am thinking of reading the Kafka stream into a DataFrame in Python and then sending that DataFrame into a Scala application to be written to Redis. Is there

how to get the key in Map with SQL

2017-07-30 Thread ??????????
Hi all, I have a table looks like: +-+-+ |A|B| |0|[Map(1->a),1]| |0|[Map(1->b),2]| I want to pickup the key and value in Map. My code looks like df.select($"B._2".alias("X"),$"B._1.key".alias("Y")).show The output is |X|Y| |1|null| |2|null| Would you like tell me how to

Re: some Ideas on expressing Spark SQL using JSON

2017-07-30 Thread Gourav Sengupta
t; >> Because sparks dsl partially supports compile time type safety. E.g. the >> compiler will notify you that a sql function was misspelled when using the >> dsl opposed to the plain sql string which is only parsed at runtime. >> Sathish Kumaran Vairavelu schrieb am Di. 25. &

Re: Complex types projection handling with Spark 2 SQL and Parquet

2017-07-27 Thread Patrick
Hi , I am having the same issue. Has any one found solution to this. When i convert the nested JSON to parquet. I dont see the projection working correctly. It still reads all the nested structure columns. Parquet does support nested column projection. Does Spark 2 SQL provide the column

Re: some Ideas on expressing Spark SQL using JSON

2017-07-26 Thread Sathish Kumaran Vairavelu
Agreed. For the same reason dataframes / dataset which is another DSL used in Spark On Wed, Jul 26, 2017 at 1:00 AM Georg Heiler wrote: > Because sparks dsl partially supports compile time type safety. E.g. the > compiler will notify you that a sql function was misspelled when using the

<    3   4   5   6   7   8   9   10   11   12   >