Spark SQL (Pyspark) - Parallel processing of multiple datasets

2017-04-16 Thread Amol Patil
Hi All, I'm writing generic pyspark program to process multiple datasets using Spark SQL. For example Traffic Data, Crime Data, Weather Data. Dataset will be in csv format & size may vary from *1 GB* to *10 GB*. Each dataset will be available at different timeframe (weekly,monthly,quarte

Hive Context and SQL Context interoperability

2017-04-13 Thread Deepak Sharma
Hi All, I have registered temp tables using hive context and sql context both. Now when i try to join these 2 temp tables , 1 of the tables complain about not being found. Is there any setting or option so the tables in these 2 different contexts are visible to each other? -- Thanks Deepak

Re: [Spark-SQL] : Incremental load in Pyspark

2017-04-11 Thread Matt Deaver
It's pretty simple, really: you would run your processing job as much as you want during the week then when loading into the base table do a window function based on the primary key(s) and order by the updated time column, then delete the existing rows with those pks and load that data. On Tue, Ap

Re: [Spark-SQL] : Incremental load in Pyspark

2017-04-11 Thread Vamsi Makkena
Hi Matt, Thanks for your reply. I will get updates regularly but I want to load the updated data once in a week. Staging table may solve this issue, but I'm looking for how row updated time should include in the query. Thanks On Tue, Apr 11, 2017 at 2:59 PM Matt Deaver wrote: > Do you have up

Re: [Spark-SQL] : Incremental load in Pyspark

2017-04-11 Thread Matt Deaver
Do you have updates coming in on your data flow? If so, you will need a staging table and a merge process into your Teradata tables. If you do not have updated rows aka your Teradata tables are append-only you can process data and insert (bulk load) into Teradata. I don't have experience doing th

[Spark-SQL] : Incremental load in Pyspark

2017-04-11 Thread Vamsi Makkena
I am reading the data from Oracle tables and Flat files (new excel file every week) and write it to Teradata weekly using Pyspark. In the initial run it will load the all the data to Teradata. But in the later runs I just want to read the new records from Oracle and Flatfiles and want to append it

Spark (SQL / Structured Streaming) Cassandra - PreparedStatement

2017-04-11 Thread Bastien DINE
Hi everyone, I'm using Spark Structured Streaming for Machine Learning purpose in real time, and I want to stored predictions in my Cassandra cluster. Since I am in a streaming context, executing multiple times per seconds the same request, one mandatory optimization is to use PreparedStatement

Re: Apache Drill vs Spark SQL

2017-04-07 Thread Pierce Lamb
eds. I'm not sure if the above link also answers your second question, but there are two graph databases listed that connect to Spark as well. Hope this helps, Pierce On Thu, Apr 6, 2017 at 10:34 PM, kant kodali wrote: > Hi All, > > I am very impressed with the work done on Spar

Apache Drill vs Spark SQL

2017-04-06 Thread kant kodali
Hi All, I am very impressed with the work done on Spark SQL however when I have to pick something to serve real time queries I am in a dilemma for the following reasons. 1. Even though Spark Sql has logical plans, physical plans and run time code generation and all that it still doesn't

Re: map transform on array in spark sql

2017-04-04 Thread Michael Armbrust
If you can find the name of the struct field from the schema you can just do: df.select($"arrayField.a") Selecting a field from an array returns an array with that field selected from each element. On Mon, Apr 3, 2017 at 8:18 PM, Koert Kuipers wrote: > i have a DataFrame where one column has t

map transform on array in spark sql

2017-04-03 Thread Koert Kuipers
i have a DataFrame where one column has type: ArrayType(StructType(Seq( StructField("a", typeA, nullableA), StructField("b", typeB, nullableB) ))) i would like to map over this array to pick the first element in the struct. so the result should be a ArrayType(typeA, nullableA). i realize i ca

Re: Spark SQL 2.1 Complex SQL - Query Planning Issue

2017-04-02 Thread Sathish Kumaran Vairavelu
Kumaran Vairavelu < > vsathishkuma...@gmail.com> wrote: > > Hi Everyone, > > I have complex SQL with approx 2000 lines of code and works with 50+ > tables with 50+ left joins and transformations. All the tables are fully > cached in Memory with sufficient storage memory and w

Re: Spark SQL 2.1 Complex SQL - Query Planning Issue

2017-03-30 Thread Sathish Kumaran Vairavelu
xecution plans in memory to avoid > such scenarios > > On Fri, Mar 31, 2017 at 1:25 PM, Sathish Kumaran Vairavelu < > vsathishkuma...@gmail.com> wrote: > > Hi Everyone, > > I have complex SQL with approx 2000 lines of code and works with 50+ > tables with 50+ left joi

Re: Spark SQL 2.1 Complex SQL - Query Planning Issue

2017-03-30 Thread Sathish Kumaran Vairavelu
h scenarios > > On Fri, Mar 31, 2017 at 1:25 PM, Sathish Kumaran Vairavelu < > vsathishkuma...@gmail.com> wrote: > > Hi Everyone, > > I have complex SQL with approx 2000 lines of code and works with 50+ > tables with 50+ left joins and transformations. All the tables are

Re: Spark SQL 2.1 Complex SQL - Query Planning Issue

2017-03-30 Thread ayan guha
I think there is an option of pinning execution plans in memory to avoid such scenarios On Fri, Mar 31, 2017 at 1:25 PM, Sathish Kumaran Vairavelu < vsathishkuma...@gmail.com> wrote: > Hi Everyone, > > I have complex SQL with approx 2000 lines of code and works with 50+ &

Spark SQL 2.1 Complex SQL - Query Planning Issue

2017-03-30 Thread Sathish Kumaran Vairavelu
Hi Everyone, I have complex SQL with approx 2000 lines of code and works with 50+ tables with 50+ left joins and transformations. All the tables are fully cached in Memory with sufficient storage memory and working memory. The issue is after the launch of the query for the execution; the query

Re: Spark SQL, dataframe join questions.

2017-03-29 Thread vaquar khan
ause another shuffle. >> So I am not sure if it is a smart way. >> >> Yong >> >> -- >> *From:* shyla deshpande >> *Sent:* Wednesday, March 29, 2017 12:33 PM >> *To:* user >> *Subject:* Re: Spark SQL, dataframe join questions. >> >&g

Re: Spark SQL, dataframe join questions.

2017-03-29 Thread Vidya Sujeet
-- > *From:* shyla deshpande > *Sent:* Wednesday, March 29, 2017 12:33 PM > *To:* user > *Subject:* Re: Spark SQL, dataframe join questions. > > > > On Tue, Mar 28, 2017 at 2:57 PM, shyla deshpande > wrote: > >> Following are my questions. Thank you

Re: Spark SQL, dataframe join questions.

2017-03-29 Thread Yong Zhang
owing join COULD cause another shuffle. So I am not sure if it is a smart way. Yong From: shyla deshpande Sent: Wednesday, March 29, 2017 12:33 PM To: user Subject: Re: Spark SQL, dataframe join questions. On Tue, Mar 28, 2017 at 2:57 PM, shyla desh

Re: Spark SQL, dataframe join questions.

2017-03-29 Thread shyla deshpande
On Tue, Mar 28, 2017 at 2:57 PM, shyla deshpande wrote: > Following are my questions. Thank you. > > 1. When joining dataframes is it a good idea to repartition on the key column > that is used in the join or > the optimizer is too smart so forget it. > > 2. In RDD join, wherever possible we do

Re: Groupby in fast in Impala than spark sql - any suggestions

2017-03-28 Thread Ryan
GB of data >> Table2: 96 GB of data >> >> Same query in Impala is taking around 20 miniutes and it took almost 3 >> hours to run in spark sql. >> >> I have added repartition to dataframe, persist as memory and disk still >> response is very bad. any sugge

Re: Groupby in fast in Impala than spark sql - any suggestions

2017-03-28 Thread Ryan
ng on requirement where i need to join two tables and do group > by to get max value on some fileds. > > Table1: 10 GB of data > Table2: 96 GB of data > > Same query in Impala is taking around 20 miniutes and it took almost 3 > hours to run in spark sql. > > I have added

Groupby in fast in Impala than spark sql - any suggestions

2017-03-28 Thread KhajaAsmath Mohammed
Hi, I am working on requirement where i need to join two tables and do group by to get max value on some fileds. Table1: 10 GB of data Table2: 96 GB of data Same query in Impala is taking around 20 miniutes and it took almost 3 hours to run in spark sql. I have added repartition to dataframe

Re: [Spark SQL & Core]: RDD to Dataset 1500 columns data with createDataFrame() throw exception of grows beyond 64 KB

2017-03-19 Thread Eyal Zituny
the > number of the constant pool issue, it has not been merged yet. > > Regards, > Kazuaki Ishizaki > > > > From:elevy > To:user@spark.apache.org > Date:2017/03/18 17:14 > Subject:[Spark SQL & Core]: RDD to Dataset 1500 columns

Re: [Spark SQL & Core]: RDD to Dataset 1500 columns data with createDataFrame() throw exception of grows beyond 64 KB

2017-03-18 Thread Kazuaki Ishizaki
From: elevy To: user@spark.apache.org Date: 2017/03/18 17:14 Subject:[Spark SQL & Core]: RDD to Dataset 1500 columns data with createDataFrame() throw exception of grows beyond 64 KB Hello all, I am using the Spark 2.1.0 release, I am trying to load BigTable CSV file

[Spark SQL & Core]: RDD to Dataset 1500 columns data with createDataFrame() throw exception of grows beyond 64 KB

2017-03-18 Thread elevy
ing : org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:893) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:950)

Spark SQL Skip and Log bad records

2017-03-15 Thread Aviral Agarwal
Hi guys, Is there a way to skip some bad records and log them when using DataFrame API ? Thanks and Regards, Aviral Agarwal

[Spark Streaming][Spark SQL] Design suggestions needed for sessionization

2017-03-10 Thread Ramkumar Venkataraman
people do sessionization in spark 1.6 would also help (couldn't find anything that helped me) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Spark-SQL-Design-suggestions-needed-for-sessionization-tp28480.html Sent from the Apache Spark User Li

Re: spark-sql use case beginner question

2017-03-09 Thread Subhash Sriram
We have a similar use case. We use the DataFrame API to cache data out of Hive tables, and then run pretty complex scripts on them. You can register your Hive UDFs to be used within Spark SQL statements if you want. Something like this: sqlContext.sql("CREATE TEMPORARY FUNCTION as '&

Re: spark-sql use case beginner question

2017-03-08 Thread nancy henry
okay what is difference between keep set hive.execution.engine =spark and running the script through hivecontext.sql Show quoted text On Mar 9, 2017 8:52 AM, "ayan guha" wrote: > Hi > > Subject to your version of Hive & Spark, you may want to set > hive.execution.engine=spark as beeline comman

Re: spark-sql use case beginner question

2017-03-08 Thread ayan guha
Hi Subject to your version of Hive & Spark, you may want to set hive.execution.engine=spark as beeline command line parameter, assuming you are running hive scripts using beeline command line (which is suggested practice for security purposes). On Thu, Mar 9, 2017 at 2:09 PM, nancy henry wrote

spark-sql use case beginner question

2017-03-08 Thread nancy henry
Hi Team, basically we have all data as hive tables ..and processing it till now in hive on MR.. now that we have hivecontext which can run hivequeries on spark, we are making all these complex hive scripts to run using hivecontext.sql(sc.textfile(hivescript)) kind of approach ie basically running

Re: spark-sql use case beginner question

2017-03-08 Thread nancy henry
Hi Team, basically we have all data as hive tables ..and processing it till now in hive on MR.. now that we have hivecontext which can run hivequeries on spark, we are making all these complex hive scripts to run using hivecontext.sql(sc.textfile(hivescript)) kind of approach ie basically running

回复:Spark SQL table authority control?

2017-02-26 Thread yuyong . zhai
https://issues.apache.org/jira/browse/SPARK-8321 翟玉勇 数据架构 ELEME Inc. Email: yuyong.z...@ele.me<mailto:zhen@ele.me> | Mobile:15221559674 http://ele.me<http://ele.me/> 饿了么 原始邮件 发件人: 李斌松 收件人: user 发送时间: 2017年2月26日(周日) 11:50 主题: Spark SQL table authority control? Through the JDB

Re: Disable Spark SQL Optimizations for unit tests

2017-02-26 Thread Stefan Ackermann
if (castToInts.contains(c)) { dfIn(c).cast(IntegerType) } else { dfIn(c) } } dfIn.select(columns: _*) } As I consequently applied this to other similar functions the unit tests went down from 60 to 18 minutes. Another way to break SQL optimizations was to just s

Spark SQL table authority control?

2017-02-25 Thread 李斌松
Through the JDBC connection spark thriftserver, execte hive SQL, check whether the table read or write permission to expand hook in hive on spark, you can control permissions, spark on hive what is the point of expansion?

Re: Spark SQL : Join operation failure

2017-02-23 Thread neil90
OUTER JOIN y ON field1=field2") joined_df.persist(StorageLevel.MEMORY_AND_DISK_ONLY) joined_df.write.save("/user/data/output") -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Join-operation-failure-tp28414p28422.html Sent from the A

Re: Spark SQL : Join operation failure

2017-02-22 Thread Yong Zhang
From: jatinpreet Sent: Wednesday, February 22, 2017 1:11 AM To: user@spark.apache.org Subject: Spark SQL : Join operation failure Hi, I am having a hard time running outer join operation on two parquet datasets. The dataset size is large ~500GB with a lot of culumns in tune of

Spark SQL : Join operation failure

2017-02-21 Thread jatinpreet
(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ) I would appreciate if someone can help me out on this. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Join-operation-failure-tp28414.html Sent from the Apache Spark User

Re: Efficient Spark-Sql queries when only nth Column changes

2017-02-19 Thread Patrick
t;).cache > > df_base.registerTempTable("df_base") > > val df1 = sqlContext.sql("select col1, col2, count(*) from df_base group > by col1, col2") > > val df2 = // similar logic > > Yong > -- > *From:* Patrick > *Sent:*

Re: Serialization error - sql UDF related

2017-02-18 Thread Yong Zhang
d Boolean, which are serializable by default. So you can change the definition to function, instead of method, which should work. Yong From: Darshan Pandya Sent: Friday, February 17, 2017 10:36 PM To: user Subject: Serialization error - sql UDF related Hello,

Re: Efficient Spark-Sql queries when only nth Column changes

2017-02-18 Thread Yong Zhang
_ From: Patrick Sent: Saturday, February 18, 2017 4:23 PM To: user Subject: Efficient Spark-Sql queries when only nth Column changes Hi, I have read 5 columns from parquet into data frame. My queries on the parquet table is of below type: val df1 = sqlContext.sql(select col1,co

Re: Efficient Spark-Sql queries when only nth Column changes

2017-02-18 Thread ayan guha
e to union the results from df1 to df4 into a single df. > > > So basically, only the second column is changing, Is there any efficient > way to write the above queries in Spark-Sql instead of writing 4 different > queries(OR in loop) and doing union to get the result. > > > Thanks > > > > > > -- Best Regards, Ayan Guha

Efficient Spark-Sql queries when only nth Column changes

2017-02-18 Thread Patrick
the above queries in Spark-Sql instead of writing 4 different queries(OR in loop) and doing union to get the result. Thanks

Re: Query data in subdirectories in Hive Partitions using Spark SQL

2017-02-18 Thread Jon Gregg
Spark has partition discovery if your data is laid out in a parquet-friendly directory structure: http://spark.apache.org/docs/latest/sql-programming-guide.html#partition-discovery You can also use wildcards to get subdirectories (I'm using spark 1.6 here) >> data2 = sqlContext.rea

Re: Serialization error - sql UDF related

2017-02-17 Thread vaquar khan
Hi Darshan , When you get org.apache.spark.SparkException: Task not serializable exception, it means that you are using a reference to an instance of a non-serialize class inside a transformation. Hope following link will help. https://databricks.gitbooks.io/databricks-spark-knowledge-base/cont

Serialization error - sql UDF related

2017-02-17 Thread Darshan Pandya
Hello, I am getting the famous serialization exception on running some code as below, val correctColNameUDF = udf(getNewColumnName(_: String, false: Boolean): String); val charReference: DataFrame = thinLong.select("char_name_id", "char_name").withColumn("columnNameInDimTable", correctColNameUDF(

Re: Query data in subdirectories in Hive Partitions using Spark SQL

2017-02-17 Thread Yan Facai
Hi, Abdelfatah, How to you read these files? spark.read.parquet or spark.sql? Could you show some code? On Wed, Feb 15, 2017 at 8:47 PM, Ahmed Kamal Abdelfatah < ahmed.abdelfa...@careem.com> wrote: > Hi folks, > > > > How can I force spark sql to recursively get data sto

Query data in subdirectories in Hive Partitions using Spark SQL

2017-02-15 Thread Ahmed Kamal Abdelfatah
Hi folks, How can I force spark sql to recursively get data stored in parquet format from subdirectories ? In Hive, I could achieve this by setting few Hive configs. set hive.input.dir.recursive=true; set hive.mapred.supports.subdirectories=true; set hive.supports.subdirectories=true; set

Re: How to get a spark sql statement implement duration ?

2017-02-15 Thread ??????????
you can find the duration time by wen ui,such as http://xxx:8080 .It depends on your setting. anout the shell, i do not know how to check the time ---Original--- From: "Jacek Laskowski" Date: 2017/2/8 04:14:58 To: "Mars Xu"; Cc: "user"; Subject: Re: How to get a

Re: Dealing with missing columns in SPARK SQL in JSON

2017-02-14 Thread Sam Elamin
, Aseem Bansal wrote: > Sorry if I trivialized the example. It is the same kind of file and > sometimes it could have "a", sometimes "b", sometimes both. I just don't > know. That is what I meant by missing columns. > > It would be good if I read any of the

Re: Dealing with missing columns in SPARK SQL in JSON

2017-02-14 Thread Aseem Bansal
Sorry if I trivialized the example. It is the same kind of file and sometimes it could have "a", sometimes "b", sometimes both. I just don't know. That is what I meant by missing columns. It would be good if I read any of the JSON and if I do spark sql and it gave me

Re: Dealing with missing columns in SPARK SQL in JSON

2017-02-14 Thread Sam Elamin
I may be missing something super obvious here but can't you combine them into a single dataframe. Left join perhaps? Try writing it in sql " select a from json1 and b from josn2"then run explain to give you a hint to how to do it in code Regards Sam On Tue, 14 Feb 2017 at 14:3

Dealing with missing columns in SPARK SQL in JSON

2017-02-14 Thread Aseem Bansal
Say I have two files containing single rows json1.json {"a": 1} json2.json {"b": 2} I read in this json file using spark's API into a dataframe one at a time. So I have Dataset json1DF and Dataset json2DF If I run "select a, b from __THIS__" in a SQLTransformer then I will get an exception a

Disable Spark SQL Optimizations for unit tests

2017-02-11 Thread Stefan Ackermann
Hi, Can the Spark SQL Optimizations be disabled somehow? In our project we started 4 weeks ago to write scala / spark / dataframe code. We currently have only around 10% of the planned project scope, and we are already waiting 10 (Spark 2.1.0, everything cached) to 30 (Spark 1.6, nothing cached

SQL warehouse dir

2017-02-10 Thread Joseph Naegele
Hi all, I've read the docs for Spark SQL 2.1.0 but I'm still having issues with the warehouse and related details. I'm not using Hive proper, so my hive-site.xml consists only of: javax.jdo.option.ConnectionURL jdbc:derby:;databaseName=/mnt/data/spark/metastore_db;create

Re: [Spark-SQL] Hive support is required to select over the following tables

2017-02-08 Thread Egor Pahomov
Just guessing here, but have you build your spark with "-Phive"? By the way, which version of Zeppelin? 2017-02-08 5:13 GMT-08:00 Daniel Haviv : > Hi, > I'm using Spark 2.1.0 on Zeppelin. > > I can successfully create a table but when I try to select from it I fail: > spark.sql("create table foo

[Spark 2.1.0] Spark SQL return correct count, but NULL on all fields

2017-02-08 Thread Babak Alipour
Hi everyone, I'm using Spark with HiveSupport enabled, the data is stored in parquet format in a fixed location. I just downloaded Spark 2.1.0 and it broke Spark-SQL queries. I can do count(*) and it returns the correct count, but all columns show as "NULL". It worked fine on 1.

Re: Un-exploding / denormalizing Spark SQL help

2017-02-08 Thread Everett Anderson
On Wed, Feb 8, 2017 at 1:14 PM, ayan guha wrote: > Will a sql solution will be acceptable? > I'm very curious to see how it'd be done in raw SQL if you're up for it! I think the 2 programmatic solutions so far are viable, though, too. (By the way, thanks everyone for

Re: Un-exploding / denormalizing Spark SQL help

2017-02-08 Thread ayan guha
Will a sql solution will be acceptable? On Thu, 9 Feb 2017 at 4:01 am, Xiaomeng Wan wrote: > You could also try pivot. > > On 7 February 2017 at 16:13, Everett Anderson > wrote: > > > > On Tue, Feb 7, 2017 at 2:21 PM, Michael Armbrust > wrote: > > I think t

Re: Un-exploding / denormalizing Spark SQL help

2017-02-08 Thread Xiaomeng Wan
You could also try pivot. On 7 February 2017 at 16:13, Everett Anderson wrote: > > > On Tue, Feb 7, 2017 at 2:21 PM, Michael Armbrust > wrote: > >> I think the fastest way is likely to use a combination of conditionals >> (when / otherwise), first (ignoring nulls), while grouping by the id. >>

[Spark-SQL] Hive support is required to select over the following tables

2017-02-08 Thread Daniel Haviv
Hi, I'm using Spark 2.1.0 on Zeppelin. I can successfully create a table but when I try to select from it I fail: spark.sql("create table foo (name string)") res0: org.apache.spark.sql.DataFrame = [] spark.sql("select * from foo") org.apache.spark.sql.AnalysisException: Hive support is required

Re: Un-exploding / denormalizing Spark SQL help

2017-02-07 Thread Everett Anderson
On Tue, Feb 7, 2017 at 2:21 PM, Michael Armbrust wrote: > I think the fastest way is likely to use a combination of conditionals > (when / otherwise), first (ignoring nulls), while grouping by the id. > This should get the answer with only a single shuffle. > > Here is an example >

Re: Un-exploding / denormalizing Spark SQL help

2017-02-07 Thread Michael Armbrust
I think the fastest way is likely to use a combination of conditionals (when / otherwise), first (ignoring nulls), while grouping by the id. This should get the answer with only a single shuffle. Here is an example

Re: Un-exploding / denormalizing Spark SQL help

2017-02-07 Thread Jacek Laskowski
Hi Everett, That's pretty much what I'd do. Can't think of a way to beat your solution. Why do you "feel vaguely uneasy about it"? I'd also check out the execution plan (with explain) to see how it's gonna work at runtime. I may have seen groupBy + join be better than window (there were more exch

Re: Un-exploding / denormalizing Spark SQL help

2017-02-07 Thread Everett Anderson
On Tue, Feb 7, 2017 at 12:50 PM, Jacek Laskowski wrote: > Hi, > > Could groupBy and withColumn or UDAF work perhaps? I think window could > help here too. > This seems to work, but I do feel vaguely uneasy about it. :) // First add a 'rank' column which is priority order just in case priorities

Re: Un-exploding / denormalizing Spark SQL help

2017-02-07 Thread Jacek Laskowski
Hi, Could groupBy and withColumn or UDAF work perhaps? I think window could help here too. Jacek On 7 Feb 2017 8:02 p.m., "Everett Anderson" wrote: > Hi, > > I'm trying to un-explode or denormalize a table like > > +---++-+--++ > |id |name|extra|data |priority| > +---+

Re: How to get a spark sql statement implement duration ?

2017-02-07 Thread Jacek Laskowski
On 7 Feb 2017 4:17 a.m., "Mars Xu" wrote: Hello All, Some spark sqls will produce one or more jobs, I have 2 questions, 1, How the cc.sql(“sql statement”) divided into one or more jobs ? It's an implementation detail. You can have zero or more jobs for a si

Un-exploding / denormalizing Spark SQL help

2017-02-07 Thread Everett Anderson
Hi, I'm trying to un-explode or denormalize a table like +---++-+--++ |id |name|extra|data |priority| +---++-+--++ |1 |Fred|8|value1|1 | |1 |Fred|8|value8|2 | |1 |Fred|8|value5|3 | |2 |Amy |9|value3|1 | |2 |Amy

How to get a spark sql statement implement duration ?

2017-02-06 Thread Mars Xu
Hello All, Some spark sqls will produce one or more jobs, I have 2 questions, 1, How the cc.sql(“sql statement”) divided into one or more jobs ? 2, When I execute spark sql query in spark - shell client, how to get the execution time (Spark 2.1.0) ? if a sql query

Re: Cannot read Hive Views in Spark SQL

2017-02-06 Thread KhajaAsmath Mohammed
; >> On Feb 6, 2017 11:21 AM, "KhajaAsmath Mohammed" >> wrote: >> >>> I dont think so, i was able to insert overwrite other created tables in >>> hive using spark sql. The only problem I am facing is, spark is not able >>> to recognize hive view n

Re: Cannot read Hive Views in Spark SQL

2017-02-06 Thread Xiao Li
te other created tables in >> hive using spark sql. The only problem I am facing is, spark is not able >> to recognize hive view name. Very strange but not sure where I am doing >> wrong in this. >> >> On Mon, Feb 6, 2017 at 11:03 AM, Jon Gregg wrote: >> >>>

Re: Cannot read Hive Views in Spark SQL

2017-02-06 Thread KhajaAsmath Mohammed
n, Feb 6, 2017 at 11:25 AM, vaquar khan wrote: > Did you try MSCK REPAIR TABLE ? > > Regards, > Vaquar Khan > > On Feb 6, 2017 11:21 AM, "KhajaAsmath Mohammed" > wrote: > >> I dont think so, i was able to insert overwrite other created tables in >> h

Re: Cannot read Hive Views in Spark SQL

2017-02-06 Thread vaquar khan
Did you try MSCK REPAIR TABLE ? Regards, Vaquar Khan On Feb 6, 2017 11:21 AM, "KhajaAsmath Mohammed" wrote: > I dont think so, i was able to insert overwrite other created tables in > hive using spark sql. The only problem I am facing is, spark is not able > to recog

Re: Cannot read Hive Views in Spark SQL

2017-02-06 Thread KhajaAsmath Mohammed
I dont think so, i was able to insert overwrite other created tables in hive using spark sql. The only problem I am facing is, spark is not able to recognize hive view name. Very strange but not sure where I am doing wrong in this. On Mon, Feb 6, 2017 at 11:03 AM, Jon Gregg wrote: > Confirm

Re: Cannot read Hive Views in Spark SQL

2017-02-06 Thread Jon Gregg
asm...@gmail.com> wrote: > Hi Khan, > > It didn't work in my case. used below code. View is already present in > Hive but I cant read that in spark sql. Throwing exception that table not > found > > sqlCtx.refreshTable("schema.hive_view") > > > Thanks, >

Re: Cannot read Hive Views in Spark SQL

2017-02-05 Thread KhajaAsmath Mohammed
Hi Khan, It didn't work in my case. used below code. View is already present in Hive but I cant read that in spark sql. Throwing exception that table not found sqlCtx.refreshTable("schema.hive_view") Thanks, Asmath On Sun, Feb 5, 2017 at 7:56 PM, vaquar khan wrote: > H

Re: Cannot read Hive Views in Spark SQL

2017-02-05 Thread vaquar khan
Hi Ashmath, Try refresh table // spark is an existing SparkSession spark.catalog.refreshTable("my_table") http://spark.apache.org/docs/latest/sql-programming-guide.html#metadata-refreshing Regards, Vaquar khan On Sun, Feb 5, 2017 at 7:19 PM, KhajaAsmath Mohammed &l

Cannot read Hive Views in Spark SQL

2017-02-05 Thread KhajaAsmath Mohammed
Hi, I have a hive view which is basically set of select statements on some tables. I want to read the hive view and use hive builtin functions available in spark sql. I am not able to read that hive view in spark sql but can retreive data in hive shell. can't spark access hive views? T

Re: Is it okay to run Hive Java UDFS in Spark-sql. Anybody's still doing it?

2017-02-02 Thread Jörn Franke
27;s to run on spark-sql it will > make performance difference??? IS anybody here actually doing it.. > converting Hive UDF's to run on Spark-sql.. > > What would be your approach if asked to make Hive Java UDFS project run on > spark-sql > > Would yu run the sa

Is it okay to run Hive Java UDFS in Spark-sql. Anybody's still doing it?

2017-02-02 Thread Alex
Hi Team, Do you really think if we make Hive Java UDF's to run on spark-sql it will make performance difference??? IS anybody here actually doing it.. converting Hive UDF's to run on Spark-sql.. What would be your approach if asked to make Hive Java UDFS project run on spark-sql Wo

Re: Hive Java UDF running on spark-sql issue

2017-02-01 Thread Alex
ther type depending on what is the type of > the original value? > Kr > > > > On 1 Feb 2017 5:56 am, "Alex" wrote: > > Hi , > > > we have Java Hive UDFS which are working perfectly fine in Hive > > SO for Better performance we are migrating the sam

Re: Hive Java UDF running on spark-sql issue

2017-02-01 Thread Marco Mistroni
for Better performance we are migrating the same To Spark-sql SO these jar files we are giving --jars argument to spark-sql and defining temporary functions to make it to run on spark-sql there is this particular Java UDF which is working fine on hive But when ran on spark-sql it is giving the err

Hive Java UDF running on spark-sql issue

2017-01-31 Thread Alex
Hi , we have Java Hive UDFS which are working perfectly fine in Hive SO for Better performance we are migrating the same To Spark-sql SO these jar files we are giving --jars argument to spark-sql and defining temporary functions to make it to run on spark-sql there is this particular Java UDF

Re: does both below code do the same thing? I had to refactor code to fit in spark-sql

2017-01-31 Thread Alex
Guys! Please Reply On Tue, Jan 31, 2017 at 12:31 PM, Alex wrote: > public Object get(Object name) { > int pos = getPos((String) name); > if (pos < 0) > return null; > String f = "string"; > Object obj

alternatives for long to longwritable typecasting in spark sql

2017-01-30 Thread Alex
Hi Guys Please let me know if any other ways to typecast as below is throwing error unable to typecast java.lang Long to Longwritable and same for Double for Text also in spark -sql Below piece of code is from hive udf which i am trying to run in spark-sql public Object get(Object name

does both below code do the same thing? I had to refactor code to fit in spark-sql

2017-01-30 Thread Alex
public Object get(Object name) { int pos = getPos((String) name); if (pos < 0) return null; String f = "string"; Object obj = list.get(pos); Object result = null; if (obj == null)

Re: Tableau BI on Spark SQL

2017-01-30 Thread Todd Nist
rnal in-memory >> representation outside of Spark (can also exist on disk if memory is too >> small) and then use it within Tableau. Accessing directly the database is >> not so efficient. >> Additionally use always the newest version of tableau.. >> >>

Re: Tableau BI on Spark SQL

2017-01-30 Thread Jörn Franke
esentation outside of Spark (can also exist on disk if memory is too >> small) and then use it within Tableau. Accessing directly the database is >> not so efficient. >> Additionally use always the newest version of tableau.. >> >>> On 30 Jan 2017, at 21:57, M

Re: Tableau BI on Spark SQL

2017-01-30 Thread Mich Talebzadeh
30 Jan 2017, at 21:57, Mich Talebzadeh > wrote: > > Hi, > > Has anyone tried using Tableau on Spark SQL? > > Specifically how does Tableau handle in-memory capabilities of Spark. > > As I understand Tableau uses its own propriety SQL against say Oracle. > That is wel

Re: Tableau BI on Spark SQL

2017-01-30 Thread Jörn Franke
so efficient. Additionally use always the newest version of tableau.. > On 30 Jan 2017, at 21:57, Mich Talebzadeh wrote: > > Hi, > > Has anyone tried using Tableau on Spark SQL? > > Specifically how does Tableau handle in-memory capabilities of Spark. > > As I unde

Tableau BI on Spark SQL

2017-01-30 Thread Mich Talebzadeh
Hi, Has anyone tried using Tableau on Spark SQL? Specifically how does Tableau handle in-memory capabilities of Spark. As I understand Tableau uses its own propriety SQL against say Oracle. That is well established. So for each product Tableau will try to use its own version of SQL against that

Re: help!!!----issue with spark-sql type cast form long to longwritable

2017-01-30 Thread Alex
Hi All, If I modify the code to below The hive UDF is working in spark-sql but it is giving different results..Please let me know difference between these two below codes.. 1) public Object get(Object name) { int pos = getPos((String)name); if(pos<0) return n

Re: help!!!----issue with spark-sql type cast form long to longwritable

2017-01-30 Thread Alex
bj).get(); > case "string" : return ((Text)obj).toString(); > default : return obj; > } > } > > Still its throws error saying Java.Lang.Long cant be convrted > to org.apache.hadoop.hive.serde2.io.DoubleWritable > > > > its working fin

Re: help!!!----issue with spark-sql type cast form long to longwritable

2017-01-30 Thread Alex
t; Hi, > > Could you show us the whole code to reproduce that? > > // maropu > > On Wed, Jan 25, 2017 at 12:02 AM, Deepak Sharma > wrote: > >> Can you try writing the UDF directly in spark and register it with spark >> sql or hive context ? >> Or do you want

Complex types handling with spark SQL and parquet

2017-01-28 Thread Antoine HOM
Hello everybody, I have been trying to use complex types (stored in parquet) with spark SQL and ended up having an issue that I can't seem to be able to solve cleanly. I was hoping, through this mail, to get some insights from the community, maybe I'm just missing something obvious in t

Re: Oracle JDBC - Spark SQL - Key Not Found: Scale

2017-01-26 Thread ayan guha
Hi I will do a little more testing and will let you know. It did not work with INT and Number types, for sure. While writing, everything is fine :) On Fri, Jan 27, 2017 at 1:04 PM, Takeshi Yamamuro wrote: > How about this? > https://github.com/apache/spark/blob/master/sql/core/ >

Re: Oracle JDBC - Spark SQL - Key Not Found: Scale

2017-01-26 Thread Takeshi Yamamuro
How about this? https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala#L729 Or, how about using Double or something instead of Numeric? // maropu On Fri, Jan 27, 2017 at 10:25 AM, ayan guha wrote: > Okay, it is working with varchar colu

Re: Oracle JDBC - Spark SQL - Key Not Found: Scale

2017-01-26 Thread ayan guha
(url=url,table=table,properties={"user" > :user,"password":password,"driver":driver}) > > > Still the issue persists. > > On Fri, Jan 27, 2017 at 11:19 AM, Takeshi Yamamuro > wrote: > >> Hi, >> >> I think you got this error

Re: Oracle JDBC - Spark SQL - Key Not Found: Scale

2017-01-26 Thread ayan guha
> I think you got this error because you used `NUMERIC` types in your schema > (https://github.com/apache/spark/blob/master/sql/core/ > src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala#L32). So, > IIUC avoiding the type is a workaround. > > // maropu > > > On Fri,

<    5   6   7   8   9   10   11   12   13   14   >