Re: How to get a spark sql statement implement duration ?

2017-02-07 Thread Jacek Laskowski
ngle structured query (query DSL or SQL). 2, When I execute spark sql query in spark - shell client, how to get the execution time (Spark 2.1.0) ? if a sql query produced 3 jobs, In my opinion, the execution time is to sum up the 3 jobs’ duration time. Yes. What's the question then? Jacek

Un-exploding / denormalizing Spark SQL help

2017-02-07 Thread Everett Anderson
Hi, I'm trying to un-explode or denormalize a table like +---++-+--++ |id |name|extra|data |priority| +---++-+--++ |1 |Fred|8|value1|1 | |1 |Fred|8|value8|2 | |1 |Fred|8|value5|3 | |2 |Amy |9|value3|1 | |2 |Amy

How to get a spark sql statement implement duration ?

2017-02-06 Thread Mars Xu
Hello All, Some spark sqls will produce one or more jobs, I have 2 questions, 1, How the cc.sql(“sql statement”) divided into one or more jobs ? 2, When I execute spark sql query in spark - shell client, how to get the execution time (Spark 2.1.0) ? if a sql query

Re: Cannot read Hive Views in Spark SQL

2017-02-06 Thread KhajaAsmath Mohammed
; >> On Feb 6, 2017 11:21 AM, "KhajaAsmath Mohammed" >> wrote: >> >>> I dont think so, i was able to insert overwrite other created tables in >>> hive using spark sql. The only problem I am facing is, spark is not able >>> to recognize hive view n

Re: Cannot read Hive Views in Spark SQL

2017-02-06 Thread Xiao Li
te other created tables in >> hive using spark sql. The only problem I am facing is, spark is not able >> to recognize hive view name. Very strange but not sure where I am doing >> wrong in this. >> >> On Mon, Feb 6, 2017 at 11:03 AM, Jon Gregg wrote: >> >>>

Re: Cannot read Hive Views in Spark SQL

2017-02-06 Thread KhajaAsmath Mohammed
n, Feb 6, 2017 at 11:25 AM, vaquar khan wrote: > Did you try MSCK REPAIR TABLE ? > > Regards, > Vaquar Khan > > On Feb 6, 2017 11:21 AM, "KhajaAsmath Mohammed" > wrote: > >> I dont think so, i was able to insert overwrite other created tables in >> h

Re: Cannot read Hive Views in Spark SQL

2017-02-06 Thread vaquar khan
Did you try MSCK REPAIR TABLE ? Regards, Vaquar Khan On Feb 6, 2017 11:21 AM, "KhajaAsmath Mohammed" wrote: > I dont think so, i was able to insert overwrite other created tables in > hive using spark sql. The only problem I am facing is, spark is not able > to recog

Re: Cannot read Hive Views in Spark SQL

2017-02-06 Thread KhajaAsmath Mohammed
I dont think so, i was able to insert overwrite other created tables in hive using spark sql. The only problem I am facing is, spark is not able to recognize hive view name. Very strange but not sure where I am doing wrong in this. On Mon, Feb 6, 2017 at 11:03 AM, Jon Gregg wrote: > Confirm

Re: Cannot read Hive Views in Spark SQL

2017-02-06 Thread Jon Gregg
asm...@gmail.com> wrote: > Hi Khan, > > It didn't work in my case. used below code. View is already present in > Hive but I cant read that in spark sql. Throwing exception that table not > found > > sqlCtx.refreshTable("schema.hive_view") > > > Thanks, >

Re: Cannot read Hive Views in Spark SQL

2017-02-05 Thread KhajaAsmath Mohammed
Hi Khan, It didn't work in my case. used below code. View is already present in Hive but I cant read that in spark sql. Throwing exception that table not found sqlCtx.refreshTable("schema.hive_view") Thanks, Asmath On Sun, Feb 5, 2017 at 7:56 PM, vaquar khan wrote: > H

Re: Cannot read Hive Views in Spark SQL

2017-02-05 Thread vaquar khan
t; mdkhajaasm...@gmail.com> wrote: > Hi, > > I have a hive view which is basically set of select statements on some > tables. I want to read the hive view and use hive builtin functions > available in spark sql. > > I am not able to read that hive view in spark sql but can retreive

Cannot read Hive Views in Spark SQL

2017-02-05 Thread KhajaAsmath Mohammed
Hi, I have a hive view which is basically set of select statements on some tables. I want to read the hive view and use hive builtin functions available in spark sql. I am not able to read that hive view in spark sql but can retreive data in hive shell. can't spark access hive views? T

Re: Is it okay to run Hive Java UDFS in Spark-sql. Anybody's still doing it?

2017-02-02 Thread Jörn Franke
27;s to run on spark-sql it will > make performance difference??? IS anybody here actually doing it.. > converting Hive UDF's to run on Spark-sql.. > > What would be your approach if asked to make Hive Java UDFS project run on > spark-sql > > Would yu run the sa

Is it okay to run Hive Java UDFS in Spark-sql. Anybody's still doing it?

2017-02-02 Thread Alex
Hi Team, Do you really think if we make Hive Java UDF's to run on spark-sql it will make performance difference??? IS anybody here actually doing it.. converting Hive UDF's to run on Spark-sql.. What would be your approach if asked to make Hive Java UDFS project run on spark-sql Wo

Re: Hive Java UDF running on spark-sql issue

2017-02-01 Thread Alex
ther type depending on what is the type of > the original value? > Kr > > > > On 1 Feb 2017 5:56 am, "Alex" wrote: > > Hi , > > > we have Java Hive UDFS which are working perfectly fine in Hive > > SO for Better performance we are migrating the sam

Re: Hive Java UDF running on spark-sql issue

2017-02-01 Thread Marco Mistroni
for Better performance we are migrating the same To Spark-sql SO these jar files we are giving --jars argument to spark-sql and defining temporary functions to make it to run on spark-sql there is this particular Java UDF which is working fine on hive But when ran on spark-sql it is giving the err

Hive Java UDF running on spark-sql issue

2017-01-31 Thread Alex
Hi , we have Java Hive UDFS which are working perfectly fine in Hive SO for Better performance we are migrating the same To Spark-sql SO these jar files we are giving --jars argument to spark-sql and defining temporary functions to make it to run on spark-sql there is this particular Java UDF

Re: does both below code do the same thing? I had to refactor code to fit in spark-sql

2017-01-31 Thread Alex
Guys! Please Reply On Tue, Jan 31, 2017 at 12:31 PM, Alex wrote: > public Object get(Object name) { > int pos = getPos((String) name); > if (pos < 0) > return null; > String f = "string"; > Object obj

alternatives for long to longwritable typecasting in spark sql

2017-01-30 Thread Alex
Hi Guys Please let me know if any other ways to typecast as below is throwing error unable to typecast java.lang Long to Longwritable and same for Double for Text also in spark -sql Below piece of code is from hive udf which i am trying to run in spark-sql public Object get(Object name

does both below code do the same thing? I had to refactor code to fit in spark-sql

2017-01-30 Thread Alex
public Object get(Object name) { int pos = getPos((String) name); if (pos < 0) return null; String f = "string"; Object obj = list.get(pos); Object result = null; if (obj == null)

Re: Tableau BI on Spark SQL

2017-01-30 Thread Todd Nist
rnal in-memory >> representation outside of Spark (can also exist on disk if memory is too >> small) and then use it within Tableau. Accessing directly the database is >> not so efficient. >> Additionally use always the newest version of tableau.. >> >>

Re: Tableau BI on Spark SQL

2017-01-30 Thread Jörn Franke
esentation outside of Spark (can also exist on disk if memory is too >> small) and then use it within Tableau. Accessing directly the database is >> not so efficient. >> Additionally use always the newest version of tableau.. >> >>> On 30 Jan 2017, at 21:57, M

Re: Tableau BI on Spark SQL

2017-01-30 Thread Mich Talebzadeh
30 Jan 2017, at 21:57, Mich Talebzadeh > wrote: > > Hi, > > Has anyone tried using Tableau on Spark SQL? > > Specifically how does Tableau handle in-memory capabilities of Spark. > > As I understand Tableau uses its own propriety SQL against say Oracle. > That is wel

Re: Tableau BI on Spark SQL

2017-01-30 Thread Jörn Franke
so efficient. Additionally use always the newest version of tableau.. > On 30 Jan 2017, at 21:57, Mich Talebzadeh wrote: > > Hi, > > Has anyone tried using Tableau on Spark SQL? > > Specifically how does Tableau handle in-memory capabilities of Spark. > > As I unde

Tableau BI on Spark SQL

2017-01-30 Thread Mich Talebzadeh
Hi, Has anyone tried using Tableau on Spark SQL? Specifically how does Tableau handle in-memory capabilities of Spark. As I understand Tableau uses its own propriety SQL against say Oracle. That is well established. So for each product Tableau will try to use its own version of SQL against that

Re: help!!!----issue with spark-sql type cast form long to longwritable

2017-01-30 Thread Alex
Hi All, If I modify the code to below The hive UDF is working in spark-sql but it is giving different results..Please let me know difference between these two below codes.. 1) public Object get(Object name) { int pos = getPos((String)name); if(pos<0) return n

Re: help!!!----issue with spark-sql type cast form long to longwritable

2017-01-30 Thread Alex
bj).get(); > case "string" : return ((Text)obj).toString(); > default : return obj; > } > } > > Still its throws error saying Java.Lang.Long cant be convrted > to org.apache.hadoop.hive.serde2.io.DoubleWritable > > > > its working fin

Re: help!!!----issue with spark-sql type cast form long to longwritable

2017-01-30 Thread Alex
t; Hi, > > Could you show us the whole code to reproduce that? > > // maropu > > On Wed, Jan 25, 2017 at 12:02 AM, Deepak Sharma > wrote: > >> Can you try writing the UDF directly in spark and register it with spark >> sql or hive context ? >> Or do you want

Complex types handling with spark SQL and parquet

2017-01-28 Thread Antoine HOM
Hello everybody, I have been trying to use complex types (stored in parquet) with spark SQL and ended up having an issue that I can't seem to be able to solve cleanly. I was hoping, through this mail, to get some insights from the community, maybe I'm just missing something obvious in t

Re: Oracle JDBC - Spark SQL - Key Not Found: Scale

2017-01-26 Thread ayan guha
src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala#L729 > Or, how about using Double or something instead of Numeric? > > // maropu > > On Fri, Jan 27, 2017 at 10:25 AM, ayan guha wrote: > >> Okay, it is working with varchar columns only. Is there any way to >> work

Re: Oracle JDBC - Spark SQL - Key Not Found: Scale

2017-01-26 Thread Takeshi Yamamuro
How about this? https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala#L729 Or, how about using Double or something instead of Numeric? // maropu On Fri, Jan 27, 2017 at 10:25 AM, ayan guha wrote: > Okay, it is working with varchar colu

Re: Oracle JDBC - Spark SQL - Key Not Found: Scale

2017-01-26 Thread ayan guha
(url=url,table=table,properties={"user" > :user,"password":password,"driver":driver}) > > > Still the issue persists. > > On Fri, Jan 27, 2017 at 11:19 AM, Takeshi Yamamuro > wrote: > >> Hi, >> >> I think you got this error

Re: Oracle JDBC - Spark SQL - Key Not Found: Scale

2017-01-26 Thread ayan guha
> I think you got this error because you used `NUMERIC` types in your schema > (https://github.com/apache/spark/blob/master/sql/core/ > src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala#L32). So, > IIUC avoiding the type is a workaround. > > // maropu > > > On Fri,

Re: Oracle JDBC - Spark SQL - Key Not Found: Scale

2017-01-26 Thread Takeshi Yamamuro
Hi, I think you got this error because you used `NUMERIC` types in your schema ( https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala#L32). So, IIUC avoiding the type is a workaround. // maropu On Fri, Jan 27, 2017 at 8:18 AM, ayan

Oracle JDBC - Spark SQL - Key Not Found: Scale

2017-01-26 Thread ayan guha
Hi I am facing exact issue with Oracle/Exadataas mentioned here . Any idea? I could not figure out so sending to this grou hoping someone have see it (and solved it) Spark Version: 1.6 pyspark command: pyspark --driver-cla

Re: Spark SQL DataFrame to Kafka Topic

2017-01-24 Thread ayan guha
docs.databricks.com/_static/notebooks/structured-streaming-kafka.html > > http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html > > On Fri, Jan 13, 2017 at 3:32 AM, Senthil Kumar > wrote: > > Hi Team , > > Sorry if this question alr

Re: Spark SQL DataFrame to Kafka Topic

2017-01-24 Thread Koert Kuipers
ming-pro >>> gramming-guide.html >>> >>> On Fri, Jan 13, 2017 at 3:32 AM, Senthil Kumar >>> wrote: >>> >>>> Hi Team , >>>> >>>> Sorry if this question already asked in this forum.. >>>> >>>&

Re: help!!!----issue with spark-sql type cast form long to longwritable

2017-01-24 Thread Takeshi Yamamuro
Hi, Could you show us the whole code to reproduce that? // maropu On Wed, Jan 25, 2017 at 12:02 AM, Deepak Sharma wrote: > Can you try writing the UDF directly in spark and register it with spark > sql or hive context ? > Or do you want to reuse the existing UDF jar for hive

Re: help!!!----issue with spark-sql type cast form long to longwritable

2017-01-24 Thread Deepak Sharma
Can you try writing the UDF directly in spark and register it with spark sql or hive context ? Or do you want to reuse the existing UDF jar for hive in spark ? Thanks Deepak On Jan 24, 2017 5:29 PM, "Sirisha Cheruvu" wrote: > Hi Team, > > I am trying to keep below cod

help!!!----issue with spark-sql type cast form long to longwritable

2017-01-24 Thread Sirisha Cheruvu
op.hive.serde2.io.DoubleWritable its working fine on hive but throwing error on spark-sql I am importing the below packages. import java.util.*; import org.apache.hadoop.hive.serde2.objectinspector.*; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.ha

Re: Writing Spark SQL output in Local and HDFS path

2017-01-19 Thread smartzjp
write the output of the Spark SQL in Local and HDFS path using Scala code. Code :- scala> val result = sqlContext.sql("select empno , name from emp"); scala > result.show(); If I give the command result.show() then It will print the output in the console. I need to redirect the ou

Re: Writing Spark SQL output in Local and HDFS path

2017-01-19 Thread Ravi Prasad
out put a fold on > the HDFS, You can use “result.write.csv(foldPath)”. > > > > -- > > Hi, > Can anyone please let us know how to write the output of the Spark SQL > in > Local and HDFS path using Scala code. > > *Code :-* > > scala> val r

Re: Writing Spark SQL output in Local and HDFS path

2017-01-19 Thread smartzjp
Beacause the reduce number will be not one, so it will out put a fold on the HDFS, You can use “result.write.csv(foldPath)”. -- Hi, Can anyone please let us know how to write the output of the Spark SQL in Local and HDFS path using Scala code. Code :- scala> val res

Writing Spark SQL output in Local and HDFS path

2017-01-19 Thread Ravi Prasad
Hi, Can anyone please let us know how to write the output of the Spark SQL in Local and HDFS path using Scala code. *Code :-* scala> val result = sqlContext.sql("select empno , name from emp"); scala > result.show(); If I give the command result.show() then It will print t

Re: need a hive generic udf which also works on spark sql

2017-01-17 Thread Sirisha Cheruvu
> import org.apache.spark.sql.hive.HiveContext >> val hc = new org.apache.spark.sql.hive.HiveContext(sc) ; >> hc.sql("add jar /home/cloudera/Downloads/genudnvl2.jar"); >> hc.sql("create temporary function nexr_nvl2 as ' >> com.nexr.platform.hive.udf.GenericUDFNVL2'"); >> hc.sql(

Re: need a hive generic udf which also works on spark sql

2017-01-17 Thread Sirisha Cheruvu
exr.platform.hive.udf.GenericUDFNVL2'"); >> hc.sql("select nexr_nvl2(name,let,ret) from testtab5").show; >> System.exit(0); >> >> >> On Jan 17, 2017 2:01 PM, "Sirisha Cheruvu" wrote: >> >>> Hi >>> >>> Anybody has a test and tried generic udf with object inspector >>> implementaion which sucessfully ran on both hive and spark-sql >>> >>> please share me the git hub link or source code file >>> >>> Thanks in advance >>> Sirisha >>> >> > > > -- > Thanks > Deepak > www.bigdatabig.com > www.keosha.net >

Re: need a hive generic udf which also works on spark sql

2017-01-17 Thread Deepak Sharma
ot;); > hc.sql("select nexr_nvl2(name,let,ret) from testtab5").show; > System.exit(0); > > > On Jan 17, 2017 2:01 PM, "Sirisha Cheruvu" wrote: > >> Hi >> >> Anybody has a test and tried generic udf with object inspector >> implementaio

Re: need a hive generic udf which also works on spark sql

2017-01-17 Thread Sirisha Cheruvu
> Hi > > Anybody has a test and tried generic udf with object inspector > implementaion which sucessfully ran on both hive and spark-sql > > please share me the git hub link or source code file > > Thanks in advance > Sirisha >

Re: need a hive generic udf which also works on spark sql

2017-01-17 Thread Deepak Sharma
On the sqlcontext or hivesqlcontext , you can register the function as udf below: *hiveSqlContext.udf.register("func_name",func(_:String))* Thanks Deepak On Wed, Jan 18, 2017 at 8:45 AM, Sirisha Cheruvu wrote: > Hey > > Can yu send me the source code of hive java udf which

Re: need a hive generic udf which also works on spark sql

2017-01-17 Thread Sirisha Cheruvu
Hey Can yu send me the source code of hive java udf which worked in spark sql and how yu registered the function on spark On Jan 17, 2017 2:01 PM, "Sirisha Cheruvu" wrote: Hi Anybody has a test and tried generic udf with object inspector implementaion which sucessfully ran on bot

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Raju Bairishetti
nd corresponding all PRs. *spark.sql.hive.convertMetastoreParquet false* *spark.sql.hive.metastorePartitionPruning true* *I had set the above properties from *SPARK-6910 & PRs. > > Yong > > > -- > *From:* Raju Bairishetti > *Sent:* Tuesday, January 17, 2017 3:00

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Yong Zhang
From: Raju Bairishetti Sent: Tuesday, January 17, 2017 3:00 AM To: user @spark Subject: Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided Had a high level look into the code. Seems getHiveQlPartitions method

Re: need a hive generic udf which also works on spark sql

2017-01-17 Thread Takeshi Yamamuro
Hi, AFAIK, you could use Hive GenericUDF stuffs in spark without much effort. If you'd like to check test suites about that, you'd better to visit HiveUDFSuite. https://github.com/apache/spark/blob/master/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala I

need a hive generic udf which also works on spark sql

2017-01-17 Thread Sirisha Cheruvu
Hi Anybody has a test and tried generic udf with object inspector implementaion which sucessfully ran on both hive and spark-sql please share me the git hub link or source code file Thanks in advance Sirisha

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Raju Bairishetti
: > Waiting for suggestions/help on this... > > On Wed, Jan 11, 2017 at 12:14 PM, Raju Bairishetti > wrote: > >> Hello, >> >>Spark sql is generating query plan with all partitions information >> even though if we apply filters on partitions in the qu

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-15 Thread Raju Bairishetti
Waiting for suggestions/help on this... On Wed, Jan 11, 2017 at 12:14 PM, Raju Bairishetti wrote: > Hello, > > Spark sql is generating query plan with all partitions information even > though if we apply filters on partitions in the query. Due to this, spark > driver/hi

Re: Spark SQL DataFrame to Kafka Topic

2017-01-13 Thread Tathagata Das
notebooks/structured-streaming-kafka.html >> http://spark.apache.org/docs/latest/structured-streaming-pro >> gramming-guide.html >> >> On Fri, Jan 13, 2017 at 3:32 AM, Senthil Kumar >> wrote: >> >>> Hi Team , >>> >>> Sorry if this question

Re: [Spark SQL - Scala] TestHive not working in Spark 2

2017-01-13 Thread Xin Wu
t;> >> *From:* Xin Wu [mailto:xwu0...@gmail.com] >> *Sent:* 13 janvier 2017 12:43 >> *To:* Nicolas Tallineau >> *Cc:* user@spark.apache.org >> *Subject:* Re: [Spark SQL - Scala] TestHive not working in Spark 2 >> >> >> >> I used the following: &

Re: [Spark SQL - Scala] TestHive not working in Spark 2

2017-01-13 Thread Xin Wu
ier 2017 12:43 > *To:* Nicolas Tallineau > *Cc:* user@spark.apache.org > *Subject:* Re: [Spark SQL - Scala] TestHive not working in Spark 2 > > > > I used the following: > > > val testHive = new org.apache.spark.sql.hive.test.TestHiveContext(sc, > *false*) > > val

RE: [Spark SQL - Scala] TestHive not working in Spark 2

2017-01-13 Thread Nicolas Tallineau
nvier 2017 12:43 To: Nicolas Tallineau Cc: user@spark.apache.org Subject: Re: [Spark SQL - Scala] TestHive not working in Spark 2 I used the following: val testHive = new org.apache.spark.sql.hive.test.TestHiveContext(sc, false) val hiveClient = testHive.sessionState.metadataHive hiveClient.

Re: [Spark SQL - Scala] TestHive not working in Spark 2

2017-01-13 Thread Xin Wu
ForkMain$ForkError: java.lang.NullPointerException: null > > at org.apache.spark.sql.hive.test.TestHiveSparkSession. > getHiveFile(TestHive.scala:190) > > at org.apache.spark.sql.hive.test.TestHiveSparkSession.org > $apache$spark$sql

Re: Spark SQL DataFrame to Kafka Topic

2017-01-13 Thread Koert Kuipers
.org/docs/latest/structured-streaming- > programming-guide.html > > On Fri, Jan 13, 2017 at 3:32 AM, Senthil Kumar > wrote: > >> Hi Team , >> >> Sorry if this question already asked in this forum.. >> >> Can we ingest data to Apache Kafka Topic f

Re: Spark SQL DataFrame to Kafka Topic

2017-01-13 Thread Peyman Mohajerian
question already asked in this forum.. > > Can we ingest data to Apache Kafka Topic from Spark SQL DataFrame ?? > > Here is my Code which Reads Parquet File : > > *val sqlContext = new org.apache.spark.sql.SQLContext(sc);* > > *val df = sqlContext.

[Spark SQL - Scala] TestHive not working in Spark 2

2017-01-13 Thread Nicolas Tallineau
t.ForkMain$ForkError: java.lang.NullPointerException: null at org.apache.spark.sql.hive.test.TestHiveSparkSession.getHiveFile(TestHive.scala:190) at org.apache.spark.sql.hive.test.TestHiveSparkSession.org$apache$spark$sql$hive$test$TestHiveSparkSession$$quoteHiveFile(TestHive.scala:196

Spark SQL DataFrame to Kafka Topic

2017-01-13 Thread Senthil Kumar
Hi Team , Sorry if this question already asked in this forum.. Can we ingest data to Apache Kafka Topic from Spark SQL DataFrame ?? Here is my Code which Reads Parquet File : *val sqlContext = new org.apache.spark.sql.SQLContext(sc);* *val df = sqlContext.read.parquet("

Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-10 Thread Raju Bairishetti
Hello, Spark sql is generating query plan with all partitions information even though if we apply filters on partitions in the query. Due to this, spark driver/hive metastore is hitting with OOM as each table is with lots of partitions. We can confirm from hive audit logs that it tries to

Re: Spark SQL 1.6.3 ORDER BY and partitions

2017-01-09 Thread Yong Zhang
17 1:14 PM To: 'user' Subject: Spark SQL 1.6.3 ORDER BY and partitions I have two separate but similar issues that I've narrowed down to a pretty good level of detail. I'm using Spark 1.6.3, particularly Spark SQL. I'm concerned with a single dataset for now, although the detail

Spark SQL 1.6.3 ORDER BY and partitions

2017-01-06 Thread Joseph Naegele
I have two separate but similar issues that I've narrowed down to a pretty good level of detail. I'm using Spark 1.6.3, particularly Spark SQL. I'm concerned with a single dataset for now, although the details apply to other, larger datasets. I'll call it "table&qu

Re: Spark SQL - Applying transformation on a struct inside an array

2017-01-05 Thread Olivier Girardot
of the underlying the schema. It's "relatively" straightforward for complex types like struct> to apply an arbitrary UDF on the column and replace the data "inside" the struct, however I'm struggling to make it work for complex types containing arrays along the way

Re: spark sql in Cloudera package

2017-01-04 Thread Sean Owen
all the grunt work to make it compatible with the (slightly different) Hive version in CDH, just to provide roughly the same functionality, yeah. On Wed, Jan 4, 2017 at 3:17 PM Mich Talebzadeh wrote: > Sounds like Cloudera do not supply the shell for spark-sql but only > spark-shell > > is t

spark sql in Cloudera package

2017-01-04 Thread Mich Talebzadeh
Sounds like Cloudera do not supply the shell for spark-sql but only spark-shell is that correct? I appreciate that one can use spark-shell. however, sounds like spark-sql is excluded in favour of Impala? cheers Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id

Migrate spark sql to rdd for better performance

2017-01-03 Thread geoHeil
I optimized a spark sql script but have come to the conclusion that the sql api is not ideal as the tasks which are generated are slow and require too much shuffling. So the script should be converted to rdd http://stackoverflow.com/q/41445571/2587904 How can I formulate this more efficient

Re: [Spark SQL] Task failed while writing rows

2016-12-25 Thread Timur Shenkao
been reported before, >> e.g. https://issues.apache.org/jira/browse/HDFS-770 which is *very* old. >> >> >> >> If you’re pretty sure Spark couldn’t be responsible for issues at this >> level I’ll stick to the Hadoop mailing list. >> >> >> >>

Spark-SQL 1.6.2 w/Hive UDF @Description

2016-12-23 Thread Lavelle, Shawn
​Hello Spark Users, I have a Hive UDF that I'm trying to use with Spark-SQL. It's showing up a bit awkwardly: I can load it into the Hive Thrift Server with a "Create function..." query against the hive context. I can then use the UDF in queries. However, a "

Re: SPARK -SQL Understanding BroadcastNestedLoopJoin and number of partitions

2016-12-21 Thread David Hodeffi
Do you know who can I talk to about this code? I am rally curious to know why there is a join and why number of partition for join is the sum of both of them, I expected to see that number of partitions should be the same as the streamed table ,or worst case multiplied. Sent from my iPhone On

SPARK -SQL Understanding BroadcastNestedLoopJoin and number of partitions

2016-12-21 Thread David Hodeffi
I have two dataframes which I am joining. small and big size dataframess. The optimizer suggest to use BroadcastNestedLoopJoin. number of partitions for the big Dataframe is 200 while small Dataframe has 5 partitions. The joined dataframe results with 205 partitions (joined.rdd.partitions.size),

Re: [Spark SQL] Task failed while writing rows

2016-12-19 Thread Michael Stratton
, but rare, cases have been reported before, > e.g. https://issues.apache.org/jira/browse/HDFS-770 which is *very* old. > > > > If you’re pretty sure Spark couldn’t be responsible for issues at this > level I’ll stick to the Hadoop mailing list. > > > > Thanks > >

RE: [Spark SQL] Task failed while writing rows

2016-12-19 Thread Joseph Naegele
. Thanks --- Joe Naegele Grier Forensics From: Michael Stratton [mailto:michael.strat...@komodohealth.com] Sent: Monday, December 19, 2016 10:00 AM To: Joseph Naegele Cc: user Subject: Re: [Spark SQL] Task failed while writing rows It seems like an issue w/ Hadoop. What do you get when

Re: [Spark SQL] Task failed while writing rows

2016-12-19 Thread Michael Stratton
lot better w/ Hive it can be a pain. On Sun, Dec 18, 2016 at 5:49 PM, Joseph Naegele wrote: > Hi all, > > I'm having trouble with a relatively simple Spark SQL job. I'm using Spark > 1.6.3. I have a dataset of around 500M rows (average 128 bytes per record). > It's

Re: Spark SQL Syntax

2016-12-19 Thread A Shaikh
6 at 14:00, Ramesh Krishnan wrote: > What is the version of spark you are using . If it is less than 2.0 , > consider using dataset API's to validate compile time checks on syntax. > > Thanks, > Ramesh > > On Mon, Dec 19, 2016 at 6:36 PM, A Shaikh wrote: > >> HI,

Spark SQL Syntax

2016-12-19 Thread A Shaikh
HI, I keep getting Spark SQL Syntax invalid especially for Dates/Timestamps manipulation. What's the best way to test SQL Syntax in Spark Dataframe is valid? Any online site for test or run a demo SQL! Thanks, Afzal

[Spark SQL] Task failed while writing rows

2016-12-18 Thread Joseph Naegele
Hi all, I'm having trouble with a relatively simple Spark SQL job. I'm using Spark 1.6.3. I have a dataset of around 500M rows (average 128 bytes per record). It's current compressed size is around 13 GB, but my problem started when it was much smaller, maybe 5 GB. This datas

Re: [Spark-SQL] collect_list() support for nested collection

2016-12-13 Thread Ninad Shringarpure
Exactly what I was looking for. Thank you so much!! On Tue, Dec 13, 2016 at 6:15 PM Michael Armbrust wrote: > Yes > > > https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/4464261896877850/2840265927289860/latest.html > > On Tue, Dec

Re: [Spark-SQL] collect_list() support for nested collection

2016-12-13 Thread Michael Armbrust
Yes https://databricks-prod-cloudfront.cloud.databricks.com/public/ 4027ec902e239c93eaaa8714f173bcfc/1023043053387187/4464261896877850/ 2840265927289860/latest.html On Tue, Dec 13, 2016 at 10:43 AM, Ninad Shringarpure wrote: > > Hi Team, > > Does Spark 2.0 support non-primitive types in collect

Fwd: [Spark-SQL] collect_list() support for nested collection

2016-12-13 Thread Ninad Shringarpure
Hi Team, Does Spark 2.0 support non-primitive types in collect_list for inserting nested collections? Would appreciate any references or samples. Thanks, Ninad

Dynamic spark sql

2016-12-12 Thread geoHeil
Hi I am curious how to dynamically generate spark sql in the scala api. http://stackoverflow.com/q/41102347/2587904 >From this list val columnsFactor = Seq("bar", "baz") I want to generate multiple withColumn statements dfWithNewLabels.withColumn(&quo

[Spark SQL]: Dataset Encoder equivalent for pre 1.6.0 releases?

2016-12-06 Thread Denis Papathanasiou
I have a case class named "Computed", and I'd like to be able to encode all the Row objects in the DataFrame like this: def myEncoder (df: DataFrame): Dataset[Computed] = df.as(Encoders.bean(classOf[Computed])) This works just fine with the latest version of spark, but I'm forced to use version

Re: How do I flatten JSON blobs into a Data Frame using Spark/Spark SQL

2016-12-05 Thread Hyukjin Kwon
unctions will require me to pass schema and >>>>> that can be little tricky for us but the code below doesn't require me to >>>>> pass schema at all. >>>>> >>>>> import org.apache.spark.sql._ >>>>> val rdd = df2.rdd.map { ca

Re: How do I flatten JSON blobs into a Data Frame using Spark/Spark SQL

2016-12-05 Thread kant kodali
> spark.read.json(rdd).show() >>>> >>>> >>>> On Tue, Nov 22, 2016 at 2:42 PM, Michael Armbrust < >>>> mich...@databricks.com> wrote: >>>> >>>>> The first release candidate should be coming out this week. You can >

Re: How do I flatten JSON blobs into a Data Frame using Spark/Spark SQL

2016-12-05 Thread Hyukjin Kwon
.com> wrote: >>> >>>> The first release candidate should be coming out this week. You can >>>> subscribe to the dev list if you want to follow the release schedule. >>>> >>>> On Mon, Nov 21, 2016 at 9:34 PM, kant kodali >>>> wr

Re: How do I flatten JSON blobs into a Data Frame using Spark/Spark SQL

2016-12-05 Thread kant kodali
if you want to follow the release schedule. >>> >>> On Mon, Nov 21, 2016 at 9:34 PM, kant kodali wrote: >>> >>>> Hi Michael, >>>> >>>> I only see spark 2.0.2 which is what I am using currently. Any idea on >>>> when

Re: Spark sql generated dynamically

2016-12-02 Thread Georg Heiler
Are you sure? I think this is a column wise and not a row wise operation. ayan guha schrieb am Fr. 2. Dez. 2016 um 15:17: > You are looking for window functions. > On 2 Dec 2016 22:33, "Georg Heiler" wrote: > > Hi, > > how can I perform a group wise operation in spark more elegant? Possibly > dy

Re: Spark sql generated dynamically

2016-12-02 Thread ayan guha
You are looking for window functions. On 2 Dec 2016 22:33, "Georg Heiler" wrote: > Hi, > > how can I perform a group wise operation in spark more elegant? Possibly > dynamically generate SQL? Or would you suggest a custom UADF? > http://stackoverflow.com/q/40930003/2587904 > > Kind regards, > Geo

Spark sql generated dynamically

2016-12-02 Thread Georg Heiler
Hi, how can I perform a group wise operation in spark more elegant? Possibly dynamically generate SQL? Or would you suggest a custom UADF? http://stackoverflow.com/q/40930003/2587904 Kind regards, Georg

Re: Spark 2.x Pyspark Spark SQL createDataframe Error

2016-12-02 Thread Vinayak Joshi5
spark" Date: 02/12/2016 05:50 AM Subject:Re: Spark 2.x Pyspark Spark SQL createDataframe Error Hello Vinayak, As I understand it, Spark creates a Derby metastore database in the current location, in the metastore_db subdirectory, whenever you first use an SQL context. This datab

Re: Spark 2.x Pyspark Spark SQL createDataframe Error

2016-12-01 Thread Michal Šenkýř
essController.doPrivileged(Native Method) > at > org.apache.derby.impl.store.raw.data.BaseDataFileFactory.getJBMSLockOnDB(Unknown > Source) > at > org.apache.derby.impl.store.raw.data.BaseDataFileFactory.boot(Unknown > Source) > at org.apache.derby.impl.serv

Re: Spark 2.x Pyspark Spark SQL createDataframe Error

2016-12-01 Thread Vinayak Joshi5
pl.store.raw.data.BaseDataFileFactory.boot(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) Regards, Vinayak Joshi From: Vinayak Joshi5/India/IBM@IBMIN To: "user.spark" Date: 01/12/2016 10:53 PM Subject:Spark 2.x Pyspar

Spark 2.x Pyspark Spark SQL createDataframe Error

2016-12-01 Thread Vinayak Joshi5
With a local spark instance built with hive support, (-Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver) The following script/sequence works in Pyspark without any error against 1.6.x, but fails with 2.x. people = sc.parallelize(["Michael,30", "Andy,12", "Justin,19"]) peopl

Re: How do I flatten JSON blobs into a Data Frame using Spark/Spark SQL

2016-11-28 Thread Michael Armbrust
I am using currently. Any idea on >>> when 2.1 will be released? >>> >>> Thanks, >>> kant >>> >>> On Mon, Nov 21, 2016 at 5:12 PM, Michael Armbrust < >>> mich...@databricks.com> wrote: >>> >>>> In Spark 2.1 we&

Re: time to run Spark SQL query

2016-11-28 Thread ayan guha
They should take same time if everything else is constant On 28 Nov 2016 23:41, "Hitesh Goyal" wrote: > Hi team, I am using spark SQL for accessing the amazon S3 bucket data. > > If I run a sql query by using normal SQL syntax like below > > 1) DataFrame d=sqlConte

<    4   5   6   7   8   9   10   11   12   13   >