Re: SparkSQL performance

2014-10-31 Thread Du Li
From: Soumya Simanta soumya.sima...@gmail.commailto:soumya.sima...@gmail.com Date: Friday, October 31, 2014 at 4:04 PM To: user@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Subject: SparkSQL performance I was really surprised to see the results

Re: SparkSQL performance

2014-10-31 Thread Soumya Simanta
I agree. My personal experience with Spark core is that it performs really well once you tune it properly. As far I understand SparkSQL under the hood performs many of these optimizations (order of Spark operations) and uses a more efficient storage format. Is this assumption correct? Has anyone

Re: SparkSQL: Nested Query error

2014-10-30 Thread SK
(deviceRDD).count(). The count comes out to be 1, but there are many UIDs in tusers that are not in device - so the result is not correct. I would like to know the right way to do frame this query in SparkSQL. thanks -- View this message in context: http://apache-spark-user-list.1001560.n3

SparkSQL + Hive Cached Table Exception

2014-10-30 Thread Jean-Pascal Billaud
Hi, While testing SparkSQL on top of our Hive metastore, I am getting some java.lang.ArrayIndexOutOfBoundsException while reusing a cached RDD table. Basically, I have a table mtable partitioned by some date field in hive and below is the scala code I am running in spark-shell: val sqlContext

Re: SparkSQL + Hive Cached Table Exception

2014-10-30 Thread Michael Armbrust
Hmmm, this looks like a bug. Can you file a JIRA? On Thu, Oct 30, 2014 at 4:04 PM, Jean-Pascal Billaud j...@tellapart.com wrote: Hi, While testing SparkSQL on top of our Hive metastore, I am getting some java.lang.ArrayIndexOutOfBoundsException while reusing a cached RDD table. Basically

SparkSQL: Nested Query error

2014-10-29 Thread SK
in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Nested-Query-error-tp17691.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr

Re: SparkSQL: Nested Query error

2014-10-29 Thread Sanjiv Mittal
tusers WHERE tusers.u_uid NOT IN (SELECT d_uid FROM device)) But that resulted in a compilation error. What is the right way to frame the above query in Spark SQL? thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Nested-Query-error

SparkSql OutOfMemoryError

2014-10-28 Thread Zhanfeng Huo
Hi,friends: I use spark(spark 1.1) sql operate data in hive-0.12, and the job fails when data is large. So how to tune it ? spark-defaults.conf: spark.shuffle.consolidateFiles true spark.shuffle.manager SORT spark.akka.threads 4 spark.sql.inMemoryColumnarStorage.compressed

Re: SparkSql OutOfMemoryError

2014-10-28 Thread Yanbo Liang
Try to increase the driver memory. 2014-10-28 17:33 GMT+08:00 Zhanfeng Huo huozhanf...@gmail.com: Hi,friends: I use spark(spark 1.1) sql operate data in hive-0.12, and the job fails when data is large. So how to tune it ? spark-defaults.conf: spark.shuffle.consolidateFiles true

Re: RDD to Multiple Tables SparkSQL

2014-10-28 Thread critikaled
mean by extract could you direct me to api or code sample. thanks and regards, critikaled. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-Multiple-Tables-SparkSQL-tp16807p17536.html Sent from the Apache Spark User List mailing list archive

Re: SparkSQL display wrong result

2014-10-27 Thread Cheng Lian
Would you mind to share DDLs of all involved tables? What format are these tables stored in? Is this issue specific to this query? I guess Hive, Shark and Spark SQL all read from the same HDFS dataset? On 10/27/14 3:45 PM, lyf刘钰帆 wrote: Hi, I am using SparkSQL 1.1.0 with cdh 4.6.0 recently

Re: 答复: SparkSQL display wrong result

2014-10-27 Thread Cheng Lian
LOCAL INPATH '/home/data/testFolder/qrytblB.txt' INTO TABLE tblB; *发件人:*Cheng Lian [mailto:lian.cs@gmail.com] *发 送时间:*2014年10月27日16:48 *收件人:*lyf刘钰帆; user@spark.apache.org *主题:*Re: SparkSQL display wrong result Would you mind to share DDLs of all involved tables? What format

Is SparkSQL + JDBC server a good approach for caching?

2014-10-24 Thread ankits
-user-list.1001560.n3.nabble.com/Is-SparkSQL-JDBC-server-a-good-approach-for-caching-tp17196.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

Re: Is SparkSQL + JDBC server a good approach for caching?

2014-10-24 Thread Aniket Bhatnagar
.nabble.com/Is-SparkSQL-JDBC-server-a-good-approach-for-caching-tp17196.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e

Re: Is SparkSQL + JDBC server a good approach for caching?

2014-10-24 Thread Michael Armbrust
offer any advantages (e.g does it have built in support for caching?) over rolling my own solution for this use case? Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-SparkSQL-JDBC-server-a-good-approach-for-caching-tp17196.html Sent from

Re: Is SparkSQL + JDBC server a good approach for caching?

2014-10-24 Thread Sadhan Sood
! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-SparkSQL-JDBC-server-a-good-approach-for-caching-tp17196.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Is SparkSQL + JDBC server a good approach for caching?

2014-10-24 Thread Michael Armbrust
.1001560.n3.nabble.com/Is-SparkSQL-JDBC-server-a-good-approach-for-caching-tp17196.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

Re: Is SparkSQL + JDBC server a good approach for caching?

2014-10-24 Thread Sadhan Sood
rolling my own solution for this use case? Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-SparkSQL-JDBC-server-a-good-approach-for-caching-tp17196.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Is SparkSQL + JDBC server a good approach for caching?

2014-10-24 Thread Michael Armbrust
like it only supports loading data from files, but I want to query tables stored in memory only via JDBC. Is that possible? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-SparkSQL-JDBC-server-a-good-approach-for-caching-tp17196p17235.html Sent

SparkSQL and columnar data

2014-10-23 Thread Marius Soutier
Hi guys, another question: what’s the approach to working with column-oriented data, i.e. data with more than 1000 columns. Using Parquet for this should be fine, but how well does SparkSQL handle the big amount of columns? Is there a limit? Should we use standard Spark instead? Thanks

SparkSQL , best way to divide data into partitions?

2014-10-22 Thread raymond
Hi I have a json file that can be load by sqlcontext.jsonfile into a table. but this table is not partitioned. Then I wish to transform this table into a partitioned table say on field “date” etc. what will be the best approaching to do this? seems in hive this is usually

Re: RDD to Multiple Tables SparkSQL

2014-10-21 Thread Olivier Girardot
-SparkSQL-tp16807.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

[SQL] Is RANK function supposed to work in SparkSQL 1.1.0?

2014-10-21 Thread Pierre B
Hi! The RANK function is available in hive since version 0.11. When trying to use it in SparkSQL, I'm getting the following exception (full stacktrace below): java.lang.ClassCastException: org.apache.hadoop.hive.ql.udf.generic.GenericUDAFRank$RankBuffer cannot be cast

Re: [SQL] Is RANK function supposed to work in SparkSQL 1.1.0?

2014-10-21 Thread Michael Armbrust
No, analytic and window functions do not work yet. On Tue, Oct 21, 2014 at 3:00 AM, Pierre B pierre.borckm...@realimpactanalytics.com wrote: Hi! The RANK function is available in hive since version 0.11. When trying to use it in SparkSQL, I'm getting the following exception (full

Re: SparkSQL - TreeNodeException for unresolved attributes

2014-10-21 Thread Terry Siu
at 12:22 PM To: Michael Armbrust mich...@databricks.commailto:mich...@databricks.com Cc: user@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: SparkSQL - TreeNodeException for unresolved attributes Hi Michael, Thanks again for the reply

Re: [SQL] Is RANK function supposed to work in SparkSQL 1.1.0?

2014-10-21 Thread Pierre B
.nabble.com/SQL-Is-RANK-function-supposed-to-work-in-SparkSQL-1-1-0-tp16909p16942.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional

RDD to Multiple Tables SparkSQL

2014-10-20 Thread critikaled
://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-Multiple-Tables-SparkSQL-tp16807.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

Re: SparkSQL IndexOutOfBoundsException when reading from Parquet

2014-10-20 Thread Terry Siu
Hi Yin, Sorry for the delay, but I’ll try the code change when I get a chance, but Michael’s initial response did solve my problem. In the meantime, I’m hitting another issue with SparkSQL which I will probably post another message if I can’t figure a workaround. Thanks, -Terry From: Yin

SparkSQL - TreeNodeException for unresolved attributes

2014-10-20 Thread Terry Siu
with GROUP BY to write back out to a Hive rollup table that has two partitions. This task is an effort to simulate the unsupported GROUPING SETS functionality in SparkSQL. In my first attempt, I got really close using SchemaRDD.groupBy until I realized that SchemaRDD.insertTo API does not support

Re: SparkSQL - TreeNodeException for unresolved attributes

2014-10-20 Thread Michael Armbrust
partitions. This task is an effort to simulate the unsupported GROUPING SETS functionality in SparkSQL. In my first attempt, I got really close using SchemaRDD.groupBy until I realized that SchemaRDD.insertTo API does not support partitioned tables yet. This prompted my second attempt to pass

Re: SparkSQL - TreeNodeException for unresolved attributes

2014-10-20 Thread Terry Siu
@smartfocus.commailto:terry@smartfocus.com Cc: user@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: SparkSQL - TreeNodeException for unresolved attributes Have you tried this on master? There were several problems with resolution of complex queries

Re: SparkSQL: set hive.metastore.warehouse.dir in CLI doesn't work

2014-10-16 Thread Cheng Lian
The warehouse location need to be specified before the |HiveContext| initialization, you can set it via: |./bin/spark-sql --hiveconf hive.metastore.warehouse.dir=/home/spark/hive/warehouse | On 10/15/14 8:55 PM, Hao Ren wrote: Hi, The following query in sparkSQL 1.1.0 CLI doesn't work

Re: [SparkSQL] Convert JavaSchemaRDD to SchemaRDD

2014-10-16 Thread Cheng Lian
this. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Convert-JavaSchemaRDD-to-SchemaRDD-tp16482.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: SparkSQL IndexOutOfBoundsException when reading from Parquet

2014-10-16 Thread Yin Huai
To: Terry Siu terry@smartfocus.com Cc: Michael Armbrust mich...@databricks.com, user@spark.apache.org user@spark.apache.org Subject: Re: SparkSQL IndexOutOfBoundsException when reading from Parquet Hello Terry, How many columns does pqt_rdt_snappy have? Thanks, Yin On Tue, Oct

Re: [SparkSQL] Convert JavaSchemaRDD to SchemaRDD

2014-10-16 Thread Earthson
I'm trying to give API interface to Java users. And I need to accept their JavaSchemaRDDs, and convert it to SchemaRDD for Scala users. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Convert-JavaSchemaRDD-to-SchemaRDD-tp16482p16641.html Sent from

[SparkSQL] Convert JavaSchemaRDD to SchemaRDD

2014-10-15 Thread Earthson
that: Is it a good idea for me to *use catalyst as DSL's execution engine?* I am trying to build a DSL, And I want to confirm this. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Convert-JavaSchemaRDD-to-SchemaRDD-tp16482.html Sent from the Apache

SparkSQL: set hive.metastore.warehouse.dir in CLI doesn't work

2014-10-15 Thread Hao Ren
Hi, The following query in sparkSQL 1.1.0 CLI doesn't work. *SET hive.metastore.warehouse.dir=/home/spark/hive/warehouse ; create table test as select v1.*, v2.card_type, v2.card_upgrade_time_black, v2.card_upgrade_time_gold from customer v1 left join customer_loyalty v2 on v1.account_id = v2

Re: SparkSQL IndexOutOfBoundsException when reading from Parquet

2014-10-15 Thread Terry Siu
...@databricks.com, user@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: SparkSQL IndexOutOfBoundsException when reading from Parquet Hello Terry, How many columns does pqt_rdt_snappy have? Thanks, Yin On Tue, Oct 14, 2014 at 11:52 AM

Re: SparkSQL: StringType for numeric comparison

2014-10-14 Thread invkrh
.1001560.n3.nabble.com/SparkSQL-StringType-for-numeric-comparison-tp16295p16361.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional

Re: SparkSQL: select syntax

2014-10-14 Thread Hao Ren
retype all the 19 columns' name when querying with select. This feature exists in hive. But in SparkSql, it gives an exception. Any ideas ? Thx Hao -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-select-syntax-tp16299p16364.html Sent from

Re: SparkSQL: select syntax

2014-10-14 Thread Gen
you'd write this in HiveQL, and then try doing that with HiveContext./ In fact, there are more problems than that. The sparkSQL will conserve (15+5=20) columns in the final table, if I remember well. Therefore, when you are doing join on two tables which have the same columns will cause doublecolumn

Re: SparkSQL: select syntax

2014-10-14 Thread Hao Ren
Thank you, Gen. I will give hiveContext a try. =) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-select-syntax-tp16299p16368.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: SparkSQL IndexOutOfBoundsException when reading from Parquet

2014-10-14 Thread Terry Siu
@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: SparkSQL IndexOutOfBoundsException when reading from Parquet There are some known bug with the parquet serde and spark 1.1. You can try setting spark.sql.hive.convertMetastoreParquet=true

Re: SparkSQL: StringType for numeric comparison

2014-10-14 Thread Michael Armbrust
to form the table schema. As for me, StringType is enough, why do we need others ? Hao -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-StringType-for-numeric-comparison-tp16295p16361.html Sent from the Apache Spark User List mailing list archive

Re: Does SparkSQL work with custom defined SerDe?

2014-10-14 Thread Chen Song
Looks like it may be related to https://issues.apache.org/jira/browse/SPARK-3807. I will build from branch 1.1 to see if the issue is resolved. Chen On Tue, Oct 14, 2014 at 10:33 AM, Chen Song chen.song...@gmail.com wrote: Sorry for bringing this out again, as I have no clue what could have

Re: How to patch sparkSQL on EC2?

2014-10-14 Thread Christos Kozanitis Christos Kozanitis
for sparkSQL (for version 1.1.0) and I am trying to deploy my new jar files (one for catalyst and one for sql/core) on ec2. My approach was to create a new spark/lib/spark-assembly-1.1.0-hadoop1.0.4.jar that merged the contents of the old one with the contents of my new jar files and I propagated

Re: SparkSQL IndexOutOfBoundsException when reading from Parquet

2014-10-14 Thread Yin Huai
...@databricks.com Date: Monday, October 13, 2014 at 5:05 PM To: Terry Siu terry@smartfocus.com Cc: user@spark.apache.org user@spark.apache.org Subject: Re: SparkSQL IndexOutOfBoundsException when reading from Parquet There are some known bug with the parquet serde and spark 1.1. You can try

Re: SparkSQL on Hive error

2014-10-13 Thread Kevin Paul
Thanks Michael, your patch works for me :) Regards, Kelvin Paul On Fri, Oct 3, 2014 at 3:52 PM, Michael Armbrust mich...@databricks.com wrote: Are you running master? There was briefly a regression here that is hopefully fixed by spark#2635 https://github.com/apache/spark/pull/2635. On Fri,

Setting SparkSQL configuration

2014-10-13 Thread Kevin Paul
Hi all, I tried to set the configuration spark.sql.inMemoryColumnarStorage.compressed, and spark.sql.inMemoryColumnarStorage.batchSize in spark.executor.extraJavaOptions but it does not work, my spark.executor.extraJavaOptions contains Dspark.sql.inMemoryColumnarStorage.compressed=true

Re: Setting SparkSQL configuration

2014-10-13 Thread Cheng Lian
Currently Spark SQL doesn’t support reading SQL specific configurations via system properties. But for |HiveContext|, you can put them in |hive-site.xml|. On 10/13/14 4:28 PM, Kevin Paul wrote: Hi all, I tried to set the configuration spark.sql.inMemoryColumnarStorage.compressed, and

Re: SparkSQL LEFT JOIN problem

2014-10-13 Thread invkrh
is #65279(or U+FEFF). As a result, the first field has a leading #65279 char. When querying, I just used account_id, so SparkSQL cannot find the given field in AST, while the one in AST is #65279account_id. So the solution this to convert input file to UTF-8 Unicode (*without* BOM), that will remove

Re: Nested Query using SparkSQL 1.1.0

2014-10-13 Thread Yin Huai
Hi Shahab, Can you try to use HiveContext? Its should work in 1.1. For SQLContext, this issues was not fixed in 1.1 and you need to use master branch at the moment. Thanks, Yin On Sun, Oct 12, 2014 at 5:20 PM, shahab shahab.mok...@gmail.com wrote: Hi, Apparently is it is possible to query

Re: Nested Query using SparkSQL 1.1.0

2014-10-13 Thread shahab
Thanks Yin. I trued HiveQL and and it solved that problem. But now I have second query requirement : But since you are main developer behind JSON-Spark integration (I saw your presentation on youtube Easy JSON Data Manipulation in Spark), is it possible to perform aggregation kind queries, for

Re: Nested Query using SparkSQL 1.1.0

2014-10-13 Thread Yin Huai
Hi Shahab, Do you mean queries with group by and aggregation functions? Once you register the json dataset as a table, you can write queries like querying a regular table. You can join it with other tables and do aggregations. Is it what you were asking for? If not, can you give me a more

SparkSQL: StringType for numeric comparison

2014-10-13 Thread invkrh
Hi, I am using SparkSQL 1.1.0. Actually, I have a table as following: root |-- account_id: string (nullable = false) |-- Birthday: string (nullable = true) |-- preferstore: string (nullable = true) |-- registstore: string (nullable = true) |-- gender: string (nullable = true

SparkSQL: select syntax

2014-10-13 Thread invkrh
Hi all, A quick question on SparkSql *SELECT* syntax. Does it support queries like: *SELECT t1.*, t2.d, t2.e FROM t1 LEFT JOIN t2 on t1.a = t2.a* It always ends with the exception: *Exception in thread main java.lang.RuntimeException: [2.12] failure: string literal expected SELECT t1.*, t2.d

Re: SparkSQL: StringType for numeric comparison

2014-10-13 Thread Michael Armbrust
...@gmail.com wrote: Hi, I am using SparkSQL 1.1.0. Actually, I have a table as following: root |-- account_id: string (nullable = false) |-- Birthday: string (nullable = true) |-- preferstore: string (nullable = true) |-- registstore: string (nullable = true) |-- gender: string

SparkSQL IndexOutOfBoundsException when reading from Parquet

2014-10-13 Thread Terry Siu
defined. Does this error look familiar to anyone? Could my usage of SparkSQL with Hive be incorrect or is support with Hive/Parquet/partitioning still buggy at this point in Spark 1.1.0? Thanks, -Terry

Does SparkSQL work with custom defined SerDe?

2014-10-13 Thread Chen Song
In Hive, the table was created with custom SerDe, in the following way. row format serde abc.ProtobufSerDe with serdeproperties (serialization.class= abc.protobuf.generated.LogA$log_a) When I start spark-sql shell, I always got the following exception, even for a simple query. select user from

Re: SparkSQL IndexOutOfBoundsException when reading from Parquet

2014-10-13 Thread Michael Armbrust
, pqt_segcust_snappy, has 21 columns and two partitions defined. Does this error look familiar to anyone? Could my usage of SparkSQL with Hive be incorrect or is support with Hive/Parquet/partitioning still buggy at this point in Spark 1.1.0? Thanks, -Terry

Nested Query using SparkSQL 1.1.0

2014-10-12 Thread shahab
Hi, Apparently is it is possible to query nested json using spark SQL, but , mainly due to lack of proper documentation/examples, I did not manage to make it working. I do appreciate if you could point me to any example or help with this issue, Here is my code: val anotherPeopleRDD =

Re: How to do broadcast join in SparkSQL

2014-10-11 Thread Jianshi Huang
It works fine, thanks for the help Michael. Liancheng also told me a trick, using a subquery with LIMIT n. It works in latest 1.2.0 BTW, looks like the broadcast optimization won't be recognized if I do a left join instead of a inner join. Is that true? How can I make it work for left joins?

Re: Blog post: An Absolutely Unofficial Way to Connect Tableau to SparkSQL (Spark 1.1)

2014-10-11 Thread Matei Zaharia
Very cool Denny, thanks for sharing this! Matei On Oct 11, 2014, at 9:46 AM, Denny Lee denny.g@gmail.com wrote: https://www.concur.com/blog/en-us/connect-tableau-to-sparksql If you're wondering how to connect Tableau to SparkSQL - here are the steps to connect Tableau to SparkSQL

How to patch sparkSQL on EC2?

2014-10-10 Thread Christos Kozanitis Christos Kozanitis
Hi I have written a few extensions for sparkSQL (for version 1.1.0) and I am trying to deploy my new jar files (one for catalyst and one for sql/core) on ec2. My approach was to create a new spark/lib/spark-assembly-1.1.0-hadoop1.0.4.jar that merged the contents of the old one

SparkSQL LEFT JOIN problem

2014-10-10 Thread invkrh
Hi, I am exploring SparkSQL 1.1.0, I have a problem on LEFT JOIN. Here is the request: select * from customer left join profile on customer.account_id = profile.account_id The two tables' schema are shown as following: // Table: customer root |-- account_id: string (nullable = false

Re: SparkSQL LEFT JOIN problem

2014-10-10 Thread Liquan Pei
Hi Can you try select birthday from customer left join profile on customer.account_id = profile.account_id to see if the problems remains on your entire data? Thanks, Liquan On Fri, Oct 10, 2014 at 8:20 AM, invkrh inv...@gmail.com wrote: Hi, I am exploring SparkSQL 1.1.0, I have a problem

Re: How to do broadcast join in SparkSQL

2014-10-08 Thread Jianshi Huang
Looks like https://issues.apache.org/jira/browse/SPARK-1800 is not merged into master? I cannot find spark.sql.hints.broadcastTables in latest master, but it's in the following patch. https://github.com/apache/spark/commit/76ca4341036b95f71763f631049fdae033990ab5 Jianshi On Mon, Sep 29,

Re: How to do broadcast join in SparkSQL

2014-10-08 Thread Jianshi Huang
Ok, currently there's cost-based optimization however Parquet statistics is not implemented... What's the good way if I want to join a big fact table with several tiny dimension tables in Spark SQL (1.1)? I wish we can allow user hint for the join. Jianshi On Wed, Oct 8, 2014 at 2:18 PM,

Re: How to do broadcast join in SparkSQL

2014-10-08 Thread Michael Armbrust
Thanks for the input. We purposefully made sure that the config option did not make it into a release as it is not something that we are willing to support long term. That said we'll try and make this easier in the future either through hints or better support for statistics. In this particular

Re: sparksql connect remote hive cluster

2014-10-08 Thread Patrick Wendell
/00ab46fa4d6711e4afb70003ff41ebbf/part-3 not sure if some of the ports are not open or it needs access to additional things. thanks, -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/sparksql-connect-remote-hive-cluster-tp15928.html Sent from

HiveServer1 and SparkSQL

2014-10-07 Thread deenar.toraskar
Hi Shark supported both the HiveServer1 and HiveServer2 thrift interfaces (using $ bin/shark -service sharkserver[1 or 2]). SparkSQL seems to support only HiveServer2. I was wondering what is involved to add support for HiveServer1. Is this something straightforward to do that I can embark

Re: [SparkSQL] Function parity with Shark?

2014-10-06 Thread Yana Kadiyska
I have created https://issues.apache.org/jira/browse/SPARK-3814 https://issues.apache.org/jira/browse/SPARK-3815 Will probably try my hand at 3814, seems like a good place to get started... On Fri, Oct 3, 2014 at 3:06 PM, Michael Armbrust mich...@databricks.com wrote: Thanks for digging in!

Re: [ANN] SparkSQL support for Cassandra with Calliope

2014-10-06 Thread tian zhang
established the ground work and direction for Spark Cassandra connectors and we have been happy seeing the results. With Spark 1.1.0 and SparkSQL release, we its time to take Calliope to the logical next level also paving the way for much more advanced functionality to come. Yesterday we released

Re: [ANN] SparkSQL support for Cassandra with Calliope

2014-10-04 Thread Rohit Rai
started this journey and laid the path for Spark + Cassandra stack. We established the ground work and direction for Spark Cassandra connectors and we have been happy seeing the results. With Spark 1.1.0 and SparkSQL release, we its time to take Calliope http://tuplejump.github.io/calliope

SparkSQL on Hive error

2014-10-03 Thread Kevin Paul
Hi all, I tried to launch my application with spark-submit, the command I use is: bin/spark-submit --class ${MY_CLASS} --jars ${MY_JARS} --master local myApplicationJar.jar I've buillt spark with SPARK_HIVE=true, and was able to start HiveContext, and was able to run command like,

Re: SparkSQL on Hive error

2014-10-03 Thread Michael Armbrust
Are you running master? There was briefly a regression here that is hopefully fixed by spark#2635 https://github.com/apache/spark/pull/2635. On Fri, Oct 3, 2014 at 1:43 AM, Kevin Paul kevinpaulap...@gmail.com wrote: Hi all, I tried to launch my application with spark-submit, the command I use

Re: SparkSQL on Hive error

2014-10-03 Thread Cheng Lian
Also make sure to call |hiveContext.sql| within the same thread where |hiveContext| is created, because Hive uses thread-local variable to initialize the |Driver.conf|. On 10/3/14 4:52 PM, Michael Armbrust wrote: Are you running master? There was briefly a regression here that is hopefully

Re: [SparkSQL] Function parity with Shark?

2014-10-03 Thread Yana Kadiyska
Thanks -- it does appear that I misdiagnosed a bit: case works generally but it doesn't seem to like the bit operation, which does not seem to work (type of bit_field in Hive is bigint): Error: java.lang.RuntimeException: Unsupported language features in query: select (case when bit_field 1=1

[ANN] SparkSQL support for Cassandra with Calliope

2014-10-03 Thread Rohit Rai
Hi All, An year ago we started this journey and laid the path for Spark + Cassandra stack. We established the ground work and direction for Spark Cassandra connectors and we have been happy seeing the results. With Spark 1.1.0 and SparkSQL release, we its time to take Calliope http

Re: [SparkSQL] Function parity with Shark?

2014-10-03 Thread Michael Armbrust
Thanks for digging in! These both look like they should have JIRAs. On Fri, Oct 3, 2014 at 8:14 AM, Yana Kadiyska yana.kadiy...@gmail.com wrote: Thanks -- it does appear that I misdiagnosed a bit: case works generally but it doesn't seem to like the bit operation, which does not seem to work

[SparkSQL] Function parity with Shark?

2014-10-02 Thread Yana Kadiyska
Hi, in an effort to migrate off of Shark I recently tried the Thrift JDBC server that comes with Spark 1.1.0. However I observed that conditional functions do not work (I tried 'case' and 'coalesce') some string functions like 'concat' also did not work. Is there a list of what's missing or a

Re: SparkSQL DataType mappings

2014-10-02 Thread Costin Leau
Hi Yin, Thanks for the reply. I've found the section as well, a couple of days ago and managed to integrate es-hadoop with Spark SQL [1] Cheers, [1] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/master/spark.html On 10/2/14 6:32 PM, Yin Huai wrote: Hi Costin, I am answering

SparkSQL DataType mappings

2014-09-30 Thread Costin Leau
Hi, I'm working on supporting SchemaRDD in Elasticsearch Hadoop [1] but I'm having some issues with the SQL API, in particular in what the DataTypes translate to. 1. A SchemaRDD is composed of a Row and StructType - I'm using the latter to decompose a Row into primitives. I'm not clear

Re: Unresolved attributes: SparkSQL on the schemaRDD

2014-09-30 Thread Yin Huai
-list.1001560.n3.nabble.com/Unresolved-attributes-SparkSQL-on-the-schemaRDD-tp15339p15376.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

How to read just specified columns from parquet file using SparkSQL.

2014-09-30 Thread mykidong
Hi, I am new to SparkSQL. I want to read the specified columns from the parquet, not all the columns defined in the parquet file. For instance, the schema of the parquet file would look like this: { type: record, name: ElectricPowerUsage, namespace: jcascalog.parquet.example, fields

Unresolved attributes: SparkSQL on the schemaRDD

2014-09-29 Thread vdiwakar.malladi
Hello, I'm exploring SparkSQL and I'm facing issue while using the queries. Any help on this is appreciated. I have the following schema once loaded as RDD. root |-- data: array (nullable = true) ||-- element: struct (containsNull = false) |||-- age: integer (nullable = true

Re: Unresolved attributes: SparkSQL on the schemaRDD

2014-09-29 Thread Cheng Lian
In your case, the table has only one row, whose contents is “data”, which is an array. You need something like |SELECT data[0].name FROM json_table| to access the |name| field. On 9/29/14 11:08 PM, vdiwakar.malladi wrote: Hello, I'm exploring SparkSQL and I'm facing issue while using

Re: Unresolved attributes: SparkSQL on the schemaRDD

2014-09-29 Thread vdiwakar.malladi
].name FROM people where data[0].age =13* Am I missing something? I'm trying to understand how the RDD is stored. Thanks in advance. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unresolved-attributes-SparkSQL-on-the-schemaRDD-tp15339p15350.html Sent from

Re: Unresolved attributes: SparkSQL on the schemaRDD

2014-09-29 Thread Yin Huai
. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unresolved-attributes-SparkSQL-on-the-schemaRDD-tp15339p15350.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Unresolved attributes: SparkSQL on the schemaRDD

2014-09-29 Thread Akhil Das
-spark-user-list.1001560.n3.nabble.com/Unresolved-attributes-SparkSQL-on-the-schemaRDD-tp15339p15350.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr

Re: Unresolved attributes: SparkSQL on the schemaRDD

2014-09-29 Thread Akhil Das
FROM people where data[0].age =13* Am I missing something? I'm trying to understand how the RDD is stored. Thanks in advance. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unresolved-attributes-SparkSQL-on-the-schemaRDD-tp15339p15350.html Sent

Re: Unresolved attributes: SparkSQL on the schemaRDD

2014-09-29 Thread vdiwakar.malladi
I'm using the latest version i.e. Spark 1.1.0 Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unresolved-attributes-SparkSQL-on-the-schemaRDD-tp15339p15376.html Sent from the Apache Spark User List mailing list archive at Nabble.com

How to do broadcast join in SparkSQL

2014-09-28 Thread Jianshi Huang
I cannot find it in the documentation. And I have a dozen dimension tables to (left) join... Cheers, -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github Blog: http://huangjs.github.com/

Re: How to do broadcast join in SparkSQL

2014-09-28 Thread Ted Yu
Have you looked at SPARK-1800 ? e.g. see sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala Cheers On Sun, Sep 28, 2014 at 1:55 AM, Jianshi Huang jianshi.hu...@gmail.com wrote: I cannot find it in the documentation. And I have a dozen dimension tables to (left) join... Cheers,

Re: How to do broadcast join in SparkSQL

2014-09-28 Thread Jianshi Huang
Yes, looks like it can only be controlled by the parameter spark.sql.autoBroadcastJoinThreshold, which is a little bit weird to me. How am I suppose to know the exact bytes of a table? Let me specify the join algorithm is preferred I think. Jianshi On Sun, Sep 28, 2014 at 11:57 PM, Ted Yu

Re: SparkSQL: map type MatchError when inserting into Hive table

2014-09-28 Thread Du Li
It turned out a bug in my code. In the select clause the list of fields is misaligned with the schema of the target table. As a consequence the map data couldn’t be cast to some other type in the schema. Thanks anyway. On 9/26/14, 8:08 PM, Cheng Lian lian.cs@gmail.com wrote: Would you mind

Re: SparkSQL Thriftserver in Mesos

2014-09-26 Thread Cheng Lian
:16 AM, John Omernik wrote: I am running the Thrift server in SparkSQL, and running it on the node I compiled spark on. When I run it, tasks only work if they landed on that node, other executors started on nodes I didn't compile spark on (and thus don't have the compile directory) fail. Should

SparkSQL: map type MatchError when inserting into Hive table

2014-09-26 Thread Du Li
Hi, I was loading data into a partitioned table on Spark 1.1.0 beeline-thriftserver. The table has complex data types such as mapstring, string and arraymapstring,string. The query is like ³insert overwrite table a partition (Š) select Š² and the select clause worked if run separately. However,

Re: SparkSQL: map type MatchError when inserting into Hive table

2014-09-26 Thread Du Li
It might be a problem when inserting into a partitioned table. It worked fine to when the target table was unpartitioned. Can you confirm this? Thanks, Du On 9/26/14, 4:48 PM, Du Li l...@yahoo-inc.com.INVALID wrote: Hi, I was loading data into a partitioned table on Spark 1.1.0

Re: SparkSQL: map type MatchError when inserting into Hive table

2014-09-26 Thread Cheng Lian
Would you mind to provide the DDL of this partitioned table together with the query you tried? The stacktrace suggests that the query was trying to cast a map into something else, which is not supported in Spark SQL. And I doubt whether Hive support casting a complex type to some other type.

<    4   5   6   7   8   9   10   11   >