From: Soumya Simanta soumya.sima...@gmail.commailto:soumya.sima...@gmail.com
Date: Friday, October 31, 2014 at 4:04 PM
To: user@spark.apache.orgmailto:user@spark.apache.org
user@spark.apache.orgmailto:user@spark.apache.org
Subject: SparkSQL performance
I was really surprised to see the results
I agree. My personal experience with Spark core is that it performs really
well once you tune it properly.
As far I understand SparkSQL under the hood performs many of these
optimizations (order of Spark operations) and uses a more efficient storage
format. Is this assumption correct?
Has anyone
(deviceRDD).count(). The count comes out to be 1, but
there are many UIDs in tusers that are not in device - so the result is
not correct.
I would like to know the right way to do frame this query in SparkSQL.
thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3
Hi,
While testing SparkSQL on top of our Hive metastore, I am getting
some java.lang.ArrayIndexOutOfBoundsException while reusing a cached RDD
table.
Basically, I have a table mtable partitioned by some date field in hive
and below is the scala code I am running in spark-shell:
val sqlContext
Hmmm, this looks like a bug. Can you file a JIRA?
On Thu, Oct 30, 2014 at 4:04 PM, Jean-Pascal Billaud j...@tellapart.com
wrote:
Hi,
While testing SparkSQL on top of our Hive metastore, I am getting
some java.lang.ArrayIndexOutOfBoundsException while reusing a cached RDD
table.
Basically
in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Nested-Query-error-tp17691.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr
tusers
WHERE tusers.u_uid NOT IN (SELECT d_uid FROM device))
But that resulted in a compilation error.
What is the right way to frame the above query in Spark SQL?
thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Nested-Query-error
Hi,friends:
I use spark(spark 1.1) sql operate data in hive-0.12, and the job fails when
data is large. So how to tune it ?
spark-defaults.conf:
spark.shuffle.consolidateFiles true
spark.shuffle.manager SORT
spark.akka.threads 4
spark.sql.inMemoryColumnarStorage.compressed
Try to increase the driver memory.
2014-10-28 17:33 GMT+08:00 Zhanfeng Huo huozhanf...@gmail.com:
Hi,friends:
I use spark(spark 1.1) sql operate data in hive-0.12, and the job fails
when data is large. So how to tune it ?
spark-defaults.conf:
spark.shuffle.consolidateFiles true
mean by
extract could you direct me to api or code sample.
thanks and regards,
critikaled.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-Multiple-Tables-SparkSQL-tp16807p17536.html
Sent from the Apache Spark User List mailing list archive
Would you mind to share DDLs of all involved tables? What format are
these tables stored in? Is this issue specific to this query? I guess
Hive, Shark and Spark SQL all read from the same HDFS dataset?
On 10/27/14 3:45 PM, lyf刘钰帆 wrote:
Hi,
I am using SparkSQL 1.1.0 with cdh 4.6.0 recently
LOCAL INPATH '/home/data/testFolder/qrytblB.txt' INTO TABLE
tblB;
*发件人:*Cheng Lian [mailto:lian.cs@gmail.com]
*发 送时间:*2014年10月27日16:48
*收件人:*lyf刘钰帆; user@spark.apache.org
*主题:*Re: SparkSQL display wrong result
Would you mind to share DDLs of all involved tables? What format
-user-list.1001560.n3.nabble.com/Is-SparkSQL-JDBC-server-a-good-approach-for-caching-tp17196.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
.nabble.com/Is-SparkSQL-JDBC-server-a-good-approach-for-caching-tp17196.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e
offer any advantages (e.g does it have built
in
support for caching?) over rolling my own solution for this use case?
Thanks!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Is-SparkSQL-JDBC-server-a-good-approach-for-caching-tp17196.html
Sent from
!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Is-SparkSQL-JDBC-server-a-good-approach-for-caching-tp17196.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
.1001560.n3.nabble.com/Is-SparkSQL-JDBC-server-a-good-approach-for-caching-tp17196.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
rolling my own solution for this use case?
Thanks!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Is-SparkSQL-JDBC-server-a-good-approach-for-caching-tp17196.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
like it only supports loading
data from files, but I want to query tables stored in memory only via JDBC.
Is that possible?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Is-SparkSQL-JDBC-server-a-good-approach-for-caching-tp17196p17235.html
Sent
Hi guys,
another question: what’s the approach to working with column-oriented data,
i.e. data with more than 1000 columns. Using Parquet for this should be fine,
but how well does SparkSQL handle the big amount of columns? Is there a limit?
Should we use standard Spark instead?
Thanks
Hi
I have a json file that can be load by sqlcontext.jsonfile into a
table. but this table is not partitioned.
Then I wish to transform this table into a partitioned table say on
field “date” etc. what will be the best approaching to do this? seems in hive
this is usually
-SparkSQL-tp16807.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
Hi!
The RANK function is available in hive since version 0.11.
When trying to use it in SparkSQL, I'm getting the following exception (full
stacktrace below):
java.lang.ClassCastException:
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFRank$RankBuffer cannot be
cast
No, analytic and window functions do not work yet.
On Tue, Oct 21, 2014 at 3:00 AM, Pierre B
pierre.borckm...@realimpactanalytics.com wrote:
Hi!
The RANK function is available in hive since version 0.11.
When trying to use it in SparkSQL, I'm getting the following exception
(full
at 12:22 PM
To: Michael Armbrust mich...@databricks.commailto:mich...@databricks.com
Cc: user@spark.apache.orgmailto:user@spark.apache.org
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: SparkSQL - TreeNodeException for unresolved attributes
Hi Michael,
Thanks again for the reply
.nabble.com/SQL-Is-RANK-function-supposed-to-work-in-SparkSQL-1-1-0-tp16909p16942.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional
://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-Multiple-Tables-SparkSQL-tp16807.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
Hi Yin,
Sorry for the delay, but I’ll try the code change when I get a chance, but
Michael’s initial response did solve my problem. In the meantime, I’m hitting
another issue with SparkSQL which I will probably post another message if I
can’t figure a workaround.
Thanks,
-Terry
From: Yin
with GROUP BY to write back out
to a Hive rollup table that has two partitions. This task is an effort to
simulate the unsupported GROUPING SETS functionality in SparkSQL.
In my first attempt, I got really close using SchemaRDD.groupBy until I
realized that SchemaRDD.insertTo API does not support
partitions. This task is
an effort to simulate the unsupported GROUPING SETS functionality in
SparkSQL.
In my first attempt, I got really close using SchemaRDD.groupBy until I
realized that SchemaRDD.insertTo API does not support partitioned tables
yet. This prompted my second attempt to pass
@smartfocus.commailto:terry@smartfocus.com
Cc: user@spark.apache.orgmailto:user@spark.apache.org
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: SparkSQL - TreeNodeException for unresolved attributes
Have you tried this on master? There were several problems with resolution of
complex queries
The warehouse location need to be specified before the |HiveContext|
initialization, you can set it via:
|./bin/spark-sql --hiveconf
hive.metastore.warehouse.dir=/home/spark/hive/warehouse
|
On 10/15/14 8:55 PM, Hao Ren wrote:
Hi,
The following query in sparkSQL 1.1.0 CLI doesn't work
this.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Convert-JavaSchemaRDD-to-SchemaRDD-tp16482.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
To: Terry Siu terry@smartfocus.com
Cc: Michael Armbrust mich...@databricks.com, user@spark.apache.org
user@spark.apache.org
Subject: Re: SparkSQL IndexOutOfBoundsException when reading from Parquet
Hello Terry,
How many columns does pqt_rdt_snappy have?
Thanks,
Yin
On Tue, Oct
I'm trying to give API interface to Java users. And I need to accept their
JavaSchemaRDDs, and convert it to SchemaRDD for Scala users.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Convert-JavaSchemaRDD-to-SchemaRDD-tp16482p16641.html
Sent from
that: Is it a good idea for me to *use catalyst as
DSL's execution engine?*
I am trying to build a DSL, And I want to confirm this.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Convert-JavaSchemaRDD-to-SchemaRDD-tp16482.html
Sent from the Apache
Hi,
The following query in sparkSQL 1.1.0 CLI doesn't work.
*SET hive.metastore.warehouse.dir=/home/spark/hive/warehouse
;
create table test as
select v1.*, v2.card_type, v2.card_upgrade_time_black,
v2.card_upgrade_time_gold
from customer v1 left join customer_loyalty v2
on v1.account_id = v2
...@databricks.com,
user@spark.apache.orgmailto:user@spark.apache.org
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: SparkSQL IndexOutOfBoundsException when reading from Parquet
Hello Terry,
How many columns does pqt_rdt_snappy have?
Thanks,
Yin
On Tue, Oct 14, 2014 at 11:52 AM
.1001560.n3.nabble.com/SparkSQL-StringType-for-numeric-comparison-tp16295p16361.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional
retype all the 19 columns' name when querying with
select. This feature exists in hive.
But in SparkSql, it gives an exception.
Any ideas ? Thx
Hao
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-select-syntax-tp16299p16364.html
Sent from
you'd write this in HiveQL, and then try doing that with
HiveContext./
In fact, there are more problems than that. The sparkSQL will conserve
(15+5=20) columns in the final table, if I remember well. Therefore, when
you are doing join on two tables which have the same columns will cause
doublecolumn
Thank you, Gen.
I will give hiveContext a try. =)
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-select-syntax-tp16299p16368.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
@spark.apache.orgmailto:user@spark.apache.org
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: SparkSQL IndexOutOfBoundsException when reading from Parquet
There are some known bug with the parquet serde and spark 1.1.
You can try setting spark.sql.hive.convertMetastoreParquet=true
to form the table schema. As for me, StringType is
enough, why do we need others ?
Hao
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-StringType-for-numeric-comparison-tp16295p16361.html
Sent from the Apache Spark User List mailing list archive
Looks like it may be related to
https://issues.apache.org/jira/browse/SPARK-3807.
I will build from branch 1.1 to see if the issue is resolved.
Chen
On Tue, Oct 14, 2014 at 10:33 AM, Chen Song chen.song...@gmail.com wrote:
Sorry for bringing this out again, as I have no clue what could have
for sparkSQL (for version 1.1.0) and I am
trying to deploy my new jar files (one for catalyst and one for sql/core) on
ec2.
My approach was to create a new
spark/lib/spark-assembly-1.1.0-hadoop1.0.4.jar that merged the contents of
the old one with the contents of my new jar files and I propagated
...@databricks.com
Date: Monday, October 13, 2014 at 5:05 PM
To: Terry Siu terry@smartfocus.com
Cc: user@spark.apache.org user@spark.apache.org
Subject: Re: SparkSQL IndexOutOfBoundsException when reading from Parquet
There are some known bug with the parquet serde and spark 1.1.
You can try
Thanks Michael, your patch works for me :)
Regards,
Kelvin Paul
On Fri, Oct 3, 2014 at 3:52 PM, Michael Armbrust mich...@databricks.com
wrote:
Are you running master? There was briefly a regression here that is
hopefully fixed by spark#2635 https://github.com/apache/spark/pull/2635.
On Fri,
Hi all, I tried to set the configuration
spark.sql.inMemoryColumnarStorage.compressed,
and spark.sql.inMemoryColumnarStorage.batchSize in
spark.executor.extraJavaOptions
but it does not work, my spark.executor.extraJavaOptions contains
Dspark.sql.inMemoryColumnarStorage.compressed=true
Currently Spark SQL doesn’t support reading SQL specific configurations
via system properties. But for |HiveContext|, you can put them in
|hive-site.xml|.
On 10/13/14 4:28 PM, Kevin Paul wrote:
Hi all, I tried to set the configuration
spark.sql.inMemoryColumnarStorage.compressed, and
is #65279(or U+FEFF).
As a result, the first field has a leading #65279 char. When querying, I
just used account_id, so SparkSQL cannot find the given field in AST, while
the one in AST is #65279account_id.
So the solution this to convert input file to UTF-8 Unicode (*without* BOM),
that will remove
Hi Shahab,
Can you try to use HiveContext? Its should work in 1.1. For SQLContext,
this issues was not fixed in 1.1 and you need to use master branch at the
moment.
Thanks,
Yin
On Sun, Oct 12, 2014 at 5:20 PM, shahab shahab.mok...@gmail.com wrote:
Hi,
Apparently is it is possible to query
Thanks Yin. I trued HiveQL and and it solved that problem. But now I have
second query requirement :
But since you are main developer behind JSON-Spark integration (I saw your
presentation on youtube Easy JSON Data Manipulation in Spark), is it
possible to perform aggregation kind queries,
for
Hi Shahab,
Do you mean queries with group by and aggregation functions? Once you
register the json dataset as a table, you can write queries like querying a
regular table. You can join it with other tables and do aggregations. Is it
what you were asking for? If not, can you give me a more
Hi,
I am using SparkSQL 1.1.0.
Actually, I have a table as following:
root
|-- account_id: string (nullable = false)
|-- Birthday: string (nullable = true)
|-- preferstore: string (nullable = true)
|-- registstore: string (nullable = true)
|-- gender: string (nullable = true
Hi all,
A quick question on SparkSql *SELECT* syntax.
Does it support queries like:
*SELECT t1.*, t2.d, t2.e FROM t1 LEFT JOIN t2 on t1.a = t2.a*
It always ends with the exception:
*Exception in thread main java.lang.RuntimeException: [2.12] failure:
string literal expected
SELECT t1.*, t2.d
...@gmail.com wrote:
Hi,
I am using SparkSQL 1.1.0.
Actually, I have a table as following:
root
|-- account_id: string (nullable = false)
|-- Birthday: string (nullable = true)
|-- preferstore: string (nullable = true)
|-- registstore: string (nullable = true)
|-- gender: string
defined. Does this error look familiar to anyone? Could my usage of
SparkSQL with Hive be incorrect or is support with Hive/Parquet/partitioning
still buggy at this point in Spark 1.1.0?
Thanks,
-Terry
In Hive, the table was created with custom SerDe, in the following way.
row format serde abc.ProtobufSerDe
with serdeproperties (serialization.class=
abc.protobuf.generated.LogA$log_a)
When I start spark-sql shell, I always got the following exception, even
for a simple query.
select user from
, pqt_segcust_snappy, has 21 columns
and two partitions defined. Does this error look familiar to anyone? Could
my usage of SparkSQL with Hive be incorrect or is support with
Hive/Parquet/partitioning still buggy at this point in Spark 1.1.0?
Thanks,
-Terry
Hi,
Apparently is it is possible to query nested json using spark SQL, but ,
mainly due to lack of proper documentation/examples, I did not manage to
make it working. I do appreciate if you could point me to any example or
help with this issue,
Here is my code:
val anotherPeopleRDD =
It works fine, thanks for the help Michael.
Liancheng also told me a trick, using a subquery with LIMIT n. It works in
latest 1.2.0
BTW, looks like the broadcast optimization won't be recognized if I do a
left join instead of a inner join. Is that true? How can I make it work for
left joins?
Very cool Denny, thanks for sharing this!
Matei
On Oct 11, 2014, at 9:46 AM, Denny Lee denny.g@gmail.com wrote:
https://www.concur.com/blog/en-us/connect-tableau-to-sparksql
If you're wondering how to connect Tableau to SparkSQL - here are the steps
to connect Tableau to SparkSQL
Hi
I have written a few extensions for sparkSQL (for version 1.1.0) and I am
trying to deploy my new jar files (one for catalyst and one for sql/core) on
ec2.
My approach was to create a new spark/lib/spark-assembly-1.1.0-hadoop1.0.4.jar
that merged the contents of the old one
Hi,
I am exploring SparkSQL 1.1.0, I have a problem on LEFT JOIN.
Here is the request:
select * from customer left join profile on customer.account_id =
profile.account_id
The two tables' schema are shown as following:
// Table: customer
root
|-- account_id: string (nullable = false
Hi
Can you try
select birthday from customer left join profile on customer.account_id =
profile.account_id
to see if the problems remains on your entire data?
Thanks,
Liquan
On Fri, Oct 10, 2014 at 8:20 AM, invkrh inv...@gmail.com wrote:
Hi,
I am exploring SparkSQL 1.1.0, I have a problem
Looks like https://issues.apache.org/jira/browse/SPARK-1800 is not merged
into master?
I cannot find spark.sql.hints.broadcastTables in latest master, but it's in
the following patch.
https://github.com/apache/spark/commit/76ca4341036b95f71763f631049fdae033990ab5
Jianshi
On Mon, Sep 29,
Ok, currently there's cost-based optimization however Parquet statistics is
not implemented...
What's the good way if I want to join a big fact table with several tiny
dimension tables in Spark SQL (1.1)?
I wish we can allow user hint for the join.
Jianshi
On Wed, Oct 8, 2014 at 2:18 PM,
Thanks for the input. We purposefully made sure that the config option did
not make it into a release as it is not something that we are willing to
support long term. That said we'll try and make this easier in the future
either through hints or better support for statistics.
In this particular
/00ab46fa4d6711e4afb70003ff41ebbf/part-3
not sure if some of the ports are not open or it needs access to additional
things.
thanks,
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/sparksql-connect-remote-hive-cluster-tp15928.html
Sent from
Hi
Shark supported both the HiveServer1 and HiveServer2 thrift interfaces
(using $ bin/shark -service sharkserver[1 or 2]).
SparkSQL seems to support only HiveServer2. I was wondering what is involved
to add support for HiveServer1. Is this something straightforward to do that
I can embark
I have created
https://issues.apache.org/jira/browse/SPARK-3814
https://issues.apache.org/jira/browse/SPARK-3815
Will probably try my hand at 3814, seems like a good place to get started...
On Fri, Oct 3, 2014 at 3:06 PM, Michael Armbrust mich...@databricks.com
wrote:
Thanks for digging in!
established the ground work and direction for Spark Cassandra
connectors and we have been happy seeing the results.
With Spark 1.1.0 and SparkSQL release, we its time to take Calliope to the
logical next level also paving the way for much more advanced functionality to
come.
Yesterday we released
started this journey and laid the path for Spark +
Cassandra stack. We established the ground work and direction for Spark
Cassandra connectors and we have been happy seeing the results.
With Spark 1.1.0 and SparkSQL release, we its time to take Calliope
http://tuplejump.github.io/calliope
Hi all, I tried to launch my application with spark-submit, the command I
use is:
bin/spark-submit --class ${MY_CLASS} --jars ${MY_JARS} --master local
myApplicationJar.jar
I've buillt spark with SPARK_HIVE=true, and was able to start HiveContext,
and was able to run command like,
Are you running master? There was briefly a regression here that is
hopefully fixed by spark#2635 https://github.com/apache/spark/pull/2635.
On Fri, Oct 3, 2014 at 1:43 AM, Kevin Paul kevinpaulap...@gmail.com wrote:
Hi all, I tried to launch my application with spark-submit, the command I
use
Also make sure to call |hiveContext.sql| within the same thread where
|hiveContext| is created, because Hive uses thread-local variable to
initialize the |Driver.conf|.
On 10/3/14 4:52 PM, Michael Armbrust wrote:
Are you running master? There was briefly a regression here that is
hopefully
Thanks -- it does appear that I misdiagnosed a bit: case works generally
but it doesn't seem to like the bit operation, which does not seem to work
(type of bit_field in Hive is bigint):
Error: java.lang.RuntimeException:
Unsupported language features in query: select (case when bit_field
1=1
Hi All,
An year ago we started this journey and laid the path for Spark + Cassandra
stack. We established the ground work and direction for Spark Cassandra
connectors and we have been happy seeing the results.
With Spark 1.1.0 and SparkSQL release, we its time to take Calliope
http
Thanks for digging in! These both look like they should have JIRAs.
On Fri, Oct 3, 2014 at 8:14 AM, Yana Kadiyska yana.kadiy...@gmail.com
wrote:
Thanks -- it does appear that I misdiagnosed a bit: case works generally
but it doesn't seem to like the bit operation, which does not seem to work
Hi, in an effort to migrate off of Shark I recently tried the Thrift JDBC
server that comes with Spark 1.1.0.
However I observed that conditional functions do not work (I tried 'case'
and 'coalesce')
some string functions like 'concat' also did not work.
Is there a list of what's missing or a
Hi Yin,
Thanks for the reply. I've found the section as well, a couple of days ago and managed to integrate es-hadoop with Spark
SQL [1]
Cheers,
[1] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/master/spark.html
On 10/2/14 6:32 PM, Yin Huai wrote:
Hi Costin,
I am answering
Hi,
I'm working on supporting SchemaRDD in Elasticsearch Hadoop [1] but I'm having some issues with the SQL API, in
particular in what the DataTypes translate to.
1. A SchemaRDD is composed of a Row and StructType - I'm using the latter to decompose a Row into primitives. I'm not
clear
-list.1001560.n3.nabble.com/Unresolved-attributes-SparkSQL-on-the-schemaRDD-tp15339p15376.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
Hi,
I am new to SparkSQL.
I want to read the specified columns from the parquet, not all the columns
defined in the parquet file.
For instance, the schema of the parquet file would look like this:
{
type: record,
name: ElectricPowerUsage,
namespace: jcascalog.parquet.example,
fields
Hello,
I'm exploring SparkSQL and I'm facing issue while using the queries. Any
help on this is appreciated.
I have the following schema once loaded as RDD.
root
|-- data: array (nullable = true)
||-- element: struct (containsNull = false)
|||-- age: integer (nullable = true
In your case, the table has only one row, whose contents is “data”,
which is an array. You need something like |SELECT data[0].name FROM
json_table| to access the |name| field.
On 9/29/14 11:08 PM, vdiwakar.malladi wrote:
Hello,
I'm exploring SparkSQL and I'm facing issue while using
].name FROM people where data[0].age =13*
Am I missing something? I'm trying to understand how the RDD is stored.
Thanks in advance.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Unresolved-attributes-SparkSQL-on-the-schemaRDD-tp15339p15350.html
Sent from
.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Unresolved-attributes-SparkSQL-on-the-schemaRDD-tp15339p15350.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
-spark-user-list.1001560.n3.nabble.com/Unresolved-attributes-SparkSQL-on-the-schemaRDD-tp15339p15350.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr
FROM people where data[0].age =13*
Am I missing something? I'm trying to understand how the RDD is stored.
Thanks in advance.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Unresolved-attributes-SparkSQL-on-the-schemaRDD-tp15339p15350.html
Sent
I'm using the latest version i.e. Spark 1.1.0
Thanks.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Unresolved-attributes-SparkSQL-on-the-schemaRDD-tp15339p15376.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
I cannot find it in the documentation. And I have a dozen dimension tables
to (left) join...
Cheers,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github Blog: http://huangjs.github.com/
Have you looked at SPARK-1800 ?
e.g. see sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala
Cheers
On Sun, Sep 28, 2014 at 1:55 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
I cannot find it in the documentation. And I have a dozen dimension tables
to (left) join...
Cheers,
Yes, looks like it can only be controlled by the
parameter spark.sql.autoBroadcastJoinThreshold, which is a little bit weird
to me.
How am I suppose to know the exact bytes of a table? Let me specify the
join algorithm is preferred I think.
Jianshi
On Sun, Sep 28, 2014 at 11:57 PM, Ted Yu
It turned out a bug in my code. In the select clause the list of fields is
misaligned with the schema of the target table. As a consequence the map
data couldn’t be cast to some other type in the schema.
Thanks anyway.
On 9/26/14, 8:08 PM, Cheng Lian lian.cs@gmail.com wrote:
Would you mind
:16 AM, John Omernik wrote:
I am running the Thrift server in SparkSQL, and running it on the node
I compiled spark on. When I run it, tasks only work if they landed on
that node, other executors started on nodes I didn't compile spark on
(and thus don't have the compile directory) fail. Should
Hi,
I was loading data into a partitioned table on Spark 1.1.0
beeline-thriftserver. The table has complex data types such as mapstring,
string and arraymapstring,string. The query is like ³insert overwrite
table a partition (Š) select Š² and the select clause worked if run
separately. However,
It might be a problem when inserting into a partitioned table. It worked
fine to when the target table was unpartitioned.
Can you confirm this?
Thanks,
Du
On 9/26/14, 4:48 PM, Du Li l...@yahoo-inc.com.INVALID wrote:
Hi,
I was loading data into a partitioned table on Spark 1.1.0
Would you mind to provide the DDL of this partitioned table together
with the query you tried? The stacktrace suggests that the query was
trying to cast a map into something else, which is not supported in
Spark SQL. And I doubt whether Hive support casting a complex type to
some other type.
801 - 900 of 1023 matches
Mail list logo