Hello Arthur,
You can use do aggregations in SQL. How did you create LINEITEM?
Thanks,
Yin
On Thu, Oct 23, 2014 at 8:54 AM, arthur.hk.c...@gmail.com
arthur.hk.c...@gmail.com wrote:
Hi,
I got $TreeNodeException, few questions:
Q1) How should I do aggregation in SparK? Can I use
The implicit conversion function mentioned by Hao is createSchemaRDD in
SQLContext/HiveContext.
You can import it by doing
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
// Or new org.apache.spark.sql.hive.HiveContext(sc) for HiveContext
import sqlContext.createSchemaRDD
On Wed, Oct
Is there any specific issues you are facing?
Thanks,
Yin
On Tue, Oct 21, 2014 at 4:00 PM, tridib tridib.sama...@live.com wrote:
Any help? or comments?
--
View this message in context:
that may help.
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
val schemaRDD = hiveContext.jsonFile(...)
schemaRDD.registerTempTable(jsonTable)
hiveContext.sql(SELECT CAST(columnName as DATE) FROM jsonTable)
Thanks,
Yin
On Tue, Oct 21, 2014 at 8:00 PM, Yin Huai huaiyin
Hi Tridib,
For the second approach, can you attach the complete stack trace?
Thanks,
Yin
On Mon, Oct 20, 2014 at 8:24 PM, Michael Armbrust mich...@databricks.com
wrote:
I think you are running into a bug that will be fixed by this PR:
https://github.com/apache/spark/pull/2850
On Mon, Oct
/jira/browse/SPARK-4003
You can check PR https://github.com/apache/spark/pull/2850 .
Thanks,
Daoyuan
*From:* Yin Huai [mailto:huaiyin@gmail.com]
*Sent:* Tuesday, October 21, 2014 10:00 AM
*To:* Michael Armbrust
*Cc:* tridib; u...@spark.incubator.apache.org
*Subject:* Re: spark
by the 2 partition
columns, coll_def_id and seg_def_id. Output shows 29 rows, but that looks
like it’s just counting the rows in the console output. Let me know if you
need more information.
Thanks
-Terry
From: Yin Huai huaiyin@gmail.com
Date: Tuesday, October 14, 2014 at 6:29 PM
Hello Terry,
How many columns does pqt_rdt_snappy have?
Thanks,
Yin
On Tue, Oct 14, 2014 at 11:52 AM, Terry Siu terry@smartfocus.com
wrote:
Hi Michael,
That worked for me. At least I’m now further than I was. Thanks for the
tip!
-Terry
From: Michael Armbrust
Question 1: Please check
http://spark.apache.org/docs/1.1.0/sql-programming-guide.html#hive-tables.
Question 2:
One workaround is to re-write it. You can use LEFT SEMI JOIN to implement
the subquery with EXISTS and use LEFT OUTER JOIN + IS NULL to implement the
subquery with NOT EXISTS.
SELECT
If you are using HiveContext, it should work in 1.1.
Thanks,
Yin
On Mon, Oct 13, 2014 at 5:08 AM, shahab shahab.mok...@gmail.com wrote:
Hello,
Given the following structure, is it possible to query, e.g. session[0].id
?
In general, is it possible to query Array Of Struct in json RDDs?
Hi Shahab,
Can you try to use HiveContext? Its should work in 1.1. For SQLContext,
this issues was not fixed in 1.1 and you need to use master branch at the
moment.
Thanks,
Yin
On Sun, Oct 12, 2014 at 5:20 PM, shahab shahab.mok...@gmail.com wrote:
Hi,
Apparently is it is possible to query
Seems the reason that you got wrong results was caused by timezone.
The time in java.sql.Timestamp(long time) means milliseconds since January
1, 1970, 00:00:00 *GMT*. A negative number is the number of milliseconds
before January 1, 1970, 00:00:00 *GMT*.
However, in ts='1970-01-01 00:00:00',
on youtube Easy JSON Data Manipulation in Spark), is it
possible to perform aggregation kind queries,
for example counting number of attributes (considering that attributes in
schema is presented as array), or any other type of aggregation?
best,
/Shahab
On Mon, Oct 13, 2014 at 4:01 PM, Yin Huai
,ts#3], MapPartitionsRDD[22] at mapPartitions at
basicOperators.scala:208
scala s.collect
res5: Array[org.apache.spark.sql.Row] = Array()
Mohammed
*From:* Yin Huai [mailto:huaiyin@gmail.com]
*Sent:* Monday, October 13, 2014 7:19 AM
*To:* Mohammed Guller
*Cc:* Cheng, Hao
Hi Tamas,
Can you try to set mapred.map.tasks and see if it works?
Thanks,
Yin
On Thu, Oct 2, 2014 at 10:33 AM, Tamas Jambor jambo...@gmail.com wrote:
That would work - I normally use hive queries through spark sql, I
have not seen something like that there.
On Thu, Oct 2, 2014 at 3:13
I think this problem has been fixed after the 1.1 release. Can you try the
master branch?
On Mon, Sep 29, 2014 at 10:06 PM, vdiwakar.malladi
vdiwakar.mall...@gmail.com wrote:
I'm using the latest version i.e. Spark 1.1.0
Thanks.
--
View this message in context:
What version of Spark did you use? Can you try the master branch?
On Mon, Sep 29, 2014 at 1:52 PM, vdiwakar.malladi
vdiwakar.mall...@gmail.com wrote:
Thanks for your prompt response.
Still on further note, I'm getting the exception while executing the query.
SELECT data[0].name FROM people
Hi Gaurav,
Can you put hive-site.xml in conf/ and try again?
Thanks,
Yin
On Mon, Sep 22, 2014 at 4:02 PM, gtinside gtins...@gmail.com wrote:
Hi ,
I have been using spark shell to execute all SQLs. I am connecting to
Cassandra , converting the data in JSON and then running queries on it,
Hello Andy,
Will our JSON support in Spark SQL help your case? If your JSON files store
one JSON object per line, you can use SQLContext.jsonFile to load it. If
you want to do pre-process these files, once you have an RDD[String] (one
JSON object per String), you can use SQLContext.jsonRDD. In
Seems https://issues.apache.org/jira/browse/HIVE-5474 is related?
On Tue, Sep 16, 2014 at 4:49 AM, Cheng, Hao hao.ch...@intel.com wrote:
Thank you for pasting the steps, I will look at this, hopefully come out
with a solution soon.
-Original Message-
From: linkpatrickliu
I meant it may be a Hive bug since we also call Hive's drop table
internally.
On Tue, Sep 16, 2014 at 1:44 PM, Yin Huai huaiyin@gmail.com wrote:
Seems https://issues.apache.org/jira/browse/HIVE-5474 is related?
On Tue, Sep 16, 2014 at 4:49 AM, Cheng, Hao hao.ch...@intel.com wrote:
Thank
1.0.1 does not have the support on outer joins (added in 1.1). Your query
should be fine in 1.1.
On Mon, Sep 15, 2014 at 5:35 AM, Yanbo Liang yanboha...@gmail.com wrote:
Spark SQL can support SQL and HiveSQL which used SQLContext and
HiveContext separate.
As far as I know, SQLContext of Spark
Can you try sbt/sbt clean first?
On Sat, Sep 13, 2014 at 4:29 PM, Ted Yu yuzhih...@gmail.com wrote:
bq. [error] File name too long
It is not clear which file(s) loadfiles was loading.
Is the filename in earlier part of the output ?
Cheers
On Sat, Sep 13, 2014 at 10:58 AM, kkptninja
What is the schema of table?
On Thu, Sep 11, 2014 at 4:30 PM, jamborta jambo...@gmail.com wrote:
thanks. this was actually using hivecontext.
--
View this message in context:
1.0.1 does not have the support on outer joins (added in 1.1). Can you try
1.1 branch?
On Wed, Sep 10, 2014 at 9:28 PM, boyingk...@163.com boyingk...@163.com
wrote:
Hi,michael :
I think Arthur.hk.chan arthur.hk.c...@gmail.com isn't here now,I Can
Show something:
1)my spark version is 1.0.1
Hello Igor,
Although Decimal is supported, Hive 0.12 does not support user definable
precision and scale (it was introduced in Hive 0.13).
Thanks,
Yin
On Sat, Aug 30, 2014 at 1:50 AM, Zitser, Igor igor.zit...@citi.com wrote:
Hi All,
New to spark and using Spark 1.0.2 and hive 0.12.
If
,
In all three option when I try to create temporary funtion i get the
classNotFoundException. What would be the issue here?
Thanks and Regards,
Sankar S.
On Saturday, 23 August 2014, 0:53, Yin Huai huaiyin@gmail.com
wrote:
Hello Sankar,
Add JAR in SQL is not supported at the moment
Hello Du,
Can you check if there is a dir metastore in the place you launching your
program. If so, can you delete it and try again?
Also, can you try HiveContext? LocalHiveContext is deprecated.
Thanks,
Yin
On Mon, Aug 25, 2014 at 6:33 PM, Du Li l...@yahoo-inc.com.invalid wrote:
Hi,
I
Hi Sankar,
You need to create an external table in order to specify the location of
data (i.e. using CREATE EXTERNAL TABLE user1 LOCATION). You can take
a look at this page
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/TruncateTable
for
the create external table command as well. I get the same
error.
Please help me to find the root cause.
Thanks and Regards,
Sankar S.
On Friday, 22 August 2014, 22:43, Yin Huai huaiyin@gmail.com
wrote:
Hi Sankar,
You need to create an external table in order to specify
I have not profiled this part. But, I think one possible cause is
allocating an array for every inner struct for every row (every struct
value is represented by a Spark SQL row). I will play with it later and see
what I find.
On Tue, Aug 19, 2014 at 9:01 PM, Evan Chan velvia.git...@gmail.com
If you want to filter the table name, you can use
hc.sql(show tables).filter(row = !test.equals(row.getString(0
Seems making functionRegistry transient can fix the error.
On Wed, Aug 20, 2014 at 8:53 PM, Vida Ha v...@databricks.com wrote:
Hi,
I doubt the the broadcast variable is your
PR is https://github.com/apache/spark/pull/2074.
--
From: Yin Huai huaiyin@gmail.com
Sent: 8/20/2014 10:56 PM
To: Vida Ha v...@databricks.com
Cc: tianyi tia...@asiainfo.com; Fengyun RAO raofeng...@gmail.com;
user@spark.apache.org
Subject: Re: Got
Hi Rafeeq,
I think the following part triggered the bug
https://issues.apache.org/jira/browse/SPARK-2908.
[{*href:null*,rel:me}]
It has been fixed. Can you try spark master and see if the error get
resolved?
Thanks,
Yin
On Mon, Aug 11, 2014 at 3:53 AM, rafeeq s rafeeq.ec...@gmail.com wrote:
Seems https://issues.apache.org/jira/browse/SPARK-2846 is the jira tracking
this issue.
On Mon, Aug 18, 2014 at 6:26 PM, cesararevalo ce...@zephyrhealthinc.com
wrote:
Thanks, Zhan for the follow up.
But, do you know how I am supposed to set that table name on the jobConf? I
don't have
Hi,
The SQLParser used by SQLContext is pretty limited. Instead, can you try
HiveContext?
Thanks,
Yin
On Tue, Aug 19, 2014 at 7:57 AM, wan...@testbird.com wan...@testbird.com
wrote:
sql:SELECT app_id,COUNT(DISTINCT app_id, macaddr) cut from object group
by app_id
*Error Log*
14/08/19
not able to switch to a database other than the default one, for
Yarn-client mode, it works fine.
Thanks!
Jenny
On Tue, Aug 12, 2014 at 12:53 PM, Yin Huai huaiyin@gmail.com wrote:
Hi Jenny,
Have you copied hive-site.xml to spark/conf directory? If not, can you
put it in conf/ and try
Hi Silvio,
You can insert into a static partition via SQL statement. Dynamic
partitioning is not supported at the moment.
Thanks,
Yin
On Wed, Aug 13, 2014 at 2:03 PM, Michael Armbrust mich...@databricks.com
wrote:
This is not supported at the moment. There are no concrete plans at the
.svl.ibm.com:8080/value
/property
property
namehive.security.authorization.enabled/name
valuetrue/value
/property
property
namehive.security.authorization.createtable.owner.grants/name
valueALL/value
/property
/configuration
On Mon, Aug 11, 2014 at 4:29 PM, Yin Huai huaiyin
Hi Jenny,
How's your metastore configured for both Hive and Spark SQL? Which
metastore mode are you using (based on
https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin
)?
Thanks,
Yin
On Mon, Aug 11, 2014 at 6:15 PM, Jenny Zhao linlin200...@gmail.com wrote:
you can
If the link to PR/1819 is broken. Here is the one
https://github.com/apache/spark/pull/1819.
On Sun, Aug 10, 2014 at 5:56 PM, Eric Friedman eric.d.fried...@gmail.com
wrote:
Thanks Michael, I can try that too.
I know you guys aren't in sales/marketing (thank G-d), but given all the
hoopla
Hi Brad,
It is a bug. I have filed https://issues.apache.org/jira/browse/SPARK-2908
to track it. It will be fixed soon.
Thanks,
Yin
On Thu, Aug 7, 2014 at 10:55 AM, Brad Miller bmill...@eecs.berkeley.edu
wrote:
Hi All,
I'm having a bit of trouble with nested data structures in pyspark
will have a better story to handle NullType columns (
https://issues.apache.org/jira/browse/SPARK-2695). But, we still will not
expose NullType to users.
On Thu, Aug 7, 2014 at 1:41 PM, Brad Miller bmill...@eecs.berkeley.edu
wrote:
Thanks Yin!
best,
-Brad
On Thu, Aug 7, 2014 at 1:39 PM, Yin
The PR is https://github.com/apache/spark/pull/1840.
On Thu, Aug 7, 2014 at 1:48 PM, Yin Huai yh...@databricks.com wrote:
Actually, the issue is if values of a field are always null (or this field
is missing), we cannot figure out the data type. So, we use NullType (it is
an internal data
Yes, 2376 has been fixed in master. Can you give it a try?
Also, for inferSchema, because Python is dynamically typed, I agree with
Davies to provide a way to scan a subset (or entire) of the dataset to
figure out the proper schema. We will take a look it.
Thanks,
Yin
On Tue, Aug 5, 2014 at
I tried jsonRDD(...).printSchema() and it worked. Seems the problem is when
we take the data back to the Python side, SchemaRDD#javaToPython failed on
your cases. I have created https://issues.apache.org/jira/browse/SPARK-2875
to track it.
Thanks,
Yin
On Tue, Aug 5, 2014 at 9:20 PM, Brad
I have created https://issues.apache.org/jira/browse/SPARK-2775 to track it.
On Thu, Jul 31, 2014 at 11:47 AM, Budde, Adam bu...@amazon.com wrote:
I still see the same “Unresolved attributes” error when using hql +
backticks.
Here’s a code snippet that replicates this behavior:
val
Hi Sarath,
I will try to reproduce the problem.
Thanks,
Yin
On Wed, Jul 23, 2014 at 11:32 PM, Sarath Chandra
sarathchandra.jos...@algofusiontech.com wrote:
Hi Michael,
Sorry for the delayed response.
I'm using Spark 1.0.1 (pre-built version for hadoop 1). I'm running spark
programs on
Hi Sarath,
Have you tried the current branch 1.0? If not, can you give it a try and
see if the problem can be resolved?
Thanks,
Yin
On Thu, Jul 24, 2014 at 11:17 AM, Yin Huai yh...@databricks.com wrote:
Hi Sarath,
I will try to reproduce the problem.
Thanks,
Yin
On Wed, Jul 23
Yes, https://issues.apache.org/jira/browse/SPARK-2576 is used to track it.
On Wed, Jul 23, 2014 at 9:11 AM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
Do we have a JIRA issue to track this? I think I've run into a similar
issue.
On Wed, Jul 23, 2014 at 1:12 AM, Yin Huai yh
On Tue, Jul 22, 2014 at 12:53 AM, Victor Sheng victorsheng...@gmail.com
wrote:
Hi, Yin Huai
I test again with your snippet code.
It works well in spark-1.0.1
Here is my code:
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
case class Record(data_date: String, mobile
Hi Victor,
Instead of importing sqlContext.createSchemaRDD, can you explicitly call
sqlContext.createSchemaRDD(rdd) to create a SchemaRDD?
For example,
You have a case class Record.
case class Record(data_date: String, mobile: String, create_time: String)
Then, you create a RDD[Record] and
Instead of using union, can you try sqlContext.parquetFile(/user/
hive/warehouse/xxx_parquet.db).registerAsTable(parquetTable)?
Then, var all = sql(select some_id, some_type, some_time from
parquetTable).map(line
= (line(0), (line(1).toString, line(2).toString.substring(0, 19
Thanks,
Yin
Can you attach your code?
Thanks,
Yin
On Sat, Jul 19, 2014 at 4:10 PM, chutium teng@gmail.com wrote:
160G parquet files (ca. 30 files, snappy compressed, made by cloudera
impala)
ca. 30 full table scan, took 3-5 columns out, then some normal scala
operations like substring, groupby,
Hi Srinivas,
Seems the query you used is val results =sqlContext.sql(select type from
table1). However, table1 does not have a field called type. The schema of
table1 is defined as the class definition of your case class Record (i.e. ID,
name, score, and school are fields of your table1). Can you
Hi,
queryPlan.baseLogicalPlan is not the plan used to execution. Actually,
the baseLogicalPlan
of a SchemaRDD (queryPlan in your case) is just the parsed plan (the parsed
plan will be analyzed, and then optimized. Finally, a physical plan will be
created). The plan shows up after you execute val
Hi Subacini,
Just want to follow up on this issue. SPARK-2339 has been merged into the
master and 1.0 branch.
Thanks,
Yin
On Tue, Jul 1, 2014 at 2:00 PM, Yin Huai huaiyin@gmail.com wrote:
Seems it is a bug. I have opened
https://issues.apache.org/jira/browse/SPARK-2339 to track
Seems it is a bug. I have opened
https://issues.apache.org/jira/browse/SPARK-2339 to track it.
Thank you for reporting it.
Yin
On Tue, Jul 1, 2014 at 12:06 PM, Subacini B subac...@gmail.com wrote:
Hi All,
Running this join query
sql(SELECT * FROM A_TABLE A JOIN B_TABLE B WHERE
Hi Durin,
I guess that blank lines caused the problem (like Aaron said). Right now,
jsonFile does not skip faulty lines. Can you first use sc.textfile to load
the file as RDD[String] and then use filter to filter out those blank lines
(code snippet can be found below)?
val sqlContext = new
Hello Lars,
Can you check the value of hive.security.authenticator.manager in
hive-site.xml? I guess the value is
org.apache.hadoop.hive.ql.security.ProxyUserAuthenticator. This class was
introduced in hive 0.13, but Spark SQL is based on hive 0.12 right now. Can
you change the value of
101 - 160 of 160 matches
Mail list logo