Hi, can you describe a little bit how the ThriftServer crashed, or steps to
reproduce that? It’s probably a bug of ThriftServer.
Thanks,
From: guoqing0...@yahoo.com.hk [mailto:guoqing0...@yahoo.com.hk]
Sent: Friday, April 24, 2015 9:55 AM
To: Arush Kharbanda
Cc: user
Subject: Re: Re: problem wit
Can you print out the physical plan?
EXPLAIN SELECT xxx…
From: luohui20...@sina.com [mailto:luohui20...@sina.com]
Sent: Monday, May 4, 2015 9:08 PM
To: Olivier Girardot; user
Subject: 回复:Re: sparksql running slow while joining 2 tables.
hi Olivier
spark1.3.1, with java1.8.0.45
and add 2 pics
Or, have you ever try broadcast join?
From: Cheng, Hao [mailto:hao.ch...@intel.com]
Sent: Tuesday, May 5, 2015 8:33 AM
To: luohui20...@sina.com; Olivier Girardot; user
Subject: RE: 回复:Re: sparksql running slow while joining 2 tables.
Can you print out the physical plan?
EXPLAIN SELECT xxx
I assume you’re using the DataFrame API within your application.
sql(“SELECT…”).explain(true)
From: Wang, Daoyuan
Sent: Tuesday, May 5, 2015 10:16 AM
To: luohui20...@sina.com; Cheng, Hao; Olivier Girardot; user
Subject: RE: 回复:RE: 回复:Re: sparksql running slow while joining_2_tables.
You can use
, Hao; Wang, Daoyuan; Olivier Girardot; user
Subject: 回复:Re: sparksql running slow while joining_2_tables.
Hi guys,
attache the pic of physical plan and logs.Thanks.
Thanks&Best regards!
罗辉 San.Luo
- 原始邮件 -
发件人:"Cheng, Hao"
You probably can try something like:
val df = sqlContext.sql("select c1, sum(c2) from T1, T2 where T1.key=T2.key
group by c1")
df.cache() // Cache the result, but it's a lazy execution.
df.registerAsTempTable("my_result")
sqlContext.sql("select * from my_result where c1=1").collect // the cache
Spark SQL just take the JDBC as a new data source, the same as we need to
support loading data from a .csv or .json.
From: Yi Zhang [mailto:zhangy...@yahoo.com.INVALID]
Sent: Friday, May 15, 2015 2:30 PM
To: User
Subject: What's the advantage features of Spark SQL(JDBC)
Hi All,
Comparing direct
Yes.
From: Yi Zhang [mailto:zhangy...@yahoo.com]
Sent: Friday, May 15, 2015 2:51 PM
To: Cheng, Hao; User
Subject: Re: What's the advantage features of Spark SQL(JDBC)
@Hao,
As you said, there is no advantage feature for JDBC, it just provides unified
api to support different data sources.
Forgot to import the implicit functions/classes?
import sqlContext.implicits._
From: Rajdeep Dua [mailto:rajdeep@gmail.com]
Sent: Monday, May 18, 2015 8:08 AM
To: user@spark.apache.org
Subject: InferredSchema Example in Spark-SQL
Hi All,
Was trying the Inferred Schema spart example
http://sp
Typo? Should be .toDF(), not .toRD()
From: Ram Sriharsha [mailto:sriharsha@gmail.com]
Sent: Monday, May 18, 2015 8:31 AM
To: Rajdeep Dua
Cc: user
Subject: Re: InferredSchema Example in Spark-SQL
you mean toDF() ? (toDF converts the RDD to a DataFrame, in this case inferring
schema from the c
And if you want to use the SQL CLI (based on catalyst) as it works in Shark,
you can also check out https://github.com/amplab/shark/pull/337 :)
This preview version doesn’t require the Hive to be setup in the cluster.
(Don’t forget to put the hive-site.xml under SHARK_HOME/conf also)
Cheng Hao
st.fill(300)(Foo("c", 3))
sparkContext.makeRDD(rows).registerAsTable("foo")
sql("select k,count(*) from foo group by k").collect
res1: Array[org.apache.spark.sql.Row] = Array([b,200], [a,100], [c,300])
Cheng Hao
From: Pei-Lun Lee [mailto:pl...@appier.com]
Sent: Wedne
I couldn't reproduce the issue with latest master, but I found another bug of
running this.
https://github.com/apache/spark/pull/1475
Can you give more details about your env?
-Original Message-
From: JiajiaJing [mailto:jj.jing0...@gmail.com]
Sent: Friday, July 18, 2014 8:48 AM
To: u..
u...@spark.incubator.apache.org
Subject: RE: Hive From Spark
Hi Cheng Hao,
Thank you very much for your reply.
Basically, the program runs on Spark 1.0.0 and Hive 0.12.0 .
Some setups of the environment are done by running "SPARK_HIVE=true sbt/sbt
assembly/assembly", including t
This is a very interesting problem. SparkSQL supports the Non Equi Join, but it
is in very low efficiency with large tables.
One possible solution is make both table partition based and the partition keys
are (cast(ds as bigint) / 240), and with each partition in dataset1, you
probably can writ
Actually it's just a pseudo algorithm I described, you can do it with spark
API. Hope the algorithm helpful.
-Original Message-
From: durga [mailto:durgak...@gmail.com]
Sent: Tuesday, July 22, 2014 11:56 AM
To: u...@spark.incubator.apache.org
Subject: RE: Joining by timestamp.
Hi Chen,
Durga, you can start from the documents
http://spark.apache.org/docs/latest/quick-start.html
http://spark.apache.org/docs/latest/programming-guide.html
-Original Message-
From: durga [mailto:durgak...@gmail.com]
Sent: Tuesday, July 22, 2014 12:45 PM
To: u...@spark.incubator.apache.o
In your code snippet, "sample" is actually a SchemaRDD, and SchemaRDD actually
binds a certain SQLContext in runtime, I don't think we can manipulate/share
the SchemaRDD across SQLContext Instances.
-Original Message-
From: Kevin Jung [mailto:itsjb.j...@samsung.com]
Sent: Tuesday, July
I ran this before, actually the hive-site.xml works in this way for me (the
tricky happens in the new HiveConf(classOf[SessionState]), can you double check
if hive-site.xml can be loaded in the class path? It supposes to appear in the
root of the class path.
-Original Message-
From: nik
Probably you need to update the SQL like "SELECT * FROM student_info where id
>= ? and id <= ?".
-Original Message-
From: srinivas [mailto:kusamsrini...@gmail.com]
Sent: Thursday, July 31, 2014 6:55 AM
To: u...@spark.incubator.apache.org
Subject: Data from Mysql using JdbcRDD
Hi,
I am
>From the log, I noticed the "substr" was added on July 15th, 1.0.1 release
>should be earlier than that. Community is now working on releasing the 1.1.0,
>and also some of the performance improvements were added. Probably you can try
>that for your benchmark.
Cheng Hao
I couldn’t reproduce the exception, probably it’s solved in the latest code.
From: Vishal Vibhandik [mailto:vishal.vibhan...@gmail.com]
Sent: Thursday, August 14, 2014 11:17 AM
To: user@spark.apache.org
Subject: Spark SQL Stackoverflow error
Hi,
I tried running the sample sql code JavaSparkSQL bu
Currently SparkSQL doesn’t support the row format/serde in CTAS. The work
around is create the table first.
-Original Message-
From: centerqi hu [mailto:cente...@gmail.com]
Sent: Tuesday, September 02, 2014 3:35 PM
To: user@spark.apache.org
Subject: Unsupported language features in query
[mailto:cente...@gmail.com]
Sent: Tuesday, September 02, 2014 3:46 PM
To: Cheng, Hao
Cc: user@spark.apache.org
Subject: Re: Unsupported language features in query
Thanks Cheng Hao
Have a way of obtaining spark support hive statement list?
Thanks
2014-09-02 15:39 GMT+08:00 Cheng, Hao :
> Curren
, neither of those SQL dialect supports Date, but Timestamp.
Cheng Hao
From: Benjamin Zaitlen [mailto:quasi...@gmail.com]
Sent: Friday, September 05, 2014 5:37 AM
To: user@spark.apache.org
Subject: TimeStamp selection with SparkSQL
I may have missed this but is it possible to select on datetime in a
Hive can launch another job with strategy to merged the small files, probably
we can also do that in the future release.
From: Michael Armbrust [mailto:mich...@databricks.com]
Sent: Friday, September 05, 2014 8:59 AM
To: DanteSama
Cc: u...@spark.incubator.apache.org
Subject: Re: SchemaRDD - Parqu
I copied the 3 datanucleus jars (datanucleus-api-jdo-3.2.1.jar,
datanucleus-core-3.2.2.jar, datanucleus-rdbms-3.2.1.jar) to the fold lib/
manually, and it works for me.
From: Denny Lee [mailto:denny.g@gmail.com]
Sent: Friday, September 12, 2014 11:28 AM
To: alexandria1101
Cc: u...@spark.incu
What's your Spark / Hadoop version? And also the hive-site.xml? Most of case
like that caused by incompatible Hadoop client jar and the Hadoop cluster.
-Original Message-
From: linkpatrickliu [mailto:linkpatrick...@live.com]
Sent: Monday, September 15, 2014 2:35 PM
To: u...@spark.incubat
The Hadoop client jar should be assembled into the uber-jar, but (I suspect)
it's probably not compatible with your Hadoop Cluster.
Can you also paste the Spark uber-jar name? Usually will be under the path
lib/spark-assembly-1.1.0-xxx-hadoopxxx.jar.
-Original Message-
From: linkpatrick
Sorry, I am not able to reproduce that.
Can you try add the following entry into the hive-site.xml? I know they have
the default value, but let's make it explicitly.
hive.server2.thrift.port
hive.server2.thrift.bind.host
hive.server2.authentication (NONE、KERBEROS、LDAP、PAM or CUSTOM)
-Origi
Thank you for pasting the steps, I will look at this, hopefully come out with a
solution soon.
-Original Message-
From: linkpatrickliu [mailto:linkpatrick...@live.com]
Sent: Tuesday, September 16, 2014 3:17 PM
To: u...@spark.incubator.apache.org
Subject: RE: SparkSQL 1.1 hang when "DROP"
https://github.com/apache/spark/pull/2241), not sure if you can wait for this.
☺
From: Yin Huai [mailto:huaiyin@gmail.com]
Sent: Wednesday, September 17, 2014 1:50 AM
To: Cheng, Hao
Cc: linkpatrickliu; u...@spark.incubator.apache.org
Subject: Re: SparkSQL 1.1 hang when "DROP" or "L
the HiveDriver will always get the null value
when retrieving HiveConf.
Cheng Hao
From: Du Li [mailto:l...@yahoo-inc.com.INVALID]
Sent: Thursday, September 18, 2014 7:51 AM
To: user@spark.apache.org; d...@spark.apache.org
Subject: problem with HiveContext inside Actor
Hi,
Wonder anybody
(1::2::Nil).map(i=> T(i.toString, new
java.sql.Timestamp(i)))
data.registerTempTable("x")
val s = sqlContext.sql("select a from x where ts>='1970-01-01 00:00:00';")
s.collect
output:
res1: Array[org.apache.spark.sql.Row] = Array([1], [2
Seems bugs in the JavaSQLContext.getSchema(), which doesn't enumerate all of
the data types supported by Catalyst.
From: Ge, Yao (Y.) [mailto:y...@ford.com]
Sent: Sunday, October 19, 2014 11:44 PM
To: Wang, Daoyuan; user@spark.apache.org
Subject: RE: scala.MatchError: class java.sql.Timestamp
sc
You needn't do anything, the implicit conversion should do this for you.
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L103
https://github.com/apache/spark/blob/2ac40da3f9fa6d45a59bb45b41606f1931ac5e81/sql/catalyst/src/main/scala/org/apac
not sure about how kd tree used in mllib. but keep in mind
SchemaRDD is just a normal RDD.
Cheng Hao
From: sanath kumar [mailto:sanath1...@gmail.com]
Sent: Wednesday, October 22, 2014 12:58 PM
To: user@spark.apache.org
Subject: spark sql query optimization , and decision tree building
Hi all
Can you paste the hive-site.xml? Most of times I meet this exception, because
the JDBC driver for hive metastore are not correct set or wrong driver classes
are included in the assembly jar.
As default, the assembly jar contains the derby.jar, which is the embedded
derby JDBC driver.
From: Jac
both.
Sorry if I missed some discussion of Hive upgrading.
Cheng Hao
Hi, all, I noticed that when compiling the SparkSQL with profile "hive-0.13.1",
it will fetch the Hive version of 0.13.1a under groupId
"org.spark-project.hive", what's the difference with the one of
"org.apache.hive"? And where can I get the source code for re-compiling?
Thanks,
Cheng Hao
Which version are you using? I can reproduce that in the latest code, but with
different exception.
I've filed an bug https://issues.apache.org/jira/browse/SPARK-4263, can you
also add some information there?
Thanks,
Cheng Hao
-Original Message-
From: Kevin Paul [mailto:kevinp
Can you try query like “SELECT timestamp, CAST(timestamp as string) FROM logs
LIMIT 5”, I guess you probably ran into the timestamp precision or the timezone
shifting problem.
(And it’s not mandatory, but you’d better change the field name from
“timestamp” to something else, as “timestamp” is t
Are all of your join keys the same? and I guess the join type are all “Left”
join, https://github.com/apache/spark/pull/3362 probably is what you need.
And, SparkSQL doesn’t support the multiway-join (and multiway-broadcast join)
currently, https://github.com/apache/spark/pull/3270 should be an
Spark SQL doesn't support the DISTINCT well currently, particularly the case
you described, it will leads all of the data fall into a single node and keep
them in memory only.
Dev community actually has solutions for this, it probably will be solved after
the release of Spark 1.2.
-Original
From: Jianshi Huang [mailto:jianshi.hu...@gmail.com]
Sent: Thursday, November 27, 2014 10:24 PM
To: Cheng, Hao
Cc: user
Subject: Re: Auto BroadcastJoin optimization failed in latest Spark
Hi Hao,
I'm using inner join as Broadcast join didn't work for left joins (thanks for
the lin
/pull/3595 )
b. It expects the function return type to be immutable.Seq[XX] for List,
immutable.Map[X, X] for Map, scala.Product for Struct, and only Array[Byte] for
binary. The Array[_] is not supported.
Cheng Hao
From: Tobias Pfeiffer [mailto:t...@preferred.jp]
Sent: Thursday, December 4
You can try to write your own Relation with filter push down or use the
ParquetRelation2 for workaround.
(https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala)
Cheng Hao
-Original Message-
From: Jerry Raj [mailto:jerry
It works exactly like Create Table As Select (CTAS) in Hive.
Cheng Hao
From: Anas Mosaad [mailto:anas.mos...@incorta.com]
Sent: Wednesday, December 10, 2014 11:59 AM
To: Michael Armbrust
Cc: Manoj Samel; user@spark.apache.org
Subject: Re: Can HiveContext be used without using Hive?
In that
As the error log shows, you may need to register it as:
sqlContext.rgisterFunction(“toHour”, toHour _)
The “_” means you are passing the function as parameter, not invoking it.
Cheng Hao
From: Xuelin Cao [mailto:xuelin...@yahoo.com.INVALID]
Sent: Monday, December 15, 2014 5:28 PM
To: User
Hi, Lam, I can confirm this is a bug with the latest master, and I filed a jira
issue for this:
https://issues.apache.org/jira/browse/SPARK-4944
Hope come with a solution soon.
Cheng Hao
From: Jerry Lam [mailto:chiling...@gmail.com]
Sent: Wednesday, December 24, 2014 4:26 AM
To: user
I am wondering if we can provide more friendly API, other than configuration
for this purpose. What do you think Patrick?
Cheng Hao
-Original Message-
From: Patrick Wendell [mailto:pwend...@gmail.com]
Sent: Thursday, December 25, 2014 3:22 PM
To: Shao, Saisai
Cc: user@spark.apache.org
multiple parquet files
for API sqlContext.parquetFile, we need to think how to support multiple paths
in some other way.
Cheng Hao
From: Michael Armbrust [mailto:mich...@databricks.com]
Sent: Thursday, December 25, 2014 1:01 PM
To: Daniel Siegmann
Cc: user@spark.apache.org
Subject: Re: Escape
Can you paste the error log?
From: Dai, Kevin [mailto:yun...@ebay.com]
Sent: Monday, January 5, 2015 6:29 PM
To: user@spark.apache.org
Subject: Implement customized Join for SparkSQL
Hi, All
Suppose I want to join two tables A and B as follows:
Select * from A join B on A.id = B.id
A is a file
The log showed it failed in parsing, so the typo stuff shouldn’t be the root
cause. BUT I couldn’t reproduce that with master branch.
I did the test as follow:
sbt/sbt –Phadoop-2.3.0 –Phadoop-2.3 –Phive –Phive-0.13.1 hive/console
scala> sql(“SELECT user_id FROM actions where conversion_aciton_id
Hi, BB
Ideally you can do the query like: select key, value.percent from
mytable_data lateral view explode(audiences) f as key, value limit 3;
But there is a bug in HiveContext:
https://issues.apache.org/jira/browse/SPARK-5237
I am working on it now, hopefully make a patch soon.
Cheng
The Data Source API probably work for this purpose.
It support the column pruning and the Predicate Push Down:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala
Examples also can be found in the unit test:
https://github.com/apache/sp
Wow, glad to know that it works well, and sorry, the Jira is another issue,
which is not the same case here.
From: Bagmeet Behera [mailto:bagme...@gmail.com]
Sent: Saturday, January 17, 2015 12:47 AM
To: Cheng, Hao
Subject: Re: using hiveContext to select a nested Map-data-type from an
It seems the netty jar works with an incompatible method signature. Can you
check if there different versions of netty jar in your classpath?
From: Walrus theCat [mailto:walrusthe...@gmail.com]
Sent: Sunday, January 18, 2015 3:37 PM
To: user@spark.apache.org
Subject: Re: SparkSQL 1.2.0 sources A
101 - 158 of 158 matches
Mail list logo