You can use
Explain extended select ….
From: luohui20...@sina.com [mailto:luohui20...@sina.com]
Sent: Tuesday, May 05, 2015 9:52 AM
To: Cheng, Hao; Olivier Girardot; user
Subject: 回复:RE: 回复:Re: sparksql running slow while joining_2_tables.
As I know broadcastjoin is automatically enabled
Can you print out the physical plan?
EXPLAIN SELECT xxx…
From: luohui20...@sina.com [mailto:luohui20...@sina.com]
Sent: Monday, May 4, 2015 9:08 PM
To: Olivier Girardot; user
Subject: 回复:Re: sparksql running slow while joining 2 tables.
hi Olivier
spark1.3.1, with java1.8.0.45
and add 2 pics
guys, just to confirm, sparksql support hive feature view, is that the one
LateralView in hive language manual?
thanks
Thanksamp;Best regards!
罗辉 San.Luo
to confirm, sparksql support hive feature view, is that the one
LateralView
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView
in
hive language manual?
thanks
Thanksamp;Best regards!
罗辉 San.Luo
I assume you’re using the DataFrame API within your application.
sql(“SELECT…”).explain(true)
From: Wang, Daoyuan
Sent: Tuesday, May 5, 2015 10:16 AM
To: luohui20...@sina.com; Cheng, Hao; Olivier Girardot; user
Subject: RE: 回复:RE: 回复:Re: sparksql running slow while joining_2_tables.
You can use
You are looking for LATERAL VIEW explode
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-explode
in HiveQL.
On Mon, May 4, 2015 at 7:49 AM, Giovanni Paolo Gibilisco gibb...@gmail.com
wrote:
Hi, I'm trying to parse log files generated by Spark using SparkSQL
Or, have you ever try broadcast join?
From: Cheng, Hao [mailto:hao.ch...@intel.com]
Sent: Tuesday, May 5, 2015 8:33 AM
To: luohui20...@sina.com; Olivier Girardot; user
Subject: RE: 回复:Re: sparksql running slow while joining 2 tables.
Can you print out the physical plan?
EXPLAIN SELECT xxx
Thanksamp;Best regards!
罗辉 San.Luo
- 原始邮件 -
发件人:Cheng, Hao hao.ch...@intel.com
收件人:Cheng, Hao hao.ch...@intel.com, luohui20...@sina.com
luohui20...@sina.com, Olivier Girardot ssab...@gmail.com, user
user@spark.apache.org
主题:RE: 回复:Re: sparksql running
The issue is solved. There was a problem in my hive codebase. Once that was
fixed, -Phive-provided spark is working fine against my hive jars.
On 27 April 2015 at 08:00, Manku Timma manku.tim...@gmail.com wrote:
Made some progress on this. Adding hive jars to the system classpath is
needed.
Hi,
I am trying to answer a simple query with SparkSQL over the Parquet file.
When execute the query several times, the first run will take about 2s
while the later run will take 0.1s.
By looking at the log file it seems the later runs doesn't load the data
from disk. However, I didn't enable
Isn't it already available on the driver UI (that runs on 4040)?
Thanks
Best Regards
On Mon, Apr 27, 2015 at 9:55 AM, Wenlei Xie wenlei@gmail.com wrote:
Hi,
I am wondering how should we understand the running time of SparkSQL
queries? For example the physical query plan and the running
storage. Note if caching is done by spark it may be transient.
On 28 Apr 2015 08:00, Wenlei Xie wenlei@gmail.com wrote:
Hi,
I am trying to answer a simple query with SparkSQL over the Parquet file.
When execute the query several times, the first run will take about 2s
while the later run
Made some progress on this. Adding hive jars to the system classpath is
needed. But looks like it needs to be towards the end of the system
classes. Manually adding the hive classpath into
Client.populateHadoopClasspath solved the issue. But a new issue has come
up. It looks like some hive
Hi,
I am wondering how should we understand the running time of SparkSQL
queries? For example the physical query plan and the running time on each
stage? Is there any guide talking about this?
Thank you!
Best,
Wenlei
Use Object[] in Java just works :).
On Fri, Apr 24, 2015 at 4:56 PM, Wenlei Xie wenlei@gmail.com wrote:
Hi,
I am wondering if there is any way to create a Row in SparkSQL 1.2 in Java
by using an List? It looks like
ArrayListObject something;
Row.create(something)
will create a row
Setting SPARK_CLASSPATH is triggering other errors. Not working.
On 25 April 2015 at 09:16, Manku Timma manku.tim...@gmail.com wrote:
Actually found the culprit. The JavaSerializerInstance.deserialize is
called with a classloader (of type MutableURLClassLoader) which has access
to all the
Actually found the culprit. The JavaSerializerInstance.deserialize is
called with a classloader (of type MutableURLClassLoader) which has access
to all the hive classes. But internally it triggers a call to loadClass but
with the default classloader. Below is the stacktrace (line numbers in the
Hi,
I am wondering if there is any way to create a Row in SparkSQL 1.2 in Java
by using an List? It looks like
ArrayListObject something;
Row.create(something)
will create a row with single column (and the single column contains the
array)
Best,
Wenlei
:18 GMT+02:00 Michael Armbrust mich...@databricks.com:
There is a cost to converting from JavaBeans to Rows and this code path
has not been optimized. That is likely what you are seeing.
On Mon, Apr 20, 2015 at 3:55 PM, ayan guha guha.a...@gmail.com wrote:
SparkSQL optimizes better by column
I see, now try a bit tricky approach, Add the hive jar to the
SPARK_CLASSPATH (in conf/spark-env.sh file on all machines) and make sure
that jar is available on all the machines in the cluster in the same path.
Thanks
Best Regards
On Wed, Apr 22, 2015 at 11:24 AM, Manku Timma
has not been optimized. That is likely what you are seeing.
On Mon, Apr 20, 2015 at 3:55 PM, ayan guha guha.a...@gmail.com wrote:
SparkSQL optimizes better by column pruning and predicate pushdown,
primarily. Here you are not taking advantage of either.
I am curious to know what goes in your
optimized. That is likely what you are seeing.
On Mon, Apr 20, 2015 at 3:55 PM, ayan guha guha.a...@gmail.com wrote:
SparkSQL optimizes better by column pruning and predicate pushdown,
primarily. Here you are not taking advantage of either.
I am curious to know what goes in your filter function
Akhil, Thanks for the suggestions.
I tried out sc.addJar, --jars, --conf spark.executor.extraClassPath and
none of them helped. I added stuff into compute-classpath.sh. That did not
change anything. I checked the classpath of the running executor and made
sure that the hive jars are in that dir.
optimized. That is likely what you are seeing.
On Mon, Apr 20, 2015 at 3:55 PM, ayan guha guha.a...@gmail.com wrote:
SparkSQL optimizes better by column pruning and predicate pushdown,
primarily. Here you are not taking advantage of either.
I am curious to know what goes in your filter function
wondering why there is such a big gap on performance if it is just a
filter. Internally, the relation files are mapped to a JavaBean. This
different data presentation (JavaBeans vs SparkSQL internal representation)
could lead to such difference? Is there anything I could do to make the
performance
SparkSQL optimizes better by column pruning and predicate pushdown,
primarily. Here you are not taking advantage of either.
I am curious to know what goes in your filter function, as you are not
using a filter in SQL side.
Best
Ayan
On 21 Apr 2015 08:05, Renato Marroquín Mogrovejo
There is a cost to converting from JavaBeans to Rows and this code path has
not been optimized. That is likely what you are seeing.
On Mon, Apr 20, 2015 at 3:55 PM, ayan guha guha.a...@gmail.com wrote:
SparkSQL optimizes better by column pruning and predicate pushdown,
primarily. Here you
presentation (JavaBeans vs SparkSQL internal representation)
could lead to such difference? Is there anything I could do to make the
performance get closer to the hard-coded option?
Thanks in advance for any suggestions or ideas.
Renato M.
Can you try sc.addJar(/path/to/your/hive/jar), i think it will resolve it.
Thanks
Best Regards
On Mon, Apr 20, 2015 at 12:26 PM, Manku Timma manku.tim...@gmail.com
wrote:
Akhil,
But the first case of creating HiveConf on the executor works fine (map
case). Only the second case fails. I was
I am using spark-1.3 with hadoop-provided and hive-provided and hive-0.13.1
profiles. I am running a simple spark job on a yarn cluster by adding all
hadoop2 and hive13 jars to the spark classpaths.
If I remove the hive-provided while building spark, I dont face any issue.
But with hive-provided
Looks like a missing jar, try to print the classpath and make sure the hive
jar is present.
Thanks
Best Regards
On Mon, Apr 20, 2015 at 11:52 AM, Manku Timma manku.tim...@gmail.com
wrote:
I am using spark-1.3 with hadoop-provided and hive-provided and
hive-0.13.1 profiles. I am running a
Akhil,
But the first case of creating HiveConf on the executor works fine (map
case). Only the second case fails. I was suspecting some foul play with
classloaders.
On 20 April 2015 at 12:20, Akhil Das ak...@sigmoidanalytics.com wrote:
Looks like a missing jar, try to print the classpath and
Using Spark 1.2.0. Tried to apply register an RDD and got:
scala.MatchError: class java.util.Date (of class java.lang.Class)
I see it was resolved in https://issues.apache.org/jira/browse/SPARK-2562
(included in 1.2.0)
Anyone encountered this issue?
Thanks,
Lior
Here's a code example:
public class DateSparkSQLExample {
public static void main(String[] args) {
SparkConf conf = new SparkConf().setAppName(test).setMaster(local);
JavaSparkContext sc = new JavaSparkContext(conf);
ListSomeObject itemsList =
...@quantium.com.aumailto:nathan.mccar...@quantium.com.au
Cc: user@spark.apache.orgmailto:user@spark.apache.org
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: SparkSQL JDBC Datasources API when running on YARN - Spark 1.3.0
Can you provide the JDBC connector jar version. Possibly the full JAR name
: SparkSQL JDBC Datasources API when running on YARN - Spark 1.3.0
Can you provide your spark version?
Thanks,
Daoyuan
From: Nathan McCarthy [mailto:nathan.mccar...@quantium.com.au]
Sent: Wednesday, April 15, 2015 1:57 PM
To: Nathan McCarthy; user@spark.apache.org
Subject: Re: SparkSQL JDBC
Hi Spark users,
Trying to upgrade to Spark1.2 and running into the following
seeing some very slow queries and wondering if someone can point me in the
right direction for debugging. My Spark UI shows a job with duration 15s
(see attached screenshot). Which would be great but client side
nathan.mccar...@quantium.com.aumailto:nathan.mccar...@quantium.com.au
Date: Wednesday, 15 April 2015 11:49 pm
To: Wang, Daoyuan daoyuan.w...@intel.commailto:daoyuan.w...@intel.com,
user@spark.apache.orgmailto:user@spark.apache.org
user@spark.apache.orgmailto:user@spark.apache.org
Subject: RE: SparkSQL
nathan.mccar...@quantium.com.au
Date: Wednesday, 15 April 2015 1:57 pm
To: user@spark.apache.org user@spark.apache.org
Subject: SparkSQL JDBC Datasources API when running on YARN - Spark 1.3.0
Hi guys,
Trying to use a Spark SQL context’s .load(“jdbc, …) method to create a
DF from a JDBC data
Can you provide your spark version?
Thanks,
Daoyuan
From: Nathan McCarthy [mailto:nathan.mccar...@quantium.com.au]
Sent: Wednesday, April 15, 2015 1:57 PM
To: Nathan McCarthy; user@spark.apache.org
Subject: Re: SparkSQL JDBC Datasources API when running on YARN - Spark 1.3.0
Just an update
, Paolo Platter paolo.plat...@agilelab.it
wrote:
Hi all,
is there anyone using SparkSQL + Parquet that has made a benchmark
about storing parquet files on HDFS or on CFS ( Cassandra File System )?
What storage can improve performance of SparkSQL+ Parquet ?
Thanks
Paolo
Hi guys,
Trying to use a Spark SQL context’s .load(“jdbc, …) method to create a DF from
a JDBC data source. All seems to work well locally (master = local[*]), however
as soon as we try and run on YARN we have problems.
We seem to be running into problems with the class path and loading up the
Subject: SparkSQL JDBC Datasources API when running on YARN - Spark 1.3.0
Hi guys,
Trying to use a Spark SQL context’s .load(“jdbc, …) method to create a DF from
a JDBC data source. All seems to work well locally (master = local[*]), however
as soon as we try and run on YARN we have problems.
We
finished, while the DF/SQL approach don't.
Any idea ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/The-differentce-between-SparkSql-DataFram-join-and-Rdd-join-tp22407.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
, join, and then apply a new schema on the result RDD. This
approach
works, at least all tasks were finished, while the DF/SQL approach don't.
Any idea ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/The-differentce-between-SparkSql-DataFram-join
, at least all tasks were finished, while the DF/SQL approach don't.
Any idea ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/The-differentce-between-SparkSql-DataFram-join-and-Rdd-join-tp22407.html
Sent from the Apache Spark User List mailing list
a new schema on the result RDD. This approach
works, at least all tasks were finished, while the DF/SQL approach don't.
Any idea ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/The-differentce-between-SparkSql-DataFram-join-and-Rdd-join-tp22407.html
Sent
Hi,
I have written a scala object which can do query on the messages which I am
receiving from Kafka.
Now I have to show it on some webpage or dashboard which can auto refresh with
new results.. Any pointer how can I do that..
Thanks,
Mukund
Hi all,
is there anyone using SparkSQL + Parquet that has made a benchmark about
storing parquet files on HDFS or on CFS ( Cassandra File System )?
What storage can improve performance of SparkSQL+ Parquet ?
Thanks
Paolo
Michael, thanks for the response and looking forward to try 1.3.1
From: Michael Armbrust [mailto:mich...@databricks.com]
Sent: Friday, April 03, 2015 6:52 AM
To: Haopu Wang
Cc: user
Subject: Re: [SparkSQL 1.3.0] Cannot resolve column name SUM('p.q)
among (k
It failed to find the class class org.apache.spark.sql.catalyst.ScalaReflection
in the Spark SQL library. Make sure it's in the classpath and the version
is correct, too.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Hi Everyone,
I am getting following error while registering table using Scala IDE.
Please let me know how to resolve this error. I am using Spark 1.2.1
import sqlContext.createSchemaRDD
val empFile = sc.textFile(/tmp/emp.csv, 4)
.map ( _.split(,) )
Hi, I want to rename an aggregation field using DataFrame API. The
aggregation is done on a nested field. But I got below exception.
Do you see the same issue and any workaround? Thank you very much!
==
Exception in thread main org.apache.spark.sql.AnalysisException:
Cannot resolve
This is actually a problem with our use of Scala's reflection library.
Unfortunately you need to load Spark SQL using the primordial classloader,
otherwise you run into this problem. If anyone from the scala side can
hint how we can tell scala.reflect which classloader to use when creating
the
This is tracked by these JIRAs..
https://issues.apache.org/jira/browse/SPARK-5947
https://issues.apache.org/jira/browse/SPARK-5948
From: denny.g@gmail.com
Date: Wed, 1 Apr 2015 04:35:08 +
Subject: Creating Partitioned Parquet Tables via SparkSQL
To: user@spark.apache.org
Creating
: Wed, 1 Apr 2015 04:35:08 +
Subject: Creating Partitioned Parquet Tables via SparkSQL
To: user@spark.apache.org
Creating Parquet tables via .saveAsTable is great but was wondering if
there was an equivalent way to create partitioned parquet tables.
Thanks!
...@centurylink.com
wrote:
I am trying to integrate SparkSQL with a BI tool. My requirement is to
query a Hive table very frequently from the BI tool.
Is there a way to cache the Hive Table permanently in SparkSQL? I don't
want to read the Hive table and cache it everytime the query
I am trying to integrate SparkSQL with a BI tool. My requirement is to query a
Hive table very frequently from the BI tool.
Is there a way to cache the Hive Table permanently in SparkSQL? I don't want
to read the Hive table and cache it everytime the query is submitted from BI
tool.
Thanks
You can use the HiveContext instead of SQLContext, which should support all the
HiveQL, including lateral view explode.
SQLContext is not supporting that yet.
BTW, nice coding format in the email.
Yong
Date: Tue, 31 Mar 2015 18:18:19 -0400
Subject: Re: SparkSql - java.util.NoSuchElementException
I am accessing ElasticSearch via the elasticsearch-hadoop and attempting to
expose it via SparkSQL. I am using spark 1.2.1, latest supported by
elasticsearch-hadoop, and org.elasticsearch % elasticsearch-hadoop %
2.1.0.BUILD-SNAPSHOT of elasticsearch-hadoop. I’m
encountering an issue when I
at 3:26 PM, Todd Nist tsind...@gmail.com wrote:
I am accessing ElasticSearch via the elasticsearch-hadoop and attempting
to expose it via SparkSQL. I am using spark 1.2.1, latest supported by
elasticsearch-hadoop, and org.elasticsearch % elasticsearch-hadoop %
2.1.0.BUILD-SNAPSHOT of elasticsearch
://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Timestamp-query-failure-tp19502p22292.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
I'm using 1.0.4
Thanks,
--
Pei-Lun
On Fri, Mar 27, 2015 at 2:32 PM, Cheng Lian lian.cs@gmail.com wrote:
Hm, which version of Hadoop are you using? Actually there should also be
a _metadata file together with _common_metadata. I was using Hadoop 2.4.1
btw. I'm not sure whether Hadoop
Thanks for the information. Verified that the _common_metadata and
_metadata file are missing in this case when using Hadoop 1.0.4. Would
you mind to open a JIRA for this?
Cheng
On 3/27/15 2:40 PM, Pei-Lun Lee wrote:
I'm using 1.0.4
Thanks,
--
Pei-Lun
On Fri, Mar 27, 2015 at 2:32 PM, Cheng
JIRA ticket created at:
https://issues.apache.org/jira/browse/SPARK-6581
Thanks,
--
Pei-Lun
On Fri, Mar 27, 2015 at 7:03 PM, Cheng Lian lian.cs@gmail.com wrote:
Thanks for the information. Verified that the _common_metadata and
_metadata file are missing in this case when using Hadoop
, 2015 1:44 AM
*To:* Cheng, Hao
*Cc:* user@spark.apache.org
*Subject:* Re: Registering custom UDAFs with HiveConetxt in SparkSQL,
how?
Thanks Hao,
But my question concerns UDAF (user defined aggregation function ) not
UDTF( user defined type function ).
I appreciate if you could point me
I couldn’t reproduce this with the following spark-shell snippet:
|scala import sqlContext.implicits._
scala Seq((1, 2)).toDF(a, b)
scala res0.save(xxx, org.apache.spark.sql.SaveMode.Overwrite)
scala res0.save(xxx, org.apache.spark.sql.SaveMode.Overwrite)
|
The _common_metadata file is
Hi Cheng,
on my computer, execute res0.save(xxx, org.apache.spark.sql.SaveMode.
Overwrite) produces:
peilunlee@pllee-mini:~/opt/spark-1.3...rc3-bin-hadoop1$ ls -l xxx
total 32
-rwxrwxrwx 1 peilunlee staff0 Mar 27 11:29 _SUCCESS*
-rwxrwxrwx 1 peilunlee staff 272 Mar 27 11:29
Dear all,
I am trying to upgrade the spark from 1.2 to 1.3 and switch the existed API
of creating SchemaRDD to DataFrame.
After testing, I notice that the following behavior is changed:
```
import java.sql.Date
import com.bridgewell.SparkTestUtils
import org.apache.spark.rdd.RDD
import
/spark/pull/3247
*From:* shahab [mailto:shahab.mok...@gmail.com]
*Sent:* Wednesday, March 11, 2015 1:44 AM
*To:* Cheng, Hao
*Cc:* user@spark.apache.org
*Subject:* Re: Registering custom UDAFs with HiveConetxt in SparkSQL,
how?
Thanks Hao,
But my question concerns UDAF (user defined
Hi,
I have a DataFrame object and I want to do types of aggregations like
count, sum, variance, stddev, etc.
DataFrame has DSL to do simple aggregations like count and sum.
How about variance and stddev?
Thank you for any suggestions!
I would do sum square. This would allow you to keep an ongoing value as an
associative operation (in an aggregator) and then calculate the variance
std deviation after the fact.
On Wed, Mar 25, 2015 at 10:28 PM, Haopu Wang hw...@qilinsoft.com wrote:
Hi,
I have a DataFrame object and I
Hi,
When I save parquet file with SaveMode.Overwrite, it never generate
_common_metadata. Whether it overwrites an existing dir or not.
Is this expected behavior?
And what is the benefit of _common_metadata? Will reading performs better
when it is present?
Thanks,
--
Pei-Lun
Perhaps this email reference may be able to help from a DataFrame
perspective:
http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201503.mbox/%3CCALte62ztepahF=5hk9rcfbnyk4z43wkcq4fkdcbwmgf_3_o...@mail.gmail.com%3E
On Wed, Mar 25, 2015 at 7:29 PM Haopu Wang hw...@qilinsoft.com wrote:
Awesome. yep - I have seen the warnings on UDTs, happy to keep up with the
API changes :). Would this be a reasonable PR to toss up despite the API
unstableness or would you prefer it to wait?
Thanks
-Pat
On Tue, Mar 24, 2015 at 7:44 PM, Michael Armbrust mich...@databricks.com
wrote:
I'll
I'll caution that the UDTs are not a stable public interface yet. We'd
like to do this someday, but currently this feature is mostly for MLlib as
we have not finalized the API.
Having an ordering could be useful, but I'll add that currently UDTs
actually exist in serialized from so the ordering
Hey all,
Currently looking into UDTs and I was wondering if it is reasonable to add
the ability to define an Ordering (or if this is possible, then how)?
Currently it will throw an error when non-Native types are used.
Thanks!
-Pat
UDAFs with HiveConetxt in SparkSQL, how?
Thanks Hao,
But my question concerns UDAF (user defined aggregation function ) not
UDTF( user defined type function ).
I appreciate if you could point me to some starting point on UDAF
development in Spark.
Thanks
Shahab
On Tuesday, March
...@yahoo.com wrote:
I have a complex SparkSQL query of the nature
select a.a, b.b, c.c from a,b,c where a.x = b.x and b.y = c.y
How do I convert this efficiently into scala query of
a.join(b,..,..)
and so on. Can anyone help me with this? If my question needs more
clarification, please
I have a complex SparkSQL query of the nature
select a.a, b.b, c.c from a,b,c where a.x = b.x and b.y = c.y
How do I convert this efficiently into scala query of
a.join(b,..,..)
and so on. Can anyone help me with this? If my question needs more
clarification, please let me know.
--
View
hi all:
I got a spark on yarn cluster (spark-1.3.0, hadoop-2.2.0) with hive-0.12.0 and
tachyon-0.6.1,
and now I start SparkSQL thriftserver with start-thriftserver.sh, and use
beeline to connect to thriftserver according to spark document.
My question is: how to cache table with specified
; user@spark.apache.org
Subject: Re: configure number of cached partition in memory on SparkSQL
Hi Judy,
In the case of HadoopRDD and NewHadoopRDD, partition number is actually decided
by the InputFormat used. And spark.sql.inMemoryColumnarStorage.batchSize is not
related to partition number
JIRA and PR for first issue:
https://issues.apache.org/jira/browse/SPARK-6408
https://github.com/apache/spark/pull/5087
On Thu, Mar 19, 2015 at 12:20 PM, Pei-Lun Lee pl...@appier.com wrote:
Hi,
I am trying jdbc data source in spark sql 1.3.0 and found some issues.
First, the syntax where
Hi:
I need to count some Game Player Events in the game.
Such as : How Many Players stay in the game scene 1--Save the
Princess from a Dragon
Moneys they have paid in the last 5 min
How many players pay money for go through this scene
much more
hey guys,
In my understanding SparkSQL only supports JDBC connection through hive thrift
server, is this correct?
Thanks
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h
Yes, I have been using Spark SQL from the onset. Haven't found any other
Server for Spark SQL for JDBC connectivity.
On Wed, Mar 18, 2015 at 5:50 PM, sequoiadb mailing-list-r...@sequoiadb.com
wrote:
hey guys,
In my understanding SparkSQL only supports JDBC connection through hive
thrift
Yes
On 3/18/15 8:20 PM, sequoiadb wrote:
hey guys,
In my understanding SparkSQL only supports JDBC connection through hive thrift
server, is this correct?
Thanks
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
Hi,
I am trying jdbc data source in spark sql 1.3.0 and found some issues.
First, the syntax where str_col='value' will give error for both
postgresql and mysql:
psql create table foo(id int primary key,name text,age int);
bash SPARK_CLASSPATH=postgresql-9.4-1201-jdbc41.jar
Hallo,
Depending non your needs, search technology, such as SolrCloud or
ElasticSearch makes more sense. If you go for the Cassandra solution you
can use the lucene text indexer...
I am not sure if hive or sparksql are very suitable for text. However, if
you do not need text search then feel free
Hi:
I need to migrate a Log Analysis System from mysql + some C++ real time
computer framwork to Hadoop ecosystem.
When I want to build a data warehouse. don't know which one is the right
choice. Cassandra? HIVE? Or just SparkSQL ?
There is few benchmark for these systems.
My
Hi Judy,
In the case of |HadoopRDD| and |NewHadoopRDD|, partition number is
actually decided by the |InputFormat| used. And
|spark.sql.inMemoryColumnarStorage.batchSize| is not related to
partition number, it controls the in-memory columnar batch size within a
single partition.
Also, what
)
--
*From:* Cheng, Hao [mailto:hao.ch...@intel.com]
*Sent:* Wednesday, March 11, 2015 8:25 AM
*To:* Haopu Wang; user; d...@spark.apache.org
*Subject:* RE: [SparkSQL] Reuse HiveContext to different Hive warehouse?
I am not so sure if Hive supports change the metastore after
)
From: Cheng, Hao [mailto:hao.ch...@intel.com]
Sent: Wednesday, March 11, 2015 8:25 AM
To: Haopu Wang; user; d...@spark.apache.org
Subject: RE: [SparkSQL] Reuse HiveContext to different Hive warehouse?
I am not so sure if Hive supports change the metastore after
initialized, I guess
Hi,
I need o develop couple of UDAFs and use them in the SparkSQL. While UDFs
can be registered as a function in HiveContext, I could not find any
documentation of how UDAFs can be registered in the HiveContext?? so far
what I have found is to make a JAR file, out of developed UDAF class
I'm using Spark 1.3.0 RC3 build with Hive support.
In Spark Shell, I want to reuse the HiveContext instance to different
warehouse locations. Below are the steps for my test (Assume I have
loaded a file into table src).
==
15/03/10 18:22:59 INFO SparkILoop: Created sql context (with
any one know how to deploy a custom UDAF jar file in SparkSQL?
Hi,
Does any one know how to deploy a custom UDAF jar file in SparkSQL? Where
should i put the jar file so SparkSQL can pick it up and make it accessible for
SparkSQL applications?
I do not use spark-shell instead I want to use
: Tuesday, March 10, 2015 5:44 PM
To: user@spark.apache.org
Subject: Registering custom UDAFs with HiveConetxt in SparkSQL, how?
Hi,
I need o develop couple of UDAFs and use them in the SparkSQL. While UDFs can
be registered as a function in HiveContext, I could not find any documentation
of how UDAFs
Hi,
Does any one know how to deploy a custom UDAF jar file in SparkSQL? Where
should i put the jar file so SparkSQL can pick it up and make it accessible
for SparkSQL applications?
I do not use spark-shell instead I want to use it in an spark application.
best,
/Shahab
:_e(%7B%7D,'cvml','shahab.mok...@gmail.com');]
*Sent:* Tuesday, March 10, 2015 5:44 PM
*To:* user@spark.apache.org
javascript:_e(%7B%7D,'cvml','user@spark.apache.org');
*Subject:* Registering custom UDAFs with HiveConetxt in SparkSQL, how?
Hi,
I need o develop couple of UDAFs and use them
/pull/3247
From: shahab [mailto:shahab.mok...@gmail.com]
Sent: Wednesday, March 11, 2015 1:44 AM
To: Cheng, Hao
Cc: user@spark.apache.org
Subject: Re: Registering custom UDAFs with HiveConetxt in SparkSQL, how?
Thanks Hao,
But my question concerns UDAF (user defined aggregation function ) not UDTF
501 - 600 of 1023 matches
Mail list logo