Thanks all for your reply!
I tested both approaches: registering the temp table then executing SQL vs.
saving to HDFS filepath directly. The problem with the second approach is
that I am inserting data into a Hive table, so if I create a new partition
with this method, Hive metadata is not
you might also coalesce to 1 (or some small number) before writing to avoid
creating a lot of files in that partition if you know that there is not a
ton of data.
On Wed, Dec 2, 2015 at 12:59 AM, Rishi Mishra wrote:
> As long as all your data is being inserted by Spark ,
As long as all your data is being inserted by Spark , hence using the same
hash partitioner, what Fengdong mentioned should work.
On Wed, Dec 2, 2015 at 9:32 AM, Fengdong Yu
wrote:
> Hi
> you can try:
>
> if your table under location “/test/table/“ on HDFS
> and has
Do you want to load multiple tables by using sql ? JdbcRelation now only
can load single table. It doesn't accept sql as loading command.
On Wed, Dec 2, 2015 at 4:33 PM, censj wrote:
> hi Fengdong Yu:
> I want to use sqlContext.read.format('jdbc').options( ... ).load()
I don't think there's api for that, but think it is reasonable and helpful
for ETL.
As a workaround you can first register your dataframe as temp table, and
use sql to insert to the static partition.
On Wed, Dec 2, 2015 at 10:50 AM, Isabelle Phan wrote:
> Hello,
>
> Is there
Hi
you can try:
if your table under location “/test/table/“ on HDFS
and has partitions:
“/test/table/dt=2012”
“/test/table/dt=2013”
df.write.mode(SaveMode.Append).partitionBy("date”).save(“/test/table")
> On Dec 2, 2015, at 10:50 AM, Isabelle Phan wrote:
>
>
hi guys, when I am trying to connect hive with spark-sql,I got a problem
like below:
[root@master spark]# bin/spark-shell --master local[4]log4j:WARN No appenders
could be found for logger
(org.apache.hadoop.metrics2.lib.MutableMetricsFactory).log4j:WARN Please
initialize the log4j system
Have you seen this thread ?
http://search-hadoop.com/m/q3RTtCoKmv14Hd1H1=Re+Spark+Hive+max+key+length+is+767+bytes
On Thu, Nov 26, 2015 at 5:26 AM, wrote:
> hi guys,
>
> when I am trying to connect hive with spark-sql,I got a problem like
> below:
>
>
> [root@master
'smallint', ''],
['day', 'smallint', '']]
In SparkSQL: hc.sql("DESCRIBE pub.inventory_daily").collect()
[Row(col_name=u'effective_date', data_type=u'string', comment=u''),
Row(col_name=u'listing_skey', data_type=u'int', comment=u''),
Row(col_name=u'car_durable_key', data_type=u'int'
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: sort, tree:
Sort [net_site#50 ASC,device#6 ASC], true
Exchange (RangePartitioning 200)
Project
[net_site#50,device#6,total_count#105L,adblock_count#106L,noanalytics_count#107L,unique_nk_count#109L]
HashOuterJoin
u can also put the C* seed node
> address in the spark-defaults.conf file under the SPARK_HOME/conf
> directory. Then you don’t need to manually SET it for each Beeline session.
>
>
>
> Mohammed
>
>
>
> *From:* Bryan Jeffrey [mailto:bryan.jeff...@gmail.com]
> *Sent:* Thu
Bryan Jeffrey [mailto:bryan.jeff...@gmail.com]
> *Sent:* Thursday, November 12, 2015 9:12 AM
> *To:* Mohammed Guller
> *Cc:* user
> *Subject:* Re: Cassandra via SparkSQL/Hive JDBC
>
>
>
> Mohammed,
>
>
>
> That is great. It looks like a perfect scenario. Would
e:
>
>> Did you mean Hive or Spark SQL JDBC/ODBC server?
>>
>>
>>
>> Mohammed
>>
>>
>>
>> *From:* Bryan Jeffrey [mailto:bryan.jeff...@gmail.com]
>> *Sent:* Thursday, November 12, 2015 9:12 AM
>> *To:* Mohammed Guller
>> *Cc:*
Did you mean Hive or Spark SQL JDBC/ODBC server?
Mohammed
From: Bryan Jeffrey [mailto:bryan.jeff...@gmail.com]
Sent: Thursday, November 12, 2015 9:12 AM
To: Mohammed Guller
Cc: user
Subject: Re: Cassandra via SparkSQL/Hive JDBC
Mohammed,
That is great. It looks like a perfect scenario. Would
eck that I
> presented at the Cassandra Summit 2015:
>
> http://www.slideshare.net/mg007/ad-hoc-analytics-with-cassandra-and-spark
>
>
>
>
>
> Mohammed
>
>
>
> *From:* Bryan [mailto:bryan.jeff...@gmail.com]
> *Sent:* Tuesday, November 10, 2015 7:42
PM, Mohammed Guller <moham...@glassbeam.com
>> > wrote:
>>
>>> Did you mean Hive or Spark SQL JDBC/ODBC server?
>>>
>>>
>>>
>>> Mohammed
>>>
>>>
>>>
>>> *From:* Bryan Jeffrey [mailto:bryan.jeff...@gmail.com]
to manually SET it for each Beeline session.
Mohammed
From: Bryan Jeffrey [mailto:bryan.jeff...@gmail.com]
Sent: Thursday, November 12, 2015 10:26 AM
To: Mohammed Guller
Cc: user
Subject: Re: Cassandra via SparkSQL/Hive JDBC
Answer: In beeline run the following: SET
spark.cassandra.connection.host
Any reason that Spark Cassandra connector won't work for you?
Yong
To: bryan.jeff...@gmail.com; user@spark.apache.org
From: bryan.jeff...@gmail.com
Subject: RE: Cassandra via SparkSQL/Hive JDBC
Date: Tue, 10 Nov 2015 22:42:13 -0500
Anyone have thoughts or a similar use-case for SparkSQL
presented at the Cassandra Summit 2015:
http://www.slideshare.net/mg007/ad-hoc-analytics-with-cassandra-and-spark
Mohammed
From: Bryan [mailto:bryan.jeff...@gmail.com]
Sent: Tuesday, November 10, 2015 7:42 PM
To: Bryan Jeffrey; user
Subject: RE: Cassandra via SparkSQL/Hive JDBC
Anyone have
Anyone have thoughts or a similar use-case for SparkSQL / Cassandra?
Regards,
Bryan Jeffrey
-Original Message-
From: "Bryan Jeffrey" <bryan.jeff...@gmail.com>
Sent: 11/4/2015 11:16 AM
To: "user" <user@spark.apache.org>
Subject: Cassandra via SparkSQL
Baghino <
stefano.bagh...@radicalbit.io> wrote:
> Hi Mustafa,
>
> are you trying to run geospatial queries on the PostGIS DB with SparkSQL?
> Correct me if I'm wrong, but I think SparkSQL itself would need to support
> the geospatial extensions in order for this to work.
>
>
Hi Folks,
I am trying to connect from SparkShell to PostGIS Database. Simply PostGIS
is a *spatial *extension for Postgresql, in order to support *geometry *
types.
Although the JDBC connection from spark works well with Postgresql, it does
not with a database on the same server, which supports
Hi Mustafa,
are you trying to run geospatial queries on the PostGIS DB with SparkSQL?
Correct me if I'm wrong, but I think SparkSQL itself would need to support
the geospatial extensions in order for this to work.
On Wed, Nov 4, 2015 at 1:46 PM, Mustafa Elbehery <elbeherymust...@gmail.com>
Today you have to do an explicit conversion. I'd really like to open up a
public UDT interface as part of Spark Datasets (SPARK-) that would
allow you to register custom classes with conversions, but this won't
happen till Spark 1.7 likely.
On Mon, Nov 2, 2015 at 8:40 PM, Bryan Jeffrey
All,
I have an object Joda DateTime fields. I would prefer to continue to use
the DateTime in my application. When I am inserting into Hive I need to
cast to a Timestamp field (DateTime is not supported). I added an implicit
conversion from DateTime to Timestamp - but it does not appear to be
Its super cheap. Its just a hashtable stored on the driver. Yes you can
have more than one name for the same DF.
On Wed, Oct 28, 2015 at 6:17 PM, Anfernee Xu wrote:
> Hi,
>
> I just want to understand the cost of DataFrame.registerTempTable(String),
> is it just a
' condition in SparkSQL
From: anfernee...@gmail.com
To: user@spark.apache.org
Hi,
I have a pretty large data set(2M entities) in my RDD, the data has already
been partitioned by a specific key, the key has a range(type in long), now I
want to create a bunch of key buckets, for example, the key has range
Hi,
I have a pretty large data set(2M entities) in my RDD, the data has already
been partitioned by a specific key, the key has a range(type in long), now
I want to create a bunch of key buckets, for example, the key has range
1 -> 100,
I will break the whole range into below buckets
1
QL partition by this virtual column?
>
> In this case, the full dataset will be just scanned once.
>
> Yong
>
> --
> Date: Thu, 29 Oct 2015 10:51:53 -0700
> Subject: RDD's filter() or using 'where' condition in SparkSQL
> From: anfernee...@gmail.c
an do whatever analytic function
you want.
Yong
Date: Thu, 29 Oct 2015 12:53:35 -0700
Subject: Re: RDD's filter() or using 'where' condition in SparkSQL
From: anfernee...@gmail.com
To: java8...@hotmail.com
CC: user@spark.apache.org
Thanks Yong for your response.
Let me see if I can understand what
Hi,
I just want to understand the cost of DataFrame.registerTempTable(String),
is it just a trivial operation(like creating a object reference) in
master(Driver) JVM? And Can I have multiple tables with different name
referencing to the same DataFrame?
Thanks
--
--Anfernee
Hi,
I've a partitioned table in Hive (Avro) that I can query alright from hive
cli.
When using SparkSQL, I'm able to query some of the partitions, but getting
exception on some of the partitions.
The query is:
sqlContext.sql("select * from myTable where source='http' and date =
20150812&q
Hi Anand, can you paste the table creating statement? I’d like to reproduce
that in my local first, and BTW, which version are you using?
Hao
From: Anand Nalya [mailto:anand.na...@gmail.com]
Sent: Tuesday, October 27, 2015 11:35 PM
To: spark users
Subject: SparkSQL on hive error
Hi,
I've
he-spark-user-list.1001560.n3.nabble.com/Spark-1-5-1-driver-memory-problems-while-doing-Cross-Validation-do-not-occur-with-1-4-1-td25076.html
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Anyone-feels-sparkSQL-in-spark1-5-1-very
-feels-sparkSQL-in-spark1-5-1-very-slow-tp25154p25204.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h
Hi, Lloyd,
Both runs are cold/warm? Memory/cache hit/miss could be a big factor if
your application is IO intensive. You need to monitor your system to
understand what is your bottleneck.
Good lucks,
Xiao Li
Thanks for reporting: https://issues.apache.org/jira/browse/SPARK-11032
You can probably workaround this by aliasing the count and just doing a
filter on that value afterwards.
On Thu, Oct 8, 2015 at 8:47 PM, Jeff Thompson <
jeffreykeatingthomp...@gmail.com> wrote:
> After upgrading from 1.4.1
Its purely for estimation, when guessing when its safe to do a broadcast
join. We picked a random number that we thought was larger than the common
case (its better to over estimate to avoid OOM).
On Wed, Oct 7, 2015 at 10:11 PM, vivek bhaskar wrote:
> I want to understand
After upgrading from 1.4.1 to 1.5.1 I found some of my spark SQL queries no
longer worked. Seems to be related to using count(1) or count(*) in a
nested query. I can reproduce the issue in a pyspark shell with the sample
code below. The ‘people’ table is from spark-1.5.1-bin-hadoop2.4/
-dev +user
1). Is that the reason why it's always slow in the first run? Or are there
> any other reasons? Apparently it loads data to memory every time so it
> shouldn't be something to do with disk read should it?
>
You are probably seeing the effect of the JVMs JIT. The first run is
I want to understand whats use of default size for a given datatype?
Following link mention that its for internal size estimation.
https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/types/DataType.html
Above behavior is also reflected in code where default value seems to be
used
Thanks for the suggestion. The output from EXPLAIN is indeed equivalent in
both sparkSQL and via the Thrift server. I did some more testing. The
source of the performance difference is in the way I was triggering the
sparkSQL query. I was using .count() instead of .collect(). When I use
'12345';
>
> When I submit the query via beeline & the JDBC thrift server it returns in
> 35s
> When I submit the exact same query using sparkSQL from a pyspark shell
> (sqlContex.sql("SELECT * FROM ")) it returns in 3s.
>
> Both times are obtained from the s
Hi,
I'm running a simple SQL query over a ~700 million row table of the form:
SELECT * FROM my_table WHERE id = '12345';
When I submit the query via beeline & the JDBC thrift server it returns in
35s
When I submit the exact same query using sparkSQL from a pyspark shell
(sqlContex.sql(&qu
Once you convert your data to a dataframe (look at spark-csv), try
df.write.partitionBy("", "mm").save("...").
On Thu, Oct 1, 2015 at 4:11 PM, haridass saisriram <
haridass.saisri...@gmail.com> wrote:
> Hi,
>
> I am trying to find a simple example to read a data file on HDFS. The
> file
Hi,
I am trying to find a simple example to read a data file on HDFS. The
file has the following format
a , b , c ,,mm
a1,b1,c1,2015,09
a2,b2,c2,2014,08
I would like to read this file and store it in HDFS partitioned by year and
month. Something like this
/path/to/hdfs//mm
I want to
rror] val sqlContext = new org.apache.spark.sql.SQLContext(sc)
[error] ^
[error] two errors found
[error] (compile:compile) Compilation failed*
So sparksql is not part of spark core package? I have no issue when testing
my codes in spark-shell. Thanks for the help!
--
Best regards!
Lin,Cui
t/src/main/scala/TestMain.scala:19: object sql is
> not a member of package org.apache.spark
> [error] val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> [error] ^
> [error] two errors found
> [error] (compile:compile) Compilati
org.apache.spark.sql.SQLContext;
>>> [error] ^
>>> [error] /data/workspace/test/src/main/scala/TestMain.scala:19: object sql
>>> is not a member of package org.apache.spark
>>> [error] val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>>
^
>> [error] /data/workspace/test/src/main/scala/TestMain.scala:19: object sql is
>> not a member of package org.apache.spark
>> [error] val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>> [error] ^
&g
r versions of Spark, and for the operations that are
>> still not supported, it's pretty straightforward to define your own
>> UserDefinedFunctions in either Scala or Java (I don't know about other
>> languages).
>> On Sep 11, 2015 10:26 PM, "liam" <liaml...@gmail
.scala:132)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:719)
at com.eaglepeaks.engine.SparkEngine.main(SparkEngine.java:114)
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/UDAF-and-UDT-with-
-spark-user-list.1001560.n3.nabble.com/UDAF-and-UDT-with-SparkSQL-1-5-0-tp24670.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional
or Java (I don't know about other
> languages).
> On Sep 11, 2015 10:26 PM, "liam" <liaml...@gmail.com> wrote:
>
>> Hi,
>>
>> Imaging this: the value of one column is the substring of another
>> column, when using Oracle,I got many ways to do the query like the
>> following statement,but how to do in SparkSQL since this no "concat(),
>> instr(), locate()..."
>>
>>
>> select * from table t where t.a like '%'||t.b||'%';
>>
>>
>> Thanks.
>>
>>
ect * from pokes; command ,it will OK ~,I can not understand~
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/sparksql-query-hive-data-error-tp24654.html
Sent from the Apache Spark User List mailing list archive at
).
On Sep 11, 2015 10:26 PM, "liam" <liaml...@gmail.com> wrote:
> Hi,
>
> Imaging this: the value of one column is the substring of another
> column, when using Oracle,I got many ways to do the query like the
> following statement,but how to do in SparkSQL s
:
There is an alternative for this work?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-without-access-to-arrays-tp24572.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
is data.schema.asNullable. What's the real reason for this?
Why not simply use the existing schema nullable flags?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-saveAsParquetFile-does-not-preserve-AVRO-schema-tp2p24454.html
Sent from
Hi, On our production environment, we have a unique problems related to Spark
SQL, and I wonder if anyone can give me some idea what is the best way to
handle this.
Our production Hadoop cluster is IBM BigInsight Version 3, which comes with
Hadoop 2.2.0 and Hive 0.12.
Right now, we build spark
Hi,
I just wonder if there's any way that I can get some sample data (10-20
rows) out of Spark's Hive using NodeJs?
Submitting a spark job to show 20 rows of data in web page is not good for
me.
I've set up Spark Thrift Server as shown in Spark Doc. The server works
because I can use *beeline*
in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-saveAsParquetFile-does-not-preserve-AVRO-schema-tp2.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user
://spark.apache.org/docs/latest/sql-programming-guide.html
Regards
Muhammad
On Thu, Aug 20, 2015 at 5:46 AM, Dawid Wysakowicz
wysakowicz.da...@gmail.com wrote:
Hi,
I would like to dip into SparkSQL. Get to know better the
architecture, good practices, some internals. Could you advise me some
started is the Spark SQL Guide from Apache
http://spark.apache.org/docs/latest/sql-programming-guide.html
Regards
Muhammad
On Thu, Aug 20, 2015 at 5:46 AM, Dawid Wysakowicz
wysakowicz.da...@gmail.com wrote:
Hi,
I would like to dip into SparkSQL. Get to know better the
architecture, good
...@gmail.com wrote:
Hi,
I would like to dip into SparkSQL. Get to know better the architecture,
good practices, some internals. Could you advise me some materials on this
matter?
Regards
Dawid
Hi,
I would like to dip into SparkSQL. Get to know better the architecture,
good practices, some internals. Could you advise me some materials on this
matter?
Regards
Dawid
Hi Dawid
The best pace to get started is the Spark SQL Guide from Apache
http://spark.apache.org/docs/latest/sql-programming-guide.html
Regards
Muhammad
On Thu, Aug 20, 2015 at 5:46 AM, Dawid Wysakowicz
wysakowicz.da...@gmail.com wrote:
Hi,
I would like to dip into SparkSQL. Get to know
/sql-programming-guide.html
Regards
Muhammad
On Thu, Aug 20, 2015 at 5:46 AM, Dawid Wysakowicz
wysakowicz.da...@gmail.com wrote:
Hi,
I would like to dip into SparkSQL. Get to know better the architecture,
good practices, some internals. Could you advise me some materials on this
matter
wysakowicz.da...@gmail.com wrote:
Hi,
I would like to dip into SparkSQL. Get to know better the architecture,
good practices, some internals. Could you advise me some materials on this
matter?
Regards
Dawid
question about sparksql's implementation of join on
not equality conditions, for instance condition1 or condition2.
In fact, Hive doesn't support such join, as it is very difficult to
express such conditions as a map/reduce job. However, sparksql supports
such operation. So I would like to know how
implementation of join on
not equality conditions, for instance condition1 or condition2.
In fact, Hive doesn't support such join, as it is very difficult to
express such conditions as a map/reduce job. However, sparksql supports
such operation. So I would like to know how spark implement
Hi,
I might have a stupid question about sparksql's implementation of join on
not equality conditions, for instance condition1 or condition2.
In fact, Hive doesn't support such join, as it is very difficult to express
such conditions as a map/reduce job. However, sparksql supports such
operation
Hi,
The issue only seems to happen when trying to access spark via the SparkSQL
Thrift Server interface.
Does anyone know a fix?
james
From: Wu, Walt Disney james.c...@disney.commailto:james.c...@disney.com
Date: Friday, August 7, 2015 at 12:40 PM
To: user@spark.apache.orgmailto:user
Hi,
I am using Spark SQL to run some queries on a set of avro data. Somehow I am
getting this error
0: jdbc:hive2://n7-z01-0a2a1453 select count(*) from flume_test;
Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task
3 in stage 26.0 failed 4 times, most recent
@spark.apache.orgmailto:user@spark.apache.org
user@spark.apache.orgmailto:user@spark.apache.org
Subject: SparkSQL: remove jar added by add jar command from dependencies
Hi,
I am using Spark SQL to run some queries on a set of avro data. Somehow I am
getting this error
0: jdbc:hive2://n7-z01-0a2a1453
Did anybody try to convert HiveQL queries to SparkSQL? If so, would you
share the experience, pros cons please? Thank you.
On Thu, Jul 30, 2015 at 10:37 AM, Bigdata techguy bigdatatech...@gmail.com
wrote:
Thanks Jorn for the response and for the pointer questions to Hive
optimization tips
is
happening, using compression, using the best data types for join columns,
denormalizing etc:. I am using Hive version - 0.13.
The idea behind this POC is to find the strengths of SparkSQL over HiveQL
and identify the use cases where SparkSQL can perform better than HiveQL
other than
Hi All,
I have a fairly complex HiveQL data processing which I am trying to convert
to SparkSQL to improve performance. Below is what it does.
Select around 100 columns including Aggregates
From a FACT_TABLE
Joined to the summary of the same FACT_TABLE
Joined to 2 smaller DIMENSION tables
trying to
convert to SparkSQL to improve performance. Below is what it does.
Select around 100 columns including Aggregates
From a FACT_TABLE
Joined to the summary of the same FACT_TABLE
Joined to 2 smaller DIMENSION tables.
The data processing currently takes around an hour to complete
Hi,
I am using Spark 1.3 (CDH 5.4.4). What's the recipe for setting a minimum
output file size when writing out from SparkSQL? So far, I have tried:
--x-
import sqlContext.implicits._
sc.hadoopConfiguration.setBoolean(fs.hdfs.impl.disable.cache,true)
sc.hadoopConfiguration.setLong
HI All,
I have data in MongoDb(few TBs) which I want to migrate to HDFS to do
complex queries analysis on this data.Queries like AND queries involved
multiple fields
So my question in which which format I should store the data in HDFS so
that processing will be fast for such kind of queries?
Can you provide an example of an and query ? If you do just look-up you
should try Hbase/ phoenix, otherwise you can try orc with storage index
and/or compression, but this depends on how your queries look like
Le mer. 22 juil. 2015 à 14:48, Jeetendra Gangele gangele...@gmail.com a
écrit :
HI
I do not think you can put all your queries into the row key without
duplicating the data for each query. However, this would be more last
resort.
Have you checked out phoenix for Hbase? This might suit your needs. It
makes it much simpler, because it provided sql on top of Hbase.
Nevertheless,
Query will be something like that
1. how many users visited 1 BHK flat in last 1 hour in given particular area
2. how many visitor for flats in give area
3. list all user who bought given property in last 30 days
Further it may go too complex involving multiple parameters in my query.
The
Parquet
Mohammed
From: Jeetendra Gangele [mailto:gangele...@gmail.com]
Sent: Wednesday, July 22, 2015 5:48 AM
To: user
Subject: Need help in SparkSQL
HI All,
I have data in MongoDb(few TBs) which I want to migrate to HDFS to do complex
queries analysis on this data.Queries like AND queries
if
there is any difference
At 2015-07-15 08:10:44, ogoh oke...@gmail.com wrote:
Hello,
I am using SparkSQL along with ThriftServer so that we can access using
Hive
queries.
With Spark 1.3.1, I can register UDF function. But, Spark 1.4.0 doesn't
work
for that. The jar of the udf is same.
Below is logs:
I
Hello,
I am using SparkSQL along with ThriftServer so that we can access using Hive
queries.
With Spark 1.3.1, I can register UDF function. But, Spark 1.4.0 doesn't work
for that. The jar of the udf is same.
Below is logs:
I appreciate any advice.
== With Spark 1.4
Beeline version 1.4.0
, Jul 14, 2015 at 8:46 PM, prosp4300 prosp4...@163.com wrote:
What's the result of list jar in both 1.3.1 and 1.4.0, please check if
there is any difference
At 2015-07-15 08:10:44, ogoh oke...@gmail.com wrote:
Hello,
I am using SparkSQL along with ThriftServer so that we can access using
Hi, all
I am using spark 1.4, and find some sql is not support,
especially the subquery, such as subquery in select items,
in where clause, and in predicate conditions.
So i want to know if spark support subquery or i am in the wrong way using
spark sql?
If not support subquery, is there a plan
In Jira, it says in progress
https://issues.apache.org/jira/browse/SPARK-4226
On Mon, Jul 13, 2015 at 11:10 PM, Louis Hust louis.h...@gmail.com wrote:
Hi, all
I am using spark 1.4, and find some sql is not support,
especially the subquery, such as subquery in select items,
in where clause,
to query the table. The one
you are looking for is df.printSchema()
On Mon, Jul 13, 2015 at 10:03 AM, Jerrick Hoang jerrickho...@gmail.com
wrote:
Hi all,
I'm new to Spark and this question may be trivial or has already been
answered, but when I do a 'describe table' from SparkSQL CLI it seems
, Jerrick Hoang jerrickho...@gmail.com
wrote:
Hi all,
I'm new to Spark and this question may be trivial or has already been
answered, but when I do a 'describe table' from SparkSQL CLI it seems to
try looking at all records at the table (which takes a really long time for
big table) instead
Which Spark release do you use ?
Cheers
On Sun, Jul 12, 2015 at 5:03 PM, Jerrick Hoang jerrickho...@gmail.com
wrote:
Hi all,
I'm new to Spark and this question may be trivial or has already been
answered, but when I do a 'describe table' from SparkSQL CLI it seems to
try looking at all
a 'describe table' from SparkSQL CLI it seems to
try looking at all records at the table (which takes a really long time for
big table) instead of just giving me the metadata of the table. Would
appreciate if someone can give me some pointers, thanks!
--
Best Regards,
Ayan Guha
it will try to query the table. The one
you are looking for is df.printSchema()
On Mon, Jul 13, 2015 at 10:03 AM, Jerrick Hoang jerrickho...@gmail.com
wrote:
Hi all,
I'm new to Spark and this question may be trivial or has already been
answered, but when I do a 'describe table' from SparkSQL CLI
Hi all,
I'm new to Spark and this question may be trivial or has already been
answered, but when I do a 'describe table' from SparkSQL CLI it seems to
try looking at all records at the table (which takes a really long time for
big table) instead of just giving me the metadata of the table. Would
Hi,
I have a very simple setup of SparkSQL connecting to a Postgres DB and I'm
trying to get a DataFrame from a table, the Dataframe with a number X of
partitions (lets say 2). The code would be the following:
MapString, String options = new HashMapString, String();
options.put(url, DB_URL
Hi folks, I just re-wrote a query from using UNION ALL to use with rollup
and I'm seeing some unexpected behavior. I'll open a JIRA if needed but
wanted to check if this is user error. Here is my code:
case class KeyValue(key: Int, value: String)
val df = sc.parallelize(1 to 50).map(i=KeyValue(i,
Can you please post result of show()?
On 10 Jul 2015 01:00, Yana Kadiyska yana.kadiy...@gmail.com wrote:
Hi folks, I just re-wrote a query from using UNION ALL to use with
rollup and I'm seeing some unexpected behavior. I'll open a JIRA if needed
but wanted to check if this is user error. Here
+---+---+---+
|cnt|_c1|grp|
+---+---+---+
| 1| 31| 0|
| 1| 31| 1|
| 1| 4| 0|
| 1| 4| 1|
| 1| 42| 0|
| 1| 42| 1|
| 1| 15| 0|
| 1| 15| 1|
| 1| 26| 0|
| 1| 26| 1|
| 1| 37| 0|
| 1| 10| 0|
| 1| 37| 1|
| 1| 10| 1|
| 1| 48| 0|
| 1| 21| 0|
| 1| 48| 1|
| 1| 21| 1|
|
Never mind, I’ve created the jira issue at
https://issues.apache.org/jira/browse/SPARK-8972.
From: Cheng, Hao [mailto:hao.ch...@intel.com]
Sent: Friday, July 10, 2015 9:15 AM
To: yana.kadiy...@gmail.com; ayan guha
Cc: user
Subject: RE: [SparkSQL] Incorrect ROLLUP results
Yes, this is a bug, do
301 - 400 of 1023 matches
Mail list logo