set of data. I'm processing records with nested
> structure, containing subtypes and arrays. 1 record takes up several KB.
>
> I tried to make some improvement with cache table:
>
> cache table event_jan_01 as select * from events where day_registered =
> 20190102;
>
&g
d I'm searching for best performing solution
> to query hot set of data. I'm processing records with nested structure,
> containing subtypes and arrays. 1 record takes up several KB.
>
> I tried to make some improvement with cache table:
> cache table event_jan_01 as select * fr
erver and I'm searching for best performing
> solution to query hot set of data. I'm processing records with nested
> structure, containing subtypes and arrays. 1 record takes up several KB.
>
> I tried to make some improvement with cache table:
>
> cache table event_jan_01 as selec
Hello,
I'm using spark-thrift server and I'm searching for best performing
solution to query hot set of data. I'm processing records with nested
structure, containing subtypes and arrays. 1 record takes up several KB.
I tried to make some improvement with cache table:
cache table event_jan_01
, and will be
cached individually.
Yong
From: Taotao.Li <charles.up...@gmail.com>
Sent: Sunday, November 20, 2016 6:18 AM
To: Rabin Banerjee
Cc: Yong Zhang; user; Mich Talebzadeh; Tathagata Das
Subject: Re: Will spark cache table once even if I call read
6
>>
>>
>>
>> Yong
>>
>> ------
>> *From:* Rabin Banerjee <dev.rabin.baner...@gmail.com>
>> *Sent:* Friday, November 18, 2016 10:36 AM
>> *To:* user; Mich Talebzadeh; Tathagata Das
>> *Subject:* Will spark cache ta
com>
> *Sent:* Friday, November 18, 2016 10:36 AM
> *To:* user; Mich Talebzadeh; Tathagata Das
> *Subject:* Will spark cache table once even if I call read/cache on the
> same table multiple times
>
> Hi All ,
>
> I am working in a project where code is divided into multi
:36 AM
To: user; Mich Talebzadeh; Tathagata Das
Subject: Will spark cache table once even if I call read/cache on the same
table multiple times
Hi All ,
I am working in a project where code is divided into multiple reusable module
. I am not able to understand spark persist/cache on that c
Hi All ,
I am working in a project where code is divided into multiple reusable
module . I am not able to understand spark persist/cache on that context.
My Question is Will spark cache table once even if I call read/cache on the
same table multiple times ??
Sample Code ::
TableReader
. But i did not
find any way to check if Spark is using User Memory or not.
Please let me know if we can verify the scenario.
Thanks,
Yogesh
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-verify-in-Spark-1-6-x-usage-User-Memory-used-after-Cache
hat the slow process mainly
> caused by GC pressure and I had understand this difference
> just from your advice.
>
> I had each executor memory with 6GB and try to cache table.
> I had 3 executors and finally I can see some info from the spark job ui
> storage, like the following:
>
Oh, thanks. Make sense to me.
Best,
Sun.
fightf...@163.com
From: Takeshi Yamamuro
Date: 2016-02-04 16:01
To: fightf...@163.com
CC: user
Subject: Re: Re: About cache table performance in spark sql
Hi,
Parquet data are column-wise and highly compressed, so the size of deserialized
rows
Hi,
Thanks a lot for your explaination. I know that the slow process mainly caused
by GC pressure and I had understand this difference
just from your advice.
I had each executor memory with 6GB and try to cache table.
I had 3 executors and finally I can see some info from the spark job ui
does not have enough heap.
Thanks,
Prabhu Joseph
On Thu, Feb 4, 2016 at 11:25 AM, fightf...@163.com <fightf...@163.com>
wrote:
> Hi,
>
> I want to make sure that the cache table indeed would accelerate sql
> queries. Here is one of my use case :
> impala table size : 24.5
Hi,
I want to make sure that the cache table indeed would accelerate sql queries.
Here is one of my use case :
impala table size : 24.59GB, no partitions, with about 1 billion+ rows.
I use sqlContext.sql to run queries over this table and try to do cache and
uncache command to see
Hi all,
I'm connected to the thrift server using beeline on Spark 1.6.
I used : cache table tbl as select * from table1
I see table1 in the storage memory. I can use it. But when I reconnect, I cant
quert it anymore.
I get : Error: org.apache.spark.sql.AnalysisException: Table not found: table1
why "cache table a as select * from b" will do shuffle,and create 2 stages.
example:
table "ods_pay_consume" is from "KafkaUtils.createDirectStream"
hiveContext.sql("cache table dwd_pay_consume as select * from
ods_pay_consume"
why "cache table a as select * from b" will do shuffle,and create 2 stages.
example:
table "ods_pay_consume" is from "KafkaUtils.createDirectStream"
hiveContext.sql("cache table dwd_pay_consume as select * from
ods_pay_consume"
Hi all,
Do you know if there is an option to specify how many replicas we want
while caching in memory a table in SparkSQL Thrift server? I have not seen
any option so far but I assumed there is an option as you can see in the
Storage section of the UI that there is 1 x replica of your
hi all:
I got a spark on yarn cluster (spark-1.3.0, hadoop-2.2.0) with hive-0.12.0 and
tachyon-0.6.1,
and now I start SparkSQL thriftserver with start-thriftserver.sh, and use
beeline to connect to thriftserver according to spark document.
My question is: how to cache table with specified
...@spark.apache.org
Cc: user@spark.apache.orgmailto:user@spark.apache.org
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: HiveContext: cache table not supported for partitioned table?
Cache table works with partitioned table.
I guess you’re experimenting with a default local metastore
Hi,
In Spark 1.1 HiveContext, I ran a create partitioned table command followed by
a cache table command and got a java.sql.SQLSyntaxErrorException: Table/View
'PARTITIONS' does not exist. But cache table worked fine if the table is not a
partitioned table.
Can anybody confirm that cache
Cache table works with partitioned table.
I guess you’re experimenting with a default local metastore and the
metastore_db directory doesn’t exist at the first place. In this case,
all metastore tables/views don’t exist at first and will throw the error
message you saw when the |PARTITIONS
:
col1 = STRING
col2 = STRING
col3 = STRING
col4 = Partition Field (TYPE STRING)
Queries
cache table table1;
--Run some other queries on other data
select col1 from table1
where col2 = 'foo' and col3 = 'bar' and col4 = 'foobar' and col1 is not
null limit 100
Fairly simple query.
When I
I am using Spark's Thrift server to connect to Hive and use JDBC to issue
queries. Is there a way to cache table in Sparck by using JDBC call?
Thanks,
Ken
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/cache-table-with-JDBC-tp12675.html
Sent from
Built in (and Thrift Server)
My query is only selecting one STRING column from the data, but only
returning data based on other columns .
Types:
col1 = STRING
col2 = STRING
col3 = STRING
col4 = Partition Field (TYPE STRING)
Queries
cache table table1;
--Run some other queries on other data
26 matches
Mail list logo