42 HARRODS
HARRODS LTD CD 4610 4 HARRODS
HARRODS LTD CD 4636 13 HARRODS
HARRODS LTD CD 5916 28 HARRODS
HARRODS LTD CD 4628 111 HARRODS
cheers
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<
flect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
*FAILED: ParseException line 6:7 Failed to recogni
d Stage-1_0: 0(+1)/1
2016-07-31 10:48:35,780 Stage-0_0: 1/1 Finished Stage-1_0: 1/1 Finished
Status: Finished successfully in 10.10 seconds
OK
2015-12-15 HARRODS LTD CD 4636 10.95 1
Time taken: 46.546 seconds, Fetched: 1 row(s)
Dr Mich Talebzadeh
LinkedIn *
https://w
Hi,
You can download the pdf from here
<https://talebzadehmich.files.wordpress.com/2016/08/hive_on_spark_only.pdf>
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profil
/delta_100_100_
drwxr-xr-x - hduser supergroup 0 2016-07-29 21:20
/user/hive/warehouse/accounts.db/payees/delta_101_101_
Spark fails reading this table. What options do I have here?
And interestingly Hive running on Spark engine and its works
Rather than queuing it
hive> alter table payees COMPACT 'major';
Compaction enqueued.
OK
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP
Thanks Gopal.
I am on Spark 1.6.1 and getting the following error
scala> var conn = LlapContext.newInstance(sc, hs2_url);
:28: error: not found: value LlapContext
var conn = LlapContext.newInstance(sc, hs2_url);
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/v
Thanks Alan.
One crude solution would be to copy data from the ACID table to a simple
table and present that table to Spark to see the data.
This is basically Spark optimiser issue not the engine itself
My Hive runs on Spark query engine and all works fine there.
HTH
Dr Mich Talebzadeh
to extend it
beyond 1024 rows to include the whole column in table?
VQE would be very useful especially with ORC as it basically means that one
can process the whole column separately thus improving performance of the
query.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile
Do you know the existing table schema? The new table schema will be based
on that table without partitioning?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
ATE EXTERNAL TABLE sales5 AS SELECT * FROM SALES;
FAILED: SemanticException [Error 10070]: CREATE-TABLE-AS-SELECT cannot
create external table
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/v
yes but that essentially copies the metadata and leaves the partition there
with no data. it is just an image copy. won't help this case
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile
: bigint, quantity_sold:
decimal(10,0), amount_sold: decimal(10,0)]
scala> s2.write.mode("overwrite").parquet("/data/stg/newtable/sales5")
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.l
you won't have this problem if you use Spark as the execution engine? That
handles concurrency OK
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdO
You won't have this problem if you use Spark as the execution engine! This
set up handles concurrency but Hive with Spark is not part of the HW distro.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.link
great in that case they can try it and I am pretty sure if they are stuck
they can come and ask you for expert advice since Hortonworks do not
support Hive on Spark and I know that
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
H
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:* Use it at your own risk. Any and all res
wonder whether hive.support.concurrency is set to true with zookeeper
running and hive.lock.manager set to
org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<ht
-- I am pretty sure that they will support it because the Spark option is
supported
Hortonworks support Spark but not Hive on Spark. Their official distro is
Hive on Tez + LLAP
Not sure where you get your information from though
I got it from Hortonworks and I know that
Dr Mich Talebzadeh
reated without those two columns and of course will not
be partitioned.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://ta
`prod_id` bigint,
`cust_id` bigint,
`time_id` timestamp,
`channel_id` bigint,
`promo_id` bigint,
`quantity_sold` decimal(10,0),
`amount_sold` decimal(10,0))
*PARTITIONED BY ( `year` int, `month` int)*
-
Which is not that useful
HTH
Dr Mich Talebzadeh
Linked
basically Hive thrift server and without it
would not exist
- Without Hive and HiveContext there would not be Spark-sql
I am a fan of Spark and use it extensively. However, you have to consider
the use case when talking about a product.
HTH
Dr Mich Talebzadeh
LinkedIn *
https
see them
netstat -pltenp 'Local|1000|9083'
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
what command did you use to start hiveserver2?
$HIVE_HOME/bin/hiveserver2 &
is the port 1 used?
netstat -alnp|egrep 'Local|10010'
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin
which version of hive it it?
can you log in to hive via hive cli
*$HIVE_HOME/bin/hive*
Logging initialized using configuration in
file:/usr/lib/hive/conf/hive-log4j2.properties
hive>
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view
which is a Data Warehouse.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:* Use
...I do not agree with you...
Yeah right. I am so upset. Was waiting for your nod
LOL
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
binding is of type
[org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in
file:/usr/lib/hive/conf/hive-log4j2.properties
hive>
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<
copy $HIVE_HOME/conf/hive-log4j2.properties.template to
$HIVE_HOME/conf/hive-log4j2.properties
Change the values in that file to WARN etc. For example
property.hive.log.level = INFO
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id
ump --rowindex
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:* Use it at your
Hi,
You are partitioning by Month and bucketing by date or day?
If that is the case you only have 30-31 hash partitioning (bucketing) for
each Month?
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.
potentially many
(definitely not known until we encounter them all) and if you want to
spread them evenly (after all that is what hash partitioning is all about)
then I think day of the month makes more sense.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id
Hi Rahul,
I don't believe you can drop a particular bucket in Hive
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw&g
l1| col2|
++--+
| 1|London|
| 2|NY|
| 3|California|
| 4| Dehli|
++--+
So the rows are there.
Let me go to Hive again now
hive> select * from testme;
OK
1 London
2 NY
3 California
4 Dehli
hive> analyze tabl
there are issues with locks not being released even when the transaction is
aborted. There are still entries in hive_locks.
I ended up deleting the row from hive_locks table manually. Not ideal but
you know that the lock should not be there as the table is dropped.
HTH
Dr Mich Talebzadeh
Hi Igor,
I don't think so. Well I never raised one!
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wo
Has table got data in it?
Can you create a new table WITHOUT serialization.null.format and
INSERT/SELECT from old to new, drop old and rename new to old.
If the data is already there then the setting will apply to new rows only.
That may be acceptable.
HTH
Dr Mich Talebzadeh
LinkedIn
have the underlying table to be updated been defined as transactional?
can you give the update example?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
;)
Insert into Hive table
sqltext = """
INSERT INTO TABLE dummy
SELECT
ID
, CLUSTERED
, SCATTERED
, RANDOMISED
, RANDOM_STRING
, SMALL_VC
, PADDING
FROM tmp
"""
HiveContext.sql
hm. Watching paint dry :)
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:* Use it a
Are you using a vendor distro or in-house build?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpre
out first
time and sync Hive table with Sybase IQ table real time. You will need SRS
SP 204 or above to make this work.
Talk to your DBA if they can get SRS SP from Sybase for this purpose. I
have done it many times. I think it is stable enough for this purpose.
HTH
Dr Mich Talebzadeh
Him
What is your current RDBMS and are these SQL the ones used in RDBMS?
Have you tried them on Hive?
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
timestamp AND mv.acqnum=t2.acqnum
> INNER JOIN table1 t1 on mv.acqnum=t1.deal_number
> where t1.deal_number=mv.acqnum;
>
> OUTPUT:
>
> " FAILED: ParseException line 1:221 missing EOF at 'FROM' near 'bgps' "
>
>
>
shed
Status: Finished successfully in 24.11 seconds
OK
3325000
hive> drop view v_dummy2;
OK
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6z
Sounds like there are a number of issues with Hive metastore on Postgres.
There have been a number of reports on this.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
You don't really want to mess around with the schema.
This is what I have in Oracle 12c schema for TBLS. The same as yours
[image: Inline images 1]
But this is Oracle, a serious database :)
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/vi
r set). *char(n)* and *varchar(n)* allocate
*n* bytes of storage.
What character set are you using for your server/database?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile
Trt this
hive.limit.optimize.fetch.max
- Default Value: 5
- Added In: Hive 0.8.0
Maximum number of rows allowed for a smaller subset of data for simple
LIMIT, if it is a fetch query. Insert queries are not restricted by this
limit.
HTH
Dr Mich Talebzadeh
LinkedIn *
https
when you start hive on spark do you set any parameters for the submitted
job (or read them from init file)?
set spark.master=yarn;
set spark.deploy.mode=client;
set spark.executor.memory=3g;
set spark.driver.memory=3g;
set spark.executor.instances=2;
set spark.ui.port=;
Dr Mich Talebzadeh
engine.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:* Use it at your own risk. A
into Hive table.
There are other ways of using JDBC say through Spark etc.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
my experience.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:* Use it at your ow
;
set spark.ui.port=;
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:*
type, then you possibly can sort it out.
Loads of time I have seen guys waiting for a vendor's supply or fix that
could have been sort it out in a fraction of a time cause they could not be
bothered to DIY.
We are vendor agnostic and so far so good.
HTH
Dr Mich Talebzadeh
LinkedI
sorry on Yarn only but I gather it should work with Mesos. I don't think
that comes into it.
The issue is the compatibility of Spark assembly library with Hive.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
&
Hi Vijay,
What is the use case for UPSERT in Hive. The functionality does not exist
but there are other solutions.
Are we talking about a set of dimension tables with primary keys hat need
to be updated (existing rows) or inserted (new rows)?
HTH
Dr Mich Talebzadeh
LinkedIn *
https
es much like Hive. HiveContext in
Spark is mapping here to HiveQL
var sqltext = ""
sqltext = """
SELECT rs.Month, rs.SalesChannel, round(TotalSales,2) As Sales, ....
FROM
(
SELECT t_t.CALENDAR_MONTH_DESC AS Month, t_c.CHANNEL_DESC AS SalesChannel,
SUM(t_s.AMOUN
that I
suggested earlier on may serve better.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpre
xternal table.
This seems to be OK.
The other option is only add new rows since last time with INSERT INTO
WHERE rows do not exist in target table.
Any other suggestions?
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8
partition
Year, Months, Days etc.
I thought about bucketing the partitions but one needs to balance the
housekeeping with the number of buckets within each partition. So I did not
bother.
Cheers
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id
ction happens nothing can be done to make
Spark read data.
If my assumptions are incorrect, I stand corrected.
Regards
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile
I have not encountered this case before.
However, you can create a temporary table in Hive put all writes into it,
read the rows as needed, and finally append data from the temporary table
to ORC once reads are done.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id
until compaction takes place which cannot be forced. I don't know
where there is a way to enforce quick compaction..
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile
alter table payees compact 'minor';
Compaction enqueued.
OK
It queues compaction but there is no way I can force it to do compaction
immediately?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedi
Hi,
Is there any documentation on Apache Hive proposed new releases which is
going to offer an in-memory database (IMDB) in the form of LLAP or built on
LLAP.
Love to see something like SAP ASE IMDB or Oracle 12c in-memory offerings
with Hive as well.
Regards,
Dr Mich Talebzadeh
LinkedIn
part is interesting. The primary use case for this capability is to
accelerate the analytics part of mixed OLTP and Analytical workloads by
eliminating the need* for most of the Analytics indexes. *This speed up the
analytical queries by a huge amount.
HTH
Dr Mich Talebzadeh
LinkedIn *
one is going to gain by having Hbase as the
Hive metastore? I trust that we can still use our existing schemas on
Oracle.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
e now relying on HDFS itself plus Hbase as well for
persistent storage. So the situation might change.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile
Hive 2.0.1
Subversion git://reznor-mbp-2.local/Users/sergey/git/hivegit -r
e3cfeebcefe9a19c5055afdcbb00646908340694
Compiled by sergey on Tue May 3 21:03:11 PDT 2016
>From source with checksum 5a49522e4b572555dbbe5dd4773bc7c2
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/v
just to
> optimize the query, before even running it.
>
> I guess another advantage is that using a RDBMS as metastore makes it a
> SPOF, unless you setup replication etc. while, HBase would give HA for free.
>
>
>
> On Mon, Oct 24, 2016 at 9:06 AM, Mich Talebzadeh <
does this work ok through Hive cli?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:*
l is a non-starter"
They already do and pay more if they have to. We will stick with Hive
metadata on Oracle with schema on SSD
.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.c
ables and views in your schema and you
only need it once and the schema will be populated by hive user that you
have specified the details in hive-site.xml
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.link
not touch
these system tables but things are not generally that simple.
Are you using Hbase as Hive metastore now?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:* Use it at your own risk. Any and all responsi
Have you checked running SQL with
EXPLAIN EXTENDED SELECT ..
And post the results.
In general your compact index will not work
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
Hive table in ORC format
and partition it by ndate or Dtsramp = '2016-10-27'
Then you can do periodic INSERT/SELECT from the external table to ORC
table. In that case you will utilise Store Index in Hive.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/
Enjoy the festive season.
Regards,
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:*
I can hear and see plenty of firework in this foggy London tonight :)
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
models.
I am very interested the next generation of Hive with LLAP as an-in-memory
database (not to be confused with LLAP as the execution engine) is
extremely interesting. I am looking forward to query about that and host of
others :)
cheers
Dr Mich Talebzadeh
LinkedIn *
https
.
alternatively use Sqoop to read the RDBMS table and create and import data
into Hive table. you need the JAR file for the relevevant RDBMS
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
Hive on Spark engine only works with Spark 1.3.1.
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpre
e etc
through ALTER table like below
ALTER TABLE ${DATABASE}.EXTERNALMARKETDATA set location
'hdfs://rhes564:9000/data/prices/${TODAY}';
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.c
goes directly to Parquet files.
there is an impact to business.
my suggestion is that if they want performant reads they should use Spark
SQL on Hive. it will always get the same values as stored by Hive
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/vi
.filter.columns"="ID",
"orc.bloom.filter.fpp"="0.05",
"orc.stripe.size"="268435456",
"orc.row.index.stride"="1" )
"""
HiveContext.sql(sqltext)
sqltext = """
INSERT INTO TABLE test.dummy2
SELECT
confirms this please?
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:* Use it at yo
STRRING.
What is the thread view on this?
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disc
in HDFS compared to
STRING columns?
Cheers
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disc
opposed to
String make any difference in terms of storage efficiency?
Regards
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV
Sounds like VARCHAR and CHAR types were created for Hive to have ANSI SQL
Compliance. Otherwise they seem to be practically the same as String types.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.
Hi,
Has there been any study of how much compressing Hive Parquet tables with
snappy reduces storage space or simply the table size in quantitative terms?
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<ht
join is by default inner join as in Oracle or Sybase.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpre
Hi,
I have not tried this but someone mentioned that it is possible to use
Sqoop to get data from one Impala/Hive table in one cluster to another?
The clusters are in different zones. This is to test the cluster. Has
anyone done such a thing?
Thanks
Dr Mich Talebzadeh
LinkedIn *
https
.
this is not really a test is iut?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:* Use
regardless there is no point using Sqoop for such purpose. it is not really
designed for it :)
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV
op.db/t/00_0_copy_1544
So I was wondering what are the best ways of compacting these files? Is
there any detriment when the number of these files grow very high such as
1000s of them?
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6
Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:* Use it at your own risk. Any and all responsi
Thanks Kapil.
Does this mean that one can have both Kerberos and LDAP (with SSL) and use
either?
Cheers,
Mich
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
So it translates to either LDAP or Kerberos, we cannot enable both for same
Hive Server. SSL is independent. So the supported situations are as below.
1. Anonymous authentication (w/ or w/o SSL)
2. LDAP authentication (w/ or w/o SSL)
3. Kerberos
Cheers
Dr Mich Talebzadeh
601 - 700 of 794 matches
Mail list logo