to Hive on Spark or they apply equally to Hive on
MapReduce as well. In other words a general issue with Hive optimizer case
hive-9044?
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/prof
default that runs on port 1
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:*
Please send a brief message to Unsubscribe: user-unsubscr...@hive.apache.org
in here <https://hive.apache.org/mailing_lists.html>
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profil
Nice one Shaw
hive> set hive.execution.engine;
hive.execution.engine=mr
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
can switch the engines
set hive.execution.engine=tez;
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
ut the cluster. It must be using some
clever algorithm to do so.
Cheers
.
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzad
later but it will be very useful to remove thriftserver, if we can. "
Cheers,
Mich
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV
I guess that is what DAG adds up to with Tez
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpre
thanks Marcin.
What Is your guesstimate on the order of "faster" please?
Cheers
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV
to Hive on MR.
One experiment is worth hundreds of opinions
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
I suggest that you try it for yourself then
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disc
memory
computing.
As usual your mileage varies.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
Time taken: 0.1 seconds, Fetched: 44 row(s)
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linke
compared to Hive or not? Will it keep the data in memory for reuse or not.
6. What I don't understand what makes Tez and LLAP more efficient
compared to Spark!
Cheers
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<ht
14.38
ORC 202.33317.77
Still I would use Spark if I had a choice and I agree that on VLT (very
large tables), the limitation in available memory may be the overriding
factor in using Spark.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profil
umulative CPU: 721.83 sec HDFS Read:
400442823 HDFS Write: 10 SUCCESS
Total MapReduce CPU Time Spent: 12 minutes 1 seconds 830 msec
OK
1
*Time taken: 239.532 seconds, Fetched: 1 row(s)*
I leave it to you guys to guess which one is better :)
Cheers
Dr Mich Talebzadeh
LinkedIn
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:* Use it at your own risk. A
of transaction activity using ORC files with
Insert/Update/Delete that need to communicate with metastore with heartbeat
etc?
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
Hi
Is this available in Hive 2?
hive> set hive.async.log.enabled=false;
Query returned non-zero code: 1, cause: hive configuration
hive.async.log.enabled does not exists.
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV
16/07/07 11:36:22 [main]: DEBUG conf.VariableSubstitution: Substitution is
on: hive
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
erested please register here
<http://www.meetup.com/futureofdata-london/events/232423292/>
Looking forward to seeing those who can make it to have an interesting
discussion and leverage your experience.
Regards,
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEAA
mal(10,0))
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5,
_col6
Statistics: Num rows: 689132 Data size: 203983072 Basic
stats: COMPLETE Column stats: NONE
ListSink
thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view
(10,0))
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5,
_col6
ListSink
*And Hive on Spark returns the same 24 rows in 30 seconds*
Ok Hive query is just slower with Spark engine.
Assuming that the time taken will be optimization time + query time then it
appear
,
sb_gu_key ORDER BY t_ev_st_dt )
select from_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss') AS EndTime;
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd
Hi,
Can you please send the output of
DESC FORMATTED
after running (if you have not so already)
ANALYZE TABLE COMPUTE STATISTICS FOR COLUMN
For both tables?
HTH,
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<ht
5",
"orc.stripe.size"="16777216",
"orc.row.index.stride"="1"
)
Others may have better ideas.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.co
Do you also have the output from
desc formatted tuning_dd_key
and send the output please?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV
Sounds like it is picking up results from both metastores!
May be the cluster is not set up correctly. it should always pickup from
the active node (just one)
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<ht
Hi Karthi,
Those database names are picked up from the metadata of Hive/ Do you know
the type of RDBMS that holds your Hive database.
Check hive-site.xml for
javax.jdo.option.ConnectionURL
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id
for
transaction logic but Spark somehow cannot do that.
In short that is it. You need to do that through Hive.
Cheers,
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
Hi
Are you using map-reduce as execution engine?
what version of Hive are you on?
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
of rows returned) of 100M
rows.
Yes I noticed your version of Hive at 1.1 on a vendor's package.
At this stage the question is what other alternatives are there to fetch
that 100Miilom rows.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id
y 1;
DB_ID DBNAME
--
1 default
2 asehadoop
6 oraclehadoop
11 test
16 iqhadoop
22 mytable_db
31 accounts
36 twitterdb
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view
is the underlying table partitioned i.e.
'SELECT FROM `db`.`table` WHERE (year=2016 AND month=6
AND day=1 AND hour=10)'
and also what is the RS size it is expected.
JDBC on its own should work. Is this an ORC table?
What version of Hive are you using?
HTH
Dr Mich Talebzadeh
LinkedIn
Hi David,
What are you actually trying to do with the data.
Hive and map-reduce are notoriously slow for this type of operations. Hive
is good for storage that is what I vouch for.
There are other alternatives.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id
I could find the stats time for columns.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On
Ok use explain extended your sql query to see if the optimizer makes a good
decision.
Help the optimizer by doing stats update at column level
ANALYZE TABLE COMPUTE STATISTICS FOR COLUMNS
use desc formatted to see the stats#
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com
and In-memory computing will do a much better
job.
So
1. Use Hive with its metadata to store data on HDFS
2. Use Spark SQL to query that Data. Orders of magnitude faster.
However, I am all for you trying what Jorn suggested.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile
Nothing.
Hive does not support external indexes even in version 2.
In other words, although you create indexes, they are not visible to Hive
optimizer as you have found out.
I wrote an article on this hoping that we should have external indexes
being used .
HTH
Dr Mich Talebzadeh
LinkedIn
Hi,
Is this automatic stats update is basic statistics or for all columns?
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
tatistics is not going to make that much
difference. I would rather spend more time on making external indexes
useful for the optimizer.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/pro
t new. Has been around for a good time in
RDBMS and can impact the performance of other queries running. So I am not
sure it can be considered as blessing.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www
Hi,
My point was we are where we are and in this juncture there is no
collection of statistics for complex columns. That may be a future
enhancement.
But then the obvious question is how useful or meaningful these statistics
is going to be?
HTH
Dr Mich Talebzadeh
LinkedIn *
https
Thank you for cut and pace monologue.
very impressive. I will try to remember it
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
you must excuse my ignorance
can you please elaborate on this as there seems something has gone wrong
somewhere?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
Hi Mahendar,
Did you load the meta-data DB/schema from backup and now seeing this error
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV
ou stating that there is somehow some
explanation for optimiser "access path" that comes out independent of the
optimizer and is called syntax tree?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linke
on Hive 2 and it does not.
hive> analyze table foo compute statistics for columns;
FAILED: UDFArgumentTypeException Only primitive type arguments are accepted
but array is passed.
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<
Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 14 June 2016 at 08:37, Aviral Agarwal <
you want to flatten the query I understand.
create temporary table tmp as select c from d;
INSERT INTO TABLE a
SELECT c from tmp where
condition
Is the INSERT code correct?
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
which version of Hive are you using?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 13 Jun
Well I know that the script works fine for Oracle (both base and
transactional).
Ok this is what this table is in Oracle. That column is 256 bytes.
[image: Inline images 2]
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
thanks Gopal
that link
404 - OOPS!
Looks like you wandered too far from the herd!
LOL
Any reason why that table in Hive cannot read data in?
cheers
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.
455594 2016-06-09 09:54
/twitter_data/FlumeData.1465462435124
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
that Spark assumes no concurrency
for Hive table. It is probably the same reason why updates/deletes to Hive
ORC transactional tables through Spark fail.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.
,
hive.compactor.worker.threads, hive.support.concurrency (true),
hive.enforce.bucketing
(true), and hive.exec.dynamic.partition.mode (nonstrict).
The default DummyTxnManager replicates pre-Hive-0.13 behavior and
provides
no transactions.
Dr Mich Talebzadeh
LinkedIn
ive there is the issue with DDL + DML locks applied in a
single transaction i.e. --> create table A as select * from b
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profil
issue here
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 8 June 2016 at 22:36, Michael
table is locked as SHARED_READ
2. With Spark --> No locks at all
3. With HIVE --> No locks on the target table
4. With Spark --> No locks at all
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https:
ask 0 in stage 1.0 failed 1 times;
aborting job
Suggested solution.
In a concurrent env, Spark should apply locks in order to prevent such
operations. Locks are kept in Hive meta data table HIVE_LOCKS
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view
data
from compressed to none when you read it or whatever. So yes there is a
performance price to pay albeit small using more CPU to uncompress the data
and present it. However, that is a small price to pay to reduce the storage
cost for data.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.
sorry should read* staging *tables ..
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 6 Jun
/Hadoop/hdfs-site.xml
dfs.permissions
false
There are other ways as well.
Check this
http://stackoverflow.com/questions/11593374/permission-denied-at-hdfs
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<ht
t.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deplo
and password?
The data is immutable What is the use case for this table? Are you going to
use data later in app/Hive and if so do you have permission to read it.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<ht
improves performance.
I much doubt whichever way you go it is really going to have that impact on
your performance.
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
That order datetime/userid/customerId looks more natural to me.
Two questions:
What is the type of table in Hive?
Are you doing this for certain queries where you think userid as the most
significant column is going to help queries better?
HTH
Dr Mich Talebzadeh
LinkedIn *
https
7T02:10:44.527",1,10),substring ("2016-05-17T02:10:44.527",12))
as timestamp) as adddate;
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2g
thanks for that.
I will have a look
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 2 Jun
are going to have support for
transactions in Spark for Hive ORC tables. This will really be useful.
Thanks,
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6Ac
loy a resource manager. I do not use map-reduce I
use Spark as an execution engine and it does run on yarn-client mode in my
case.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/v
t table 't' (1 rows affected)'
T. 2016/04/08 09:44:44. (89): Command sent to 'hiveserver2.asehadoop':
T. 2016/04/08 09:44:44. (89): 'Bulk insert table 't' (1 rows affected)'
T. 2016/04/08 09:45:37. (90): Command sent to 'hiveserver2.asehadoop':
Dr Mich Talebzadeh
LinkedIn *
http
. With Hive on Spark vs Hive on MapReduce the
performance gains are order of magnitude.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
. With Hive on Spark vs Hive on MapReduce the
performance gains are order of magnitude.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
ffic I am afraid
- LLAP I don't know
Sounds like Hortonworks promote TEZ and Cloudera does not want to know
anything about Hive. and they promote Impala but that sounds like a sinking
ship these days.
Having said that I will try TEZ + LLAP :) No pun intended
Regards
Dr Mich Talebzadeh
L
only hdfs can write to it not user
Sandeep?
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On
is this location correct and valid?
LOCATION '/data/SentimentFiles/*SentimentFiles*/upload/data/tweets_raw/'
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
data). 80-20 rule?
In reality may be just 2TB or most recent partitions etc. The rest is cold
data.
cheers
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
another stack like Tez.
Cloudera support Impala instead of Hive but it is not something I have
used. .
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
with no access right given?
--
1> use ASEIMDB
2> go
Msg 10351, Level 14, State 1:
Server 'SYB_157', Line 1:
Server user id 24 is not a valid user in database 'ASEIMDB'
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<
have access rights to
that database.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 30 Ma
oup 1588 2016-05-25 16:46
hdfs://rhes564:9000/export/
*_metadata*drwxr-xr-x - hduser supergroup 0 2016-05-25 16:46
hdfs://rhes564:9000/export/data
and uses the metadata file to create the target table which somehow does
not work in this case!
HTH
Dr Mich Talebzadeh
LinkedIn *
ht
Hi Gopal,
please see my correspondence about Tez in tez user group. I forwarded to
hive user group.
thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
select count(1) from test.sales_staging;
exit;
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 30 May 201
ke it work as I have hive on spark engine as well.
please tell me what version of tez and yarn etc. I
thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6Ac
thanks Damien.
I tried TEZ 0.82 with Hive 2 although I did not persevere.
When you say "Not stable" are you referring to using it with YARN etc.
In short at the simplest set up what Resource Manager it works with?
Cheers
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/pr
thanks I think the problem is that the TEZ user group is exceptionally
quiet. Just sent an email to Hive user group to see anyone has managed to
built a vendor independent version.
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
Please bear in mind that I am talking about your own build not anything
comes as part of Vendor's package.
If so kindly specify both Hive and TEZ versions.
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<ht
but of course Spark has both plus
in-memory capability.
It would be interesting to see what version of TEZ works as execution
engine with Hive.
Vendors are divided on this (use Hive with TEZ) or use Impala instead of
Hive etc as I am sure you already know.
Cheers,
Dr Mich Talebzadeh
LinkedIn
yep
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 29 May 2016 at 18:01, Igor Kravzov <
ate intermittent "No
such lock.." and "No such transaction..." errors. Setting
"datanucleus.connectionPoolingType=DBCP" is recommended in this case
So I changed the setting to DBCP. Don't know how useful it is going to be.
Regards,
Dr Mich Talebzadeh
Linked
be an option. NAS is better as it
saves scp and copy across with taget having enough external space to get
the files in.
More useful tool would be to export the full Hive database in binary format
and import it in target.
Cheers
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile
the reuse of connection objects and reduce the number of
times that connection objects are created. Connection pools significantly
improve performance for database-intensive applications because creating
connection objects is costly both in terms of time and resources.
Thanks
Dr Mich Talebzadeh
not sure vendors do
parallelise this sort of things.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpre
r_expression1* is equal to
*char_expression2* or* uchar_expression2*.
-
-1 – indicates that *char_expression1* or *uchar_expression1* is less
than *char_expression2 *or* uchar expression2*.
hive> select compare("aaa", "bbb");
FAILED: SemanticException [Error 100
only col4
hive> insert into testme (col4) values(6);
Loading data to table test.testme
OK
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPC
con
nections: 2
2016-05-24T16:16:44,864 INFO [0deb842d-9b15-4dd9-8d60-0e198a9d3865
0deb842d-9b15-4dd9-8d60-0e198a9d3865 main]: hive.metastore
(HiveMetaStoreClient.java:open(505)) - Connected to metastore.
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id
large table from Oracle to Hive and decided to use Spark
1.6.1 with Hive 2 on Spark 1.3.1 and that worked fine. We just used JDBC
connection with temp table and it was good. We could have used sqoop but
decided to settle for Spark so it all depends on use case.
HTH
Dr Mich Talebzadeh
LinkedIn
Thanks Seth.
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 23 May 2016 at 21:08, Siddharth Se
Hi Sharath
See this thread
Using Spark on Hive with Hive also using Spark as its execution engine
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
201 - 300 of 741 matches
Mail list logo