Re: 答复: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-13 Thread Mich Talebzadeh
to Hive on Spark or they apply equally to Hive on MapReduce as well. In other words a general issue with Hive optimizer case hive-9044? Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/prof

Re: can't start up hive 2.1 hiveserver2/metastore services

2016-07-13 Thread Mich Talebzadeh
default that runs on port 1 HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:*

Re: Verifying Hive execution engine used within a session

2016-07-13 Thread Mich Talebzadeh
Please send a brief message to Unsubscribe: user-unsubscr...@hive.apache.org in here <https://hive.apache.org/mailing_lists.html> HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profil

Re: Verifying Hive execution engine used within a session

2016-07-13 Thread Mich Talebzadeh
Nice one Shaw hive> set hive.execution.engine; hive.execution.engine=mr Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>

Verifying Hive execution engine used within a session

2016-07-13 Thread Mich Talebzadeh
can switch the engines set hive.execution.engine=tez; Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Mich Talebzadeh
ut the cluster. It must be using some clever algorithm to do so. Cheers . Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzad

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Mich Talebzadeh
later but it will be very useful to remove thriftserver, if we can. " Cheers, Mich Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Mich Talebzadeh
I guess that is what DAG adds up to with Tez Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpre

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Mich Talebzadeh
thanks Marcin. What Is your guesstimate on the order of "faster" please? Cheers Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Mich Talebzadeh
to Hive on MR. One experiment is worth hundreds of opinions Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Mich Talebzadeh
I suggest that you try it for yourself then Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disc

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Mich Talebzadeh
memory computing. As usual your mileage varies. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Mich Talebzadeh
.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink Time taken: 0.1 seconds, Fetched: 44 row(s) HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linke

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread Mich Talebzadeh
compared to Hive or not? Will it keep the data in memory for reuse or not. 6. What I don't understand what makes Tez and LLAP more efficient compared to Spark! Cheers Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <ht

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread Mich Talebzadeh
14.38 ORC 202.33317.77 Still I would use Spark if I had a choice and I agree that on VLT (very large tables), the limitation in available memory may be the overriding factor in using Spark. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profil

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread Mich Talebzadeh
umulative CPU: 721.83 sec HDFS Read: 400442823 HDFS Write: 10 SUCCESS Total MapReduce CPU Time Spent: 12 minutes 1 seconds 830 msec OK 1 *Time taken: 239.532 seconds, Fetched: 1 row(s)* I leave it to you guys to guess which one is better :) Cheers Dr Mich Talebzadeh LinkedIn

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread Mich Talebzadeh
Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. A

Re: Hive Metastore on Amazon Aurora

2016-07-11 Thread Mich Talebzadeh
of transaction activity using ORC files with Insert/Update/Delete that need to communicate with metastore with heartbeat etc? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view

Re: hive 2.1.0 beeline cannot show verbose log

2016-07-07 Thread Mich Talebzadeh
Hi Is this available in Hive 2? hive> set hive.async.log.enabled=false; Query returned non-zero code: 1, cause: hive configuration hive.async.log.enabled does not exists. Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV

Re: hive 2.1.0 beeline cannot show verbose log

2016-07-07 Thread Mich Talebzadeh
16/07/07 11:36:22 [main]: DEBUG conf.VariableSubstitution: Substitution is on: hive HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

Presentation in London: Running Spark on Hive or Hive on Spark

2016-07-06 Thread Mich Talebzadeh
erested please register here <http://www.meetup.com/futureofdata-london/events/232423292/> Looking forward to seeing those who can make it to have an interesting discussion and leverage your experience. Regards, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAA

Re: Querying Hive tables from Spark

2016-06-27 Thread Mich Talebzadeh
mal(10,0)) outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6 Statistics: Num rows: 689132 Data size: 203983072 Basic stats: COMPLETE Column stats: NONE ListSink thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view

Querying Hive tables from Spark

2016-06-27 Thread Mich Talebzadeh
(10,0)) outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6 ListSink *And Hive on Spark returns the same 24 rows in 30 seconds* Ok Hive query is just slower with Spark engine. Assuming that the time taken will be optimization time + query time then it appear

Re: Optimize Hive Query

2016-06-24 Thread Mich Talebzadeh
, sb_gu_key ORDER BY t_ev_st_dt ) select from_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss') AS EndTime; HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd

Re: Hive/Tez ORC tables -- rawDataSize value

2016-06-23 Thread Mich Talebzadeh
Hi, Can you please send the output of DESC FORMATTED after running (if you have not so already) ANALYZE TABLE COMPUTE STATISTICS FOR COLUMN For both tables? HTH, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <ht

Re: Optimize Hive Query

2016-06-23 Thread Mich Talebzadeh
5", "orc.stripe.size"="16777216", "orc.row.index.stride"="1" ) Others may have better ideas. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.co

Re: Optimize Hive Query

2016-06-23 Thread Mich Talebzadeh
Do you also have the output from desc formatted tuning_dd_key and send the output please? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV

Re: Show Redudant database name in Beeline -Hive 2.0

2016-06-22 Thread Mich Talebzadeh
Sounds like it is picking up results from both metastores! May be the cluster is not set up correctly. it should always pickup from the active node (just one) Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <ht

Re: Show Redudant database name in Beeline -Hive 2.0

2016-06-22 Thread Mich Talebzadeh
Hi Karthi, Those database names are picked up from the metadata of Hive/ Do you know the type of RDBMS that holds your Hive database. Check hive-site.xml for javax.jdo.option.ConnectionURL HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id

Re: Spark support for update/delete operations on Hive ORC transactional tables

2016-06-22 Thread Mich Talebzadeh
for transaction logic but Spark somehow cannot do that. In short that is it. You need to do that through Hive. Cheers, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view

Re: loading in ORC from big compressed file

2016-06-22 Thread Mich Talebzadeh
Hi Are you using map-reduce as execution engine? what version of Hive are you on? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

Re: Network throughput from HiveServer2 to JDBC client too low

2016-06-21 Thread Mich Talebzadeh
of rows returned) of 100M rows. Yes I noticed your version of Hive at 1.1 on a vendor's package. At this stage the question is what other alternatives are there to fetch that 100Miilom rows. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id

Re: Show Redudant database name in Beeline -Hive 2.0

2016-06-21 Thread Mich Talebzadeh
y 1; DB_ID DBNAME -- 1 default 2 asehadoop 6 oraclehadoop 11 test 16 iqhadoop 22 mytable_db 31 accounts 36 twitterdb HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view

Re: Network throughput from HiveServer2 to JDBC client too low

2016-06-21 Thread Mich Talebzadeh
is the underlying table partitioned i.e. 'SELECT FROM `db`.`table` WHERE (year=2016 AND month=6 AND day=1 AND hour=10)' and also what is the RS size it is expected. JDBC on its own should work. Is this an ORC table? What version of Hive are you using? HTH Dr Mich Talebzadeh LinkedIn

Re: Network throughput from HiveServer2 to JDBC client too low

2016-06-20 Thread Mich Talebzadeh
Hi David, What are you actually trying to do with the data. Hive and map-reduce are notoriously slow for this type of operations. Hive is good for storage that is what I vouch for. There are other alternatives. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id

Re: last stats time on table columns

2016-06-17 Thread Mich Talebzadeh
I could find the stats time for columns. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On

Re: Hive indexes without improvement of performance

2016-06-16 Thread Mich Talebzadeh
Ok use explain extended your sql query to see if the optimizer makes a good decision. Help the optimizer by doing stats update at column level ANALYZE TABLE COMPUTE STATISTICS FOR COLUMNS use desc formatted to see the stats# HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com

Re: Hive indexes without improvement of performance

2016-06-16 Thread Mich Talebzadeh
and In-memory computing will do a much better job. So 1. Use Hive with its metadata to store data on HDFS 2. Use Spark SQL to query that Data. Orders of magnitude faster. However, I am all for you trying what Jorn suggested. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile

Re: Hive indexes without improvement of performance

2016-06-16 Thread Mich Talebzadeh
Nothing. Hive does not support external indexes even in version 2. In other words, although you create indexes, they are not visible to Hive optimizer as you have found out. I wrote an article on this hoping that we should have external indexes being used . HTH Dr Mich Talebzadeh LinkedIn

Re: column statistics for non-primitive types

2016-06-14 Thread Mich Talebzadeh
Hi, Is this automatic stats update is basic statistics or for all columns? Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

Re: column statistics for non-primitive types

2016-06-14 Thread Mich Talebzadeh
tatistics is not going to make that much difference. I would rather spend more time on making external indexes useful for the optimizer. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/pro

Re: column statistics for non-primitive types

2016-06-14 Thread Mich Talebzadeh
t new. Has been around for a good time in RDBMS and can impact the performance of other queries running. So I am not sure it can be considered as blessing. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www

Re: column statistics for non-primitive types

2016-06-14 Thread Mich Talebzadeh
Hi, My point was we are where we are and in this juncture there is no collection of statistics for complex columns. That may be a future enhancement. But then the obvious question is how useful or meaningful these statistics is going to be? HTH Dr Mich Talebzadeh LinkedIn * https

Re: Optimized Hive query

2016-06-14 Thread Mich Talebzadeh
Thank you for cut and pace monologue. very impressive. I will try to remember it Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

Re: ORC does not support type conversion from INT to STRING.

2016-06-14 Thread Mich Talebzadeh
you must excuse my ignorance can you please elaborate on this as there seems something has gone wrong somewhere? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view

Re: ORC does not support type conversion from INT to STRING.

2016-06-14 Thread Mich Talebzadeh
Hi Mahendar, Did you load the meta-data DB/schema from backup and now seeing this error Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV

Re: Optimized Hive query

2016-06-14 Thread Mich Talebzadeh
ou stating that there is somehow some explanation for optimiser "access path" that comes out independent of the optimizer and is called syntax tree? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linke

Re: column statistics for non-primitive types

2016-06-14 Thread Mich Talebzadeh
on Hive 2 and it does not. hive> analyze table foo compute statistics for columns; FAILED: UDFArgumentTypeException Only primitive type arguments are accepted but array is passed. Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <

Re: Optimized Hive query

2016-06-14 Thread Mich Talebzadeh
Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 14 June 2016 at 08:37, Aviral Agarwal <

Re: Optimized Hive query

2016-06-13 Thread Mich Talebzadeh
you want to flatten the query I understand. create temporary table tmp as select c from d; INSERT INTO TABLE a SELECT c from tmp where condition Is the INSERT code correct? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: column statistics for non-primitive types

2016-06-13 Thread Mich Talebzadeh
which version of Hive are you using? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 13 Jun

Re: Hive Table Creation failure on Postgres

2016-06-09 Thread Mich Talebzadeh
Well I know that the script works fine for Oracle (both base and transactional). Ok this is what this table is in Oracle. That column is 256 bytes. [image: Inline images 2] HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Using Hive table for twitter data

2016-06-09 Thread Mich Talebzadeh
thanks Gopal that link 404 - OOPS! Looks like you wandered too far from the herd! LOL Any reason why that table in Hive cannot read data in? cheers Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.

Using Hive table for twitter data

2016-06-09 Thread Mich Talebzadeh
455594 2016-06-09 09:54 /twitter_data/FlumeData.1465462435124 Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com

Re: Creating a Hive table through Spark and potential locking issue (a bug)

2016-06-08 Thread Mich Talebzadeh
that Spark assumes no concurrency for Hive table. It is probably the same reason why updates/deletes to Hive ORC transactional tables through Spark fail. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.

Re: Creating a Hive table through Spark and potential locking issue (a bug)

2016-06-08 Thread Mich Talebzadeh
, hive.compactor.worker.threads, hive.support.concurrency (true), hive.enforce.bucketing (true), and hive.exec.dynamic.partition.mode (nonstrict). The default DummyTxnManager replicates pre-Hive-0.13 behavior and provides no transactions. Dr Mich Talebzadeh LinkedIn

Re: Creating a Hive table through Spark and potential locking issue (a bug)

2016-06-08 Thread Mich Talebzadeh
ive there is the issue with DDL + DML locks applied in a single transaction i.e. --> create table A as select * from b HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profil

Re: Creating a Hive table through Spark and potential locking issue (a bug)

2016-06-08 Thread Mich Talebzadeh
issue here HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 8 June 2016 at 22:36, Michael

Re: Creating a Hive table through Spark and potential locking issue (a bug)

2016-06-08 Thread Mich Talebzadeh
table is locked as SHARED_READ 2. With Spark --> No locks at all 3. With HIVE --> No locks on the target table 4. With Spark --> No locks at all HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https:

Creating a Hive table through Spark and potential locking issue (a bug)

2016-06-08 Thread Mich Talebzadeh
ask 0 in stage 1.0 failed 1 times; aborting job Suggested solution. In a concurrent env, Spark should apply locks in order to prevent such operations. Locks are kept in Hive meta data table HIVE_LOCKS HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view

Re: Why does the user need write permission on the location of external hive table?

2016-06-06 Thread Mich Talebzadeh
data from compressed to none when you read it or whatever. So yes there is a performance price to pay albeit small using more CPU to uncompress the data and present it. However, that is a small price to pay to reduce the storage cost for data. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.

Re: Why does the user need write permission on the location of external hive table?

2016-06-06 Thread Mich Talebzadeh
sorry should read* staging *tables .. Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 6 Jun

Re: Why does the user need write permission on the location of external hive table?

2016-06-06 Thread Mich Talebzadeh
/Hadoop/hdfs-site.xml dfs.permissions false There are other ways as well. Check this http://stackoverflow.com/questions/11593374/permission-denied-at-hdfs HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <ht

Re: Spark support for update/delete operations on Hive ORC transactional tables

2016-06-06 Thread Mich Talebzadeh
t.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deplo

Re: Why does the user need write permission on the location of external hive table?

2016-06-06 Thread Mich Talebzadeh
and password? The data is immutable What is the use case for this table? Are you going to use data later in app/Hive and if so do you have permission to read it. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <ht

Re: alter partitions on hive external table

2016-06-06 Thread Mich Talebzadeh
improves performance. I much doubt whichever way you go it is really going to have that impact on your performance. Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view

Re: alter partitions on hive external table

2016-06-06 Thread Mich Talebzadeh
That order datetime/userid/customerId looks more natural to me. Two questions: What is the type of table in Hive? Are you doing this for certain queries where you think userid as the most significant column is going to help queries better? HTH Dr Mich Talebzadeh LinkedIn * https

Re: Convert date in string format to timestamp in table definition

2016-06-04 Thread Mich Talebzadeh
7T02:10:44.527",1,10),substring ("2016-05-17T02:10:44.527",12)) as timestamp) as adddate; HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2g

Re: Spark support for update/delete operations on Hive ORC transactional tables

2016-06-02 Thread Mich Talebzadeh
thanks for that. I will have a look Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 2 Jun

Spark support for update/delete operations on Hive ORC transactional tables

2016-06-02 Thread Mich Talebzadeh
are going to have support for transactions in Spark for Hive ORC tables. This will really be useful. Thanks, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6Ac

Re: why does HIVE can run normally without starting yarn?

2016-06-01 Thread Mich Talebzadeh
loy a resource manager. I do not use map-reduce I use Spark as an execution engine and it does run on yarn-client mode in my case. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/v

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-31 Thread Mich Talebzadeh
t table 't' (1 rows affected)' T. 2016/04/08 09:44:44. (89): Command sent to 'hiveserver2.asehadoop': T. 2016/04/08 09:44:44. (89): 'Bulk insert table 't' (1 rows affected)' T. 2016/04/08 09:45:37. (90): Command sent to 'hiveserver2.asehadoop': Dr Mich Talebzadeh LinkedIn * http

Fwd: [ANNOUNCE] Apache Hive 2.0.1 Released

2016-05-31 Thread Mich Talebzadeh
. With Hive on Spark vs Hive on MapReduce the performance gains are order of magnitude. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

Re: [ANNOUNCE] Apache Hive 2.0.1 Released

2016-05-31 Thread Mich Talebzadeh
. With Hive on Spark vs Hive on MapReduce the performance gains are order of magnitude. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-31 Thread Mich Talebzadeh
ffic I am afraid - LLAP I don't know Sounds like Hortonworks promote TEZ and Cloudera does not want to know anything about Hive. and they promote Impala but that sounds like a sinking ship these days. Having said that I will try TEZ + LLAP :) No pun intended Regards Dr Mich Talebzadeh L

Re: Why does the user need write permission on the location of external hive table?

2016-05-31 Thread Mich Talebzadeh
only hdfs can write to it not user Sandeep? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On

Re: Why does the user need write permission on the location of external hive table?

2016-05-31 Thread Mich Talebzadeh
is this location correct and valid? LOCATION '/data/SentimentFiles/*SentimentFiles*/upload/data/tweets_raw/' Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-30 Thread Mich Talebzadeh
data). 80-20 rule? In reality may be just 2TB or most recent partitions etc. The rest is cold data. cheers Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-30 Thread Mich Talebzadeh
another stack like Tez. Cloudera support Impala instead of Hive but it is not something I have used. . HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view

Re: SHOW DATABASES/TABLES with SQL standard authorization

2016-05-30 Thread Mich Talebzadeh
with no access right given? -- 1> use ASEIMDB 2> go Msg 10351, Level 14, State 1: Server 'SYB_157', Line 1: Server user id 24 is not a valid user in database 'ASEIMDB' HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <

Re: SHOW DATABASES/TABLES with SQL standard authorization

2016-05-30 Thread Mich Talebzadeh
have access rights to that database. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 30 Ma

Re: Does hive need exact schema in Hive Export/Import?

2016-05-30 Thread Mich Talebzadeh
oup 1588 2016-05-25 16:46 hdfs://rhes564:9000/export/ *_metadata*drwxr-xr-x - hduser supergroup 0 2016-05-25 16:46 hdfs://rhes564:9000/export/data and uses the metadata file to create the target table which somehow does not work in this case! HTH Dr Mich Talebzadeh LinkedIn * ht

Re: Anyone successfully deployed Hive on TEZ engine?

2016-05-30 Thread Mich Talebzadeh
Hi Gopal, please see my correspondence about Tez in tez user group. I forwarded to hive user group. thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view

Re: Does hive need exact schema in Hive Export/Import?

2016-05-30 Thread Mich Talebzadeh
select count(1) from test.sales_staging; exit; Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 30 May 201

Re: Anyone successfully deployed Hive on TEZ engine?

2016-05-30 Thread Mich Talebzadeh
ke it work as I have hive on spark engine as well. please tell me what version of tez and yarn etc. I thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6Ac

Re: Anyone successfully deployed Hive on TEZ engine?

2016-05-30 Thread Mich Talebzadeh
thanks Damien. I tried TEZ 0.82 with Hive 2 although I did not persevere. When you say "Not stable" are you referring to using it with YARN etc. In short at the simplest set up what Resource Manager it works with? Cheers Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/pr

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-29 Thread Mich Talebzadeh
thanks I think the problem is that the TEZ user group is exceptionally quiet. Just sent an email to Hive user group to see anyone has managed to built a vendor independent version. Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Anyone successfully deployed Hive on TEZ engine?

2016-05-29 Thread Mich Talebzadeh
Please bear in mind that I am talking about your own build not anything comes as part of Vendor's package. If so kindly specify both Hive and TEZ versions. Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <ht

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-29 Thread Mich Talebzadeh
but of course Spark has both plus in-memory capability. It would be interesting to see what version of TEZ works as execution engine with Hive. Vendors are divided on this (use Hive with TEZ) or use Impala instead of Hive etc as I am sure you already know. Cheers, Dr Mich Talebzadeh LinkedIn

Re: Test

2016-05-29 Thread Mich Talebzadeh
yep Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 29 May 2016 at 18:01, Igor Kravzov <

Re: Hive and using Pooled Connections

2016-05-26 Thread Mich Talebzadeh
ate intermittent "No such lock.." and "No such transaction..." errors. Setting "datanucleus.connectionPoolingType=DBCP" is recommended in this case So I changed the setting to DBCP. Don't know how useful it is going to be. Regards, Dr Mich Talebzadeh Linked

Re: Copying all Hive tables from Prod to UAT

2016-05-26 Thread Mich Talebzadeh
be an option. NAS is better as it saves scp and copy across with taget having enough external space to get the files in. More useful tool would be to export the full Hive database in binary format and import it in target. Cheers Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile

Hive and using Pooled Connections

2016-05-25 Thread Mich Talebzadeh
the reuse of connection objects and reduce the number of times that connection objects are created. Connection pools significantly improve performance for database-intensive applications because creating connection objects is costly both in terms of time and resources. Thanks Dr Mich Talebzadeh

Re: Copying all Hive tables from Prod to UAT

2016-05-25 Thread Mich Talebzadeh
not sure vendors do parallelise this sort of things. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpre

Re: Any way in hive to have functionality like SQL Server collation on Case sensitivity

2016-05-24 Thread Mich Talebzadeh
r_expression1* is equal to *char_expression2* or* uchar_expression2*. - -1 – indicates that *char_expression1* or *uchar_expression1* is less than *char_expression2 *or* uchar expression2*. hive> select compare("aaa", "bbb"); FAILED: SemanticException [Error 100

Re: Insert query with selective columns in Hive

2016-05-24 Thread Mich Talebzadeh
only col4 hive> insert into testme (col4) values(6); Loading data to table test.testme OK HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPC

Hive 2 loss of connection to metadstore and multiple connections/disconnect in the same session

2016-05-24 Thread Mich Talebzadeh
con nections: 2 2016-05-24T16:16:44,864 INFO [0deb842d-9b15-4dd9-8d60-0e198a9d3865 0deb842d-9b15-4dd9-8d60-0e198a9d3865 main]: hive.metastore (HiveMetaStoreClient.java:open(505)) - Connected to metastore. Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-24 Thread Mich Talebzadeh
large table from Oracle to Hive and decided to use Spark 1.6.1 with Hive 2 on Spark 1.3.1 and that worked fine. We just used JDBC connection with temp table and it was good. We could have used sqoop but decided to settle for Spark so it all depends on use case. HTH Dr Mich Talebzadeh LinkedIn

Re: Compatibility of Hive 2 with TEZ

2016-05-23 Thread Mich Talebzadeh
Thanks Seth. Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 23 May 2016 at 21:08, Siddharth Se

Re: Using Spark as execution engine for Hive

2016-05-23 Thread Mich Talebzadeh
Hi Sharath See this thread Using Spark on Hive with Hive also using Spark as its execution engine HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view

<    1   2   3   4   5   6   7   8   >