Re: De-identification_in Hive

2016-03-19 Thread Mich Talebzadeh
REPLACE(net,'[^\\d\\.]','') AS DECIMAL(20,2)) , NULL , CAST(REGEXP_REPLACE(total,'[^\\d\\.]','') AS DECIMAL(20,2)) FROM stg_t2 WHERE --INVOICENUMBER > 0 AND CAST(REGEXP_REPLACE(total,'[^\\d\\.]','') AS DECIMAL(20,2)) > 0.0 -- Exclude empty rows HTH Dr Mich

Re: De-identification_in Hive

2016-03-19 Thread Mich Talebzadeh
Are you loading your CSV file from an External table into Hive table.? Basically you want to scramble that column before putting into Hive table? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/prof

Re: select count(*) from table;

2016-03-22 Thread Mich Talebzadeh
', | | *'numRows'='25'*,| File statistics, Stripe statistics and row group statistics are kept. So ORC table will rely on those if needed HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id

Re: Mechanism when doing a select *

2016-03-21 Thread Mich Talebzadeh
ime taken: 0.047 seconds INFO : Executing command(queryId=hduser_20160321162726_7efeecbb-46ee-431f-9095-f67e0602b318): select * from countries INFO : Completed executing command(queryId=hduser_20160321162726_7efeecbb-46ee-431f-9095-f67e0602b318); Time taken: 0.001 seconds INFO : OK Dr Mich Talebza

Error selecting from a Hive ORC table in Spark-sql

2016-03-21 Thread Mich Talebzadeh
til.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAE

Re: Mechanism when doing a select *

2016-03-21 Thread Mich Talebzadeh
nodes from your client. So in summary Hive server 2 collects data from all blocks and forwards it to the client. The actual collection and filtering of result set in SQL query will depend on many factors. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id

Re: Mechanism when doing a select *

2016-03-21 Thread Mich Talebzadeh
Well I use Spark as engine. Now the question is have you updated statistics on ORC table? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV

Re: Error selecting from a Hive ORC table in Spark-sql

2016-03-21 Thread Mich Talebzadeh
sounds like with ORC transactional table this happens When I create that table as ORC but non transactional it works! Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view

Re: The build-in indexes in ORC file does not work.

2016-03-19 Thread Mich Talebzadeh
By Operator That is the only time I have seen through explain plan that partition elimination is working. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view

Column type conversion in Hive

2016-03-20 Thread Mich Talebzadeh
storage for Integer. In a conventional RDBMS this needs to be done through cast (CHAR AS INT) etc? Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6Ac

Re: Issue joining 21 HUGE Hive tables

2016-03-24 Thread Mich Talebzadeh
, c.channel_desc ; select from_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss') AS FirstQuery; SELECT calendar_month_desc AS MONTH, channel_desc AS CHANNEL, TotalSales from tmp ORDER BY MONTH, CHANNEL LIMIT 5 ; HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id

Re: Hive on Spark engine

2016-03-26 Thread Mich Talebzadeh
Thanks Jorn. Just to be clear they get Hive working with Spark 1.6 out of the box (binary download)? The usual work-around is to build your own package and get the Hadoop-assembly jar file copied over to $HIVE_HOME/lib. Cheers Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile

Hive on Spark engine

2016-03-26 Thread Mich Talebzadeh
Hive 2 on Spark 1.6 as the execution engine and it crashed. I do not know the development state of this cross-breed but will be very desirable if we could manage to sort out this spark-assembly-1.x.1-hadoop2.4.0.jar for once. Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com

Re: Hive on Spark engine

2016-03-26 Thread Mich Talebzadeh
Thanks Ted, More interested in general availability of Hive 2 on Spark 1.6 engine as opposed to Vendors specific custom built. Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view

Re: Hive 2 insert error

2016-03-08 Thread Mich Talebzadeh
ated tables with bucketing so this was never an issue. Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzad

Re: Field delimiter in hive

2016-03-08 Thread Mich Talebzadeh
try "~|~" as field delimiter. It normally works for most conditions Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV

Re: Hive alter table concatenate loses data - can parquet help?

2016-03-08 Thread Mich Talebzadeh
Hi can you please provide DDL for this table "show create table " Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV

Hive 2 and versions of Spark as the execution engine for Hive

2016-03-08 Thread Mich Talebzadeh
ark like 1.5.2 and 1.6, given that Hive 2 encourages using Hive on Spark and/or Tez. Thanks, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>

Re: Hive and Impala

2016-03-02 Thread Mich Talebzadeh
OK two questions here please: 1. Which version of Hive are you running 2. Have you tried Hive on Spark which does both DAG & In-memory calculation. Query Hive on Spark job[1] stages: INFO : 2 INFO : 3 HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/vie

Parquet versus ORC

2016-03-06 Thread Mich Talebzadeh
on this. Thanks , Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com

Re: Parquet versus ORC

2016-03-06 Thread Mich Talebzadeh
ase besides using queries that select whole row (much like "a row based" type relational database does). Cheers. Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2g

Re: Updating column in table throws error

2016-03-06 Thread Mich Talebzadeh
Hi, This update will throw an error as any column used for bucketing (read for hash partitioning) cannot be updated as it is used for physical ordering of rows in the table. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Hive and Impala

2016-03-02 Thread Mich Talebzadeh
I forgot besides LLAP you are going to have Hive Hybrid Procedural SQL On Hadoop <http://Hive Hybrid Procedural SQL On Hadoop (HPL/SQL)>(HPL/SQL) which is going to add another dimension to Hive Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/v

Re: Hive and Impala

2016-03-02 Thread Mich Talebzadeh
rk on Hive. If you look around from Impala to Spark the architecture is essentially a query tool. Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV

Hive and Impala

2016-03-01 Thread Mich Talebzadeh
or Spark using Hive metastore what we cannot achieve that we can achieve with Impala? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6Ac

Re: Hive and Impala

2016-03-01 Thread Mich Talebzadeh
Just to clarify the statement in quotes was made by the author of the article "We can access all objects from Hive data warehouse with HiveQL which leverages the map-reduce architecture in background for data retrieval and transformation and this results in latency." Dr Mich

Re: count(*) not allowed in order by

2016-03-07 Thread Mich Talebzadeh
Hi, You arte looking at the top 25 of result set so you will have to get full result set before looking at top 25 Something like this select rs.prod_id, rs.score from ( prod_id, count(prod_id) AS Score from sales GROUP BY prod_id ORDER BY Score DESC )rs LIMIT 25; HTH Dr Mich Talebzadeh

Hive 2 insert error

2016-03-07 Thread Mich Talebzadeh
( | | 'orc.compress'='SNAPPY', | | 'transactional'='true', | | 'transient_lastDdlTime'='1457396808') | +-+--+ Dr Mich

Re: SELECT without FROM

2016-03-09 Thread Mich Talebzadeh
+ 1 4> go --- Adaptive Server Enterprise/15.7/EBF 21708 SMP SP110 /P/x86_64/Enterprise Linux/ase157sp11x/3546/64-bit/FBO/Fri Nov 8 05:39:38 2013 --- 2 Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/pro

Re: read-only mode for hive

2016-03-08 Thread Mich Talebzadeh
Hive much like MSSQL or SAP ASE has multiple databases. Are you implying to put one of these databases in READ ONLY mode? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view

Re: How to setup Hive JDBC client to connect remote Hiveserver

2016-04-04 Thread Mich Talebzadeh
This telnet does not specify the port that Hiveserver2 is running on (default 1) Mine is running on 10010 *telnet 50.140.197.217 10010*Trying 50.140.197.217... Connected to rhes564 (50.140.197.217). Escape character is '^]'. Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com

Re: Can't able to access temp table via jdbc client

2016-04-05 Thread Mich Talebzadeh
(s.amount_sold) AS TotalSales will only be visible to that session and is created under /tmp/hive/hduser. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view

Re: Hive Metadata tables of a schema

2016-04-05 Thread Mich Talebzadeh
eed to understand the Hive schema tables and relationships among the tables. There is no package or proc to provide the info you need. You need to write your own queries. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <

Re: Automatic Update statistics on ORC tables in Hive

2016-03-28 Thread Mich Talebzadeh
Thanks. This does not seem to be implemented although the Jira says resolved. It also mentions the timestamp of the last update stats. I do not see it yet. Regards, Mich Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <ht

Re: Unable to start Hive CLI after install

2016-04-04 Thread Mich Talebzadeh
HI Raj, Hive 2 is as good to go :) Check this <https://www.linkedin.com/pulse/apache-hive-2-now-released-mich-talebzadeh-ph-d-?trk=prof-post> I see that you are using Oracle DB as your metastore. Mine is Oracle as well javax.jdo.option.ConnectionURL jdbc:oracle:thin:@rhes56

Re: Unable to start Hive CLI after install

2016-04-04 Thread Mich Talebzadeh
Interesting why you did not download Hive 2.0 which is out now The error says: HiveConf of name hive.metastore.local does not exist In you hive-site.xml how have you configured parameters for hive.metastore? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id

Re: Unable to start Hive CLI after install

2016-04-04 Thread Mich Talebzadeh
What are you getting when trying $HIVE_HOME/bin/hive The error! Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

Standard Deviation in Hive 2 is still incorrect

2016-04-04 Thread Mich Talebzadeh
ev_pop | ++-++-+--+ | 260.7270919450411 | 260.7270722861637 | 260.7270722861637 | 260.72704617042166 | ++-++-+--+ Hopefully The Hive one will be corrected. Thanks Dr Mich Talebzadeh LinkedIn *

Re: Automatic Update statistics on ORC tables in Hive

2016-03-28 Thread Mich Talebzadeh
as millions and millions of rows then stats matter and then ORC adds its value. Cheers Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http:/

Re: Hive footprint

2016-04-25 Thread Mich Talebzadeh
by investing in the existing tools rather than trying to fragment it further. There seems to be little effort in this area for reasons that I may not be aware. However, I am more than happy to contribute to this case. Kind regards, Mich Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com

Hive configuration parameter hive.enforce.bucketing does not exist in Hive 2

2016-04-29 Thread Mich Talebzadeh
Is the parameter --set hive.enforce.bucketing = true; depreciated in Hive 2 as it causes hql code not to work? hive> set hive.enforce.bucketing = true; Query returned non-zero code: 1, cause: hive configuration hive.enforce.bucketing does not exists. Dr Mich Talebzadeh LinkedIn * ht

Re: Issue with correlated subqueries being case-sensitive

2016-04-29 Thread Mich Talebzadeh
sts operator SubQuery must be Correlated. As a work around This works but not that efficient hive> select count(1) from smallsales where PROD_ID IN (SELECT PROD_ID FROM sales_staging); HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOAB

Re: Issue with correlated subqueries being case-sensitive

2016-04-29 Thread Mich Talebzadeh
Why not just try the standard way SELECT * FROM P WHERE EXISTS(SELECT 1 FROM B WHERE P.ID = B.ID) You don't need '*' that is not standard SQL as far as I know HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <ht

Re: Disable Hive autogather optimization

2016-04-29 Thread Mich Talebzadeh
RWRITE operation is involved in an existing table, then column stats kicks in and that adds to timing process? Sounds like it is a general feature and can be disabled as part of table struct. Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP

Re: Hive TTransportException - Create Table

2016-04-27 Thread Mich Talebzadeh
ary tables (private to that session). A DDL in any database is a heavy operation if you can truncate or overwrite the existing tables it would be prudent. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com

Re: Sqoop_Sql_blob_types

2016-04-27 Thread Mich Talebzadeh
Is the source of data Oracle? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 27 April 2016 at

Re: Hive query to split one row into many rows such that Row 1 will have col 1 Name, col 1 Value and Row 2 will have col 2 Name and col 2 value

2016-04-23 Thread Mich Talebzadeh
thanks I may have missed something. Deepak might clarify. cheers Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

Re: Hive query to split one row into many rows such that Row 1 will have col 1 Name, col 1 Value and Row 2 will have col 2 Name and col 2 value

2016-04-23 Thread Mich Talebzadeh
user_parameters t2 JOIN user_details t1 ON t2.user_id = t1.user_id; Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

Re: Disable Hive autogather optimization

2016-04-29 Thread Mich Talebzadeh
apologies should read "Udit" Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 30 Apr

Re: Disable Hive autogather optimization

2016-04-29 Thread Mich Talebzadeh
ther feature for existing tables. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 2

Re: Hive configuration parameter hive.enforce.bucketing does not exist in Hive 2

2016-04-29 Thread Mich Talebzadeh
Well having it in the old code causes the query to crash as well! Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

Re: Hive configuration parameter hive.enforce.bucketing does not exist in Hive 2

2016-04-29 Thread Mich Talebzadeh
Unfortunately that needs to be done or better the whole line removed in every hql code where it is set as true . Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view

Re: Hive configuration parameter hive.enforce.bucketing does not exist in Hive 2

2016-04-30 Thread Mich Talebzadeh
Ok thanks Lefty Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 30 April 2016 at 02:23,

Re: Making sqoop import use Spark engine as opposed to MapReduce for Hive

2016-04-30 Thread Mich Talebzadeh
e.execution.engine=spark does not matter. Sqoop seems to internally set hive.execution.engine=mr anyway. May be there should be an option --hive-execution-engine='mr/tez/spak' etc in above command? Cheers, Mich Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAA

Making sqoop import use Spark engine as opposed to MapReduce for Hive

2016-04-30 Thread Mich Talebzadeh
Hi, What is the simplest way of making sqoop import use spark engine as opposed to the default mapreduce when putting data into hive table. I did not see any parameter for this in sqoop command line doc. Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id

Re: Making sqoop import use Spark engine as opposed to MapReduce for Hive

2016-04-30 Thread Mich Talebzadeh
yes I was thinking of that. use Spark to load JDBC data from Oracle and flush it into ORC table in Hive. Now I am using Spark 1.6.1 and JDBC driver as I recall (I raised a thread for it) throwing error. This was working under Spark 1.5.2. Cheers Dr Mich Talebzadeh LinkedIn * https

Re: Making sqoop import use Spark engine as opposed to MapReduce for Hive

2016-04-30 Thread Mich Talebzadeh
into temp table. The code actually creates the Hive ORC table in Hive database and populates it from temp table. ​ See How it goes Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view

Compatibility of Hive 2 with TEZ

2016-05-21 Thread Mich Talebzadeh
Hi, I see in a matrix that Hive 2 is compatible with Tez 0.8.2 as its execution engine. Can someone verify this please as I am trying to test Hive 2 with Tez. I normally use Hive 2 on Spark 1.3 engine fine. Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-23 Thread Mich Talebzadeh
Have a look at this thread Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 23 May 2016 at 09:10

Re: Using Spark as execution engine for Hive

2016-05-23 Thread Mich Talebzadeh
Hi Sharath See this thread Using Spark on Hive with Hive also using Spark as its execution engine HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view

Re: Compatibility of Hive 2 with TEZ

2016-05-23 Thread Mich Talebzadeh
Thanks Seth. Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 23 May 2016 at 21:08, Siddharth Se

Re: Hive setup on Hadoop cluster

2016-05-18 Thread Mich Talebzadeh
Hi John, I see this error Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 FAIL Can you check in case you have a problem under Hadoop storage or you have an issue with your user say hduser on Linux! HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id

Re: Missing HIVE Execution JAR

2016-05-18 Thread Mich Talebzadeh
I don't use windows but check bin/hive.cmd for environment variables. Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

Re: Missing HIVE Execution JAR

2016-05-18 Thread Mich Talebzadeh
how about CLASSPATH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 18 May 2016 at 20:

Re: Hive setup on Hadoop cluster

2016-05-19 Thread Mich Talebzadeh
ndLoadMain(LauncherHelper.java:486) However, sounds like you may have an issue with yarn container memory. How big is the underlying table. Also can you just do a plain select count(1) from itself (no distinct etc) and see it works? HTH Dr Mich Talebzadeh LinkedIn * https://www.li

Re: Hive system catalog

2016-05-18 Thread Mich Talebzadeh
Hi Braj, Any tool GUI or OS level can log in and see the schema created for Hive. For example my metadata for Hive is on Oracle and I can use SQL Developer Data Model to create a logical model from the physical model HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view

Re: Hive 2 database Entity-Relationship Diagram

2016-05-19 Thread Mich Talebzadeh
down Thanks[image: Inline images 1] Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 19 May 2016 at

Hive 2 database Entity-Relationship Diagram

2016-05-19 Thread Mich Talebzadeh
Please have a kook and appreciate comments to me and if it is useful we can load it into wiki. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw&

Re: Hive 2 Metastore Entity-Relationship Diagram, Base tables

2016-05-22 Thread Mich Talebzadeh
for now to be used as a quick reference for hive metadata tables, columns, pk and constraint. It only covers the base tables excluding transactional add ons in hive-txn-schema-2.0.0.oracle.sql HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-24 Thread Mich Talebzadeh
large table from Oracle to Hive and decided to use Spark 1.6.1 with Hive 2 on Spark 1.3.1 and that worked fine. We just used JDBC connection with temp table and it was good. We could have used sqoop but decided to settle for Spark so it all depends on use case. HTH Dr Mich Talebzadeh LinkedIn

Hive 2 loss of connection to metadstore and multiple connections/disconnect in the same session

2016-05-24 Thread Mich Talebzadeh
con nections: 2 2016-05-24T16:16:44,864 INFO [0deb842d-9b15-4dd9-8d60-0e198a9d3865 0deb842d-9b15-4dd9-8d60-0e198a9d3865 main]: hive.metastore (HiveMetaStoreClient.java:open(505)) - Connected to metastore. Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id

Re: Hive and XML

2016-05-22 Thread Mich Talebzadeh
That is interesting. DBs like MarkLogic are adapt to this. BTW how do you define yor base Hive table for XML and what table type have you used? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.

Re: Hive 2 database Entity-Relationship Diagram

2016-05-19 Thread Mich Talebzadeh
Thanks These are the list of tables and views Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpre

Re: Unable to pick data from subdirectories into hive table in CDH 5.3.3

2016-05-19 Thread Mich Talebzadeh
agreed but it still needs to know where the hive top node directory starts from, which is normally under ../../ warehouse Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view

Re: Unable to pick data from subdirectories into hive table in CDH 5.3.3

2016-05-19 Thread Mich Talebzadeh
Hi, I am not familiar with CDH, but in a default set -up, the hive directory is under hdfs://https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com

Re: Create external table

2016-05-10 Thread Mich Talebzadeh
| | +--+---+---+--+ 13 rows selected (0.13 seconds) 0: jdbc:hive2://rhes564:10010/default> Closing: 0: jdbc:hive2://rhes564:10010/default HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/v

Re: Create external table

2016-05-10 Thread Mich Talebzadeh
yes but table then exists correct I mean second time did you try *use default;* *drop table if exists trips;* it is still within Hive metadata registered as an existing table. Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id

Re: Query Failing while querying on ORC Format

2016-05-17 Thread Mich Talebzadeh
rom old_table ALTER old_table RENAME to old_table_KEEP RENAME new_table TO ol_table That should work. Check the syntax. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/v

Re: Query Failing while querying on ORC Format

2016-05-17 Thread Mich Talebzadeh
I am afraid AFAIK the old partitions cannot be modified as they are fixed in size. That is the existing partition file. I agree this is very tedious. We should come up with a more flexible design for ORC tables. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id

Re: Query Failing while querying on ORC Format

2016-05-16 Thread Mich Talebzadeh
k. thanks Top of Form http://permalink.gmane.org/gmane.comp.lang.scala.spark.user/32484 | http://post.gmane.org/post.php?group=gmane.comp.lang.scala.spark.user=32484 | Bottom of Form http:// http://search.gmane.org/?author=Mich+Talebzadeh=date | 10 Apr 12:41 2016 Re: alter table add columns

Re: clustered bucket and tablesample

2016-05-14 Thread Mich Talebzadeh
pleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink In general in my experience bucketing in ORC is the only area where ORC tables come handy. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6

Re: Query Failing while querying on ORC Format

2016-05-15 Thread Mich Talebzadeh
Hi Mahender, Please check this thread https://mail.google.com/mail/#search/alter+table+add+columns+aternatives+or+hive+refresh/153fe59e7c2970b2 HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.

Re: clustered bucket and tablesample

2016-05-15 Thread Mich Talebzadeh
ctable. With integer it is fine. I believe there is an underlying bug in here. Other alternative is to an integer as a surrogate column for hash partitioning. like a seqiuence in Oracle or identity in Sybase/MSSQL HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP

Re: Query Failing while querying on ORC Format

2016-05-14 Thread Mich Talebzadeh
check this thread. alter table add columns aternatives or hive refresh that night help HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV

Re: Hive setup on Hadoop cluster

2016-05-18 Thread Mich Talebzadeh
. You also need to set up environment variables for both Hadoop and hive in your start up script like .profile .kshrc etc Have a look anyway. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/prof

Re: Hive setup on Hadoop cluster

2016-05-18 Thread Mich Talebzadeh
Hi John, can you please a new thread for your problem so we can deal with separately. thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV

Re: clustered bucket and tablesample

2016-05-14 Thread Mich Talebzadeh
vel in Hive, the number of partitions/files will be fixed. In contrast, with partitioning you do not have this limitation. can you do show create table X and send the output. please. Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view

Hive-Hbase vs Phoenix-Hbase

2016-05-05 Thread Mich Talebzadeh
elies on memory (what else) to speed up this process. Hive on newer engine can do most of this these days. Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAE

Re: NullPointerException when dropping database backed by S3

2016-05-06 Thread Mich Talebzadeh
in your metastore. Cheers, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 6 May 2016 at 17:29,

Re: Predicates for 'like' and 'between' operators to custom storage handler.

2016-05-05 Thread Mich Talebzadeh
Hi, Do you have the equivalent of that operation in pure SQL. Also have you tried Spark query tool with Hive table. I gather you are doing this through Java? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <ht

Re: Spark Streaming, Batch interval, Windows length and Sliding Interval settings

2016-05-05 Thread Mich Talebzadeh
Any ideas/experience on this? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 4 May 2016 at

Re: Predicates for 'like' and 'between' operators to custom storage handler.

2016-05-05 Thread Mich Talebzadeh
ted (153.959 seconds) So it does work HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 5

Re: Predicates for 'like' and 'between' operators to custom storage handler.

2016-05-05 Thread Mich Talebzadeh
')) AS TransactionDate Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 5 May 2016 at 13:25, Amey Barve <

Spark Streaming, Batch interval, Windows length and Sliding Interval settings

2016-05-04 Thread Mich Talebzadeh
ing on what is being measured. However, I believe having slidinginterval = batch interval makes sense? Appreciate any views on this. Thanks, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/v

Re: Insert after typecast fails for Timestamp

2016-04-18 Thread Mich Talebzadeh
inished Stage-4_0: 1/1 Finished Status: Finished successfully in 2.26 seconds Loading data to table default.dummy OK Time taken: 2.586 seconds Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com

Re: [VOTE] Bylaws change to allow some commits without review

2016-04-18 Thread Mich Talebzadeh
+1 Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 18 April 2016 at 18:24, Alan Gates &l

Re: Standard Deviation in Hive 2 is still incorrect

2016-04-19 Thread Mich Talebzadeh
Will do thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 19 April 2016 at 23:33, Alan

Re: Hive footprint

2016-04-20 Thread Mich Talebzadeh
Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 20 April 2016 at 13:07, Sabarish Sasidharan

Re: Hive external indexes incorporation into Hive CBO

2016-04-21 Thread Mich Talebzadeh
Kindly provide an example where one can see EXPLAIN SELECT .shows external index usage? That will be great. Choose your table and block size Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.

Re: Mappers spawning Hive queries

2016-04-18 Thread Mich Talebzadeh
What is the version of Hive and the execution engine (MR, Tez, Spark)? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

<    1   2   3   4   5   6   7   8   >