Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu

2015-11-20 Thread Sai Gopalakrishnan
Hi everyone, Thank you for your responses. I think Mich's suggestion is a great one, will go with it. As Alan suggested, using compactor in Hive should help out with managing the delta files. @Dasun, pardon me for deviating from the topic. Regarding configuration, you could try a packaged di

Re: Query performance correlated to increase in delta files?

2015-11-20 Thread Sai Gopalakrishnan
Hi Alan, Thanks for replying. I haven't tried the compactor yet, will do. Can it be scheduled or does it automatically run when detects a very high number of delta files? The documentation says 'All compactions are done in the background and do not prevent concurrent reads and writes of the d

Supporting special characters in nested column name

2015-11-20 Thread Mohammad Islam
Hi,Looks like Hive supports special character (unicode) with "`" in first level column names since Hive 0.13 (HIVE-6013).Similarly does hive supports special characters in nested column names? For example, struct create table test1_mislam(a int, `$b_` int);OKTime taken: 0.036 secondshive> create

Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu

2015-11-20 Thread Dasun Hegoda
Hi Mich, Hi Sai, Hi Jorn, Thank you very much for the information. I think we are deviating from the original question. Hive on Spark on Ubuntu. Can you please kindly tell me the configuration steps? On Fri, Nov 20, 2015 at 11:10 PM, Jörn Franke wrote: > I think the most recent versions of cl

Behavior of typed/untyped NULL in various UDFs

2015-11-20 Thread Aaron Tokhy
Hello, I've encountered a strange error when using NULL in a number of ways. I would expect to_date(NULL) to just return NULL. However, when the NULL is cast to a timestamp, this works fine. Should this case be handled by the to_date UDF? What about other UDFs that may need to handle NULL

Re: Query performance correlated to increase in delta files?

2015-11-20 Thread Alan Gates
Are you running the compactor as part of your metastore? It's occasionally compacts the delta files in order to reduce read time. See https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions for details. Alan. Sai Gopalakrishnan November

RE: starting spark-shell throws /tmp/hive on HDFS should be writable error

2015-11-20 Thread Mich Talebzadeh
Not necessarily. If you shut down the metastore of Hive you will see spark-shell will try to make connections to hive metastore coming with the following message 15/11/20 21:43:04 WARN metastore: Failed to connect to the MetaStore Server... 15/11/20 21:43:05 WARN metastore: Failed to conn

Re: How to capture query log and duration

2015-11-20 Thread Gopal Vijayaraghavan
>Can you please also let me know what argument list this script want . > >I was trying following in HDP Sandbox , but did not get JSON outout The JSON output is saved into a .zip file, if you hit ^C. > https://gist.github.com/t3rmin4t0r/e4bf835f10271b9e466e Look for a file named atsdump*.zip

Re: starting spark-shell throws /tmp/hive on HDFS should be writable error

2015-11-20 Thread Xuefu Zhang
This seems belonging to Spark user list. I don't see any relevance to Hive except the directory containing "hive" word. --Xuefu On Fri, Nov 20, 2015 at 1:13 PM, Mich Talebzadeh wrote: > Hi, > > > > Has this been resolved. I don’t think this has anything to do with > /tmp/hive directory permissi

starting spark-shell throws /tmp/hive on HDFS should be writable error

2015-11-20 Thread Mich Talebzadeh
Hi, Has this been resolved. I don't think this has anything to do with /tmp/hive directory permission spark-shell log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). log4j:WARN Please initialize the log4j system properly. log4j:WARN S

Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu

2015-11-20 Thread Jörn Franke
I think the most recent versions of cloudera or Hortonworks should include all these components - try their Sandboxes. > On 20 Nov 2015, at 12:54, Dasun Hegoda wrote: > > Where can I get a Hadoop distribution containing these technologies? Link? > >> On Fri, Nov 20, 2015 at 5:22 PM, Jörn Fran

Re: How to capture query log and duration

2015-11-20 Thread Rajit Saha
Hi Gopal, Thanks for the help. Can you please also let me know what argument list this script want . I was trying following in HDP Sandbox , but did not get JSON outout [root@sandbox ~]# python ats-plan-fetcher.py --ats=http://127.0.0.1:8188/ws/v1/timeline/HIVE_QUERY_ID? --count=1 ats-plan-fetc

RE: Hive on Spark - Hadoop 2 - Installation - Ubuntu

2015-11-20 Thread Mich Talebzadeh
Hi Sai, Sqoop will not be able to do that. You can use Sqoop to get data in first time and populate your Hive table at time T0 At later times T1, T2 .. you can get Sqoop to read new rows based on Primary Key of your source table. Assuming the primary is a monolithically increasing number s

Re: hive transaction strange behaviour

2015-11-20 Thread Eugene Koifman
hive.compactor.delta.num.threshold controls when compaction is triggered. If you don't have enough delta files it won't run. Assuming you have compactions running

Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu

2015-11-20 Thread Sai Gopalakrishnan
Hi Mich, Could you please explain more on how to efficiently reflect updates and deletes done at RDBMS in HDFS via Sqoop? Even if Hive supports ACID properties in ORC, it still needs to know which records are to be updated/deleted right? You had mentioned feeding deltas from RDBMS to Hive, but

RE: Hive on Spark - Hadoop 2 - Installation - Ubuntu

2015-11-20 Thread Mich Talebzadeh
HI, I don’t think there is any packaged distribution including all these components. Indeed one neds to get the architecture right to make this work seamlessly What I did was 1.Installed and configured Hadoop 2.Installed Hive 3.Installed Sqoop 4.Used Sqoop to get

Hive JVM initialization - slf4j

2015-11-20 Thread Rajat Jain
Hi folks, It seems to me the initialization of Hive JVM takes ~10s and most of this time goes into initializing the slf4j library. Is there a known reason why it takes so much time. Can this be reduced somehow? Thanks, Rajat

Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu

2015-11-20 Thread Dasun Hegoda
Where can I get a Hadoop distribution containing these technologies? Link? On Fri, Nov 20, 2015 at 5:22 PM, Jörn Franke wrote: > I recommend to use a Hadoop distribution containing these technologies. I > think you get also other useful tools for your scenario, such as Auditing > using sentry or

Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu

2015-11-20 Thread Jörn Franke
I recommend to use a Hadoop distribution containing these technologies. I think you get also other useful tools for your scenario, such as Auditing using sentry or ranger. > On 20 Nov 2015, at 10:48, Mich Talebzadeh wrote: > > Well > > “I'm planning to deploy Hive on Spark but I can't find t

Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu

2015-11-20 Thread Dasun Hegoda
With respective to your steps, 1.It's MySQL database 2.Yes 3.Not sure whether I need it, Do you think I need it? if so why? 4.Sqoop will get data from MySQL to Hadoop 5.Correct 6.I want to use Hive on Spark for real time data processing on Hadoop Daily/periodic changes fr

RE: Hive on Spark - Hadoop 2 - Installation - Ubuntu

2015-11-20 Thread Mich Talebzadeh
Right Your steps look reasonable. Try to understand your approach 1.You have a current RDBMS (Oracle, Sybase, MSSQL?) 2.You want to feed that data daily in batch or real time from RDBMS to Hadoop as relational tables (that is where Hive comes into it) 3.You need to have f

Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu

2015-11-20 Thread Dasun Hegoda
Hi, Do you have an step by step guide to get it done? better one On Fri, Nov 20, 2015 at 3:18 PM, Mich Talebzadeh wrote: > Well > > > > “I'm planning to deploy Hive on Spark but I can't find the installation > steps. I tried to read the official '[Hive on Spark][1]' guide but it has > problems.

RE: Hive on Spark - Hadoop 2 - Installation - Ubuntu

2015-11-20 Thread Mich Talebzadeh
Well “I'm planning to deploy Hive on Spark but I can't find the installation steps. I tried to read the official '[Hive on Spark][1]' guide but it has problems. As an example it says under 'Configuring Yarn' `yarn.resourcemanager.scheduler.class=org.apache.hadoop.yarn.server.resourcemanager.

Hive on Spark - Hadoop 2 - Installation - Ubuntu

2015-11-20 Thread Dasun Hegoda
Hi, What I'm planning to do is develop a reporting platform using existing data. I have an existing RDBMS which has large number of records. So I'm using. ( http://stackoverflow.com/questions/33635234/hadoop-2-7-spark-hive-jasperreports-scoop-architecuture ) - Scoop - Extract data from RDBMS to