Re: Hive metadata on Hbase

2016-10-25 Thread Furcy Pin
Hi Mich,

No, I am not using HBase as a metastore now, but I am eager for it to
become production ready and released in CDH and HDP.

Concerning locks, I think HBase would do fine because it is ACID at the row
level. It only appends data on HDFS, but
it works by keeping regions in RAM, plus a write-ahead-log for failure
recovery.
So updates on rows are atomic and ACID.
This allows to have acid guarantees between elements that are stored on the
same row.
Since HBase supports a great number of dynamic columns in each rows
(large-columnar store, like Cassandra), the
smart way to design your tables is quite different from RDBMS.
I would expect that they will have something like a hbase table with one
row per hive table, with all the associated data with it. This would make
all modifications on a table atomic.

Concerning locks, as they involve multiple tables, I guess they would have
to manually put a global lock on the "hbase lock table" before editing it.

I agree that you should not touch the system tables too much, but sometimes
you have to remove the deadlock or fix an inconsistency yourself. I guess
removing deadlocks in HBase should not be much harder, using the
hbase-shell (new syntax to learn, however)

It would be nice if Hive had some syntax to manually remove deadlocks when
they happen, you would not have to worry about the metastore implementation
then.



On Wed, Oct 26, 2016 at 12:58 AM, Mich Talebzadeh  wrote:

> Hi Furcy,
>
> Having used Hbase for part of Batch layer in Lambda Architecture I have
> come to conclusion that it is a very good product despite the fact that
> because of its cryptic nature it is not much loved or appreciated. However,
> it may be useful to have a Hive metastore skin on top of Hbase tables so
> admin and others can interrogate Hbase tables. Definitely there is a need
> for some sort of interface to Hive metastore on Hbase, whether through Hive
> or Phoenix.
>
> Then we still have to handle lock and concurrency on metastore tables.
> RDBMS is transactional and ACID compliant. I do not know enough about
> Hbase. As far as I know Hbase appends data. Currently when I have an issue
> with transactions and locks I go to metadata and do some plastic surgery on
> TRXN and LOCKS tables that resolves the issue. I am not sure how I am going
> to achieve that in Hbase. Puritans might argue that one should not touch
> these system tables but things are not generally that simple.
>
> Are you using Hbase as Hive metastore now?
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 25 October 2016 at 13:44, Furcy Pin  wrote:
>
>> Hi Mich,
>>
>> I mostly agree with you, but I would comment on the part about using
>> HBase as a maintenance free core product:
>> I would say that most medium company using Hadoop rely on Hortonworks or
>> Cloudera, that both provides a pre-packaged HBase installation. It would
>> probably make sense for them to ship pre-installed versions of Hive relying
>> on HBase as metastore.
>> And as Alan stated, it would also be a good way to improve the
>> integration between Hive and HBase.
>>
>> I am not well placed to give an opinion on this, but I agree that
>> maintaining integration between both HBase and regular RDBMS might be a
>> real pain.
>> I am also worried about the fact that if indeed HBase grant us the
>> possibility to have all nodes calling the metastore, then any optimization
>> making use
>> of this will only work for a cluster with a Hive metastore on HBase?
>>
>> Anyway, I am still looking forward to this, as despite working in a small
>> company, our metastore sometimes seems to be a bottleneck, especially
>> when running more than 20 queries on tables with 10 000 partitions...
>> But perhaps migrating it on a bigger host would be enough for us...
>>
>>
>>
>> On Mon, Oct 24, 2016 at 10:21 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Thanks Alan for detailed explanation.
>>>
>>> Please bear in mind that any tool that needs to work with some
>>> repository (Oracle TimesTen IMDB has its metastore on Oracle classic),
>>> SAP Replication Server has its repository RSSD on SAP ASE and others
>>> First thing they do, they go and cache those tables and keep it in
>>> memory of the big brother database until they are shutdown. I reversed
>>> engineered and created Hive data model from physical schema (on Oracle).
>>> There are around 194 tables in total that can be easily cached.

Re: hive transactional table compaction fails

2016-10-25 Thread aft
Well the auto compactor fails every timeand it has been going on
for couple of days. As "day" is a partition, I have to assume, it
happens with all of them.

Both manual/auto compactor fails.

On Wed, Oct 26, 2016 at 12:52 AM, Eugene Koifman
 wrote:
> does this happen for 1 specific partition or all of them?
>
> On 10/25/16, 12:47 AM, "aft"  wrote:
>
>>Hi,
>>
>>Table created with this :
>>
>>$hive>create table syslog_staged (id string, facility string,
>>sender string, severity string, tstamp string, service string, msg
>>string) partitioned by (hostname string,  year string, month string,
>>day string) clustered by (id) into 20 buckets stored as orc
>>tblproperties("transactional"="true");
>>
>>the table is populated with Apache nifi's PutHiveStreaming...
>>
>>$hive>alter table syslog_staged partition
>>(hostname="cloudserver19", year="2016", month="10", day="24") compact
>>'major';
>>
>>Now it turns out compaction fails for some reason.(from job history)
>>
>>No of maps and reduces are 0 job_1476884195505_0031
>>Job commit failed: java.io.FileNotFoundException: File
>>hdfs://hadoop1.openstacksetup.com:8020/apps/hive/warehouse/log.db/syslog_s
>>taged/hostname=cloudserver19/year=2016/month=10/day=24/_tmp_27c40005-658e-
>>48c1-90f7-2acaa124e2fa
>>does not exist.
>>at
>>org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(Distribute
>>dFileSystem.java:904)
>>at
>>org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSys
>>tem.java:113)
>>at
>>org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSyst
>>em.java:966)
>>at
>>org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSyst
>>em.java:962)
>>at
>>org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver
>>.java:81)
>>at
>>org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSys
>>tem.java:962)
>>at
>>org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorOutputCommitt
>>er.commitJob(CompactorMR.java:776)
>>at
>>org.apache.hadoop.mapred.OutputCommitter.commitJob(OutputCommitter.java:29
>>1)
>>at
>>org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProce
>>ssor.handleJobCommit(CommitterEventHandler.java:285)
>>at
>>org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProce
>>ssor.run(CommitterEventHandler.java:237)
>>at
>>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
>>1142)
>>at
>>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
>>:617)
>>
>>from hive metastore log :
>>
>>2016-10-24 16:33:35,503 WARN  [Thread-14]: compactor.Initiator
>>(Initiator.java:run(132)) - Will not initiate compaction for
>>log.syslog_staged.hostname=cloudserver19/year=2016/month=10/day=24
>>since last hive.compactor.initiator.failed.compacts.threshold attempts
>>to compact it failed.
>>
>>
>>Hive version:
>>1.2.1000
>>
>


Re: Error with flush_length File in Orc, in hive 2.1.0 and mr execution engine.

2016-10-25 Thread Eugene Koifman
Which of your tables are are transactional?  Can you provide the DDL?

I don’t think “File does not exist” error is causing your queries to fail.  
It’s an INFO level msg.
There should be some other error.

Eugene


From: satyajit vegesna 
mailto:satyajit.apas...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Tuesday, October 25, 2016 at 5:46 PM
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>, 
"d...@hive.apache.org" 
mailto:d...@hive.apache.org>>
Subject: Error with flush_length File in Orc, in hive 2.1.0 and mr execution 
engine.

HI All,

i am using hive 2.1.0 , hadoop 2.7.2 , but  when i try running queries like 
simple insert,

set mapreduce.job.queuename=default;set hive.exec.dynamic.partition=true;set 
hive.exec.dynamic.partition.mode=nonstrict;set 
hive.exec.max.dynamic.partitions.pernode=400;set 
hive.exec.max.dynamic.partitions=2000;set mapreduce.map.memory.mb=5120;set 
mapreduce.reduce.memory.mb=5120;set mapred.tasktracker.map.tasks.maximum=30;set 
mapred.tasktracker.reduce.tasks.maximum=20;set 
mapred.reduce.child.java.opts=-Xmx2048m;set 
mapred.map.child.java.opts=-Xmx2048m; set hive.support.concurrency=true; set 
hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; set 
hive.compactor.initiator.on=false; set hive.compactor.worker.threads=1;set 
mapreduce.job.queuename=default;set hive.exec.dynamic.partition=true;set 
hive.exec.dynamic.partition.mode=nonstrict;INSERT INTO 
access_logs.crawlstats_dpp PARTITION(day="2016-10-23") select pra.url as 
prUrl,pra.url_type as urlType,CAST(pra.created_at AS timestamp) as prCreated, 
CAST(pra.updated_at AS timestamp) as prUpdated, CAST(ml.created_at AS 
timestamp) as mlCreated, CAST(ml.updated_at AS timestamp) as mlUpdated, 
a.name as status, pra.public_record_id as prId, acl.accesstime 
as crawledon, pra.id as propId, pra.primary_listing_id as 
listingId, datediff(CAST(acl.accesstime AS timestamp),CAST(ml.created_at AS 
timestamp)) as mlcreateage, datediff(CAST(acl.accesstime AS 
timestamp),CAST(ml.updated_at AS timestamp)) as mlupdateage, 
datediff(CAST(acl.accesstime AS timestamp),CAST(pra.created_at AS timestamp)) 
as prcreateage, datediff(CAST(acl.accesstime AS timestamp),CAST(pra.updated_at 
AS timestamp)) as prupdateage,  (case when (pra.public_record_id is not null 
and TRIM(pra.public_record_id) <> '')  then (case when (pra.primary_listing_id 
is null or TRIM(pra.primary_listing_id) = '') then 'PR' else 'PRMLS' END)  else 
(case when (pra.primary_listing_id is not null and TRIM(pra.primary_listing_id) 
<> '') then 'MLS' else 'UNKNOWN' END) END) as listingType,  acl.httpstatuscode, 
 acl.httpverb,  acl.requesttime, acl.upstreamheadertime , 
acl.upstreamresponsetime,  acl.page_id,  useragent AS user_agent,  
substring(split(pra.url,'/')[0], 0,length(split(pra.url,'/')[0])-3) as city,  
substring(split(pra.url,'/')[0], length(split(pra.url,'/')[0])-1,2) as state,  
ml.mls_id  FROM access_logs.loadbalancer_accesslogs acl  inner join 
mls_public_record_association_snapshot_orc pra on acl.listing_url = pra.url  
left outer join mls_listing_snapshot_orc ml on pra.primary_listing_id = 
ml.id  left outer join attribute a on a.id = 
ml.standard_status  WHERE acl.accesstimedate="2016-10-23";

i finally end up getting below error,

2016-10-25 17:40:18,725 Stage-2 map = 100%,  reduce = 52%, Cumulative CPU 
1478.96 sec
2016-10-25 17:40:19,761 Stage-2 map = 100%,  reduce = 62%, Cumulative CPU 
1636.58 sec
2016-10-25 17:40:20,794 Stage-2 map = 100%,  reduce = 64%, Cumulative CPU 
1764.97 sec
2016-10-25 17:40:21,820 Stage-2 map = 100%,  reduce = 69%, Cumulative CPU 
1879.61 sec
2016-10-25 17:40:22,842 Stage-2 map = 100%,  reduce = 80%, Cumulative CPU 
2051.38 sec
2016-10-25 17:40:23,872 Stage-2 map = 100%,  reduce = 90%, Cumulative CPU 
2151.49 sec
2016-10-25 17:40:24,907 Stage-2 map = 100%,  reduce = 93%, Cumulative CPU 
2179.67 sec
2016-10-25 17:40:25,944 Stage-2 map = 100%,  reduce = 94%, Cumulative CPU 
2187.86 sec
2016-10-25 17:40:29,062 Stage-2 map = 100%,  reduce = 95%, Cumulative CPU 
2205.22 sec
2016-10-25 17:40:30,107 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU 
2241.25 sec
MapReduce Total cumulative CPU time: 37 minutes 21 seconds 250 msec
Ended Job = job_1477437520637_0009
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/opt/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/opt/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2016-10-25 17:40:35Starting to launch local task to process map join;maximum 
memory = 514850

答复: Can I specify database name in hive metastore service?

2016-10-25 Thread Huang Meilong
Look at this local metastore architecture:


[cid:87afe275-ff1d-4dc3-88ed-b2081163c512]


If I set different database name in javax.jdo.option.ConnectionURL, say,

"jdbc:mysql://x/hivemeta_1?createDatabaseIfNotExist=true&characterEncoding=UTF-8"
 and 
"jdbc:mysql://x/hivemeta_2?createDatabaseIfNotExist=true&characterEncoding=UTF-8",
 will the to metastore services work fine?


In short, I want to use the same RDBMS database for the two hive metastore 
services, and the meta data is isolated form each other. How can I achieve that?




发件人: Peter Vary 
发送时间: 2016年10月26日 0:49
收件人: user@hive.apache.org
主题: Re: Can I specify database name in hive metastore service?


Hi Huang,

Hive metastore is a component of the "Hive database". See: 
https://cwiki.apache.org/confluence/display/Hive/Design

The metastore uses traditional RDBMS to store "the structure information of the 
various tables and partitions in the warehouse". The 
javax.jdo.option.ConnectionURL and the javax.jdo.option.ConnectionDriverName 
configuration options are used to access this RDBMS database. The 
hive.metastore.uris is the endpoint where the metastore will communicate with 
the other Hive components, like the HiveServer2.
So you can change the database name in the connectionUrl, which will change 
only the database name where the metadata is stored in the relational database 
and you can not add a database name to the thrift uri (metastore uri) since 
HiveServer2 will use the same uri to access metadata regardless of which Hive 
database is used by the client.

I hope this helps,
Peter

2016. okt. 25. 17:32 ezt írta ("Huang Meilong" 
mailto:ims...@outlook.com>>):

Hi,


To use hive metastore service, I must set `javax.jdo.option.ConnectionURL`, 
`javax.jdo.option.ConnectionDriverName` and `hive.metastore.uris` in 
hive-site.xml, like this:


  

javax.jdo.option.ConnectionURL


jdbc:mysql://x/hivemeta?createDatabaseIfNotExist=true&characterEncoding=UTF-8

  

  

javax.jdo.option.ConnectionDriverName

com.mysql.jdbc.Driver

  



hive.metastore.uris

thrift://xxx:9083

  



I'm confused that can I change the database name (usually it's `hivemeta`) for 
other names?


If I changed the database name from `hivemeta` to `my_hivemeta`, can hive 
metastore work? We can not specify database name in `hive.metastore.uris`, we 
can only specify hostname and port of metastore service.


Error with flush_length File in Orc, in hive 2.1.0 and mr execution engine.

2016-10-25 Thread satyajit vegesna
HI All,

i am using hive 2.1.0 , hadoop 2.7.2 , but  when i try running queries like
simple insert,

set mapreduce.job.queuename=default;set
hive.exec.dynamic.partition=true;set
hive.exec.dynamic.partition.mode=nonstrict;set
hive.exec.max.dynamic.partitions.pernode=400;set
hive.exec.max.dynamic.partitions=2000;set mapreduce.map.memory.mb=5120;set
mapreduce.reduce.memory.mb=5120;set
mapred.tasktracker.map.tasks.maximum=30;set
mapred.tasktracker.reduce.tasks.maximum=20;set
mapred.reduce.child.java.opts=-Xmx2048m;set
mapred.map.child.java.opts=-Xmx2048m; set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; set
hive.compactor.initiator.on=false; set hive.compactor.worker.threads=1;set
mapreduce.job.queuename=default;set hive.exec.dynamic.partition=true;set
hive.exec.dynamic.partition.mode=nonstrict;INSERT INTO
access_logs.crawlstats_dpp PARTITION(day="2016-10-23") select pra.url as
prUrl,pra.url_type as urlType,CAST(pra.created_at AS timestamp) as
prCreated, CAST(pra.updated_at AS timestamp) as prUpdated,
CAST(ml.created_at AS timestamp) as mlCreated, CAST(ml.updated_at AS
timestamp) as mlUpdated, a.name as status, pra.public_record_id as prId,
acl.accesstime as crawledon, pra.id as propId, pra.primary_listing_id as
listingId, datediff(CAST(acl.accesstime AS timestamp),CAST(ml.created_at AS
timestamp)) as mlcreateage, datediff(CAST(acl.accesstime AS
timestamp),CAST(ml.updated_at AS timestamp)) as mlupdateage,
datediff(CAST(acl.accesstime AS timestamp),CAST(pra.created_at AS
timestamp)) as prcreateage, datediff(CAST(acl.accesstime AS
timestamp),CAST(pra.updated_at AS timestamp)) as prupdateage,  (case when
(pra.public_record_id is not null and TRIM(pra.public_record_id) <> '')
 then (case when (pra.primary_listing_id is null or
TRIM(pra.primary_listing_id) = '') then 'PR' else 'PRMLS' END)  else (case
when (pra.primary_listing_id is not null and TRIM(pra.primary_listing_id)
<> '') then 'MLS' else 'UNKNOWN' END) END) as listingType,
 acl.httpstatuscode,  acl.httpverb,  acl.requesttime,
acl.upstreamheadertime , acl.upstreamresponsetime,  acl.page_id,  useragent
AS user_agent,  substring(split(pra.url,'/')[0],
0,length(split(pra.url,'/')[0])-3) as city,
 substring(split(pra.url,'/')[0], length(split(pra.url,'/')[0])-1,2) as
state,  ml.mls_id  FROM access_logs.loadbalancer_accesslogs acl  inner join
mls_public_record_association_snapshot_orc pra on acl.listing_url = pra.url
 left outer join mls_listing_snapshot_orc ml on pra.primary_listing_id =
ml.id  left outer join attribute a on a.id = ml.standard_status  WHERE
acl.accesstimedate="2016-10-23";

i finally end up getting below error,

2016-10-25 17:40:18,725 Stage-2 map = 100%,  reduce = 52%, Cumulative CPU
1478.96 sec
2016-10-25 17:40:19,761 Stage-2 map = 100%,  reduce = 62%, Cumulative CPU
1636.58 sec
2016-10-25 17:40:20,794 Stage-2 map = 100%,  reduce = 64%, Cumulative CPU
1764.97 sec
2016-10-25 17:40:21,820 Stage-2 map = 100%,  reduce = 69%, Cumulative CPU
1879.61 sec
2016-10-25 17:40:22,842 Stage-2 map = 100%,  reduce = 80%, Cumulative CPU
2051.38 sec
2016-10-25 17:40:23,872 Stage-2 map = 100%,  reduce = 90%, Cumulative CPU
2151.49 sec
2016-10-25 17:40:24,907 Stage-2 map = 100%,  reduce = 93%, Cumulative CPU
2179.67 sec
2016-10-25 17:40:25,944 Stage-2 map = 100%,  reduce = 94%, Cumulative CPU
2187.86 sec
2016-10-25 17:40:29,062 Stage-2 map = 100%,  reduce = 95%, Cumulative CPU
2205.22 sec
2016-10-25 17:40:30,107 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU
2241.25 sec
MapReduce Total cumulative CPU time: 37 minutes 21 seconds 250 msec
Ended Job = job_1477437520637_0009
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/opt/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/opt/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type
[org.apache.logging.slf4j.Log4jLoggerFactory]
2016-10-25 17:40:35 Starting to launch local task to process map join; maximum
memory = 514850816
Execution failed with exit status: 2
Obtaining error information

Task failed!
Task ID:
  Stage-14

Logs:

FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 106  Reduce: 45   Cumulative CPU: 3390.11 sec   HDFS
Read: 8060555201 HDFS Write: 757253756 SUCCESS
Stage-Stage-2: Map: 204  Reduce: 85   Cumulative CPU: 2241.25 sec   HDFS
Read: 2407914653 HDFS Write: 805874953 SUCCESS
Total MapReduce CPU Time Spent: 0 days 1 hours 33 minutes 51 seconds 360
msec

Could not find any errors in logs, but when i check namenode logs , oi get
the following error,

2016-10-25 17:01:51,923 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 1 on 9000, call
org.apache.hadoop.hdfs.protocol.Clie

Re: Hive metadata on Hbase

2016-10-25 Thread Mich Talebzadeh
Hi Furcy,

Having used Hbase for part of Batch layer in Lambda Architecture I have
come to conclusion that it is a very good product despite the fact that
because of its cryptic nature it is not much loved or appreciated. However,
it may be useful to have a Hive metastore skin on top of Hbase tables so
admin and others can interrogate Hbase tables. Definitely there is a need
for some sort of interface to Hive metastore on Hbase, whether through Hive
or Phoenix.

Then we still have to handle lock and concurrency on metastore tables.
RDBMS is transactional and ACID compliant. I do not know enough about
Hbase. As far as I know Hbase appends data. Currently when I have an issue
with transactions and locks I go to metadata and do some plastic surgery on
TRXN and LOCKS tables that resolves the issue. I am not sure how I am going
to achieve that in Hbase. Puritans might argue that one should not touch
these system tables but things are not generally that simple.

Are you using Hbase as Hive metastore now?



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 25 October 2016 at 13:44, Furcy Pin  wrote:

> Hi Mich,
>
> I mostly agree with you, but I would comment on the part about using HBase
> as a maintenance free core product:
> I would say that most medium company using Hadoop rely on Hortonworks or
> Cloudera, that both provides a pre-packaged HBase installation. It would
> probably make sense for them to ship pre-installed versions of Hive relying
> on HBase as metastore.
> And as Alan stated, it would also be a good way to improve the integration
> between Hive and HBase.
>
> I am not well placed to give an opinion on this, but I agree that
> maintaining integration between both HBase and regular RDBMS might be a
> real pain.
> I am also worried about the fact that if indeed HBase grant us the
> possibility to have all nodes calling the metastore, then any optimization
> making use
> of this will only work for a cluster with a Hive metastore on HBase?
>
> Anyway, I am still looking forward to this, as despite working in a small
> company, our metastore sometimes seems to be a bottleneck, especially
> when running more than 20 queries on tables with 10 000 partitions...
> But perhaps migrating it on a bigger host would be enough for us...
>
>
>
> On Mon, Oct 24, 2016 at 10:21 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Thanks Alan for detailed explanation.
>>
>> Please bear in mind that any tool that needs to work with some repository
>> (Oracle TimesTen IMDB has its metastore on Oracle classic), SAP Replication
>> Server has its repository RSSD on SAP ASE and others
>> First thing they do, they go and cache those tables and keep it in memory
>> of the big brother database until they are shutdown. I reversed engineered
>> and created Hive data model from physical schema (on Oracle). There are
>> around 194 tables in total that can be easily cached.
>>
>> For small medium enterprise (SME), they don't really have much data so
>> anything will do and they are the ones that use open source databases. For
>> bigger companies, they already pay bucks for Oracle and alike and they are
>> the one that would not touch an open source database (not talking about big
>> data), because in this new capital-sensitive risk-averse world, they do
>> not want to expose themselves to unnecessary risk.  So I am not sure
>> whether they will take something like Hbase as a core product, unless it is
>> going to be maintenance free.
>>
>> Going back to your point
>>
>> ".. but you have to pay for an expensive commercial license to make the
>> metadata really work well is a non-starter"
>>
>> They already do and pay more if they have to. We will stick with Hive
>> metadata on Oracle with schema on SSD
>> .
>>
>> HTH
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 24 October 2016 at 20:14, Alan Gates  wr

Re: hive transactional table compaction fails

2016-10-25 Thread Eugene Koifman
does this happen for 1 specific partition or all of them?

On 10/25/16, 12:47 AM, "aft"  wrote:

>Hi,
>
>Table created with this :
>
>$hive>create table syslog_staged (id string, facility string,
>sender string, severity string, tstamp string, service string, msg
>string) partitioned by (hostname string,  year string, month string,
>day string) clustered by (id) into 20 buckets stored as orc
>tblproperties("transactional"="true");
>
>the table is populated with Apache nifi's PutHiveStreaming...
>
>$hive>alter table syslog_staged partition
>(hostname="cloudserver19", year="2016", month="10", day="24") compact
>'major';
>
>Now it turns out compaction fails for some reason.(from job history)
>
>No of maps and reduces are 0 job_1476884195505_0031
>Job commit failed: java.io.FileNotFoundException: File
>hdfs://hadoop1.openstacksetup.com:8020/apps/hive/warehouse/log.db/syslog_s
>taged/hostname=cloudserver19/year=2016/month=10/day=24/_tmp_27c40005-658e-
>48c1-90f7-2acaa124e2fa
>does not exist.
>at 
>org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(Distribute
>dFileSystem.java:904)
>at 
>org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSys
>tem.java:113)
>at 
>org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSyst
>em.java:966)
>at 
>org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSyst
>em.java:962)
>at 
>org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver
>.java:81)
>at 
>org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSys
>tem.java:962)
>at 
>org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorOutputCommitt
>er.commitJob(CompactorMR.java:776)
>at 
>org.apache.hadoop.mapred.OutputCommitter.commitJob(OutputCommitter.java:29
>1)
>at 
>org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProce
>ssor.handleJobCommit(CommitterEventHandler.java:285)
>at 
>org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProce
>ssor.run(CommitterEventHandler.java:237)
>at 
>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
>1142)
>at 
>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
>:617)
>
>from hive metastore log :
>
>2016-10-24 16:33:35,503 WARN  [Thread-14]: compactor.Initiator
>(Initiator.java:run(132)) - Will not initiate compaction for
>log.syslog_staged.hostname=cloudserver19/year=2016/month=10/day=24
>since last hive.compactor.initiator.failed.compacts.threshold attempts
>to compact it failed.
>
>
>Hive version:
>1.2.1000
>



Re: Can I specify database name in hive metastore service?

2016-10-25 Thread Peter Vary
Hi Huang,

Hive metastore is a component of the "Hive database". See:
https://cwiki.apache.org/confluence/display/Hive/Design

The metastore uses traditional RDBMS to store "the structure information of
the various tables and partitions in the warehouse". The
javax.jdo.option.ConnectionURL and the
javax.jdo.option.ConnectionDriverName configuration options are used to
access this RDBMS database. The hive.metastore.uris is the endpoint where
the metastore will communicate with the other Hive components, like the
HiveServer2.
So you can change the database name in the connectionUrl, which will change
only the database name where the metadata is stored in the relational
database and you can not add a database name to the thrift uri (metastore
uri) since HiveServer2 will use the same uri to access metadata regardless
of which Hive database is used by the client.

I hope this helps,
Peter

2016. okt. 25. 17:32 ezt írta ("Huang Meilong" ):

> Hi,
>
>
> To use hive metastore service, I must set `javax.jdo.option.ConnectionURL`,
> `javax.jdo.option.ConnectionDriverName` and `hive.metastore.uris` in
> hive-site.xml, like this:
>
>
>   
>
> javax.jdo.option.ConnectionURL
>
> jdbc:mysql://x/hivemeta?createDatabaseIfNotExist=true&
> amp;characterEncoding=UTF-8
>
>   
>
>   
>
> javax.jdo.option.ConnectionDriverName
>
> com.mysql.jdbc.Driver
>
>   
>
> 
>
> hive.metastore.uris
>
> thrift://xxx:9083
>
>   
>
>
>
> I'm confused that can I change the database name (usually it's `hivemeta`)
> for other names?
>
>
> If I changed the database name from `hivemeta` to `my_hivemeta`, can hive
> metastore work? We can not specify database name in `hive.metastore.uris`,
> we can only specify hostname and port of metastore service.
>


Can I specify database name in hive metastore service?

2016-10-25 Thread Huang Meilong
Hi,


To use hive metastore service, I must set `javax.jdo.option.ConnectionURL`, 
`javax.jdo.option.ConnectionDriverName` and `hive.metastore.uris` in 
hive-site.xml, like this:


  

javax.jdo.option.ConnectionURL


jdbc:mysql://x/hivemeta?createDatabaseIfNotExist=true&characterEncoding=UTF-8

  

  

javax.jdo.option.ConnectionDriverName

com.mysql.jdbc.Driver

  



hive.metastore.uris

thrift://xxx:9083

  



I'm confused that can I change the database name (usually it's `hivemeta`) for 
other names?


If I changed the database name from `hivemeta` to `my_hivemeta`, can hive 
metastore work? We can not specify database name in `hive.metastore.uris`, we 
can only specify hostname and port of metastore service.


Re: Hive metadata on Hbase

2016-10-25 Thread Furcy Pin
Hi Mich,

I mostly agree with you, but I would comment on the part about using HBase
as a maintenance free core product:
I would say that most medium company using Hadoop rely on Hortonworks or
Cloudera, that both provides a pre-packaged HBase installation. It would
probably make sense for them to ship pre-installed versions of Hive relying
on HBase as metastore.
And as Alan stated, it would also be a good way to improve the integration
between Hive and HBase.

I am not well placed to give an opinion on this, but I agree that
maintaining integration between both HBase and regular RDBMS might be a
real pain.
I am also worried about the fact that if indeed HBase grant us the
possibility to have all nodes calling the metastore, then any optimization
making use
of this will only work for a cluster with a Hive metastore on HBase?

Anyway, I am still looking forward to this, as despite working in a small
company, our metastore sometimes seems to be a bottleneck, especially
when running more than 20 queries on tables with 10 000 partitions...
But perhaps migrating it on a bigger host would be enough for us...



On Mon, Oct 24, 2016 at 10:21 PM, Mich Talebzadeh  wrote:

> Thanks Alan for detailed explanation.
>
> Please bear in mind that any tool that needs to work with some repository
> (Oracle TimesTen IMDB has its metastore on Oracle classic), SAP Replication
> Server has its repository RSSD on SAP ASE and others
> First thing they do, they go and cache those tables and keep it in memory
> of the big brother database until they are shutdown. I reversed engineered
> and created Hive data model from physical schema (on Oracle). There are
> around 194 tables in total that can be easily cached.
>
> For small medium enterprise (SME), they don't really have much data so
> anything will do and they are the ones that use open source databases. For
> bigger companies, they already pay bucks for Oracle and alike and they are
> the one that would not touch an open source database (not talking about big
> data), because in this new capital-sensitive risk-averse world, they do
> not want to expose themselves to unnecessary risk.  So I am not sure
> whether they will take something like Hbase as a core product, unless it is
> going to be maintenance free.
>
> Going back to your point
>
> ".. but you have to pay for an expensive commercial license to make the
> metadata really work well is a non-starter"
>
> They already do and pay more if they have to. We will stick with Hive
> metadata on Oracle with schema on SSD
> .
>
> HTH
>
>
>
>
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 24 October 2016 at 20:14, Alan Gates  wrote:
>
>> Some thoughts on this:
>>
>> First, there’s no plan to remove the option to use an RDBMS such as
>> Oracle as your backend.  Hive’s RawStore interface is built such that
>> various implementations of the metadata storage can easily coexist.
>> Obviously different users will make different choices about what metadata
>> store makes sense for them.
>>
>> As to why HBase:
>> 1) We desperately need to get rid of the ORM layer.  It’s causing us
>> performance problems, as evidenced by things like it taking several minutes
>> to fetch all of the partition data for queries that span many partitions.
>> HBase is a way to achieve this, not the only way.  See in particular
>> Yahoo’s work on optimizing Oracle access https://issues.apache.org/jira
>> /browse/HIVE-14870  The question around this is whether we can optimize
>> for Oracle, MySQL, Postgres, and SQLServer without creating a maintenance
>> and testing nightmare for ourselves.  I’m skeptical, but others think it’s
>> possible.  See comments on that JIRA.
>>
>> 2) We’d like to scale to much larger sizes, both in terms of data and
>> access from nodes.  Not that we’re worried about the amount of metadata,
>> but we’d like to be able to cache more stats, file splits, etc.  And we’d
>> like to allow nodes in the cluster to contact the metastore, which we do
>> not today since many RDBMSs don’t handle a thousand plus simultaneous
>> connections well.  Obviously both data and connection scale can be met with
>> high end commercial stores.  But saying that we have this great open source
>> database but you have to pay for an expensive commercial license to make
>> the metadata really work well is a non-starter.
>>
>> 3) By using tools within the Hadoop ecosystem like HBase we are helping
>> to dr

Re: class not found exception

2016-10-25 Thread Rajendra Bhat
Hi ,



I have configured mysql metastore.  issues is on s3 supporting jar.

On Tue, Oct 25, 2016 at 4:43 PM, dv akhil  wrote:

> Hi,
>which metastore are you using for hive? . Have you copied the jar
> containing the JDBC driver for your metadata db into hive's lib dir?
>
>
> down voteaccepted
>
>
>
> On Tue, Oct 25, 2016 at 4:29 PM, Rajendra Bhat 
> wrote:
>
>> hive> list jars;
>> /opt/apache-hive-2.0.1-bin/lib/hadoop-aws-2.7.1.jar
>> /opt/apache-hive-2.0.1-bin/lib/aws-java-sdk-1.7.4.jar
>> hive>
>>
>>
>> On Tue, Oct 25, 2016 at 4:27 PM, Rajendra Bhat 
>> wrote:
>>
>>> yes, I have added the jar on hive prompt.
>>>
>>> On Tue, Oct 25, 2016 at 4:25 PM, aft  wrote:
>>>
 On Tue, Oct 25, 2016 at 6:11 AM, Rajendra Bhat 
 wrote:
 > Hi,
 >
 > i am getting below error on create extrnal table. I have copied
 > hadoop-aws-2.7.1.jar and aws-java-sdk-1.7.4.jar to hive_home/lib
 folder.
 > please let me know where should need to place supporting jar..
 >
 >
 > hive> create external table kv (key int, value string)  location
 > 's3a://myntra-datasciences/cip/test';
 > FAILED: Execution Error, return code 1 from
 > org.apache.hadoop.hive.ql.exec.DDLTask.
 > MetaException(message:java.lang.RuntimeException:
 > java.lang.ClassNotFoundException: Class
 > org.apache.hadoop.fs.s3a.S3AFileSystem not found)

 Have you added the jars at hive prompt with "add jar " command?

 >
 > --
 > Thanks and
 > Regards
 >
 > Rajendra Bhat

>>>
>>>
>>>
>>> --
>>> Thanks and
>>> Regards
>>>
>>> Rajendra Bhat
>>>
>>
>>
>>
>> --
>> Thanks and
>> Regards
>>
>> Rajendra Bhat
>>
>
>


-- 
Thanks and
Regards

Rajendra Bhat


Re: class not found exception

2016-10-25 Thread dv akhil
Hi,
   which metastore are you using for hive? . Have you copied the jar
containing the JDBC driver for your metadata db into hive's lib dir?


down voteaccepted



On Tue, Oct 25, 2016 at 4:29 PM, Rajendra Bhat  wrote:

> hive> list jars;
> /opt/apache-hive-2.0.1-bin/lib/hadoop-aws-2.7.1.jar
> /opt/apache-hive-2.0.1-bin/lib/aws-java-sdk-1.7.4.jar
> hive>
>
>
> On Tue, Oct 25, 2016 at 4:27 PM, Rajendra Bhat 
> wrote:
>
>> yes, I have added the jar on hive prompt.
>>
>> On Tue, Oct 25, 2016 at 4:25 PM, aft  wrote:
>>
>>> On Tue, Oct 25, 2016 at 6:11 AM, Rajendra Bhat 
>>> wrote:
>>> > Hi,
>>> >
>>> > i am getting below error on create extrnal table. I have copied
>>> > hadoop-aws-2.7.1.jar and aws-java-sdk-1.7.4.jar to hive_home/lib
>>> folder.
>>> > please let me know where should need to place supporting jar..
>>> >
>>> >
>>> > hive> create external table kv (key int, value string)  location
>>> > 's3a://myntra-datasciences/cip/test';
>>> > FAILED: Execution Error, return code 1 from
>>> > org.apache.hadoop.hive.ql.exec.DDLTask.
>>> > MetaException(message:java.lang.RuntimeException:
>>> > java.lang.ClassNotFoundException: Class
>>> > org.apache.hadoop.fs.s3a.S3AFileSystem not found)
>>>
>>> Have you added the jars at hive prompt with "add jar " command?
>>>
>>> >
>>> > --
>>> > Thanks and
>>> > Regards
>>> >
>>> > Rajendra Bhat
>>>
>>
>>
>>
>> --
>> Thanks and
>> Regards
>>
>> Rajendra Bhat
>>
>
>
>
> --
> Thanks and
> Regards
>
> Rajendra Bhat
>


Re: class not found exception

2016-10-25 Thread Rajendra Bhat
hive> list jars;
/opt/apache-hive-2.0.1-bin/lib/hadoop-aws-2.7.1.jar
/opt/apache-hive-2.0.1-bin/lib/aws-java-sdk-1.7.4.jar
hive>


On Tue, Oct 25, 2016 at 4:27 PM, Rajendra Bhat  wrote:

> yes, I have added the jar on hive prompt.
>
> On Tue, Oct 25, 2016 at 4:25 PM, aft  wrote:
>
>> On Tue, Oct 25, 2016 at 6:11 AM, Rajendra Bhat 
>> wrote:
>> > Hi,
>> >
>> > i am getting below error on create extrnal table. I have copied
>> > hadoop-aws-2.7.1.jar and aws-java-sdk-1.7.4.jar to hive_home/lib folder.
>> > please let me know where should need to place supporting jar..
>> >
>> >
>> > hive> create external table kv (key int, value string)  location
>> > 's3a://myntra-datasciences/cip/test';
>> > FAILED: Execution Error, return code 1 from
>> > org.apache.hadoop.hive.ql.exec.DDLTask.
>> > MetaException(message:java.lang.RuntimeException:
>> > java.lang.ClassNotFoundException: Class
>> > org.apache.hadoop.fs.s3a.S3AFileSystem not found)
>>
>> Have you added the jars at hive prompt with "add jar " command?
>>
>> >
>> > --
>> > Thanks and
>> > Regards
>> >
>> > Rajendra Bhat
>>
>
>
>
> --
> Thanks and
> Regards
>
> Rajendra Bhat
>



-- 
Thanks and
Regards

Rajendra Bhat


Re: class not found exception

2016-10-25 Thread Rajendra Bhat
yes, I have added the jar on hive prompt.

On Tue, Oct 25, 2016 at 4:25 PM, aft  wrote:

> On Tue, Oct 25, 2016 at 6:11 AM, Rajendra Bhat 
> wrote:
> > Hi,
> >
> > i am getting below error on create extrnal table. I have copied
> > hadoop-aws-2.7.1.jar and aws-java-sdk-1.7.4.jar to hive_home/lib folder.
> > please let me know where should need to place supporting jar..
> >
> >
> > hive> create external table kv (key int, value string)  location
> > 's3a://myntra-datasciences/cip/test';
> > FAILED: Execution Error, return code 1 from
> > org.apache.hadoop.hive.ql.exec.DDLTask.
> > MetaException(message:java.lang.RuntimeException:
> > java.lang.ClassNotFoundException: Class
> > org.apache.hadoop.fs.s3a.S3AFileSystem not found)
>
> Have you added the jars at hive prompt with "add jar " command?
>
> >
> > --
> > Thanks and
> > Regards
> >
> > Rajendra Bhat
>



-- 
Thanks and
Regards

Rajendra Bhat


Re: class not found exception

2016-10-25 Thread aft
On Tue, Oct 25, 2016 at 6:11 AM, Rajendra Bhat  wrote:
> Hi,
>
> i am getting below error on create extrnal table. I have copied
> hadoop-aws-2.7.1.jar and aws-java-sdk-1.7.4.jar to hive_home/lib folder.
> please let me know where should need to place supporting jar..
>
>
> hive> create external table kv (key int, value string)  location
> 's3a://myntra-datasciences/cip/test';
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.DDLTask.
> MetaException(message:java.lang.RuntimeException:
> java.lang.ClassNotFoundException: Class
> org.apache.hadoop.fs.s3a.S3AFileSystem not found)

Have you added the jars at hive prompt with "add jar " command?

>
> --
> Thanks and
> Regards
>
> Rajendra Bhat


class not found exception

2016-10-25 Thread Rajendra Bhat
Hi,

i am getting below error on create extrnal table. I have
copied hadoop-aws-2.7.1.jar and aws-java-sdk-1.7.4.jar to hive_home/lib
folder. please let me know where should need to place supporting jar..


hive> create external table kv (key int, value string)  location
's3a://myntra-datasciences/cip/test';
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask.
MetaException(message:java.lang.RuntimeException:
java.lang.ClassNotFoundException: Class
org.apache.hadoop.fs.s3a.S3AFileSystem not found)

-- 
Thanks and
Regards

Rajendra Bhat


Re: Connect metadata

2016-10-25 Thread Rajendra Bhat
I have setup the meta store. I need to create hive external table, table
data stored in S3 with parquet format.

As i am not installed hive server for execute create command. How can I
possible execute  to create table commands..?

On Tue, Oct 25, 2016 at 1:41 PM, Rajendra Bhat  wrote:

> I need external table. data should refer from S3.
>
> On Tue, Oct 25, 2016 at 1:00 PM, Damien Carol 
> wrote:
>
>> You could use CTAS in presto
>>
>> 2016-10-25 9:09 GMT+02:00 Rajendra Bhat :
>>
>>> Hi Team,
>>>
>>> I have configured only meta store and started the meta store service,
>>> hwich i ma used on presto.
>>>
>>> I need create table on metastore.. how can i able create that.. as i am
>>> not started hive server service, bcoz hadoop not installed on my syatem.
>>>
>>> --
>>> Thanks and
>>> Regards
>>>
>>> Rajendra Bhat
>>>
>>
>>
>
>
> --
> Thanks and
> Regards
>
> Rajendra Bhat
>



-- 
Thanks and
Regards

Rajendra Bhat


Re: Connect metadata

2016-10-25 Thread Rajendra Bhat
I need external table. data should refer from S3.

On Tue, Oct 25, 2016 at 1:00 PM, Damien Carol 
wrote:

> You could use CTAS in presto
>
> 2016-10-25 9:09 GMT+02:00 Rajendra Bhat :
>
>> Hi Team,
>>
>> I have configured only meta store and started the meta store service,
>> hwich i ma used on presto.
>>
>> I need create table on metastore.. how can i able create that.. as i am
>> not started hive server service, bcoz hadoop not installed on my syatem.
>>
>> --
>> Thanks and
>> Regards
>>
>> Rajendra Bhat
>>
>
>


-- 
Thanks and
Regards

Rajendra Bhat


Re: Connect metadata

2016-10-25 Thread Mich Talebzadeh
I don't understand what you mean by Hive installing your schema at startup.

If you have created an empty database/schema in postgres (I am not familiar
with it), you can run the relevant script in directory

$HIVE_HOME/scripts/metastore/upgrade/postgres

That will create all the associated tables and views in your schema and you
only need it once and the schema will be populated by hive user that you
have specified the details in hive-site.xml


HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 25 October 2016 at 08:50, Per Ullberg  wrote:

> Sorry, no help from me I'm afraid, but I have a similar question:
>
> Could someone provide me with a code snippet (preferably Java) that
> installs the schema (through datanucleus) on my empty metastore (postgres).
> I don't want Hive to install the schema at startup, but rather do it
> explicitly myself.
>
> regards
> /Pelle
>
> On Tue, Oct 25, 2016 at 9:09 AM, Rajendra Bhat 
> wrote:
>
>> Hi Team,
>>
>> I have configured only meta store and started the meta store service,
>> hwich i ma used on presto.
>>
>> I need create table on metastore.. how can i able create that.. as i am
>> not started hive server service, bcoz hadoop not installed on my syatem.
>>
>> --
>> Thanks and
>> Regards
>>
>> Rajendra Bhat
>>
>
>
>
> --
>
> *Per Ullberg*
> Data Vault Tech Lead
> Odin Uppsala
> +46 701612693 <+46+701612693>
>
> Klarna AB (publ)
> Sveavägen 46, 111 34 Stockholm
> Tel: +46 8 120 120 00 <+46812012000>
> Reg no: 556737-0431
> klarna.com
>
>


Re: Connect metadata

2016-10-25 Thread Gopal Vijayaraghavan
> Could someone provide me with a code snippet (preferably Java) that installs 
> the schema (through datanucleus) on my empty metastore (postgres)

I wish it was that simple, but do not leave it to the Hive startup to create it 
- create it explicitly with schematool

https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool#HiveSchemaTool-TheHiveSchemaTool

or alternatively 

https://github.com/apache/hive/blob/master/metastore/scripts/upgrade/postgres/hive-schema-2.2.0.postgres.sql

Cheers,
Gopal




Re: Connect metadata

2016-10-25 Thread Per Ullberg
Sorry, no help from me I'm afraid, but I have a similar question:

Could someone provide me with a code snippet (preferably Java) that
installs the schema (through datanucleus) on my empty metastore (postgres).
I don't want Hive to install the schema at startup, but rather do it
explicitly myself.

regards
/Pelle

On Tue, Oct 25, 2016 at 9:09 AM, Rajendra Bhat  wrote:

> Hi Team,
>
> I have configured only meta store and started the meta store service,
> hwich i ma used on presto.
>
> I need create table on metastore.. how can i able create that.. as i am
> not started hive server service, bcoz hadoop not installed on my syatem.
>
> --
> Thanks and
> Regards
>
> Rajendra Bhat
>



-- 

*Per Ullberg*
Data Vault Tech Lead
Odin Uppsala
+46 701612693 <+46+701612693>

Klarna AB (publ)
Sveavägen 46, 111 34 Stockholm
Tel: +46 8 120 120 00 <+46812012000>
Reg no: 556737-0431
klarna.com


hive transactional table compaction fails

2016-10-25 Thread aft
Hi,

Table created with this :

$hive>create table syslog_staged (id string, facility string,
sender string, severity string, tstamp string, service string, msg
string) partitioned by (hostname string,  year string, month string,
day string) clustered by (id) into 20 buckets stored as orc
tblproperties("transactional"="true");

the table is populated with Apache nifi's PutHiveStreaming...

$hive>alter table syslog_staged partition
(hostname="cloudserver19", year="2016", month="10", day="24") compact
'major';

Now it turns out compaction fails for some reason.(from job history)

No of maps and reduces are 0 job_1476884195505_0031
Job commit failed: java.io.FileNotFoundException: File
hdfs://hadoop1.openstacksetup.com:8020/apps/hive/warehouse/log.db/syslog_staged/hostname=cloudserver19/year=2016/month=10/day=24/_tmp_27c40005-658e-48c1-90f7-2acaa124e2fa
does not exist.
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:904)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:113)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:966)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:962)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:962)
at 
org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorOutputCommitter.commitJob(CompactorMR.java:776)
at 
org.apache.hadoop.mapred.OutputCommitter.commitJob(OutputCommitter.java:291)
at 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:285)
at 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

from hive metastore log :

2016-10-24 16:33:35,503 WARN  [Thread-14]: compactor.Initiator
(Initiator.java:run(132)) - Will not initiate compaction for
log.syslog_staged.hostname=cloudserver19/year=2016/month=10/day=24
since last hive.compactor.initiator.failed.compacts.threshold attempts
to compact it failed.


Hive version:
1.2.1000


Re: Connect metadata

2016-10-25 Thread Damien Carol
You could use CTAS in presto

2016-10-25 9:09 GMT+02:00 Rajendra Bhat :

> Hi Team,
>
> I have configured only meta store and started the meta store service,
> hwich i ma used on presto.
>
> I need create table on metastore.. how can i able create that.. as i am
> not started hive server service, bcoz hadoop not installed on my syatem.
>
> --
> Thanks and
> Regards
>
> Rajendra Bhat
>


Connect metadata

2016-10-25 Thread Rajendra Bhat
Hi Team,

I have configured only meta store and started the meta store service, hwich
i ma used on presto.

I need create table on metastore.. how can i able create that.. as i am not
started hive server service, bcoz hadoop not installed on my syatem.

-- 
Thanks and
Regards

Rajendra Bhat


Re: confirm subscribe to user@hive.apache.org

2016-10-25 Thread Rajendra Bhat
reply

On Tue, Oct 25, 2016 at 12:19 PM,  wrote:

> Hi! This is the ezmlm program. I'm managing the
> user@hive.apache.org mailing list.
>
> To confirm that you would like
>
>rajhalk...@gmail.com
>
> added to the user mailing list, please send
> a short reply to this address:
>
>user-sc.1477378161.gebelchbemmljcbjcppp-rajhalkere=gmail.com@hive.
> apache.org
>
> Usually, this happens when you just hit the "reply" button.
> If this does not work, simply copy the address and paste it into
> the "To:" field of a new message.
>
> This confirmation serves two purposes. First, it verifies that I am able
> to get mail through to you. Second, it protects you in case someone
> forges a subscription request in your name.
>
> Please note that ALL Apache dev- and user- mailing lists are publicly
> archived.  Do familiarize yourself with Apache's public archive policy at
>
> http://www.apache.org/foundation/public-archives.html
>
> prior to subscribing and posting messages to user@hive.apache.org.
> If you're not sure whether or not the policy applies to this mailing list,
> assume it does unless the list name contains the word "private" in it.
>
> Some mail programs are broken and cannot handle long addresses. If you
> cannot reply to this request, instead send a message to
>  and put the
> entire address listed above into the "Subject:" line.
>
>
> --- Administrative commands for the user list ---
>
> I can handle administrative requests automatically. Please
> do not send them to the list address! Instead, send
> your message to the correct command address:
>
> To subscribe to the list, send a message to:
>
>
> To remove your address from the list, send a message to:
>
>
> Send mail to the following for info and FAQ for this list:
>
>
>
> Similar addresses exist for the digest list:
>
>
>
> To get messages 123 through 145 (a maximum of 100 per request), mail:
>
>
> To get an index with subject and author for messages 123-456 , mail:
>
>
> They are always returned as sets of 100, max 2000 per request,
> so you'll actually get 100-499.
>
> To receive all messages with the same subject as message 12345,
> send a short message to:
>
>
> The messages should contain one line or word of text to avoid being
> treated as sp@m, but I will ignore their content.
> Only the ADDRESS you send to is important.
>
> You can start a subscription for an alternate address,
> for example "john@host.domain", just add a hyphen and your
> address (with '=' instead of '@') after the command word:
> 
>
> To stop subscription for this address, mail:
> 
>
> In both cases, I'll send a confirmation message to that address. When
> you receive it, simply reply to it to complete your subscription.
>
> If despite following these instructions, you do not get the
> desired results, please contact my owner at
> user-ow...@hive.apache.org. Please be patient, my owner is a
> lot slower than I am ;-)
>
> --- Enclosed is a copy of the request I received.
>
> Return-Path: 
> Received: (qmail 35213 invoked by uid 99); 25 Oct 2016 06:49:21 -
> Received: from pnap-us-west-generic-nat.apache.org (HELO
> spamd3-us-west.apache.org) (209.188.14.142)
> by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Oct 2016 06:49:21
> +
> Received: from localhost (localhost [127.0.0.1])
> by spamd3-us-west.apache.org (ASF Mail Server at
> spamd3-us-west.apache.org) with ESMTP id EBEF7180027
> for ; Tue, 25 Oct 2016 06:49:20
> + (UTC)
> X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org
> X-Spam-Flag: NO
> X-Spam-Score: 2.379
> X-Spam-Level: **
> X-Spam-Status: No, score=2.379 tagged_above=-999 required=6.31
> tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
> HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001,
> RCVD_IN_MSPIKE_H3=-0.01,
> RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001]
> autolearn=disabled
> Authentication-Results: spamd3-us-west.apache.org (amavisd-new);
> dkim=pass (2048-bit key) header.d=gmail.com
> Received: from mx1-lw-us.apache.org ([10.40.0.8])
> by localhost (spamd3-us-west.apache.org [10.40.0.10])
> (amavisd-new, port 10024)
> with ESMTP id 4PelKgE9bHwy for ;
> Tue, 25 Oct 2016 06:49:20 + (UTC)
> Received: from mail-yb0-f181.google.com (mail-yb0-f181.google.com
> [209.85.213.181])
> by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org)
> with ESMTPS id 3E44C5FC0E
> for ; Tue, 25 Oct 2016 06:49:20
> + (UTC)
> Received: by mail-yb0-f181.google.com with SMTP id h65so10516594ybb.7
> for ; Mon, 24 Oct 2016 23:49:20
> -0700 (PDT)
> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
> d=gmail.com; s=20120113;
> h=mime-version:from:date:message-id:subject:to;
> bh=gwQCDUrBX870xFP6zgjxWTzSR+7lD0F532SCe10qNzU=;
> b=t+48Sxom+gCKLQJOwSFgDEY1PfER+ZP/YVJ6ZIpR34uq5RD8fE6mlBjew2itPD
> 8FJf
>  Vfu2LnQf4dbiu6izW