Difference between join and inner join

2017-02-11 Thread Divya Gehlot
Hi ,
What's the difference between" join " and "inner join" in hive ?

Thanks ,
Divya


Re: Working Hive--> Spark --> HDFS

2016-11-23 Thread Divya Gehlot
can you please share the stacktrace of excpetion you get .


Thanks,
Divya

On 24 November 2016 at 06:33, Joaquin Alzola 
wrote:

> Hi Guys
>
>
>
> Can somebody tell me a workin version of HoSoHDFS.
>
>
>
> So far I have tested:
>
> Hive1.2.1àSparkà1.6.3à Hadoop 2.6
>
>
>
> Hive 2.1 à Spark2.0.2 à Hadoop 2.7
>
>
>
> And both of them give me varios exceptions.
>
> I have to say the first one creates the job in HDFS and finish it
> successfully but give back an error on spark.
>
>
>
> BR
>
>
>
> Joaquin
>
>
> This email is confidential and may be subject to privilege. If you are not
> the intended recipient, please do not copy or disclose its content but
> contact the sender immediately upon receipt.
>


Re: Permission denied: user=root, access=WRITE, inode="/user/root":hdfs:hdfs:drwxr-xr-x

2016-10-05 Thread Divya Gehlot
when are you getting this issue ?

On 5 October 2016 at 16:26, Raj hadoop  wrote:

> Hi All,
>
> Could someone help in to solve this issue,
>
> Logging initialized using configuration in file:/etc/hive/2.4.2.0-258/0/
> hive-log4j.properties
> Exception in thread "main" java.lang.RuntimeException:
> org.apache.hadoop.security.AccessControlException: Permission denied:
> user=root, access=WRITE, inode="/user/root":hdfs:hdfs:drwxr-xr-x
> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
> check(FSPermissionChecker.java:319)
> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
> check(FSPermissionChecker.java:292)
> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
> checkPermission(FSPermissionChecker.java:213)
> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
> checkPermission(FSPermissionChecker.java:190)
> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.
> checkPermission(FSDirectory.java:1780)
> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.
> checkPermission(FSDirectory.java:1764)
> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.
> checkAncestorAccess(FSDirectory.java:1747)
> at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(
> FSDirMkdirOp.java:71)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(
> FSNamesystem.java:3972)
> at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.
> mkdirs(NameNodeRpcServer.java:1081)
> at org.apache.hadoop.hdfs.protocolPB.
> ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(
> ClientNamenodeProtocolServerSideTranslatorPB.java:630)
> at org.apache.hadoop.hdfs.protocol.proto.
> ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(
> ClientNamenodeProtocolProtos.java)
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$
> ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2206)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2202)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1709)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2200)
>
> at org.apache.hadoop.hive.ql.session.SessionState.start(
> SessionState.java:516)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:680)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: org.apache.hadoop.security.AccessControlException: Permission
> denied: user=root, access=WRITE, inode="/user/root":hdfs:hdfs:drwxr-xr-x
> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
> check(FSPermissionChecker.java:319)
> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
> check(FSPermissionChecker.java:292)
> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
> checkPermission(FSPermissionChecker.java:213)
> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
> checkPermission(FSPermissionChecker.java:190)
> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.
> checkPermission(FSDirectory.java:1780)
> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.
> checkPermission(FSDirectory.java:1764)
> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.
> checkAncestorAccess(FSDirectory.java:1747)
> at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(
> FSDirMkdirOp.java:71)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(
> FSNamesystem.java:3972)
> at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.
> mkdirs(NameNodeRpcServer.java:1081)
> at org.apache.hadoop.hdfs.protocolPB.
> ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(
> ClientNamenodeProtocolServerSideTranslatorPB.java:630)
> at org.apache.hadoop.hdfs.protocol.proto.
> ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(
> ClientNamenodeProtocolProtos.java)
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$
> ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at 

Re: hive need access the hdfs of hbase?

2016-03-19 Thread Divya Gehlot
 Do you have hbase-site.xml in classpath ?


On 17 March 2016 at 17:08, songj songj <songjun...@gmail.com> wrote:

> 
>zookeeper.znode.parent
>/hbase
> 
>
> and I found it that ,bind any ip which the hive can access to
> 'hbase-cluster' ,they are all ok!
>
>
>
> 2016-03-17 16:46 GMT+08:00 Divya Gehlot <divya.htco...@gmail.com>:
>
>> Hi,
>> Please check your zookeeper.znode.parent property
>> where is it pointing to ?
>>
>> On 17 March 2016 at 15:21, songj songj <songjun...@gmail.com> wrote:
>>
>>> hi all:
>>> I have 2 cluster,one is hive cluster(2.0.0),another is hbase
>>> cluster(1.1.1),
>>> this two clusters have dependent hdfs:
>>>
>>> hive cluster:
>>> 
>>>fs.defaultFS
>>>hdfs://*hive-cluster*
>>> 
>>>
>>> hbase cluster:
>>> 
>>>fs.defaultFS
>>>hdfs://*hbase-cluster*
>>> 
>>>
>>> *1)*but when I use hive shell to access hbase cluster
>>> >set hbase.zookeeper.quorum=10.24.19.88;
>>> >CREATE EXTERNAL TABLE IF NOT EXISTS pagecounts_hbase (rowkey STRING,
>>> pageviews STRING, bytes STRING) STORED BY
>>> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES
>>> ('hbase.columns.mapping' = ':key,cf:c1,cf:c2') TBLPROPERTIES ('
>>> hbase.table.name' = 'test');
>>>
>>> *2)*then I got exceptions:
>>>
>>> FAILED: Execution Error, return code 1 from
>>> org.apache.hadoop.hive.ql.exec.DDLTask.
>>> MetaException(message:MetaException(message:java.io.IOException:
>>> java.lang.reflect.InvocationTargetException
>>>  at
>>> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
>>>  at
>>> org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:420)
>>>  at
>>> org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:413)
>>>  at
>>> org.apache.hadoop.hbase.client.ConnectionManager.getConnectionInternal(ConnectionManager.java:291)
>>>  at
>>> org.apache.hadoop.hbase.client.HBaseAdmin.(HBaseAdmin.java:222)
>>>  at
>>> org.apache.hadoop.hive.hbase.HBaseStorageHandler.getHBaseAdmin(HBaseStorageHandler.java:102)
>>>  at
>>> org.apache.hadoop.hive.hbase.HBaseStorageHandler.preCreateTable(HBaseStorageHandler.java:182)
>>>  at
>>> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:608)
>>>  at
>>> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:601)
>>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>  at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>  at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>  at java.lang.reflect.Method.invoke(Method.java:606)
>>>  at
>>> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:90)
>>>  at com.sun.proxy.$Proxy15.createTable(Unknown Source)
>>>  at
>>> org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:671)
>>>  at
>>> org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3973)
>>>  at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:295)
>>>  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>>>  at
>>> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>>>  at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1604)
>>>  at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1364)
>>>  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1177)
>>>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1004)
>>>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:994)
>>>  at
>>> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:201)
>>>  at
>>> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:153)
>>>  at
>>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:364)
>>>  at
>>> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:712)
>>>  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:631)
>>>  at org.apache.hadoop.hi

Re: hive need access the hdfs of hbase?

2016-03-19 Thread Divya Gehlot
Hi,
Please check your zookeeper.znode.parent property
where is it pointing to ?

On 17 March 2016 at 15:21, songj songj  wrote:

> hi all:
> I have 2 cluster,one is hive cluster(2.0.0),another is hbase
> cluster(1.1.1),
> this two clusters have dependent hdfs:
>
> hive cluster:
> 
>fs.defaultFS
>hdfs://*hive-cluster*
> 
>
> hbase cluster:
> 
>fs.defaultFS
>hdfs://*hbase-cluster*
> 
>
> *1)*but when I use hive shell to access hbase cluster
> >set hbase.zookeeper.quorum=10.24.19.88;
> >CREATE EXTERNAL TABLE IF NOT EXISTS pagecounts_hbase (rowkey STRING,
> pageviews STRING, bytes STRING) STORED BY
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES
> ('hbase.columns.mapping' = ':key,cf:c1,cf:c2') TBLPROPERTIES ('
> hbase.table.name' = 'test');
>
> *2)*then I got exceptions:
>
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.DDLTask.
> MetaException(message:MetaException(message:java.io.IOException:
> java.lang.reflect.InvocationTargetException
>  at
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
>  at
> org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:420)
>  at
> org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:413)
>  at
> org.apache.hadoop.hbase.client.ConnectionManager.getConnectionInternal(ConnectionManager.java:291)
>  at
> org.apache.hadoop.hbase.client.HBaseAdmin.(HBaseAdmin.java:222)
>  at
> org.apache.hadoop.hive.hbase.HBaseStorageHandler.getHBaseAdmin(HBaseStorageHandler.java:102)
>  at
> org.apache.hadoop.hive.hbase.HBaseStorageHandler.preCreateTable(HBaseStorageHandler.java:182)
>  at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:608)
>  at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:601)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:606)
>  at
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:90)
>  at com.sun.proxy.$Proxy15.createTable(Unknown Source)
>  at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:671)
>  at
> org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3973)
>  at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:295)
>  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>  at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>  at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1604)
>  at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1364)
>  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1177)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1004)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:994)
>  at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:201)
>  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:153)
>  at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:364)
>  at
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:712)
>  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:631)
>  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:570)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:606)
>  at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>  at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.lang.reflect.InvocationTargetException
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>  at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>  at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>  at
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
>  ... 36 more
> Caused by: java.lang.ExceptionInInitializerError
>  at org.apache.hadoop.hbase.ClusterId.parseFrom(ClusterId.java:64)
>  at
> org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:75)
>  at
> org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:105)
>  

Re: hive read/write hbase

2016-03-14 Thread Divya Gehlot
Hi,
Do you have hive-hbase-handler.jar in hive classpath ?



On 14 March 2016 at 21:28, songj songj  wrote:

> hi,i have two cluster,one is hbase(with a hdfs://A ) ,another is hive(with
> a hdfs://B).
>
> i want to use hive shell to read/write hbase.
>
> when i enter hive shell and input as follows:
>
> hive>set hbase.zookeeper.quorum=10.24.31.99;
> hive>
> create external table inputTable (key string, value string)
>  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>   with serdeproperties ("hbase.columns.mapping" = ":key,fam1:col1")
>   tblproperties ("hbase.table.name" = "inputTable");
>
> but I got this error,and how this happened? thank you.
>
>
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.DDLTask.
> MetaException(message:MetaException(message:java.io.IOException:
> java.lang.reflect.InvocationTargetException
> at
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
> at
> org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:420)
> at
> org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:413)
> at
> org.apache.hadoop.hbase.client.ConnectionManager.getConnectionInternal(ConnectionManager.java:291)
> at org.apache.hadoop.hbase.client.HBaseAdmin.(HBaseAdmin.java:222)
> at
> org.apache.hadoop.hive.hbase.HBaseStorageHandler.getHBaseAdmin(HBaseStorageHandler.java:102)
> at
> org.apache.hadoop.hive.hbase.HBaseStorageHandler.preCreateTable(HBaseStorageHandler.java:182)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:608)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:601)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:90)
> at com.sun.proxy.$Proxy15.createTable(Unknown Source)
> at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:671)
> at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3973)
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:295)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
> at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1604)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1364)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1177)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1004)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:994)
> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:201)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:153)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:364)
> at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:712)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:631)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:570)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> at
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
> ... 36 more
> Caused by: java.lang.ExceptionInInitializerError
> at org.apache.hadoop.hbase.ClusterId.parseFrom(ClusterId.java:64)
> at
> org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:75)
> at
> org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:105)
> at
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.retrieveClusterId(ConnectionManager.java:879)
> at
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.(ConnectionManager.java:635)
> ... 41 more
> Caused by: java.lang.IllegalArgumentException:
> java.net.UnknownHostException: A
> at
> 

Re: HBase table map to hive

2016-03-11 Thread Divya Gehlot
yes you can
Register table in hive based in hbase

CREATE EXTERNAL TABLE IF NOT EXISTS DB_NAME.TABLE_NAME(COL1 STRING,COL2
STRING,COL3 STRING)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:COL2,cf:COL3")
TBLPROPERTIES ("hbase.table.name" = "HBASE_TABLE_NAME",
"hbase.mapred.output.outputtable" = "HBASE_TABLE_NAME");


On 11 March 2016 at 16:49, ram kumar  wrote:

> Hi,
> I have a HBase table with rowkey and column family.
> Is it possible to map HBase table to hive table?
>
> Thanks
>


Re: [ANNOUNCE] New Hive Committer - Wei Zheng

2016-03-09 Thread Divya Gehlot
Congratulations Wei  for being part of one of the successful Apache project
!!


On 10 March 2016 at 09:40, Szehon Ho  wrote:

> Congratulations Wei!
>
> On Wed, Mar 9, 2016 at 5:26 PM, Vikram Dixit K  wrote:
>
>> The Apache Hive PMC has voted to make Wei Zheng a committer on the Apache
>> Hive Project. Please join me in congratulating Wei.
>>
>> Thanks
>> Vikram.
>>
>
>


[Issue:]Getting null values for Numeric types while accessing hive tables (Registered on Hbase,created through Phoenix)

2016-03-03 Thread Divya Gehlot
Hi,
I am registering hive table on Hbase

CREATE EXTERNAL TABLE IF NOT EXISTS TEST(NAME STRING,AGE INT)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,0:AGE")
TBLPROPERTIES ("hbase.table.name" = "TEST",
"hbase.mapred.output.outputtable" = "TEST");

When I am trying to access data I am getting null for age as its a numeric
field.

test.name test.age
John  null
Paul  null
Peter null


Version I am using
Phoenix  4.4
 Hbase 1.1.2
 Hive 1.2  ?
Has any body face this issue ?


Would really appreciate the help.


Thanks,
Divya


[Error]: Spark 1.5.2 + HiveHbase Integration

2016-02-29 Thread Divya Gehlot
Hi,
I am trying to access hive table which been created using HbaseIntegration

I am able to access data in Hive CLI
But when I am trying to access the table using hivecontext of Spark
getting following error

> java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/util/Bytes
> at
> org.apache.hadoop.hive.hbase.HBaseSerDe.parseColumnsMapping(HBaseSerDe.java:184)
> at
> org.apache.hadoop.hive.hbase.HBaseSerDeParameters.(HBaseSerDeParameters.java:73)
> at
> org.apache.hadoop.hive.hbase.HBaseSerDe.initialize(HBaseSerDe.java:117)
> at
> org.apache.hadoop.hive.serde2.AbstractSerDe.initialize(AbstractSerDe.java:53)
> at
> org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:521)
> at
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:391)
> at
> org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:276)
> at
> org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:258)
> at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:605)
> at
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1$$anonfun$3.apply(ClientWrapper.scala:330)
> at
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1$$anonfun$3.apply(ClientWrapper.scala:325)



Have added following jars to Spark class path :
/usr/hdp/2.3.4.0-3485/hive/lib/hive-hbase-handler.jar,
/usr/hdp/2.3.4.0-3485/hive/lib/zookeeper-3.4.6.2.3.4.0-3485.jar,
/usr/hdp/2.3.4.0-3485/hive/lib/guava-14.0.1.jar,
/usr/hdp/2.3.4.0-3485/hive/lib/protobuf-java-2.5.0.jar

Which jar files  am I missing ??


Thanks,
Regards,
Divya


[BEST PRACTICES]: Registering Hbase table as hive external table

2016-02-28 Thread Divya Gehlot
Hi,
Has any worked on registering Hbase tables as hive ?
I would like to know the best practices as well as pros and cons of it .

Would really appreciate if you could refer me to good blog ,study materials
etc.
If anybody has hands on /production experience ,could you please share the
tips?


Thanks,
Divya


[Error] : while registering Hbase table with hive

2016-02-28 Thread Divya Gehlot
Hi,
I trying to register a hbase table with hive and getting following error :

 Error while processing statement: FAILED: Execution Error, return code 1
> from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException:
> MetaException(message:org.apache.hadoop.hive.serde2.SerDeException Error:
> the HBase columns mapping contains a badly formed column family, column
> qualifier specification.)



May I know what could be the possible reason ?


Thanks,
Divya


Re: External table returns no result.

2016-02-20 Thread Divya Gehlot
Yes Gabriel is correct ..
Even I got into same issue and it got resolved by MSCK repair  table
.command
On Feb 20, 2016 12:35 AM, "Gabriel Balan"  wrote:

> Hi
>
> It's not enough to make dirs in hdfs. You need to let the metastore know
> you're adding partitions.
> Try to Recover Partitions (MSCK REPAIR TABLE)
> 
> .
>
>
> hth
> Gabriel Balan
>
> The statements and opinions expressed here are my own and do not
> necessarily represent those of Oracle Corporation.
>
> - Original Message -
> From: amrit.jan...@goibibo.com
> To: hue-u...@cloudera.org, user@hive.apache.org
> Sent: Friday, February 19, 2016 2:21:29 AM GMT -05:00 US/Canada Eastern
> Subject: External table returns no result.
>
> Hi,
>
> Trying to run queries over HDFS data using Hive external table.
>
> Created a table using the following syntax but select * from stats returns
> no result.
>
> CREATE EXTERNAL TABLE `stats`(
>> `filename` string,
>> `ts` string,
>> `type` string,
>> `module` string,
>> `method` string,
>> `line` string,
>> `query` string,
>> `qt` string,
>> `num_results` string,
>> `result_count` int,
>> `search_time` string,
>> `millis` float,
>> `ip` string)
>> PARTITIONED BY (
>> `years` bigint,
>> `months` bigint,
>> `days` bigint,
>> `hours` int)
>> ROW FORMAT DELIMITED
>> FIELDS TERMINATED BY '\t'
>> STORED AS INPUTFORMAT
>> 'org.apache.hadoop.mapred.TextInputFormat'
>> OUTPUTFORMAT
>> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
>> LOCATION
>> 'hdfs://nmlgo1912:8020/user/goibibo/external/logs/provider=stats'
>
>
> The folder structure is as given below, there are *multiple bzip2 files* 
> residing
> inside hours folder containing required data.
>
>
> /user/goibibo/external/logs/provider=stats/years=2016/months=201602/days=20160202/hours=01/
> { 1.bzip2, 2.bzip2 ...}
>
>
> Also, if table is created without partition and we point LOCATION directly
> to any particular hour everything works fine. Issue is with the partitioned
> table.
>
> Hive 0.13 ( CDH 5.3 )
>
> Please help.
> --
>
> Regards,
> Amrit
> DataPlatform Team
>
>


Re: Apache sqoop and hive

2016-02-19 Thread Divya Gehlot
sqoop import-all-tables \
  --connect "jdbc:mysql://host_name:3306/db_name" \
  --username=username \
  --password=password \
  --warehouse-dir=/user/hive/warehouse/hive_db.db

Have you tried this




On 19 February 2016 at 12:17, Archana Patel <archa...@vavni.com> wrote:

>
> hi  , Divya
>
> Actually i am able to import single table but want to import whole
> database into hive.
> I did this.
>
> sqoop import --connect jdbc:mysql://localhost:3306/sample--username
> root -P--table demo--hive-import--hive-table default.demo -m 1
> --driver com.mysql.jdbc.Driver
>
>
> ____
> From: Divya Gehlot [divya.htco...@gmail.com]
> Sent: Friday, February 19, 2016 8:37 AM
> To: user@hive.apache.org
> Subject: Re: Apache sqoop and hive
>
> Can you please post the steps how are you doing it ?
>
>
>
> On 18 February 2016 at 19:19, Archana Patel <archa...@vavni.com archa...@vavni.com>> wrote:
> hi,
>
> I am trying to import all tables from mysql to hive by using apache sqoop
> but its having problem. And i have added mysql-connector-java-5.0.8-bin.jar
> also still having problem. Can anybody help on this who has tried this
> already or any idea.
>
> Thanks ,
> Archana Patel
> skype(live:archana961)
>
>


Re: How can we find Hive version from Hive CLI or Hive shell?

2016-02-16 Thread Divya Gehlot
Try this :

issue ps -aux



ps -aux | grep -i "Hive"



On 17 February 2016 at 14:25, Abhishek Dubey 
wrote:

> Well thanks for the reply but it didn’t seems to work at my side.
>
>
>
> To be more precise, I want to determine hive version while hive is running
> like by querying or something…
>
>
>
> *Thanks & Regards,*
> *Abhishek Dubey*
>
>
>
>
>
> *From:* Amrit Jangid [mailto:amrit.jan...@goibibo.com]
> *Sent:* Wednesday, February 17, 2016 11:43 AM
> *To:* user@hive.apache.org
> *Subject:* Re: How can we find Hive version from Hive CLI or Hive shell?
>
>
>
> >>hive --version
>
> Hive 0.13.1-cdh5.3.5
>
>
>
> On Wed, Feb 17, 2016 at 11:33 AM, Abhishek Dubey <
> abhishek.du...@xoriant.com> wrote:
>
> Hi,
>
>
>
> How can we find Hive version from Hive CLI or Hive shell?
>
>
>
> *Thanks & Regards,*
> *Abhishek Dubey*
>
>
>
>
>
>
>
> --
>
>
> Regards,
>
> Amrit
>
> DataPlatform Team
>
>
>


Re: Need help :Does anybody has HDP cluster on EC2?

2016-02-15 Thread Divya Gehlot
Hi Sabarish,
Thanks alot for your help.
I am able to view the logs now

Thank you very much .

Cheers,
Divya


On 15 February 2016 at 16:51, Sabarish Sasidharan <
sabarish.sasidha...@manthan.com> wrote:

> You can setup SSH tunneling.
>
>
> http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-ssh-tunnel.html
>
> Regards
> Sab
>
> On Mon, Feb 15, 2016 at 1:55 PM, Divya Gehlot <divya.htco...@gmail.com>
> wrote:
>
>> Hi,
>> I have hadoop cluster set up in EC2.
>> I am unable to view application logs in Web UI as its taking internal IP
>> Like below :
>> http://ip-xxx-xx-xx-xxx.ap-southeast-1.compute.internal:8042
>> <http://ip-172-31-22-136.ap-southeast-1.compute.internal:8042/>
>>
>> How can I change this to external one or redirecting to external ?
>> Attached screenshots for better understanding of my issue.
>>
>> Would really appreciate help.
>>
>>
>> Thanks,
>> Divya
>>
>>
>>
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>
>
>
> --
>
> Architect - Big Data
> Ph: +91 99805 99458
>
> Manthan Systems | *Company of the year - Analytics (2014 Frost and
> Sullivan India ICT)*
> +++
>


Need help :Does anybody has HDP cluster on EC2?

2016-02-15 Thread Divya Gehlot
Hi,
I have hadoop cluster set up in EC2.
I am unable to view application logs in Web UI as its taking internal IP
Like below :
http://ip-xxx-xx-xx-xxx.ap-southeast-1.compute.internal:8042


How can I change this to external one or redirecting to external ?
Attached screenshots for better understanding of my issue.

Would really appreciate help.


Thanks,
Divya


optimize joins in hive 1.2.1

2016-01-18 Thread Divya Gehlot
Hi,
Need tips/guidance to optimize(increase perfomance) billion data rows
 joins in hive .

Any help would be appreciated.


Thanks,
Divya


Re: how does Hive Partitioning works ?

2015-12-30 Thread Divya Gehlot
 same number of buckets (anyone has tried this?)
>
>
>
> One more things. When one defines the number of buckets at table creation
> level in Hive, the number of partitions/files will be fixed. In contrast,
> with partitioning you do not have this limitation.
>
>
>
> Mich Talebzadeh
>
>
>
> *Sybase ASE 15 Gold Medal Award 2008*
>
> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>
>
> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>
> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE
> 15", ISBN 978-0-9563693-0-7*.
>
> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN
> 978-0-9759693-0-4*
>
> *Publications due shortly:*
>
> *Complex Event Processing in Heterogeneous Environments*, ISBN:
> 978-0-9563693-3-8
>
> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume
> one out shortly
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Ltd, its subsidiaries nor their employees
> accept any responsibility.
>
>
>
> *From:* Divya Gehlot [mailto:divya.htco...@gmail.com]
> *Sent:* 30 December 2015 10:44
> *To:* user@hive.apache.org
> *Subject:* how does Hive Partitioning works ?
>
>
>
> Hi,
>
> I am new bee to hive and trying to understand the hive partitioning .
> My files are in CSV format
> Steps which I followed
> CREATE EXTERNAL TABLE IF NOT EXISTS loan_depo_part(COLUMN1 String ,COLUMN2
> String ,COLUMN3 String ,
>COLUMN4 String,COLUMN5
> String,COLUMN6 String,
>COLUMN7 Int ,COLUMN8 Int
> ,COLUMN9 String ,
>COLUMN10 String ,COLUMN11
> String ,COLUMN12 String,
>COLUMN13 String ,COLUMN14
> String ,
>COLUMN15 String ,COLUMN16
> String ,
>COLUMN17 String ,COLUMN18
> String ,
>COLUMN19 String ,COLUMN20
> String ,
>COLUMN21 String ,COLUMN22
> String )
> COMMENT 'testing Partition'
> PARTITIONED BY (Year String,Month String ,Day String)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES ("skip.header.line.count"="1") ;
>
>
>
> ALTER TABLE loan_depo_part ADD IF NOT EXISTS PARTITION(
> Year=2015,Month=01,Day=01);
>
> ALTER TABLE loan_depo_part PARTITION(Year=2015,Month=01,Day=01)
>  SET LOCATION
> 'hdfs://namenode:8020/tmp/TestDivya/HiveInput/year=2015/month=01/day=01/';
>
>
>
> Whereas my HDFS data location is
> /TestDivya/HiveInput/year=2015/month=01/day=01/
>
> I have few queries regarding the above partioning :
>
> 1. It creates the table when run the second step and gives the select
> command it doesnt diplay any data
>
> 2. Do I need to create normal external table first and the partitioned one
> next
>
>  and then do the insert overwrite.
>
> Basically I am not able to understand the partioning things mentioned
> above
>
> I followed this link
> <http://deanwampler.github.io/polyglotprogramming/papers/Hive-SQLforHadoop.pdf>
>
> Would really appreciate the help/pointers.
>
> Thanks,
>
> Divya
>
>
>


RE: how does Hive Partitioning works ?

2015-12-30 Thread Divya Gehlot
 1428862151905  | hduser  | rhes564   |
>
>
> +--+-+++-+--+-+-++-+---+--+
>
>
>
> Now your point 2, bucketing in Hive refers to hash partitioning where a
> hashing function is applied. Likewise an RDBMS, Hive will apply a linear
> hashing algorithm to prevent data from clustering within specific
> partitions. Hashing is very effective if the column selected for bucketing
> has very high selectivity like an ID column where selectivity (*select
> count(distinct(column))/count(column)* ) = 1.  In this case, the created
> partitions/ files will be as evenly sized as possible. In a nutshell
> bucketing is a method to get data evenly distributed over many
> partitions/files.  One should define the number of buckets by a power of
> two -- 2^n,  like 2, 4, 8, 16 etc to achieve best results. Again bucketing
> will help concurrency in Hive. It may even allow a *partition wise join*
> i.e. a join between two tables that are bucketed on the same column with
> the same number of buckets (anyone has tried this?)
>
>
>
> One more things. When one defines the number of buckets at table creation
> level in Hive, the number of partitions/files will be fixed. In contrast,
> with partitioning you do not have this limitation.
>
>
>
> Mich Talebzadeh
>
>
>
> *Sybase ASE 15 Gold Medal Award 2008*
>
> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>
>
> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>
> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE
> 15", ISBN 978-0-9563693-0-7*.
>
> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN
> 978-0-9759693-0-4*
>
> *Publications due shortly:*
>
> *Complex Event Processing in Heterogeneous Environments*, ISBN:
> 978-0-9563693-3-8
>
> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume
> one out shortly
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Ltd, its subsidiaries nor their employees
> accept any responsibility.
>
>
>
> *From:* Divya Gehlot [mailto:divya.htco...@gmail.com]
> *Sent:* 30 December 2015 10:44
> *To:* user@hive.apache.org
> *Subject:* how does Hive Partitioning works ?
>
>
>
> Hi,
>
> I am new bee to hive and trying to understand the hive partitioning .
> My files are in CSV format
> Steps which I followed
> CREATE EXTERNAL TABLE IF NOT EXISTS loan_depo_part(COLUMN1 String ,COLUMN2
> String ,COLUMN3 String ,
>COLUMN4 String,COLUMN5
> String,COLUMN6 String,
>COLUMN7 Int ,COLUMN8 Int
> ,COLUMN9 String ,
>COLUMN10 String ,COLUMN11
> String ,COLUMN12 String,
>COLUMN13 String ,COLUMN14
> String ,
>COLUMN15 String ,COLUMN16
> String ,
>COLUMN17 String ,COLUMN18
> String ,
>COLUMN19 String ,COLUMN20
> String ,
>COLUMN21 String ,COLUMN22
> String )
> COMMENT 'testing Partition'
> PARTITIONED BY (Year String,Month String ,Day String)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES ("skip.header.line.count"="1") ;
>
>
>
> ALTER TABLE loan_depo_part ADD IF NOT EXISTS PARTITION(
> Year=2015,Month=01,Day=01);
>
> ALTER TABLE loan_depo_part PARTITION(Year=2015,Month=01,Day=01)
>  SET LOCATION
> 'hdfs://namenode:8020/tmp/TestDivya/HiveInput/year=2015/month=01/day=01/';
>
>
>
> Whereas my HDFS data location is
> /TestDivya/HiveInput/year=2015/month=01/day=01/
>
> I have few queries regarding the above partioning :
>
> 1. It creates the table when run the second step and gives the select
> command it doesnt diplay any data
>
> 2. Do I need to create normal external table first and the partitioned one
> next
>
>  and then do the insert overwrite.
>
> Basically I am not able to understand the partioning things mentioned
> above
>
> I followed this link
> <http://deanwampler.github.io/polyglotprogramming/papers/Hive-SQLforHadoop.pdf>
>
> Would really appreciate the help/pointers.
>
> Thanks,
>
> Divya
>
>
>


error while defining custom schema in Spark 1.5.0

2015-12-22 Thread Divya Gehlot
Hi,
I am new bee to Apache Spark ,using  CDH 5.5 Quick start VM.having spark
1.5.0.
I working on custom schema and getting error

import org.apache.spark.sql.hive.HiveContext
>>
>> scala> import org.apache.spark.sql.hive.orc._
>> import org.apache.spark.sql.hive.orc._
>>
>> scala> import org.apache.spark.sql.types.{StructType, StructField,
>> StringType, IntegerType};
>> import org.apache.spark.sql.types.{StructType, StructField, StringType,
>> IntegerType}
>>
>> scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>> 15/12/21 23:41:53 INFO hive.HiveContext: Initializing execution hive,
>> version 1.1.0
>> 15/12/21 23:41:53 INFO client.ClientWrapper: Inspected Hadoop version:
>> 2.6.0-cdh5.5.0
>> 15/12/21 23:41:53 INFO client.ClientWrapper: Loaded
>> org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0-cdh5.5.0
>> hiveContext: org.apache.spark.sql.hive.HiveContext =
>> org.apache.spark.sql.hive.HiveContext@214bd538
>>
>> scala> val customSchema = StructType(Seq(StructField("year", IntegerType,
>> true),StructField("make", StringType, true),StructField("model",
>> StringType, true),StructField("comment", StringType,
>> true),StructField("blank", StringType, true)))
>> customSchema: org.apache.spark.sql.types.StructType =
>> StructType(StructField(year,IntegerType,true),
>> StructField(make,StringType,true), StructField(model,StringType,true),
>> StructField(comment,StringType,true), StructField(blank,StringType,true))
>>
>> scala> val customSchema = (new StructType).add("year", IntegerType,
>> true).add("make", StringType, true).add("model", StringType,
>> true).add("comment", StringType, true).add("blank", StringType, true)
>> customSchema: org.apache.spark.sql.types.StructType =
>> StructType(StructField(year,IntegerType,true),
>> StructField(make,StringType,true), StructField(model,StringType,true),
>> StructField(comment,StringType,true), StructField(blank,StringType,true))
>>
>> scala> val customSchema = StructType( StructField("year", IntegerType,
>> true) :: StructField("make", StringType, true) :: StructField("model",
>> StringType, true) :: StructField("comment", StringType, true) ::
>> StructField("blank", StringType, true)::StructField("blank", StringType,
>> true))
>> :24: error: value :: is not a member of
>> org.apache.spark.sql.types.StructField
>>val customSchema = StructType( StructField("year", IntegerType,
>> true) :: StructField("make", StringType, true) :: StructField("model",
>> StringType, true) :: StructField("comment", StringType, true) ::
>> StructField("blank", StringType, true)::StructField("blank", StringType,
>> true))
>>
>
Tried like like below also

scala> val customSchema = StructType( StructField("year", IntegerType,
true), StructField("make", StringType, true) ,StructField("model",
StringType, true) , StructField("comment", StringType, true) ,
StructField("blank", StringType, true),StructField("blank", StringType,
true))
:24: error: overloaded method value apply with alternatives:
  (fields:
Array[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType

  (fields:
java.util.List[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType

  (fields:
Seq[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType
 cannot be applied to (org.apache.spark.sql.types.StructField,
org.apache.spark.sql.types.StructField,
org.apache.spark.sql.types.StructField,
org.apache.spark.sql.types.StructField,
org.apache.spark.sql.types.StructField,
org.apache.spark.sql.types.StructField)
   val customSchema = StructType( StructField("year", IntegerType,
true), StructField("make", StringType, true) ,StructField("model",
StringType, true) , StructField("comment", StringType, true) ,
StructField("blank", StringType, true),StructField("blank", StringType,
true))
  ^
   Would really appreciate if somebody share the example which works with
Spark 1.4 or Spark 1.5.0

Thanks,
Divya

^


configure spark for hive context

2015-12-21 Thread Divya Gehlot
Hi,
I am trying to configure spark for hive context  (Please dont get mistaken
with hive on spark )
I placed hive-site.xml in spark/CONF_DIR
Now when I run spark-shell I am getting below error
Version which I am using




*Hadoop 2.6.2  Spark 1.5.2   Hive 1.2.1 *


Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 1.5.2
>   /_/
>
> Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
> 1.8.0_66)
> Type in expressions to have them evaluated.
> Type :help for more information.
> Spark context available as sc.
> java.lang.RuntimeException: java.lang.IllegalArgumentException:
> java.net.URISyntaxException: Relative path in absolute URI:
> ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
> at
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
> at
> org.apache.spark.sql.hive.client.ClientWrapper.(ClientWrapper.scala:171)
> at
> org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:162)
> at
> org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:160)
> at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:167)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at
> org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1028)
> at $iwC$$iwC.(:9)
> at $iwC.(:18)
> at (:20)
> at .(:24)
> at .()
> at .(:7)
> at .()
> at $print()
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
> at
> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
> at
> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
> at
> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
> at
> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
> at
> org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:132)
> at
> org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:124)
> at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324)
> at
> org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:124)
> at
> org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64)
> at
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:974)
> at
> org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:159)
> at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64)
> at
> org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:108)
> at
> org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:64)
> at
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:991)
> at
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> at org.apache.spark.repl.SparkILoop.org
> $apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)
> at
> 

Difference between Local Hive Metastore server and A Hive-based Metastore server

2015-12-17 Thread Divya Gehlot
Hi,
I am new bee to spark and using 1.4.1
Got confused between  Local Metastore server and a hive based metastore
server.
Can somebody share the usecases when to use which one  and pros and cons ?

I am using HDP 2,.3.2 in which hive-site-xml is already in spark
configuration directory that means HDP 2.3.2 already uses hive based
metastore server.


Pros and cons -Saving spark data in hive

2015-12-15 Thread Divya Gehlot
Hi,
I am new bee to Spark and  I am exploring option and pros and cons which
one will work best in spark and hive context.My  dataset  inputs are CSV
files, using spark to process the my data and saving it in hive using
hivecontext

1) Process the CSV file using spark-csv package and create temptable and
store the data in hive using hive context.
2) Process the file as normal text file in sqlcontext  ,register its as
temptable in sqlcontext and store it as ORC file and read that ORC file in
hive context and store it in hive.

Is there any other best options apart from mentioned above.
Would really appreciate the inputs.
Thanks in advance.

Thanks,
Regards,
Divya


Re: What hive-site.xml including in $SPARK_HOME looks like ?

2015-12-10 Thread Divya Gehlot
Hive-site.xml in spark /conf looks like below
 


  hive.metastore.uris
  thrift://your cluster ip :9083


  

On 11 December 2015 at 13:09, zml张明磊  wrote:

> Hi,
>
>
>
>  I am a beginner at hive, something happened (can not find table )
> when I start spark job and read data from hive. I don't set hive-site.xml
> in $SPARK_HOME/conf. What is the default hive-site.xml looks like ?
>
>
>
> Thanks,
>
> Minglei.
>
>
>


org.apache.spark.SparkException: Task failed while writing rows.+ Spark output data to hive table

2015-12-10 Thread Divya Gehlot
Hi,

I am using HDP2.3.2 with Spark 1.4.1 and trying to insert data in hive
table using hive context.

Below is the sample code


   1. spark-shell   --master yarn-client --driver-memory 512m
--executor-memory 512m
   2. //Sample code
   3. import org.apache.spark.sql.SQLContext
   4. import sqlContext.implicits._
   5. val sqlContext = new org.apache.spark.sql.SQLContext(sc)
   6. val people = sc.textFile("/user/spark/people.txt")
   7. val schemaString = "name age"
   8. import org.apache.spark.sql.Row;
   9. import org.apache.spark.sql.types.{StructType,StructField,StringType};
   10. val schema =
   11.   StructType(
   12. schemaString.split(" ").map(fieldName =>
StructField(fieldName, StringType, true)))
   13. val rowRDD = people.map(_.split(",")).map(p => Row(p(0), p(1).trim))
   14. //Create hive context
   15. val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
   16. //Apply the schema to the
   17. val df = hiveContext.createDataFrame(rowRDD, schema);
   18. val options = Map("path" ->
"hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/personhivetable")
   19. 
df.write.format("org.apache.spark.sql.hive.orc.DefaultSource").options(options).saveAsTable("personhivetable")

Getting below error :


   1. org.apache.spark.SparkException: Task failed while writing rows.
   2.   at 
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org$apache$spark$sql$sources$InsertIntoHadoopFsRelation$writeRows$1(commands.scala:191)
   3.   at 
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$anonfun$insert$1.apply(commands.scala:160)
   4.   at 
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$anonfun$insert$1.apply(commands.scala:160)
   5.   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
   6.   at org.apache.spark.scheduler.Task.run(Task.scala:70)
   7.   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
   8.   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   9.   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   10.  at java.lang.Thread.run(Thread.java:745)
   11. Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
   12.  at 
$line30.$read$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$anonfun$2.apply(:29)
   13.  at 
$line30.$read$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$anonfun$2.apply(:29)
   14.  at scala.collection.Iterator$anon$11.next(Iterator.scala:328)
   15.  at scala.collection.Iterator$anon$11.next(Iterator.scala:328)
   16.  at 
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org$apache$spark$sql$sources$InsertIntoHadoopFsRelation$writeRows$1(commands.scala:182)
   17.  ... 8 more

Is it configuration issue?

When I googled it I found out that Environment variable named HIVE_CONF_DIR
should be there in spark-env.sh

Then I checked spark-env.sh in HDP2.3.2,I couldnt find the Environment
variable named HIVE_CONF_DIR .

Do I need to add above mentioned variables to insert spark output data to
hive tables.

Would really appreciate pointers.

Thanks,

Divya
Add comment



getting error while persisting in hive

2015-12-09 Thread Divya Gehlot
Hi,
I am using spark 1.4.1 .
I am getting error when persisting spark dataframe output to hive

> scala>
> df.select("name","age").write().format("com.databricks.spark.csv").mode(SaveMode.Append).saveAsTable("PersonHiveTable");
> :39: error: org.apache.spark.sql.DataFrameWriter does not take
> parameters
>
>

Can somebody points me whats wrong here ?

Would really appreciate your help.

Thanks in advance

Divya


Unable to acces hive table (created through hive context) in hive console

2015-12-07 Thread Divya Gehlot
Hi,

I am new bee to Spark and using HDP 2.2 which comes with Spark 1.3.1
I tried following  code example

> import org.apache.spark.sql.SQLContext
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> import sqlContext.implicits._
>
> val personFile = "/user/hdfs/TestSpark/Person.csv"
> val df = sqlContext.load(
> "com.databricks.spark.csv",
> Map("path" -> personFile, "header" -> "true", "inferSchema" -> "true"))
> df.printSchema()
> val selectedData = df.select("Name", "Age")
> selectedData.save("NewPerson.csv", "com.databricks.spark.csv")
> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
> hiveContext.sql("CREATE TABLE IF NOT EXISTS PersonTable (Name STRING, Age
> STRING)")
> hiveContext.sql("LOAD DATA  INPATH '/user/hdfs/NewPerson.csv' INTO TABLE
> PersonTable")
> hiveContext.sql("from PersonTable SELECT Name, Age
> ").collect.foreach(println)


I am able to access above table in HDFS

> [hdfs@sandbox ~]$ hadoop fs -ls /user/hive/warehouse/persontable
> Found 3 items
> -rw-r--r--   1 hdfs hdfs  0 2015-12-08 04:40
> /user/hive/warehouse/persontable/_SUCCESS
> -rw-r--r--   1 hdfs hdfs 47 2015-12-08 04:40
> /user/hive/warehouse/persontable/part-0
> -rw-r--r--   1 hdfs hdfs 33 2015-12-08 04:40
> /user/hive/warehouse/persontable/part-1


But when I try show tables in hive console ,I couldnt find the table.

> hive> use default ;
> OK
> Time taken: 0.864 seconds
> hive> show tables;
> OK
> dataframe_test
> sample_07
> sample_08
> Time taken: 0.521 seconds, Fetched: 3 row(s)
> hive> use xademo ;
> OK
> Time taken: 0.791 seconds
> hive> show tables;
> OK
> call_detail_records
> customer_details
> recharge_details
> Time taken: 0.256 seconds, Fetched: 3 row(s)


Can somebody guide me to right direction ,if something is wrong with the
code or I am unable to understand the concepts.
Would really appreciate your help.

Thanks,
Divya