sqoop issue

2017-02-21 Thread Raj hadoop
I'm using hadoop 2.5.1 and sqoop 1.4.6.

I am using sqoop import for importing table from mysql database to be used
with hadoop. It is showing following error

Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.hadoop.fs.FSOutputSummer


How to handle RAW data type of oracle in SQOOP import

2017-02-21 Thread Raj hadoop
How to handle RAW data type of oracle in SQOOP import


Re: Permission denied: user=root, access=WRITE, inode="/user/root":hdfs:hdfs:drwxr-xr-x

2016-10-05 Thread Raj hadoop
seems its already present Amrit,

hdpmaster001:~ # useradd -G hdfs root
useradd: Account `root' already exists.
hdpmaster001:~ #


On Wed, Oct 5, 2016 at 2:46 PM, Amrit Jangid <amrit.jan...@goibibo.com>
wrote:

> Hi Raj
>
> Do add root user into hdfs group.
> Run this command on your NameNode server.
>
>
> useradd -G hdfs root
>
> On Wed, Oct 5, 2016 at 2:07 PM, Raj hadoop <raj.had...@gmail.com> wrote:
>
>> Im getting it when im trying to start hive
>>
>> hdpmaster001:~ # hive
>> WARNING: Use "yarn jar" to launch YARN applications.
>>
>> how can I execute the same,
>> Thanks,
>> Raj.
>>
>> On Wed, Oct 5, 2016 at 1:56 PM, Raj hadoop <raj.had...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> Could someone help in to solve this issue,
>>>
>>> Logging initialized using configuration in file:/etc/hive/2.4.2.0-258/0/h
>>> ive-log4j.properties
>>> Exception in thread "main" java.lang.RuntimeException:
>>> org.apache.hadoop.security.AccessControlException: Permission denied:
>>> user=root, access=WRITE, inode="/user/root":hdfs:hdfs:drwxr-xr-x
>>> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c
>>> heck(FSPermissionChecker.java:319)
>>> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c
>>> heck(FSPermissionChecker.java:292)
>>> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c
>>> heckPermission(FSPermissionChecker.java:213)
>>> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c
>>> heckPermission(FSPermissionChecker.java:190)
>>> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPerm
>>> ission(FSDirectory.java:1780)
>>> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPerm
>>> ission(FSDirectory.java:1764)
>>> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAnce
>>> storAccess(FSDirectory.java:1747)
>>> at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(F
>>> SDirMkdirOp.java:71)
>>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(F
>>> SNamesystem.java:3972)
>>> at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkd
>>> irs(NameNodeRpcServer.java:1081)
>>> at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServ
>>> erSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTr
>>> anslatorPB.java:630)
>>> at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocol
>>> Protos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNam
>>> enodeProtocolProtos.java)
>>> at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcIn
>>> voker.call(ProtobufRpcEngine.java:616)
>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2206)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2202)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:422)
>>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>>> upInformation.java:1709)
>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2200)
>>>
>>> at org.apache.hadoop.hive.ql.session.SessionState.start(Session
>>> State.java:516)
>>> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:680)
>>> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>>> ssorImpl.java:62)
>>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>>> thodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:498)
>>> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
>>> Caused by: org.apache.hadoop.security.AccessControlException:
>>> Permission denied: user=root, access=WRITE, inode="/user/root":hdfs:hdfs:d
>>> rwxr-xr-x
>>> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c
>>> heck(FSPermissionChecker.java:319)
>>> at org.apache.hadoop.hdfs.server.namenode.FSPermissionCh

Re: Permission denied: user=root, access=WRITE, inode="/user/root":hdfs:hdfs:drwxr-xr-x

2016-10-05 Thread Raj hadoop
Im getting it when im trying to start hive

hdpmaster001:~ # hive
WARNING: Use "yarn jar" to launch YARN applications.

how can I execute the same,
Thanks,
Raj.

On Wed, Oct 5, 2016 at 1:56 PM, Raj hadoop <raj.had...@gmail.com> wrote:

> Hi All,
>
> Could someone help in to solve this issue,
>
> Logging initialized using configuration in file:/etc/hive/2.4.2.0-258/0/
> hive-log4j.properties
> Exception in thread "main" java.lang.RuntimeException:
> org.apache.hadoop.security.AccessControlException: Permission denied:
> user=root, access=WRITE, inode="/user/root":hdfs:hdfs:drwxr-xr-x
> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
> check(FSPermissionChecker.java:319)
> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
> check(FSPermissionChecker.java:292)
> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
> checkPermission(FSPermissionChecker.java:213)
> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
> checkPermission(FSPermissionChecker.java:190)
> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.
> checkPermission(FSDirectory.java:1780)
> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.
> checkPermission(FSDirectory.java:1764)
> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.
> checkAncestorAccess(FSDirectory.java:1747)
> at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(
> FSDirMkdirOp.java:71)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(
> FSNamesystem.java:3972)
> at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.
> mkdirs(NameNodeRpcServer.java:1081)
> at org.apache.hadoop.hdfs.protocolPB.
> ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(
> ClientNamenodeProtocolServerSideTranslatorPB.java:630)
> at org.apache.hadoop.hdfs.protocol.proto.
> ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(
> ClientNamenodeProtocolProtos.java)
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$
> ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2206)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2202)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1709)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2200)
>
> at org.apache.hadoop.hive.ql.session.SessionState.start(
> SessionState.java:516)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:680)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: org.apache.hadoop.security.AccessControlException: Permission
> denied: user=root, access=WRITE, inode="/user/root":hdfs:hdfs:drwxr-xr-x
> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
> check(FSPermissionChecker.java:319)
> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
> check(FSPermissionChecker.java:292)
> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
> checkPermission(FSPermissionChecker.java:213)
> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
> checkPermission(FSPermissionChecker.java:190)
> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.
> checkPermission(FSDirectory.java:1780)
> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.
> checkPermission(FSDirectory.java:1764)
> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.
> checkAncestorAccess(FSDirectory.java:1747)
> at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(
> FSDirMkdirOp.java:71)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(
> FSNamesystem.java:3972)
> at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.
> mkdirs(NameNodeRpcServer.java:1081)
> at org.apache.hadoop.hdfs.protocolPB.
> ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(
> ClientNamen

Permission denied: user=root, access=WRITE, inode="/user/root":hdfs:hdfs:drwxr-xr-x

2016-10-05 Thread Raj hadoop
Hi All,

Could someone help in to solve this issue,

Logging initialized using configuration in
file:/etc/hive/2.4.2.0-258/0/hive-log4j.properties
Exception in thread "main" java.lang.RuntimeException:
org.apache.hadoop.security.AccessControlException: Permission denied:
user=root, access=WRITE, inode="/user/root":hdfs:hdfs:drwxr-xr-x
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292)
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:213)
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1780)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1764)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1747)
at
org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:71)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3972)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1081)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:630)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2206)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2202)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2200)

at
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:516)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:680)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.hadoop.security.AccessControlException: Permission
denied: user=root, access=WRITE, inode="/user/root":hdfs:hdfs:drwxr-xr-x
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292)
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:213)
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1780)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1764)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1747)
at
org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:71)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3972)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1081)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:630)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2206)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2202)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at

Re: hive concurrency not working

2016-08-04 Thread Raj hadoop
Thanks everyone..

we are raising case with Hortonworks

On Wed, Aug 3, 2016 at 6:44 PM, Raj hadoop <raj.had...@gmail.com> wrote:

> Dear All,
>
> In need or your help,
>
> we have horton works 4 node cluster,and the problem is hive is allowing
> only one user at a time,
>
> if any second resource need to login hive is not working,
>
> could someone please help me in this
>
> Thanks,
> Rajesh
>


hive concurrency not working

2016-08-03 Thread Raj hadoop
Dear All,

In need or your help,

we have horton works 4 node cluster,and the problem is hive is allowing
only one user at a time,

if any second resource need to login hive is not working,

could someone please help me in this

Thanks,
Rajesh


Re: Unable to start Hive CLI after install

2016-04-04 Thread Raj Hadoop
Hi Mich -I did all those steps. Some how i am not able to find out whats the 
issue. Can you suggest any debugging tips ?Regards,Rajendra

 

On Monday, April 4, 2016 12:16 PM, Mich Talebzadeh 
<mich.talebza...@gmail.com> wrote:
 

 HI Raj,
Hive 2 is as good to go :) Check this
I see that you are using Oracle DB as your metastore. Mine is Oracle as well
  
    javax.jdo.option.ConnectionURL
    jdbc:oracle:thin:@rhes564:1521:mydb
    JDBC connect string for a JDBC metastore
  
Also need username/password for your metastore
  
    javax.jdo.option.ConnectionUserName
    hiveuser
    Username to use against metastore database
  
  
    javax.jdo.option.ConnectionPassword
    xxx
    password to use against metastore database
  

Now you also need to put Oracle jar file ojdbc6.jar in $HIVE_HOME/lib otherwise 
you won't be able to connect
HTH

Dr Mich Talebzadeh LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 http://talebzadehmich.wordpress.com 
On 4 April 2016 at 20:02, Raj Hadoop <hadoop...@yahoo.com> wrote:

Sorry in a typo with your name - Mich.
 

On Monday, April 4, 2016 12:01 PM, Raj Hadoop <hadoop...@yahoo.com> wrote:
 

 Thanks Mike. If Hive 2.0 is stable - i would definitely go for it. But let me 
troubleshoot 1.1.1 issues i am facing now.
here is my hive-site.xml. Can you please let me know if i am missing anything.


hive.exec.scratchdir
/tmp/hive




hive.metastore.local
false




hive.metastore.warehouse.dir
hdfs://z1:8899/user/hive/warehouse



javax.jdo.option.ConnectionURL
jdbc:oracle:thin:@//z4:1521/xe

 

javax.jdo.option.ConnectionDriverName
com.oracle.jdbc.Driver

 

javax.jdo.option.ConnectionUserName
hive

 

javax.jdo.option.ConnectionPassword
hive

 


    hive.querylog.location
    $HIVE_HOME/iotmp
    Location of Hive run time structured log file
  

  
    hive.exec.local.scratchdir
    $HIVE_HOME/iotmp
    Local scratch space for Hive jobs
  

  
    hive.downloaded.resources.dir
    $HIVE_HOME/iotmp
    Temporary local directory for added resources in the remote 
file system.
  



  

On Monday, April 4, 2016 11:46 AM, Mich Talebzadeh 
<mich.talebza...@gmail.com> wrote:
 

 Interesting why you did not download Hive 2.0 which is out now
The error says:
 HiveConf of name hive.metastore.local does not exist
In you hive-site.xml how have you configured parameters for hive.metastore?
HTH

Dr Mich Talebzadeh LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 http://talebzadehmich.wordpress.com 
On 4 April 2016 at 18:25, Raj Hadoop <hadoop...@yahoo.com> wrote:

Hi,
I have downloaded apache hive 1.1.1 and trying to setup hive environment in my 
hadoop cluster.
On one of the nodes i installed hive and when i set all the variables and 
environment i am getting the following error.Please advise.

[hadoop@z1 bin]$ hive
2016-04-04 10:12:45,686 WARN  [main] conf.HiveConf 
(HiveConf.java:initialize(2605)) - HiveConf of name hive.metastore.local does 
not exist

Logging initialized using configuration in 
jar:file:/home/hadoop/hive/hive111/lib/hive-common-1.1.1.jar!/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/home/hadoop/hadoop262/hadoop262/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/home/hadoop/hive/hive111/lib/hive-jdbc-1.1.1-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.RuntimeException: 
java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:472)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1485)
    at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:64)
    at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:74)
    at 
org.apache.hadoop.hive.ql.metadata.H

Re: Unable to start Hive CLI after install

2016-04-04 Thread Raj Hadoop
Thanks Mike. If Hive 2.0 is stable - i would definitely go for it. But let me 
troubleshoot 1.1.1 issues i am facing now.
here is my hive-site.xml. Can you please let me know if i am missing anything.


hive.exec.scratchdir
/tmp/hive




hive.metastore.local
false




hive.metastore.warehouse.dir
hdfs://z1:8899/user/hive/warehouse



javax.jdo.option.ConnectionURL
jdbc:oracle:thin:@//z4:1521/xe

 

javax.jdo.option.ConnectionDriverName
com.oracle.jdbc.Driver

 

javax.jdo.option.ConnectionUserName
hive

 

javax.jdo.option.ConnectionPassword
hive

 


    hive.querylog.location
    $HIVE_HOME/iotmp
    Location of Hive run time structured log file
  

  
    hive.exec.local.scratchdir
    $HIVE_HOME/iotmp
    Local scratch space for Hive jobs
  

  
    hive.downloaded.resources.dir
    $HIVE_HOME/iotmp
    Temporary local directory for added resources in the remote 
file system.
  



  

On Monday, April 4, 2016 11:46 AM, Mich Talebzadeh 
<mich.talebza...@gmail.com> wrote:
 

 Interesting why you did not download Hive 2.0 which is out now
The error says:
 HiveConf of name hive.metastore.local does not exist
In you hive-site.xml how have you configured parameters for hive.metastore?
HTH

Dr Mich Talebzadeh LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 http://talebzadehmich.wordpress.com 
On 4 April 2016 at 18:25, Raj Hadoop <hadoop...@yahoo.com> wrote:

Hi,
I have downloaded apache hive 1.1.1 and trying to setup hive environment in my 
hadoop cluster.
On one of the nodes i installed hive and when i set all the variables and 
environment i am getting the following error.Please advise.

[hadoop@z1 bin]$ hive
2016-04-04 10:12:45,686 WARN  [main] conf.HiveConf 
(HiveConf.java:initialize(2605)) - HiveConf of name hive.metastore.local does 
not exist

Logging initialized using configuration in 
jar:file:/home/hadoop/hive/hive111/lib/hive-common-1.1.1.jar!/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/home/hadoop/hadoop262/hadoop262/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/home/hadoop/hive/hive111/lib/hive-jdbc-1.1.1-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.RuntimeException: 
java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:472)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1485)
    at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:64)
    at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:74)
    at 
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2841)
    at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2860)
    at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:453)
    ... 8 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1483)
    ... 13 more
Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional 
connection factory
NestedThrowables:
java.lang.reflect.InvocationTargetException
    at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:587)
    at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:78


Regards,Raj





  

Re: Unable to start Hive CLI after install

2016-04-04 Thread Raj Hadoop
Sorry in a typo with your name - Mich.
 

On Monday, April 4, 2016 12:01 PM, Raj Hadoop <hadoop...@yahoo.com> wrote:
 

 Thanks Mike. If Hive 2.0 is stable - i would definitely go for it. But let me 
troubleshoot 1.1.1 issues i am facing now.
here is my hive-site.xml. Can you please let me know if i am missing anything.


hive.exec.scratchdir
/tmp/hive




hive.metastore.local
false




hive.metastore.warehouse.dir
hdfs://z1:8899/user/hive/warehouse



javax.jdo.option.ConnectionURL
jdbc:oracle:thin:@//z4:1521/xe

 

javax.jdo.option.ConnectionDriverName
com.oracle.jdbc.Driver

 

javax.jdo.option.ConnectionUserName
hive

 

javax.jdo.option.ConnectionPassword
hive

 


    hive.querylog.location
    $HIVE_HOME/iotmp
    Location of Hive run time structured log file
  

  
    hive.exec.local.scratchdir
    $HIVE_HOME/iotmp
    Local scratch space for Hive jobs
  

  
    hive.downloaded.resources.dir
    $HIVE_HOME/iotmp
    Temporary local directory for added resources in the remote 
file system.
  



  

On Monday, April 4, 2016 11:46 AM, Mich Talebzadeh 
<mich.talebza...@gmail.com> wrote:
 

 Interesting why you did not download Hive 2.0 which is out now
The error says:
 HiveConf of name hive.metastore.local does not exist
In you hive-site.xml how have you configured parameters for hive.metastore?
HTH

Dr Mich Talebzadeh LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 http://talebzadehmich.wordpress.com 
On 4 April 2016 at 18:25, Raj Hadoop <hadoop...@yahoo.com> wrote:

Hi,
I have downloaded apache hive 1.1.1 and trying to setup hive environment in my 
hadoop cluster.
On one of the nodes i installed hive and when i set all the variables and 
environment i am getting the following error.Please advise.

[hadoop@z1 bin]$ hive
2016-04-04 10:12:45,686 WARN  [main] conf.HiveConf 
(HiveConf.java:initialize(2605)) - HiveConf of name hive.metastore.local does 
not exist

Logging initialized using configuration in 
jar:file:/home/hadoop/hive/hive111/lib/hive-common-1.1.1.jar!/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/home/hadoop/hadoop262/hadoop262/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/home/hadoop/hive/hive111/lib/hive-jdbc-1.1.1-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.RuntimeException: 
java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:472)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1485)
    at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:64)
    at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:74)
    at 
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2841)
    at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2860)
    at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:453)
    ... 8 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1483)
    ... 13 more
Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional 
connection factory
NestedThrowables:
java.lang.reflect.InvocationTargetException
    at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:587)
    at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:78


Regards,Raj





   

  

Unable to start Hive CLI after install

2016-04-04 Thread Raj Hadoop
Hi,
I have downloaded apache hive 1.1.1 and trying to setup hive environment in my 
hadoop cluster.
On one of the nodes i installed hive and when i set all the variables and 
environment i am getting the following error.Please advise.

[hadoop@z1 bin]$ hive
2016-04-04 10:12:45,686 WARN  [main] conf.HiveConf 
(HiveConf.java:initialize(2605)) - HiveConf of name hive.metastore.local does 
not exist

Logging initialized using configuration in 
jar:file:/home/hadoop/hive/hive111/lib/hive-common-1.1.1.jar!/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/home/hadoop/hadoop262/hadoop262/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/home/hadoop/hive/hive111/lib/hive-jdbc-1.1.1-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.RuntimeException: 
java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:472)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1485)
    at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:64)
    at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:74)
    at 
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2841)
    at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2860)
    at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:453)
    ... 8 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1483)
    ... 13 more
Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional 
connection factory
NestedThrowables:
java.lang.reflect.InvocationTargetException
    at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:587)
    at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:78


Regards,Raj



HCatStorer error

2015-11-24 Thread Raj hadoop
We are facing below mentioned error on storing dataset using HCatStorer.Can
someone please help us



STORE F INTO 'default.CONTENT_SVC_USED' using
org.apache.hive.hcatalog.pig.HCatStorer();



ERROR hive.log - Got exception: java.net.URISyntaxException Malformed
escape pair at index 9: thrift://%HOSTGROUP::host_group_master1%:9933

java.net.URISyntaxException: Malformed escape pair at index 9:
thrift://%HOSTGROUP::host_group_master1%:9933


Thanks,

Raj


select * from table and select column from table in hive

2014-10-20 Thread Raj Hadoop
I am able to see the data in the table for all the columns when I issue the 
following - 

SELECT * FROM t1 WHERE dt1='2013-11-20' 


But I am unable to see the column data when i issue the following - 

SELECT cust_num FROM t1 WHERE dt1='2013-11-20' 

The above shows null values. 

How should I debug this ?


Re: Remove duplicate records in Hive

2014-09-11 Thread Raj Hadoop
Thank you all for your suggestions. This group is the best.

I am working with the different options you guys suggested.

One big question I have is -

I am good at writing Oracle SQL queries. But the syntax with Hive is different. 
Especially - wiritng multiple SELECT statements in a single Hive Query has 
become a challenge. Can the group suggest any good tutorial that explains the 
basics of Syntax to develop complex queries in Hive.

Regards,
Rajendra





On Thursday, September 11, 2014 2:48 AM, vivek thakre vivek.tha...@gmail.com 
wrote:
 


Considering that the records only differ by one column i.e if the first two 
columns are are unique (distinct), then you simply use group by with max as 
aggregation function to eliminate duplicates i,e 

select cno, sqno, max (date) 
from table 
group by cno, sqno

If the above assumption is not true i.e if cno and sqno are not unique and for 
a particular cno, you want to get sqno with latest date, then you can do inner 
join with max select query something like

select a.cno, a.sqno, a.date
from table a 
join (select cno, max(date)  as max_date from table group by cno) b
on a.cno=b.cno
and a.date = b.max_date



On Wed, Sep 10, 2014 at 3:39 PM, Nishant Kelkar nishant@gmail.com wrote:

Try something like this then:


SELECT A.cno, A.sqno, A.sorted_dates[A.size-1] AS latest_date
FROM 
(
SELECT cno, sqno,
SORT_ARRAY(COLLECT_SET(date)) AS sorted_dates, SIZE(COLLECT_SET(date)) AS size 
FROM table GROUP BY cno, sqno
) A;



There are better ways of doing this, but this one's quick and dirty :)


Best Regards,
Nishant Kelkar


On Wed, Sep 10, 2014 at 12:48 PM, Raj Hadoop hadoop...@yahoo.com wrote:

sort_array returns in ascending order. so the first element cannot be the 
largest date. the last element is the largest date.




On Wednesday, September 10, 2014 3:38 PM, Nishant Kelkar 
nishant@gmail.com wrote:
 


Hi Raj,


You'll have to change the format of your date to something like -MM-DD. 
For example, for 2-oct-2013 it will be 2013-10-02.


Best Regards,
Nishant Kelkar





On Wed, Sep 10, 2014 at 11:48 AM, Raj Hadoop hadoop...@yahoo.com wrote:

The

SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date

is returning the lowest date. I need the largest date.




On Wed, 9/10/14, Raj Hadoop hadoop...@yahoo.com wrote:

 Subject: Re: Remove duplicate records in Hive
 To: user@hive.apache.org
 Date: Wednesday, September 10, 2014, 2:41 PM


 Thanks. I will try it.
 
 On Wed, 9/10/14, Nishant Kelkar nishant@gmail.com
 wrote:

  Subject: Re: Remove
 duplicate records in Hive
  To: user@hive.apache.org,
 hadoop...@yahoo.com
  Date: Wednesday, September 10, 2014, 1:59
 PM

  Hi

 Raj, 
  You can do something
  along these lines: 

  SELECT
  cno, sqno,
 SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date
  FROM table GROUP BY cno, sqno;
  However, you have to make sure your
  date format is such that sorting it gives you
 the most
  recent date. The best way to do
 that is to have it in
  format:
 -MM-DD.
  Hope this helps.
  Best Regards,Nishant

 Kelkar
  On Wed, Sep 10, 2014 at
  10:04 AM, Raj Hadoop hadoop...@yahoo.com
  wrote:


  Hi,



  I have a requirement in Hive
 to remove duplicate records (
  they differ
 only by one column i.e a date column) and keep
  the latest date record.



  Sample
 :

  Hive Table :

   d2 is a higher

  cno,sqno,date



  100 1 1-oct-2013

  101 2 1-oct-2013

  100 1 2-oct-2013

  102 2 2-oct-2013





  Output needed:



  100 1 2-oct-2013

  101 2 1-oct-2013

  102 2 2-oct-2013



  I am using
 Hive 0.11



  Any suggestions please ?



  Regards,


 Raj









Remove duplicate records in Hive

2014-09-10 Thread Raj Hadoop

Hi,

I have a requirement in Hive to remove duplicate records ( they differ only by 
one column i.e a date column) and keep the latest date record.

Sample :
Hive Table :
 d2 is a higher 
cno,sqno,date

100 1 1-oct-2013
101 2 1-oct-2013
100 1 2-oct-2013
102 2 2-oct-2013


Output needed:

100 1 2-oct-2013
101 2 1-oct-2013
102 2 2-oct-2013

I am using Hive 0.11

Any suggestions please ?

Regards,
Raj


Re: Remove duplicate records in Hive

2014-09-10 Thread Raj Hadoop
Thanks. I will try it.

On Wed, 9/10/14, Nishant Kelkar nishant@gmail.com wrote:

 Subject: Re: Remove duplicate records in Hive
 To: user@hive.apache.org, hadoop...@yahoo.com
 Date: Wednesday, September 10, 2014, 1:59 PM
 
 Hi
 Raj, 
 You can do something
 along these lines: 
 
 SELECT
 cno, sqno, SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date
 FROM table GROUP BY cno, sqno;
 However, you have to make sure your
 date format is such that sorting it gives you the most
 recent date. The best way to do that is to have it in
 format: -MM-DD.
 Hope this helps.
 Best Regards,Nishant
 Kelkar
 On Wed, Sep 10, 2014 at
 10:04 AM, Raj Hadoop hadoop...@yahoo.com
 wrote:
 
 
 Hi,
 
 
 
 I have a requirement in Hive to remove duplicate records (
 they differ only by one column i.e a date column) and keep
 the latest date record.
 
 
 
 Sample :
 
 Hive Table :
 
  d2 is a higher
 
 cno,sqno,date
 
 
 
 100 1 1-oct-2013
 
 101 2 1-oct-2013
 
 100 1 2-oct-2013
 
 102 2 2-oct-2013
 
 
 
 
 
 Output needed:
 
 
 
 100 1 2-oct-2013
 
 101 2 1-oct-2013
 
 102 2 2-oct-2013
 
 
 
 I am using Hive 0.11
 
 
 
 Any suggestions please ?
 
 
 
 Regards,
 
 Raj
 
 



Re: Remove duplicate records in Hive

2014-09-10 Thread Raj Hadoop
The

SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date

is returning the lowest date. I need the largest date.




On Wed, 9/10/14, Raj Hadoop hadoop...@yahoo.com wrote:

 Subject: Re: Remove duplicate records in Hive
 To: user@hive.apache.org
 Date: Wednesday, September 10, 2014, 2:41 PM
 
 Thanks. I will try it.
 
 On Wed, 9/10/14, Nishant Kelkar nishant@gmail.com
 wrote:
 
  Subject: Re: Remove
 duplicate records in Hive
  To: user@hive.apache.org,
 hadoop...@yahoo.com
  Date: Wednesday, September 10, 2014, 1:59
 PM
  
  Hi
 
 Raj, 
  You can do something
  along these lines: 
  
  SELECT
  cno, sqno,
 SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date
  FROM table GROUP BY cno, sqno;
  However, you have to make sure your
  date format is such that sorting it gives you
 the most
  recent date. The best way to do
 that is to have it in
  format:
 -MM-DD.
  Hope this helps.
  Best Regards,Nishant
 
 Kelkar
  On Wed, Sep 10, 2014 at
  10:04 AM, Raj Hadoop hadoop...@yahoo.com
  wrote:
  
  
  Hi,
  
  
  
  I have a requirement in Hive
 to remove duplicate records (
  they differ
 only by one column i.e a date column) and keep
  the latest date record.
  
  
  
  Sample
 :
  
  Hive Table :
  
   d2 is a higher
  
  cno,sqno,date
  
  
  
  100 1 1-oct-2013
  
  101 2 1-oct-2013
  
  100 1 2-oct-2013
  
  102 2 2-oct-2013
  
  
  
  
  
  Output needed:
  
  
  
  100 1 2-oct-2013
  
  101 2 1-oct-2013
  
  102 2 2-oct-2013
  
  
  
  I am using
 Hive 0.11
  
  
  
  Any suggestions please ?
  
  
  
  Regards,
  
 
 Raj
  
  



Re: Remove duplicate records in Hive

2014-09-10 Thread Raj Hadoop
sort_array returns in ascending order. so the first element cannot be the 
largest date. the last element is the largest date.



On Wednesday, September 10, 2014 3:38 PM, Nishant Kelkar 
nishant@gmail.com wrote:
 


Hi Raj,

You'll have to change the format of your date to something like -MM-DD. For 
example, for 2-oct-2013 it will be 2013-10-02.

Best Regards,
Nishant Kelkar





On Wed, Sep 10, 2014 at 11:48 AM, Raj Hadoop hadoop...@yahoo.com wrote:

The

SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date

is returning the lowest date. I need the largest date.




On Wed, 9/10/14, Raj Hadoop hadoop...@yahoo.com wrote:

 Subject: Re: Remove duplicate records in Hive
 To: user@hive.apache.org
 Date: Wednesday, September 10, 2014, 2:41 PM


 Thanks. I will try it.
 
 On Wed, 9/10/14, Nishant Kelkar nishant@gmail.com
 wrote:

  Subject: Re: Remove
 duplicate records in Hive
  To: user@hive.apache.org,
 hadoop...@yahoo.com
  Date: Wednesday, September 10, 2014, 1:59
 PM

  Hi

 Raj, 
  You can do something
  along these lines: 

  SELECT
  cno, sqno,
 SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date
  FROM table GROUP BY cno, sqno;
  However, you have to make sure your
  date format is such that sorting it gives you
 the most
  recent date. The best way to do
 that is to have it in
  format:
 -MM-DD.
  Hope this helps.
  Best Regards,Nishant

 Kelkar
  On Wed, Sep 10, 2014 at
  10:04 AM, Raj Hadoop hadoop...@yahoo.com
  wrote:


  Hi,



  I have a requirement in Hive
 to remove duplicate records (
  they differ
 only by one column i.e a date column) and keep
  the latest date record.



  Sample
 :

  Hive Table :

   d2 is a higher

  cno,sqno,date



  100 1 1-oct-2013

  101 2 1-oct-2013

  100 1 2-oct-2013

  102 2 2-oct-2013





  Output needed:



  100 1 2-oct-2013

  101 2 1-oct-2013

  102 2 2-oct-2013



  I am using
 Hive 0.11



  Any suggestions please ?



  Regards,


 Raj





Can I update just one row in Hive table using Hive INSERT OVERWRITE

2014-04-04 Thread Raj Hadoop

Can I update ( delete and insert kind of)just one row keeping the remaining 
rows intact in Hive table using Hive INSERT OVERWRITE. There is no partition in 
the Hive table.



INSERT OVERWRITE TABLE tablename SELECT col1,col2,col3 from tabx where 
col2='abc';

Does the above work ? Please advise.

HiveThrift Service Issue

2014-03-20 Thread Raj Hadoop
Hello everyone,

The  HiveThrift Service was started succesfully.


netstat -nl | grep 1


tcp    0  0 0.0.0.0:1   0.0.0.0:*   
LISTEN 



I am able to read tables from Hive through Tableau. When executing queries 
through Tableau I am getting the following error -

Query returned non-zero code: 1, cause: FAILED: Execution Error, return code 1 
from org.apache.hadoop.hive.ql.exec.MapRedTask

Can any one suggest what the problem is ?

Regards,
Raj


Re: HiveThrift Service Issue

2014-03-20 Thread Raj Hadoop
Hi Szehon,

It is not showing on the http://xyzserver:50030/jobtracker.jsp.


I checked this log. and this shows as -


/tmp/root/hive.log
 
 
 exec.ExecDriver (ExecDriver.java:addInputPaths(853)) - Processing
alias table_emp
exec.ExecDriver (ExecDriver.java:addInputPaths(871)) - Adding input
file hdfs://xyzserver:8020/user/hive/warehouse/table_emp 
2014-03-20 11:57:26,352
INFO  exec.ExecDriver (ExecDriver.java:createTmpDirs(221)) - Making Temp
Directory: 

hdfs://xyzserver:8020:8020/tmp/hive-root/hi
ve_2014-03-20_11-57-25_822_1668300320164798948-3/-ext-10001
2014-03-20 11:57:26,377 ERROR
security.UserGroupInformation (UserGroupInformation.java:doAs(1411)) -
PriviledgedActionException as:root (auth:SIMPLE) 

cause:org.apache.hadoop.sec





On Thursday, March 20, 2014 3:53 PM, Szehon Ho sze...@cloudera.com wrote:
 
Hi Raj,

There are map-reduce job logs generated if the MapRedTask fails, those might 
give some clue.

Thanks,
Szehon




On Thu, Mar 20, 2014 at 12:29 PM, Raj Hadoop hadoop...@yahoo.com wrote:

I am struggling on this one. Can any one throw some pointers on how to 
troubelshoot this issue please?




On Thursday, March 20, 2014 3:09 PM, Raj Hadoop hadoop...@yahoo.com wrote:
 
Hello everyone,


The  HiveThrift Service was started succesfully.



netstat -nl | grep 1




tcp    0  0 0.0.0.0:1   0.0.0.0:*   
LISTEN 





I am able to read tables from Hive through Tableau. When executing queries 
through Tableau I am getting the following error -


Query returned non-zero code: 1, cause: FAILED: Execution Error, return code 1 
from org.apache.hadoop.hive.ql.exec.MapRedTask


Can any one suggest what the problem is ?


Regards,
Raj




Re: Hive append

2014-03-06 Thread Raj hadoop
Hi Nitin,

existing records should remain same and the new records should get inserted
into the table


On Thu, Mar 6, 2014 at 2:11 PM, Nitin Pawar nitinpawar...@gmail.com wrote:

 are you talking about adding new records to tables or updating records in
 already existing table?


 On Thu, Mar 6, 2014 at 1:59 PM, Raj hadoop raj.had...@gmail.com wrote:

 Query in HIVE



 I tried merge kind of operation in Hive to retain the existing records
 and append the new records instead of dropping the table and populating it
 again.



 If anyone can come help with any other approach other than this or the
 approach to perform merge operation



 will be great help




 --
 Nitin Pawar



Re: Hive append

2014-03-06 Thread Raj hadoop
Hi Nitin,

existing records should remain same and the new records should get inserted
into the table


On Thu, Mar 6, 2014 at 2:11 PM, Nitin Pawar nitinpawar...@gmail.com wrote:

 are you talking about adding new records to tables or updating records in
 already existing table?


 On Thu, Mar 6, 2014 at 1:59 PM, Raj hadoop raj.had...@gmail.com wrote:

 Query in HIVE



 I tried merge kind of operation in Hive to retain the existing records
 and append the new records instead of dropping the table and populating it
 again.



 If anyone can come help with any other approach other than this or the
 approach to perform merge operation



 will be great help




 --
 Nitin Pawar



Merge records in hive

2014-03-05 Thread Raj hadoop
Hi,



Help required to merge data in hive,



Ex:

Today file

-

Empno  ename

1  abc

2  def

3  ghi



Tomorrow file

-

Empno  ename

5  abcd

6  defg

7  ghij





Reg: should not drop the hive table and then create it,what I actually
require is as shown in the example we have to merge the data,



Thanks,

Raj


Sqoop import to HDFS and then Hive table - Issue with data type

2014-03-04 Thread Raj Hadoop
All,

I loaded data from an Oracle query through Sqoop to HDFS file. This is bzip 
compressed files partitioned by one column date.

I created a Hive table to point to the above location.

After loading lot of data , I realized the data type of one of the column was 
wrongly given.

When I changed the data type of column using ALTER to the new type, it is still 
showing NULL values.

How should I resolve this?

Do I need to recreate the table? If so, I have loaded lot of data and I should 
not loose the data. This is an External table.

Please advise.



Regards,
Raj


Connection refused error - Getting repeatedly

2014-02-26 Thread Raj Hadoop
All, 

I have a 3 node hadoop cluster CDH 4.4 and every few days or when ever I load 
some data through sqoop or query through hive , sometimes I get the following 
error -


Call From server 1 to server 2 failed on connection exception: 
java.net.ConnectException: Connection refused

This has become so frequent. What can be the reasons and how should I 
troubleshoot this? Is it the hardware or network can be the most common 
problem/issue with this kind of error. Please advise.

Regards,
Raj


Can a hive partition contain a string like 'tr_date=2014-01-01'

2014-02-25 Thread Raj Hadoop
I am trying to create a Hive partition like 'tr_date=2014-01-01'

FAILED: ParseException line 1:58 mismatched input '-' expecting ) near '2014' 
in add partition statement

hive_ret_val:  64
Errors while executing Hive for bksd table for 2014-01-01

Are hyphen's not allowed in the partition directory. ?

Please advise.

Regards.
Raj


Re: Can a hive partition contain a string like 'tr_date=2014-01-01'

2014-02-25 Thread Raj Hadoop
Thanks. Will try it.





On Tuesday, February 25, 2014 8:23 PM, Kuldeep Dhole kuldeepr...@gmail.com 
wrote:
 
Probably you should use tr_date='2014-01-01'
Considering tr_date partition is there

On Tuesday, February 25, 2014, Raj Hadoop hadoop...@yahoo.com wrote:

I am trying to create a Hive partition like 'tr_date=2014-01-01'


FAILED: ParseException line 1:58 mismatched input '-' expecting ) near '2014' 
in add partition statement

hive_ret_val:  64
Errors while executing Hive for bksd table for 2014-01-01


Are hyphen's not allowed in the partition directory. ?


Please advise.


Regards.
Raj


part-m-00000 files and their size - Hive table

2014-02-25 Thread Raj Hadoop
Hi,

I am loading data to HDFS files through sqoop and creating a Hive table to 
point to these files.

The mapper files through sqoop example are generated like this below.

part-m-0

 part-m-1

part-m-2

My question is -
1) For Hive query performance , how important or significant is the 
distribution of the file sizes above.

part_m_0 say 1 GB
part_m_1 say 3 GB
part_m_1 say 0.25 GB

Vs

part_m_0 say 1.4 GB
part_m_1 say 1.4 GB
part_m_1 say  1.45 B


NOTE : The size and no of files is just for sample. The real numbers are far 
bigger.


I am assuming the uniform distribution has a performance benefit .

If so, what is the reason and can I know the technical details. 


Re: part-m-00000 files and their size - Hive table

2014-02-25 Thread Raj Hadoop
Thanks for the detailed explanation Yong. It helps.

Regards,
Raj





On Tuesday, February 25, 2014 9:18 PM, java8964 java8...@hotmail.com wrote:
 
Yes, it is good that the file sizes are evenly close, but not very important, 
unless there are files very small (compared to the block size).

The reasons are:

Your files should be splitable to be used in Hadoop (Or in Hive, it is the same 
thing). If they are splitable, then 1G file will use 10 blocks (assume the 
block size is 128M), and 256M file will take 2 blocks. So these 2 files will 
generate 12 mapper tasks, and will be equally run in your cluster. From 
performance point of view, you have 12 mapper tasks, and they are equally 
processed in the cluster. So one 1G file plus one 256M file are not big deal. 
But if you have one file are very small, like 10M, that one file will also 
consume one mapper task, and that is kind of bad for performance, as hadoop 
starting one mapper task only consuming 10M data, which is bad, because 
starting/stop tasks is using quite some resource, but only processing 10M data.

The reason you see unevenly file size of the output of sqoop is that it is hard 
for sqoop to split your source data evenly. For example, if you dump table A 
from DB to hive, sqoop will do the following:

1) Identify the primary/unique keys of the table.
2) Find out the min/max value of the keys, let say they are (1 to 1,000,000)
3) Based on # of your mapper task, split them. If you run sqoop with 4 mappers, 
then the data will be split into 4 groups (1, 250,000) (250,001, 500,000) 
(500,001, 750,000) (750,001, 1,000,000). As you can image, your data most 
likely are not even distributed by the primary keys in that 4 groups, then you 
will get unevenly output as part-m-xxx files.

Keep in mind that it is not required to use primary keys or unique keys as the 
split column. So you can choose whateven column in your table make sense. Pick 
up whateven can make the split more even.

Yong




Date: Tue, 25 Feb 2014 17:42:20 -0800
From: hadoop...@yahoo.com
Subject: part-m-0 files and their size - Hive table
To: user@hive.apache.org


Hi,

I am loading data to HDFS files through sqoop and creating a Hive table to 
point to these files.

The mapper files through sqoop example are generated like this below.

part-m-0

 part-m-1

part-m-2

My question is -
1) For Hive query performance , how important or significant is the 
distribution of the file sizes above.

part_m_0 say 1 GB
part_m_1 say 3 GB
part_m_1 say 0.25 GB

Vs

part_m_0 say 1.4 GB
part_m_1 say 1.4 GB
part_m_1 say  1.45 B


NOTE : The size and no of files is just for sample. The real numbers are far 
bigger.


I am assuming the uniform distribution has a performance benefit .

If so, what is the reason and can I know the technical details. 

Finding Hive and Hadoop version from command line

2014-02-09 Thread Raj Hadoop
All,

Is there any way from the command prompt I can find which hive version I am 
using and Hadoop version too?


Thanks in advance.

Regards,
Raj

Add few record(s) to a Hive table or a HDFS file on a daily basis

2014-02-09 Thread Raj Hadoop



Hi,

My requirement is a typical Datawarehouse and ETL requirement. I need to 
accomplish

1) Daily Insert transaction records to a Hive table or a HDFS file. This table 
or file is not a big table ( approximately 10 records per day). I don't want to 
Partition the table / file.


I am reading a few articles on this. It was being mentioned that we need to 
load to a staging table in Hive. And then insert like the below :

insertoverwrite tablefinaltable select*fromstaging;


I am not getting this logic. How should I populate the staging table daily.

Thanks,
Raj

How can I just find out the physical location of a partitioned table in Hive

2014-02-06 Thread Raj Hadoop
Hi,

How can I just find out the physical location of a partitioned table in Hive.


Show partitions tab name

gives me just the partition column info.

I want the location of the hdfs directory / files where the table is created.

Please advise.

Thanks,
Raj

Hive Query Error

2014-02-05 Thread Raj Hadoop
I am trying to create a Hive sequence file from another table by running the 
following -

Your query has the following error(s):
OK
FAILED: ParseException line 5:0 cannot recognize input near 'STORED' 'STORED' 
'AS' in constant click the Error Log tab above for details 
1
CREATE TABLE temp_xyz as
2
SELECT prop1,prop2,prop3,prop4,prop5
3
FROM hitdata
4
WHERE dateoflog=20130101 and prop1='785-ou'
5
STORED AS SEQUENCEFILE;
ok
failed: parseexception line 5:0 cannot recognize input near 'stored' 'stored' 
'as' in constant 

Re: GenericUDF Testing in Hive

2014-02-04 Thread Raj Hadoop
How to test a Hive GenericUDF which accepts two parameters ListT, T 

ListT - Can it be the output of a collect set. Please advise.

I have a generic udf which takes ListT, T. I want to test it how it works 
through Hive. 





On Monday, January 20, 2014 5:19 PM, Raj Hadoop hadoop...@yahoo.com wrote:
 
 
The following is a an example for a GenericUDF. I wanted to test this through a 
Hive query. Basically want to pass parameters some thing like select 
ComplexUDFExample('a','b','c') from employees limit 10.


 
 
https://github.com/rathboma/hive-extension-examples/blob/master/src/main/java/com/matthewrathbone/example/ComplexUDFExample.java
 
 
 
class ComplexUDFExample extends GenericUDF {
  ListObjectInspector listOI;
  StringObjectInspector elementOI;
  @Override
  public String getDisplayString(String[] arg0) {
    return arrayContainsExample(); // this should probably be better
  }
  @Override
  public ObjectInspector initialize(ObjectInspector[] arguments) throws 
UDFArgumentException {
    if (arguments.length != 2) {
  throw new UDFArgumentLengthException(arrayContainsExample only takes 2 
arguments: ListT, T);
    }
    // 1. Check we received the right object types.
    ObjectInspector a = arguments[0];
    ObjectInspector b = arguments[1];
    if (!(a instanceof ListObjectInspector) || !(b instanceof 
StringObjectInspector)) {
  throw new UDFArgumentException(first argument must be a list / array, 
second argument must be a
 string);
    }
    this.listOI = (ListObjectInspector) a;
    this.elementOI = (StringObjectInspector) b;
    
    // 2. Check that the list contains strings
    if(!(listOI.getListElementObjectInspector() instanceof 
StringObjectInspector)) {
  throw new UDFArgumentException(first argument must be a list of 
strings);
    }
    
    // the return type of our function is a boolean, so we provide the correct 
object inspector
    return PrimitiveObjectInspectorFactory.javaBooleanObjectInspector;
  }
  
  @Override
  public Object evaluate(DeferredObject[] arguments) throws HiveException {
    
    // get the list and string from the deferred objects using the object
 inspectors
    ListString list = (ListString) this.listOI.getList(arguments[0].get());
    String arg = elementOI.getPrimitiveJavaObject(arguments[1].get());
    
    // check for nulls
    if (list == null || arg == null) {
  return null;
    }
    
    // see if our list contains the value we need
    for(String s: list) {
  if (arg.equals(s)) return new Boolean(true);
    }
    return new Boolean(false);
  }
  
}
 
 
hive select ComplexUDFExample('a','b','c') from email_list_1 limit 10;
FAILED: SemanticException [Error 10015]: Line 1:7 Arguments length mismatch 
''c'': arrayContainsExample only takes 2 arguments: ListT, T
 
--
 
How to test this example in Hive query. I know I am invoking it wrong. But how 
can I invoke it correctly.
 
My requirement is to pass a String of arrays as first argument and another 
string as second argument in Hive like below.
 
 
Select col1, ComplexUDFExample( collectset(col2) , 'xyz')
from 
Employees
Group By col1;
 
How do i do that?
 
Thanks in advance.
 
Regards,
Raj

Re: GenericUDF Testing in Hive

2014-02-04 Thread Raj Hadoop

I want to do a simple test like this - but not working -

select ComplexUDFExample(List(a, b, c), b) from table1 limit 10;


FAILED: SemanticException [Error 10011]: Line 1:25 Invalid function 'List'






On Tuesday, February 4, 2014 2:34 PM, Raj Hadoop hadoop...@yahoo.com wrote:
 
How to test a Hive GenericUDF which accepts two parameters ListT, T 

ListT - Can it be the output of a collect set. Please advise.

I have a generic udf which takes ListT, T. I want to test it how it works 
through Hive. 





On Monday, January 20, 2014 5:19 PM, Raj Hadoop hadoop...@yahoo.com wrote:
 
 
The following is a an example for a GenericUDF. I wanted to test this through a 
Hive query. Basically want to pass parameters some thing like select 
ComplexUDFExample('a','b','c') from employees limit 10.


 
 
https://github.com/rathboma/hive-extension-examples/blob/master/src/main/java/com/matthewrathbone/example/ComplexUDFExample.java
 
 
 
class ComplexUDFExample extends GenericUDF {
  ListObjectInspector listOI;
  StringObjectInspector elementOI;
  @Override
  public String getDisplayString(String[] arg0) {
    return arrayContainsExample(); // this should probably be better
  }
  @Override
  public ObjectInspector initialize(ObjectInspector[] arguments) throws 
UDFArgumentException {
    if (arguments.length != 2) {
  throw new UDFArgumentLengthException(arrayContainsExample only takes 2 
arguments: ListT, T);
    }
    // 1. Check we received the right object types.
    ObjectInspector a = arguments[0];
    ObjectInspector b = arguments[1];
    if (!(a instanceof ListObjectInspector) || !(b instanceof 
StringObjectInspector)) {
  throw new UDFArgumentException(first argument must be a list / array, 
second argument must be a
 string);
    }
    this.listOI = (ListObjectInspector) a;
    this.elementOI = (StringObjectInspector) b;
    
    // 2. Check that the list contains strings
    if(!(listOI.getListElementObjectInspector() instanceof 
StringObjectInspector)) {
  throw new UDFArgumentException(first argument must be a list of 
strings);
    }
    
    // the return type of our function is a boolean, so we provide the correct 
object inspector
    return PrimitiveObjectInspectorFactory.javaBooleanObjectInspector;
  }
  
  @Override
  public
 Object evaluate(DeferredObject[] arguments) throws HiveException {
    
    // get the list and string from the deferred objects using the object
 inspectors
    ListString list = (ListString) this.listOI.getList(arguments[0].get());
    String arg = elementOI.getPrimitiveJavaObject(arguments[1].get());
    
    // check for nulls
    if (list == null || arg == null) {
  return null;
    }
    
    // see if our list contains the value we need
    for(String s: list) {
  if (arg.equals(s)) return new Boolean(true);
    }
    return new Boolean(false);
  }
  
}
 
 
hive select ComplexUDFExample('a','b','c') from email_list_1 limit 10;
FAILED: SemanticException [Error 10015]: Line 1:7 Arguments length mismatch 
''c'': arrayContainsExample only takes 2 arguments: ListT, T
 
--
 
How to test this example in Hive query. I know I am invoking it wrong. But how 
can I invoke it correctly.
 
My requirement is to pass a String of arrays as first argument and another 
string as second argument in Hive like below.
 
 
Select col1, ComplexUDFExample( collectset(col2) , 'xyz')
from 
Employees
Group By col1;
 
How do i do that?
 
Thanks in advance.
 
Regards,
Raj

Find a date that is in the range of any array dates in Hive

2014-01-31 Thread Raj Hadoop
Hi,


I have the following requirement from a Hive table below.

CustNumActivityDatesRates
10010-Aug-13,12-Aug-13,20-Aug-1310,15,20

The data above says that

From 10 Aug to 11 Aug the rate is 10.
From 12 Aug to 19 Aug the rate is 15.

From 20-Aug to till date the rate is 20.

Note : The order is maintained in 'ActivityDates' and 'Rates'.

From the above table , I need to find the rate on say a given date 15-Aug-13. 
In the above case , the rate for 15-Aug-13 is 15.

How should I get this result in Hive.

I was reading about a Generic UDF and was thinking to write one like this.
The Generic UDF takes two inputs (input date , array of input dates ) . the 
output should be (an int )to return the element number in the array. 

In the above case 
Generic UDF(15-Aug-13,10-Aug-13,12-Aug-13,20-Aug-13) should return the 2nd 
element in array - 2.



Please advise if there is an alternative solution or if the above solution 
works. I have never written a UDF or Generic UDF and would need some help from 
the forum members. Please advise.


Regards,
Raj

delete duplicate records in Hive table

2014-01-30 Thread Raj hadoop
Hi,

Can someone help me how to delete duplicate records in Hive table,

I know that delete and update are not supported by hive but still,

if some know's some alternative can help me in this

Thanks,
Raj.


Re: delete duplicate records in Hive table

2014-01-30 Thread Raj hadoop
Hi Nitin,

Thanks a ton for quick response,

Could you please share if any sql syntax for this

Thanks,
Raj.


On Thu, Jan 30, 2014 at 3:29 PM, Nitin Pawar nitinpawar...@gmail.comwrote:

 easiest way to do is .. write it in a temp table and then select uniq of
 each column and writing to real table


 On Thu, Jan 30, 2014 at 3:19 PM, Raj hadoop raj.had...@gmail.com wrote:

 Hi,

 Can someone help me how to delete duplicate records in Hive table,

 I know that delete and update are not supported by hive but still,

 if some know's some alternative can help me in this

 Thanks,
 Raj.




 --
 Nitin Pawar



Basic UDF in Hive - How to setup

2014-01-17 Thread Raj Hadoop
Hi,

I am trying to compile a basic hive UDF java file. I am using all the jar files 
in my classpath but I am not able to compile it and getting the following 
error. I am using CDH4. Can any one advise please?

$ javac HelloWorld.java
HelloWorld.java:3: package org.apache.hadoop.hive.ql.exec does not exist
import org.apache.hadoop.hive.ql.exec.Description;
 ^
HelloWorld.java:4: package org.apache.hadoop.hive.ql.exec does not exist
import org.apache.hadoop.hive.ql.exec.UDF;
 ^
HelloWorld.java:5: package org.apache.hadoop.hive.ql.udf does not exist
import org.apache.hadoop.hive.ql.udf.UDFType;
    ^
HelloWorld.java:8: cannot find symbol
symbol: class UDF
public class HelloWorld extends UDF
    ^
4 errors
$ echo $CLASSPATH
/usr/lib/hive/lib/hive-beeline-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-builtins-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-cli-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-common-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-contrib-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-exec-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-hbase-handler-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-hwi-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-jdbc-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-metastore-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-pdk-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-serde-0.10.0-cdh4.4.0.jar::/usr/lib/hive/lib/hive-service-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-shims-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/parquet-hive-1.0.jar:/usr/lib/hive/lib/sentry-binding-hive-1.1.0-cdh4.4.0.jar:/usr/lib/hadoop/hadoop-annotations-2.0.0-cdh4.4.0.jar:/usr/lib/hadoop/hadoop-annotations.jar:/usr/lib/hadoop/hadoop-auth-2.0.0-cdh4.4.0.jar:/usr/lib/hadoop/hadoop-auth.ja
r:/usr/lib/hadoop/hadoop-common-2.0.0-cdh4




Thanks,
Raj

Re: Basic UDF in Hive - How to setup

2014-01-17 Thread Raj Hadoop
Ok. I just figured out. I have to set classpath with EXPORT. Its working now.





On Friday, January 17, 2014 3:37 PM, Raj Hadoop hadoop...@yahoo.com wrote:
 
Hi,

I am trying to compile a basic hive UDF java file. I am using all the jar files 
in my classpath but I am not able to compile it and getting the following 
error. I am using CDH4. Can any one advise please?

$ javac HelloWorld.java
HelloWorld.java:3: package org.apache.hadoop.hive.ql.exec does not exist
import org.apache.hadoop.hive.ql.exec.Description;
 ^
HelloWorld.java:4: package org.apache.hadoop.hive.ql.exec does not exist
import
 org.apache.hadoop.hive.ql.exec.UDF;
 ^
HelloWorld.java:5: package org.apache.hadoop.hive.ql.udf does not exist
import org.apache.hadoop.hive.ql.udf.UDFType;
    ^
HelloWorld.java:8: cannot find symbol
symbol: class UDF
public class HelloWorld extends UDF
    ^
4 errors
$ echo
 $CLASSPATH
/usr/lib/hive/lib/hive-beeline-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-builtins-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-cli-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-common-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-contrib-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-exec-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-hbase-handler-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-hwi-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-jdbc-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-metastore-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-pdk-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-serde-0.10.0-cdh4.4.0.jar::/usr/lib/hive/lib/hive-service-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-shims-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/parquet-hive-1.0.jar:/usr/lib/hive/lib/sentry-binding-hive-1.1.0-cdh4.4.0.jar:/usr/lib/hadoop/hadoop-annotations-2.0.0-cdh4.4.0.jar:/usr/lib/hadoop/hadoop-annotations.jar:/usr/lib/hadoop/hadoop-auth-2.0.0-cdh4.4.0.jar:/usr/lib/hadoop
/hadoop-auth.jar:/usr/lib/hadoop/hadoop-common-2.0.0-cdh4




Thanks,
Raj

Re: JSON data to HIVE table

2014-01-07 Thread Raj Hadoop
 
All,
 
If I have to load JSON data to a Hive table (default record format while 
creating the table) - is that a requirement to convert each JSON record into 
one line.
 
How would I do this ?
 
 
Thanks,
Raj



From: Rok Kralj rok.kr...@gmail.com
To: user@hive.apache.org 
Sent: Tuesday, January 7, 2014 3:54 AM
Subject: Re: JSON data to HIVE table



Also, if you have large or dynamic schemas which are a pain to write by hand, 
you can use this simple tool: 

https://github.com/strelec/hive-serde-gen




2014/1/7 Roberto Congiu roberto.con...@openx.com

Also https://github.com/rcongiu/Hive-JSON-Serde ;)



On Mon, Jan 6, 2014 at 12:00 PM, Russell Jurney russell.jur...@gmail.com 
wrote:

Check these out: 


http://hortonworks.com/blog/discovering-hive-schema-in-collections-of-json-documents/

http://hortonworks.com/blog/howto-use-hive-to-sqlize-your-own-tweets-part-two-loading-hive-sql-queries/

https://github.com/kevinweil/elephant-bird




On Mon, Jan 6, 2014 at 9:36 AM, Raj Hadoop hadoop...@yahoo.com wrote:

Hi,

I am trying to load a data that is in JSON format to Hive table. Can any one 
suggest what is the method I need to follow?

Thanks,
Raj



-- 
Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com 



-- 
-- 
Good judgement comes with experience.
Experience comes with bad judgement.

--

Roberto Congiu - Data Engineer - OpenX
tel: +1 626 466 1141


-- 
eMail: rok.kr...@gmail.com 

JSON data to HIVE table

2014-01-06 Thread Raj Hadoop
Hi,
 
I am trying to load a data that is in JSON format to Hive table. Can any one 
suggest what is the method I need to follow?
 
Thanks,
Raj

Re: Dynamic columns in Hive Table - Best Design for the problem

2013-12-29 Thread Raj Hadoop
Matt,

Thanks for the suggestion. Can you please provide more details on what type of 
UDAF should I develop ? I have never worked on a UDAF earlier. But would like 
to explore it. Any tips on how to proceed.

Thanks,
Raj



On Saturday, December 28, 2013 2:47 PM, Matt Tucker matthewt...@gmail.com 
wrote:
 
It looks like you're essentially doing a pivot function. Your best bet is to 
write a custom UDAF or look at the windowing functions available in recent 
releases.
Matt
On Dec 28, 2013 12:57 PM, Raj Hadoop hadoop...@yahoo.com wrote:

Dear All Hive Group Members,


I have the following requirement.


Input:


Ticket#|Date of booking|Price
100|20-Oct-13|54

100|21-Oct-13|56
100|22-Oct-13|54
100|23-Oct-13|55
100|27-Oct-13|60
100|30-Oct-13|47


101|10-Sep-13|12
101|13-Sep-13|14
101|20-Oct-13|6




Expected Output:


Ticket#|Initial|Delta1|Delta2|Delta3|Delta4|Delta5
100|20-Oct-13,54|21-Oct-13,2|22-Oct-13,0|23-Oct-3,1|27-Oct-13,6|30-Oct-13,-7
101|10-Sep-13,12|13-Sep-13,2|20-Oct-13,-6|||


The number of columns in the expected output is a dynamic list depending on 
the number of price changes of a ticket.


1) What is the best design to solve the above problem in Hive? 
2) How do we implement it?


Please advise.


Regards,
Raj











Dynamic columns in Hive Table - Best Design for the problem

2013-12-28 Thread Raj Hadoop
Dear All Hive Group Members,

I have the following requirement.

Input:

Ticket#|Date of booking|Price
100|20-Oct-13|54

100|21-Oct-13|56
100|22-Oct-13|54
100|23-Oct-13|55
100|27-Oct-13|60
100|30-Oct-13|47

101|10-Sep-13|12
101|13-Sep-13|14
101|20-Oct-13|6


Expected Output:

Ticket#|Initial|Delta1|Delta2|Delta3|Delta4|Delta5
100|20-Oct-13,54|21-Oct-13,2|22-Oct-13,0|23-Oct-3,1|27-Oct-13,6|30-Oct-13,-7
101|10-Sep-13,12|13-Sep-13,2|20-Oct-13,-6|||

The number of columns in the expected output is a dynamic list depending on the 
number of price changes of a ticket.

1) What is the best design to solve the above problem in Hive? 
2) How do we implement it?

Please advise.

Regards,
Raj

how to find number of elements in an array in Hive

2013-12-02 Thread Raj Hadoop
hi,

how to find number of elements in an array in Hive table?

thanks,
Raj

Re: how to find number of elements in an array in Hive

2013-12-02 Thread Raj Hadoop


Thanks Brad



On Monday, December 2, 2013 5:09 PM, Brad Ruderman bruder...@radiumone.com 
wrote:
 
Check out

size

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF


Thanks,
Brad




On Mon, Dec 2, 2013 at 5:05 PM, Raj Hadoop hadoop...@yahoo.com wrote:

hi,


how to find number of elements in an array in Hive table?


thanks,
Raj




Compression for a HDFS text file - Hive External Partition Table

2013-11-13 Thread Raj Hadoop
Hi ,
  
1)  My requirement is to load a file ( a tar.gz file which has multiple tab 
separated values files and one file is the main file which has huge data – 
about 10 GB per day) to an externally partitioned hive table.
 
2)  What I am doing is I have automated the process by extracting the 
tar.gz file and get the main data file (10GB text file) and then loading to a 
hdfs file as text file.
 
3)  I want to compress the files. What is the procedure for it?
 
4)  Do I need to use any utility to compress the hit data file before 
loading to HDFS? And also should I define an Input Structure for HDFS File 
format through a Java Program?
 
Regards,
Raj

Re: Hive external table partitions with less than symbol ?

2013-11-04 Thread Raj Hadoop
How can i use concat function? I did not get it. Can you please elaborate. 

My requirement is to create a HDFS directory like 
(cust_id1000 and cust_id2000)


and map this to a Hive External table.

can i do that?



On Monday, November 4, 2013 3:34 AM, Matouk IFTISSEN 
matouk.iftis...@ysance.com wrote:
 
Hello
You can use concat function or case to do this like:
Concat ('/data1/customer/', id) 
.
Where id 1000 
Etc..
Hope this help you ;)
Le 3 nov. 2013 23:51, Raj Hadoop hadoop...@yahoo.com a écrit :

All,


I want to create partitions like the below and create a hive external table. 
How can i do that ?


/data1/customer/id1000
/data1/customer/id1000 and id  2000

/data1/customer/id 2000



Is this possible (  and  symbols in folders ?)


My requirement is to partition the hive table based on some customer id's.


Thanks,
Raj

Re: Hive external table partitions with less than symbol ?

2013-11-04 Thread Raj Hadoop
Hi -

I have this doubt.

Why do i need to use an INSERT INTO .

can I just create hdfs directories and map it to a hive external table setting 
the location of the hdfs directories.

will this work ? please advise.

Thanks,
Raj







On Monday, November 4, 2013 8:34 AM, Matouk IFTISSEN 
matouk.iftis...@ysance.com wrote:
 
Yes it is possible:
hadoop fs -mkdir /hdfs_path/'cust_id1000'

I tested it and works, then you can store data in this directory .

for concat function you do simple:

insert into your_table_partionned PARTITION (path_xxx)
select attr,id,  concat ('/data1/customer/', id) as path_xxx  from your_table
where  id 1000
..


Cdt.





2013/11/4 Raj Hadoop hadoop...@yahoo.com

How can i use concat function? I did not get it. Can you please elaborate. 


My requirement is to create a HDFS directory like 
(cust_id1000 and cust_id2000)



and map this to a Hive External table.


can i do that?



On Monday, November 4, 2013 3:34 AM, Matouk IFTISSEN 
matouk.iftis...@ysance.com wrote:
 
Hello
You can use concat function or case to do this like:
Concat ('/data1/customer/', id) 
.
Where id 1000 
Etc..
Hope this help you ;)
Le 3 nov. 2013 23:51, Raj Hadoop hadoop...@yahoo.com a écrit :

All,


I want to create partitions like the below and create a hive external table. 
How can i do that ?


/data1/customer/id1000
/data1/customer/id1000 and id  2000

/data1/customer/id 2000



Is this possible (  and  symbols in folders ?)


My requirement is to partition the hive table based on some customer id's.


Thanks,
Raj




-- 

Matouk IFTISSEN | Consultant BI  Big Data
 
24 rue du sentier - 75002 Paris - www.ysance.com
Mob : +33 6 78 51 18 69 || Fax : +33 1 73 72 97 26 
Ysance sur :Twitter | Facebook | Google+ | LinkedIn | Newsletter
Nos autres sites : ys4you | labdecisionnel | decrypt

Oracle to HDFS through Sqoop and a Hive External Table

2013-11-03 Thread Raj Hadoop
Hi,

I am sending this to the three dist-lists of Hadoop, Hive and Sqoop as this 
question is closely related to all the three areas.

I have this requirement.

I have a big table in Oracle (about 60 million rows - Primary Key Customer Id). 
I want to bring this to HDFS and then create
a Hive external table. My requirement is running queries on this Hive table (at 
this time i do not know what queries i would be running).

Is the following a good design for the above problem ? Any pros and cons of 
this.


1) Load the table to HDFS using Sqoop into multiple folders (divide Customer 
Id's into 100 segments).
2) Create Hive external partition table based on the above 100 HDFS directories.


Thanks,
Raj

Re: Oracle to HDFS through Sqoop and a Hive External Table

2013-11-03 Thread Raj Hadoop
Manish,

Thanks for reply.


1. Load to Hdfs, beware of Sqoop error handling, as its a mapreduce based 
framework, so if 1 mapper fails it might happen that you get partial data.
So do you say that - if I can handle errors in Sqoop, going for 100 HDFS 
folders/files - is it OK ?

2. Create partition based on date and hour, if customer table has some date or 
timestamp column.
I cannot rely on date or timestamp column. So can I go with Customer ID ?

3. Think about file format also, as that will affect the load and query time.
Can you please suggest a file format that I have to use ?

4. Think about compression as well before hand, as that will govern the data 
split, and performance of your queries as well.
Does compression increases or reduces performance ? Isn't the compression 
advantage is saving in storage? 

- Raj



On Sunday, November 3, 2013 11:03 AM, manish.hadoop.work 
manish.hadoop.w...@gmail.com wrote:
 
1. Load to Hdfs, beware of Sqoop error handling, as its a mapreduce based 
framework, so if 1 mapper fails it might happen that you get partial data.

2. Create partition based on date and hour, if customer table has some date or 
timestamp column.

3. Think about file format also, as that will affect the load and query time.

4. Think about compression as well before hand, as that will govern the data 
split, and performance of your queries as well.

Regards,
Manish



Sent from my T-Mobile 4G LTE Device


 Original message 
From: Raj Hadoop hadoop...@yahoo.com 
Date: 11/03/2013  7:39 AM  (GMT-08:00) 
To: Hive user@hive.apache.org,Sqoop u...@sqoop.apache.org,User 
u...@hadoop.apache.org 
Subject: Oracle to HDFS through Sqoop and a Hive External Table 



Hi,

I am sending this to the three dist-lists of Hadoop, Hive and Sqoop as this 
question is closely related to all the three areas.

I have this requirement.

I have a big table in Oracle (about 60 million rows - Primary Key Customer Id). 
I want to bring this to HDFS and then create
a Hive external table. My requirement is running queries on this Hive table (at 
this time i do not know what queries i would be running).

Is the following a good design for the above problem ? Any pros and cons of 
this.


1) Load the table to HDFS using Sqoop into multiple folders (divide Customer 
Id's into 100 segments).
2) Create Hive external partition table based on the above 100 HDFS directories.


Thanks,
Raj

External Partition Table

2013-10-31 Thread Raj Hadoop
Hi,

I am planning for a Hive External Partition Table based on a date.

Which one of the below yields a better performance or both have the same 
performance?

1) Partition based on one folder per day
LIKE date INT
2) Partition based on one folder per year / month / day ( So it has three 
folders) 
LIKE year INT, month INT, day INT

Thanks,
Raj


Re: External Partition Table

2013-10-31 Thread Raj Hadoop


Thanks Tim. I am using a String column for the partition column. 



On Thursday, October 31, 2013 6:49 PM, Timothy Potter thelabd...@gmail.com 
wrote:
 
Hi Raj,
This seems like a matter of style vs. any performance benefit / cost ... if 
you're going to do a lot of queries just based on month or year, then #2 might 
be easier, e.g.

select * from foo where year = 2013 seems a little cleaner than select * from 
foo where date = 20130101 and date = 20131231 (not sure how you're encoding 
dates into a INT but I think you get the idea)

I do something similar but my partition fields are strings, like 
2013-10-31_ (which has the nice property of lexically sorting the same as 
numeric sort).

I'm assuming they will both have the same performance because Hive is still 
selecting the same number of input paths in both scenarios, one just happens to 
be a little deeper.

Cheers,
Tim



On Thu, Oct 31, 2013 at 4:34 PM, Raj Hadoop hadoop...@yahoo.com wrote:

Hi,


I am planning for a Hive External Partition Table based on a date.


Which one of the below yields a better performance or both have the same 
performance?


1) Partition based on one folder per day
LIKE date INT
2) Partition based on one folder per year / month / day ( So it has three 
folders) 
LIKE year INT, month INT, day INT


Thanks,
Raj



Re: Hive Query Questions - is null in WHERE

2013-10-17 Thread Raj Hadoop
 
Thanks. It worked for me now when i use it as an empty string.



From: Krishnan K kkrishna...@gmail.com
To: user@hive.apache.org user@hive.apache.org; Raj Hadoop 
hadoop...@yahoo.com 
Sent: Thursday, October 17, 2013 11:11 AM
Subject: Re: Hive Query Questions - is null in WHERE



For string columns, null will be interpreted as an empty string and for others, 
it will be interpreted as null...

On Wednesday, October 16, 2013, Raj Hadoop wrote:

All,

When a query is executed like the below

select field1 from table1 where field1 is null;

I am getting the results which have empty values or nulls in field1. How does 
is null work in Hive queries.

Thanks,
Raj

Hive Query Questions - is null in WHERE

2013-10-16 Thread Raj Hadoop
All,
 
When a query is executed like the below
 
select field1 from table1 where field1 is null;
 
I am getting the results which have empty values or nulls in field1. How does 
is null work in Hive queries.
 
Thanks,
Raj

Re: How to load /t /n file to Hive

2013-10-07 Thread Raj Hadoop


Yes, I have it.

Thanks,
Raj


 From: Sonal Goyal sonalgoy...@gmail.com
To: user@hive.apache.org user@hive.apache.org; Raj Hadoop 
hadoop...@yahoo.com 
Sent: Monday, October 7, 2013 1:38 AM
Subject: Re: How to load /t /n file to Hive
 


Do you have the option to escape your tabs and newlines in your base file? 


Best Regards,
Sonal
Nube Technologies 







On Sat, Sep 21, 2013 at 12:34 AM, Raj Hadoop hadoop...@yahoo.com wrote:

Hi,
 
I have a file which is delimted by a tab. Also, there are some fields in the 
file which has a tab /t character and a new line /n character in some fields.
 
Is there any way to load this file using Hive load command? Or do i have to 
use a Custom Map Reduce (custom) Input format with java ? Please advise.
 
Thanks,
Raj

How to load /t /n file to Hive

2013-09-20 Thread Raj Hadoop
Hi,
 
I have a file which is delimted by a tab. Also, there are some fields in the 
file which has a tab /t character and a new line /n character in some fields.
 
Is there any way to load this file using Hive load command? Or do i have to use 
a Custom Map Reduce (custom) Input format with java ? Please advise.
 
Thanks,
Raj

Re: How to load /t /n file to Hive

2013-09-20 Thread Raj Hadoop
Please note that there is an escape chacter in the fields where the /t and /n 
are present.




From: Raj Hadoop hadoop...@yahoo.com
To: Hive user@hive.apache.org 
Sent: Friday, September 20, 2013 3:04 PM
Subject: How to load /t /n file to Hive



Hi,

I have a file which is delimted by a tab. Also, there are some fields in the 
file which has a tab /t character and a new line /n character in some fields.

Is there any way to load this file using Hive load command? Or do i have to use 
a Custom Map Reduce (custom) Input format with java ? Please advise.

Thanks,
Raj

Re: How to load /t /n file to Hive

2013-09-20 Thread Raj Hadoop
Hi Nitin,
 
Thanks for the reply. I have a huge file in unix.
 
As per the file definition, the file is a tab separated file of fields. But I 
am sure that within some field's I have some new line character. 
 
How should I find a record? It is a huge file. Is there some command?
 
Thanks,
 



From: Nitin Pawar nitinpawar...@gmail.com
To: user@hive.apache.org user@hive.apache.org; Raj Hadoop 
hadoop...@yahoo.com 
Sent: Friday, September 20, 2013 3:15 PM
Subject: Re: How to load /t /n file to Hive



If your data contains new line chars, its better you write a custom map reduce 
job and convert the data into a single line removing all unwanted chars in 
column separator as well just having single new line char per line 



On Sat, Sep 21, 2013 at 12:38 AM, Raj Hadoop hadoop...@yahoo.com wrote:

Please note that there is an escape chacter in the fields where the /t and /n 
are present.



From: Raj Hadoop hadoop...@yahoo.com
To: Hive user@hive.apache.org 
Sent: Friday, September 20, 2013 3:04 PM
Subject: How to load /t /n file to Hive



Hi,

I have a file which is delimted by a tab. Also, there are some fields in the 
file which has a tab /t character and a new line /n character in some fields.

Is there any way to load this file using Hive load command? Or do i have to 
use a Custom Map Reduce (custom) Input format with java ? Please advise.

Thanks,
Raj




-- 
Nitin Pawar

Re: How to load /t /n file to Hive

2013-09-20 Thread Raj Hadoop
Hi Gabo,

Are you suggesting to use java.net.URLEncoder ? Can you be more specific ? I 
have lot of fields in the file which are not only URL related but some text 
fields which has new line characters.

Thanks,
Raj



 From: Gabriel Eisbruch gabrieleisbr...@gmail.com
To: user@hive.apache.org user@hive.apache.org; Raj Hadoop 
hadoop...@yahoo.com 
Sent: Friday, September 20, 2013 4:43 PM
Subject: Re: How to load /t /n file to Hive
 


Hi 
 One way that we used to solve that problem it's to transform the data when you 
are creating/loading it, for example we've applied UrlEncode to each field on 
create time.

Thanks,
Gabo.



2013/9/20 Raj Hadoop hadoop...@yahoo.com

Hi Nitin,
 
Thanks for the reply. I have a huge file in unix.
 
As per the file definition, the file is a tab separated file of fields. But I 
am sure that within some field's I have some new line character. 
 
How should I find a record? It is a huge file. Is there some command?
 
Thanks,
 


From: Nitin Pawar nitinpawar...@gmail.com
To: user@hive.apache.org user@hive.apache.org; Raj Hadoop 
hadoop...@yahoo.com 
Sent: Friday, September 20, 2013 3:15 PM
Subject: Re: How to load /t /n file to Hive



If your data contains new line chars, its better you write a custom map reduce 
job and convert the data into a single line removing all unwanted chars in 
column separator as well just having single new line char per line 



On Sat, Sep 21, 2013 at 12:38 AM, Raj Hadoop hadoop...@yahoo.com wrote:

Please note that there is an escape chacter in the fields where the /t and /n 
are present.



From: Raj Hadoop hadoop...@yahoo.com
To: Hive user@hive.apache.org 
Sent: Friday, September 20, 2013 3:04 PM
Subject: How to load /t /n file to Hive



Hi,
 
I have a file which is delimted by a tab. Also, there are some fields in the 
file which has a tab /t character and a new line /n character in some fields.
 
Is there any way to load this file using Hive load command? Or do i have to 
use a Custom Map Reduce (custom) Input format with java ? Please advise.
 
Thanks,
Raj





-- 
Nitin Pawar




Hive Thrift Service - Not Running Continously

2013-08-05 Thread Raj Hadoop
Hi,
 
 
The hive thrift service is not running continously. I had to execute  the 
command (hive --service hiveserver ) very frequently . Can any one help me on 
this?
 
Thanks,
Raj

Help in debugging Hive Query

2013-07-25 Thread Raj Hadoop
All,
 
I am trying to determine visits for customer from omniture weblog file using 
Hive.
 
Table: omniture_web_data
Columns: visid_high,visid_low,evar23,visit_page_num
 
Sample Data:
visid_high,visid_low,evar23,visit_page_num
999,888,1003,10
999,888,1003,14
999,888,1003,6
999,777,1003,12
999,777,1003,20
 
I want to calculate for each Customer Number ( evar23 is  Customer Number ) , 
total visits. visid_high and visid_low determines a unique visit.
For each distinct visitor, calculate sum of maximum visit_page_num. In above 
example
 
14 + 20 = 34 should be the total visits for the customer 1003.
 
I am trying to run the following queries - Method 1 is almost the same as 
Method 2. Except in Method 1 I only choose a particualr customer number 1003. 
In method 2 , i generalized to all.
 
In Method 1 , I am getting the accurate result. In metnhod 2 , I am not getting 
the same result as Method 1. 
 
Any suggestions on how to trouble shoot. ALso, any alternative approaches.
 
// Method 1
select a.evar23,sum(b.max_visit_page_num) from
(select distinct visid_high,visid_low,evar23 from web.omniture_web_data where 
evar23='1003') a
JOIN
(select visid_high,visid_low,max(visit_page_num) as max_visit_page_num from 
omniture_web_data where evar23='1003' group by visid_high,visid_low) b
where a.visid_high=b.visid_high and a.visid_low=b.visid_low
group by a.evar23;
 
/ Result of Method 1
 
100334
 
// Method 2

create table temp123 as
select a.evar23,sum(b.max_visit_page_num) from
(select distinct visid_high,visid_low,evar23 from web.omniture_web_data) a
JOIN
(select visid_high,visid_low,max(visit_page_num) as max_visit_page_num from 
omniture_web_data group by visid_high,visid_low) b
where a.visid_high=b.visid_high and a.visid_low=b.visid_low
group by a.evar23;
 
select * from temp123 where evar23='1003';
 
// The Result of Method 2 is not the same as Method 1. It is showing a 
different number.
 
 
 
Thanks,
Raj

Re: Help in debugging Hive Query

2013-07-25 Thread Raj Hadoop
Hi Sanjay,
 
Thanks for taking the time to write all the details. I did a silly mistake. The 
data type for visit_page_num, i created it as string. The string was causing 
issues when I am using the max function. A type cast to int in the query worked 
for me.
 
Regards,
Raj



From: Sanjay Subramanian sanjay.subraman...@wizecommerce.com
To: user@hive.apache.org user@hive.apache.org 
Sent: Thursday, July 25, 2013 1:41 PM
Subject: Re: Help in debugging Hive Query



The query is correct but since u r creating a managed table , that is possibly 
creating some issue and the records are not all getting created

This is what I would propose

CHECKPOINT  1 : Is this query running at all ?
===
Use this option in BOLD and run the QUERY ONLY (without any table creation) to 
log errors and pipe to a log file by using nohup or some other way that u prefer
hive -hiveconf hive.root.logger=INFO,console -e

select a.evar23,sum(b.max_visit_page_num) from
(select distinct visid_high,visid_low,evar23 from web.omniture_web_data) a
JOIN
(select visid_high,visid_low,max(visit_page_num) as max_visit_page_num from 
omniture_web_data group by visid_high,visid_low) b
where a.visid_high=b.visid_high and a.visid_low=b.visid_low
group by a.evar23;


CHECKPOINT 2 : Run the query (using the CREATE TABLE option) with these 
additional options
===
Required params:

SET mapreduce.job.maps=500; 
SET mapreduce.job.reduces=8; 
SET mapreduce.tasktracker.map.tasks.maximum=12; 
SET mapreduce.tasktracker.reduce.tasks.maximum=8; 
SET 
mapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.SnappyCodec; 
SET mapreduce.map.output.compress=true; 


Optional params:
---
If u r using compression in output , use the following ; u can change the 
LzoCodec to whatever u r using for compression 
SET hive.exec.compress.intermediate=true; 
SET hive.exec.compress.output=true;
SET 
mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec;
 
SET mapreduce.output.fileoutputformat.compress=true; 


Thanks

Sanjay

From: Raj Hadoop hadoop...@yahoo.com
Reply-To: user@hive.apache.org user@hive.apache.org, Raj Hadoop 
hadoop...@yahoo.com
Date: Thursday, July 25, 2013 5:00 AM
To: Hive user@hive.apache.org
Subject: Help in debugging Hive Query


All,

I am trying to determine visits for customer from omniture weblog file using 
Hive.

Table: omniture_web_data
Columns: visid_high,visid_low,evar23,visit_page_num

Sample Data:
visid_high,visid_low,evar23,visit_page_num
999,888,1003,10
999,888,1003,14
999,888,1003,6
999,777,1003,12
999,777,1003,20

I want to calculate for each Customer Number ( evar23 is  Customer Number ) , 
total visits. visid_high and visid_low determines a unique visit.
For each distinct visitor, calculate sum of maximum visit_page_num. In above 
example

14 + 20 = 34 should be the total visits for the customer 1003.

I am trying to run the following queries - Method 1 is almost the same as 
Method 2. Except in Method 1 I only choose a particualr customer number 1003. 
In method 2 , i generalized to all.

In Method 1 , I am getting the accurate result. In metnhod 2 , I am not getting 
the same result as Method 1. 

Any suggestions on how to trouble shoot. ALso, any alternative approaches.

// Method 1
select a.evar23,sum(b.max_visit_page_num) from
(select distinct visid_high,visid_low,evar23 from web.omniture_web_data where 
evar23='1003') a
JOIN
(select visid_high,visid_low,max(visit_page_num) as max_visit_page_num from 
omniture_web_data where evar23='1003' group by visid_high,visid_low) b
where a.visid_high=b.visid_high and a.visid_low=b.visid_low
group by a.evar23;

/ Result of Method 1

100334

// Method 2

create table temp123 as
select a.evar23,sum(b.max_visit_page_num) from
(select distinct visid_high,visid_low,evar23 from web.omniture_web_data) a
JOIN
(select visid_high,visid_low,max(visit_page_num) as max_visit_page_num from 
omniture_web_data group by visid_high,visid_low) b
where a.visid_high=b.visid_high and a.visid_low=b.visid_low
group by a.evar23;

select * from temp123 where evar23='1003';

// The Result of Method 2 is not the same as Method 1. It is showing a 
different number.



Thanks,
Raj

 

CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System

Special characters in web log file causing issues

2013-07-08 Thread Raj Hadoop


Hi ,
 
The log file that I am trying to load throuh Hive has some special characters 
 
The field is shown below and the special characters ¿¿are also shown.
 Shockwave Flash
in;Motive ManagementPlug-in;Google Update;Java(TM)Platform SE 7U21;McAfee 
SiteAdvisor;McAfee Virtual Technician;Windows Live¿¿ Photo Gallery;McAfee 
SecurityCenter;Silverlig
 
 
The above is causing the record to be terminated and loading another line.  How 
can I avoid this type of issues and how to load the proper data ? Any 
suggestions please.
Thanks,
Raj;Chrome Remote Desktop Viewer;NativeClient;Chrome PDF Viewer;Adobe 
Acrobat;Microsoft Office 2010;Motive Plug- 

Re: How Can I store the Hive query result in one file ?

2013-07-04 Thread Raj Hadoop
 

 hive  set hive.io.output.fileformat=CSVTextFile;
 hive  insert overwrite local directory '/usr/home/hadoop/da1/' select * from 
customers

*** customers is a Hive table



 From: Edward Capriolo edlinuxg...@gmail.com
To: user@hive.apache.org user@hive.apache.org 
Sent: Friday, July 5, 2013 12:10 AM
Subject: Re: How Can I store the Hive query result in one file ?
 


Normally if use set mapred.reduce.tasks=1 you get one output file. You can also 
look at
hive.merge.mapfiles, mapred.reduce.tasks, hive.merge.reducefiles also you can 
use a separate tool https://github.com/edwardcapriolo/filecrush




On Thu, Jul 4, 2013 at 6:38 AM, Nitin Pawar nitinpawar...@gmail.com wrote:

will hive -e query  filename  or hive -f query.q  filename will do ? 


you specially want it to write into a named file on hdfs only? 



On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN matouk.iftis...@ysance.com 
wrote:

Hello Hive users,
Is there a manner to store the Hive  query result (SELECT *.) in a 
specfique  and alone file (given the file name) like (INSERT OVERWRITE LOCAL 
DIRECTORY '/directory_path_name/')?
Thanks for your answers






-- 
Nitin Pawar


Re: How Can I store the Hive query result in one file ?

2013-07-04 Thread Raj Hadoop


Adding to that

- Multiple files can be concatenated from the directory like
Example:  cat 0-0 00-1 0-2  final




 From: Raj Hadoop hadoop...@yahoo.com
To: user@hive.apache.org user@hive.apache.org; matouk.iftis...@ysance.com 
matouk.iftis...@ysance.com 
Sent: Friday, July 5, 2013 12:17 AM
Subject: Re: How Can I store the Hive query result in one file ?
 


 

 hive  set hive.io.output.fileformat=CSVTextFile;
 hive  insert overwrite local directory '/usr/home/hadoop/da1/' select * from 
customers

*** customers is a Hive table



 From: Edward Capriolo edlinuxg...@gmail.com
To: user@hive.apache.org user@hive.apache.org 
Sent: Friday, July 5, 2013 12:10 AM
Subject: Re: How Can I store the Hive query result in one file ?
 


Normally if use set mapred.reduce.tasks=1 you get one output file. You can also 
look at
hive.merge.mapfiles, mapred.reduce.tasks, hive.merge.reducefiles also you can 
use a separate tool https://github.com/edwardcapriolo/filecrush




On Thu, Jul 4, 2013 at 6:38 AM, Nitin Pawar nitinpawar...@gmail.com wrote:

will hive -e query  filename  or hive -f query.q  filename will do ? 


you specially want it to write into a named file on hdfs only? 



On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN matouk.iftis...@ysance.com 
wrote:

Hello Hive users,
Is there a manner to store the Hive  query result (SELECT *.) in a 
specfique  and alone file (given the file name) like (INSERT OVERWRITE LOCAL 
DIRECTORY '/directory_path_name/')?
Thanks for your answers






-- 
Nitin Pawar


Issue with Oracle Hive Metastore (SEQUENCE_TABLE)

2013-07-03 Thread Raj Hadoop
Hi,
 
When I installed Hive earlier on my machine I used a oracle hive meta script. 
Please find attached the script. HIVE worked fine for me on this box with no 
issues.
 
I am trying to install Hive on another machine in a different Oracle metastore. 
I executed the meta script but I am having issues with my hive on second box.
 
$ hive
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
Logging initialized using configuration in 
jar:file:/software/hadoop/hive/hive-0.9.0/lib/hive-common-0.9.0.jar!/hive-log4j.properties
Hive history file=/tmp/hadoop/hive_job_log_hadoop_201307031616_605717324.txt
hive show tables;
FAILED: Error in metadata: javax.jdo.JDOException: Couldnt obtain a new 
sequence (unique id) : ORA-00942: table or view does not exist
NestedThrowables:
java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask

I found the difference between the two meta stores and one table is missing in 
it. The table is SEQUENCE_TABLE. I do not know whether this table will be 
created automatically by Hive or should it be in the script.I dont remember 
what I did earlier and I am assuming I used the same script. Can any one had 
this issue earlier ? Please advise.
 
Also, Where to get the hive 0.9 oracle meta script?
 
Thanks,
Raj

hive-schema-0.9.0.oracle.sql
Description: Binary data


Hive Table to CSV file

2013-07-01 Thread Raj Hadoop
Hi,

My requirement is to load data from a (one column) Hive view to a CSV file. 
After loading it, I dont see any file generated.

I used the following commands to load data to file from a view v_june1


hive  set hive.io.output.fileformat=CSVTextFile;
 hive  insert overwrite local directory '/usr/home/hadoop/da1/' select * from 
v_june1_pgnum 

.The output at console is like the below. 



MapReduce Total cumulative CPU time: 4 minutes 15 seconds 590 msec
Ended Job = job_201306141336_0113
Copying data to local directory /usr/home/hadoop/da1
Copying data to local directory /usr/home/hadoop/da1
3281 Rows loaded to /usr/home/hadoop/da1
MapReduce Jobs Launched:
Job 0: Map: 21  Reduce: 6   Cumulative CPU: 255.59 sec   HDFS Read: 5373722496 
HDFS Write: 389069 SUCCESS
Total MapReduce CPU Time Spent: 4 minutes 15 seconds 590 msec
OK
Time taken: 148.764 second



My Question : I do not see any files created under /usr/home/hadoop/da1. Where 
are the files created?

Thanks,
Raj

Re: Hive Table to CSV file

2013-07-01 Thread Raj Hadoop
Sorry. Its my bad.  I see the files now. I was looking in a different directory 
earlier.





 From: Mohammad Tariq donta...@gmail.com
To: user user@hive.apache.org 
Sent: Monday, July 1, 2013 8:26 PM
Subject: Re: Hive Table to CSV file
 


Do you have permissions to write to this path?And make sure you are looking 
into the local FS, as Stephen has specified.


Warm Regards,
Tariq
cloudfront.blogspot.com



On Tue, Jul 2, 2013 at 5:25 AM, Stephen Sprague sprag...@gmail.com wrote:

you gotta admit that's kinda funny.  Your stderr output shows not only once but 
three times where it put the output and in fact how many rows it put there.  
and to top it off it reported 'SUCCESS'.

but you're saying there's nothing there? 

now. call me crazy but i would tend to believe hive over you - but that's just 
me. :)

are you looking at the local filesystem on the same box you ran hive?




On Mon, Jul 1, 2013 at 4:01 PM, Raj Hadoop hadoop...@yahoo.com wrote:

Hi,

My requirement is to load data from a (one column) Hive view to a CSV file. 
After loading it, I dont see any file generated.

I used the following commands to load data to file from a view v_june1


hive  set hive.io.output.fileformat=CSVTextFile;
 hive  insert overwrite local directory '/usr/home/hadoop/da1/' select * 
from v_june1_pgnum 

.The output at console is like the below. 



MapReduce Total cumulative CPU time: 4 minutes 15 seconds 590 msec
Ended Job = job_201306141336_0113
Copying data to local directory /usr/home/hadoop/da1
Copying data to local directory /usr/home/hadoop/da1
3281 Rows loaded to /usr/home/hadoop/da1
MapReduce Jobs Launched:
Job 0: Map: 21  Reduce: 6   Cumulative CPU: 255.59 sec   HDFS Read: 
5373722496 HDFS Write: 389069 SUCCESS
Total MapReduce CPU Time Spent: 4 minutes 15 seconds 590 msec
OK
Time taken: 148.764 second



My Question : I do not see any files created under /usr/home/hadoop/da1. 
Where are the files created?

Thanks,
Raj






TempStatsStore derby.log

2013-06-21 Thread Raj Hadoop
Hi,
 
I have Hive metastore created in an Oracle database. 
 
But when i execute my Hive queries , I see following directory and file created.
TempStatsStore  (directory)
derby.log
 
What are this? Can one one suggest why derby log is created even though my 
javax.jdo.option.ConnectionURL is pointing to Oracle?
 
Thanks,
Raj

Apache Flume Properties File

2013-05-24 Thread Raj Hadoop
Hi,
 
I just installed Apache Flume 1.3.1 and trying to run a small example to test. 
Can any one suggest me how can I do this? I am going through the documentation 
right now.
 
Thanks,
Raj

Hive tmp logs

2013-05-22 Thread Raj Hadoop
Hi,
 
My hive job logs are being written to /tmp/hadoop directory. I want to change 
it to a different location i.e. a sub directory somehere under the 'hadoop' 
user home directory.
How do I change it.
 
Thanks,
Ra

Sqoop Import Oracle Error - Attempted to generate class with no columns!

2013-05-22 Thread Raj Hadoop
Hi,
 
I just finished setting up Apache sqoop 1.4.3. I am trying to test basic sqoop 
import on Oracle.
 
sqoop import --connect jdbc:oracle:thin:@//intelli.dmn.com:1521/DBT --table 
usr1.testonetwo --username usr123 --password passwd123
 
 
I am getting the error as 
13/05/22 17:18:16 INFO manager.SqlManager: Executing SQL statement: SELECT t.* 
FROM usr1.testonetwo t WHERE 1=0
13/05/22 17:18:16 ERROR tool.ImportTool: Imported Failed: Attempted to generate 
class with no columns!
 
I checked the database and the query runs fine from Oracle sqlplus client and 
Toad.
 
Thanks,
Raj

hive.metastore.warehouse.dir - Should it point to a physical directory

2013-05-21 Thread Raj Hadoop
Hi,
 
I am configurinig Hive. I ahve a question on the property 
hive.metastore.warehouse.dir.
 
Should this point to a physical directory. I am guessing it is a logical 
directory under Hadoop fs.default.name. Please advise whether I need to create 
any directory for the variable hive.metastore.warehouse.dir
 
Thanks,
Raj

Re: hive.metastore.warehouse.dir - Should it point to a physical directory

2013-05-21 Thread Raj Hadoop
Can some one help me on this ? I am stuck installing and configuring Hive with 
Oracle. Your timely help is really aprreciated.




From: Raj Hadoop hadoop...@yahoo.com
To: Hive user@hive.apache.org; User u...@hadoop.apache.org 
Sent: Tuesday, May 21, 2013 1:08 PM
Subject: hive.metastore.warehouse.dir - Should it point to a physical directory



Hi,

I am configurinig Hive. I ahve a question on the property 
hive.metastore.warehouse.dir.

Should this point to a physical directory. I am guessing it is a logical 
directory under Hadoop fs.default.name. Please advise whether I need to create 
any directory for the variable hive.metastore.warehouse.dir

Thanks,
Raj

Re: hive.metastore.warehouse.dir - Should it point to a physical directory

2013-05-21 Thread Raj Hadoop
Thanks Sanjay.
 
My environment is  like this.
 
$ echo $HADOOP_HOME
/software/home/hadoop/hadoop/hadoop-1.1.2
 
$ echo $HIVE_HOME
/software/home/hadoop/hive/hive-0.9.0

$ id
uid=50052(hadoop) gid=600(apps) groups=600(apps)

 
So can i do like this:
 
$pwd
/software/home/hadoop/hive/hive-0.9.0
 
$mkdir warehouse
 
$cd /software/home/hadoop/hive/hive-0.9.0/warehouse
 
$ in hive-site.xmlproperty
  namehive.metastore.warehouse.dir/name
  value/software/home/hadoop/hive/hive-0.9.0/warehouse/value
  descriptionlocation of default database for the warehouse/description 
/property
 
Where should I create the HDFS directory ?
 
 


From: Sanjay Subramanian sanjay.subraman...@wizecommerce.com
To: user@hive.apache.org user@hive.apache.org; Raj Hadoop 
hadoop...@yahoo.com; Dean Wampler deanwamp...@gmail.com 
Cc: User u...@hadoop.apache.org 
Sent: Tuesday, May 21, 2013 1:53 PM
Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical 
directory



Notes below
From: Raj Hadoop hadoop...@yahoo.com
Reply-To: user@hive.apache.org user@hive.apache.org, Raj Hadoop 
hadoop...@yahoo.com
Date: Tuesday, May 21, 2013 10:49 AM
To: Dean Wampler deanwamp...@gmail.com, user@hive.apache.org 
user@hive.apache.org
Cc: User u...@hadoop.apache.org
Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical 
directory


Ok.I got it. My questions -
 
1) Should a local physical directory be created before using this property?
I created a directory in HDFS during Hive installation
/user/hive/warehouse

My hive-site.xml has the following property defined


property
  namehive.metastore.warehouse.dir/name
  value/user/hive/warehouse/value
  descriptionlocation of default database for the warehouse/description 
/property
2) Should a HDFS file directory be created from Hadoop before using this 
property?
hdfs dfs -mkdir /user/hive/warehouse
Change the owner:group to hive:hive 
 



From: Dean Wampler deanwamp...@gmail.com
To: user@hive.apache.org; Raj Hadoop hadoop...@yahoo.com 
Cc: User u...@hadoop.apache.org 
Sent: Tuesday, May 21, 2013 1:44 PM
Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical 
directory



The name is misleading; this is the directory within HDFS where Hive stores the 
data, by default. (External tables can go elsewhere). It doesn't really have 
anything to do with the metastore. 

dean


On Tue, May 21, 2013 at 12:42 PM, Raj Hadoop hadoop...@yahoo.com wrote:

Can some one help me on this ? I am stuck installing and configuring Hive with 
Oracle. Your timely help is really aprreciated.



From: Raj Hadoop hadoop...@yahoo.com
To: Hive user@hive.apache.org; User u...@hadoop.apache.org 
Sent: Tuesday, May 21, 2013 1:08 PM
Subject: hive.metastore.warehouse.dir - Should it point to a physical directory



Hi,

I am configurinig Hive. I ahve a question on the property 
hive.metastore.warehouse.dir.

Should this point to a physical directory. I am guessing it is a logical 
directory under Hadoop fs.default.name. Please advise whether I need to create 
any directory for the variable hive.metastore.warehouse.dir

Thanks,
Raj




-- 
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com/



CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.

Re: hive.metastore.warehouse.dir - Should it point to a physical directory

2013-05-21 Thread Raj Hadoop
yes thats what i meant. local physical directory. thanks.




From: bharath vissapragada bharathvissapragada1...@gmail.com
To: user@hive.apache.org; Raj Hadoop hadoop...@yahoo.com 
Cc: User u...@hadoop.apache.org 
Sent: Tuesday, May 21, 2013 1:59 PM
Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical 
directory



Hi, 

If by local physical directory you mean a directory in the underlying OS file 
system, then No. You just need to create a directory in HDFS and ad it to that 
xml config file.

Thanks,



On Tue, May 21, 2013 at 11:19 PM, Raj Hadoop hadoop...@yahoo.com wrote:

Ok.I got it. My questions -
 
1) Should a local physical directory be created before using this property?
2) Should a HDFS file directory be created from Hadoop before using this 
property?
 
 


From: Dean Wampler deanwamp...@gmail.com
To: user@hive.apache.org; Raj Hadoop hadoop...@yahoo.com 
Cc: User u...@hadoop.apache.org 
Sent: Tuesday, May 21, 2013 1:44 PM
Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical 
directory



The name is misleading; this is the directory within HDFS where Hive stores 
the data, by default. (External tables can go elsewhere). It doesn't really 
have anything to do with the metastore. 


dean


On Tue, May 21, 2013 at 12:42 PM, Raj Hadoop hadoop...@yahoo.com wrote:

Can some one help me on this ? I am stuck installing and configuring Hive with 
Oracle. Your timely help is really aprreciated.



From: Raj Hadoop hadoop...@yahoo.com
To: Hive user@hive.apache.org; User u...@hadoop.apache.org 
Sent: Tuesday, May 21, 2013 1:08 PM
Subject: hive.metastore.warehouse.dir - Should it point to a physical 
directory



Hi,

I am configurinig Hive. I ahve a question on the property 
hive.metastore.warehouse.dir.

Should this point to a physical directory. I am guessing it is a logical 
directory under Hadoop fs.default.name. Please advise whether I need to 
create any directory for the variable hive.metastore.warehouse.dir

Thanks,
Raj





-- 
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com/ 



Re: hive.metastore.warehouse.dir - Should it point to a physical directory

2013-05-21 Thread Raj Hadoop
So that means I need to create a HDFS ( Not an OS physical directory ) 
directory under Hadoop that need to be used in the Hive config file for this 
property. Right?




From: Dean Wampler deanwamp...@gmail.com
To: Raj Hadoop hadoop...@yahoo.com 
Cc: Sanjay Subramanian sanjay.subraman...@wizecommerce.com; 
user@hive.apache.org user@hive.apache.org; User u...@hadoop.apache.org 
Sent: Tuesday, May 21, 2013 2:06 PM
Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical 
directory



No, you only need a directory in HDFS, which will be virtually located 
somewhere in your cluster automatically by HDFS. 

Also there's a typo in your hive.xml:

  value/software/home/hadoop/hive/hive-0.9.0/warehouse/value
Should be

  value/correct/path/in/hdfs/to/your/warehouse/directory/value

On Tue, May 21, 2013 at 1:04 PM, Raj Hadoop hadoop...@yahoo.com wrote:

Thanks Sanjay.
 
My environment is  like this.
 
$ echo $HADOOP_HOME
/software/home/hadoop/hadoop/hadoop-1.1.2
 
$ echo $HIVE_HOME
/software/home/hadoop/hive/hive-0.9.0

$ id
uid=50052(hadoop) gid=600(apps) groups=600(apps)

 
So can i do like this:
 
$pwd
/software/home/hadoop/hive/hive-0.9.0
 
$mkdir warehouse
 
$cd /software/home/hadoop/hive/hive-0.9.0/warehouse
 
$ in hive-site.xmlproperty
  namehive.metastore.warehouse.dir/name
  value/software/home/hadoop/hive/hive-0.9.0/warehouse/value
  descriptionlocation of default database for the warehouse/description 
/property
 
Where should I create the HDFS directory ?
 

From: Sanjay Subramanian sanjay.subraman...@wizecommerce.com
To: user@hive.apache.org user@hive.apache.org; Raj Hadoop 
hadoop...@yahoo.com; Dean Wampler deanwamp...@gmail.com 
Cc: User u...@hadoop.apache.org 
Sent: Tuesday, May 21, 2013 1:53 PM 

Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical 
directory



Notes below

From: Raj Hadoop hadoop...@yahoo.com
Reply-To: user@hive.apache.org user@hive.apache.org, Raj Hadoop 
hadoop...@yahoo.com
Date: Tuesday, May 21, 2013 10:49 AM
To: Dean Wampler deanwamp...@gmail.com, user@hive.apache.org 
user@hive.apache.org
Cc: User u...@hadoop.apache.org
Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical 
directory



Ok.I got it. My questions -
 
1) Should a local physical directory be created before using this property?
I created a directory in HDFS during Hive installation
/user/hive/warehouse


My hive-site.xml has the following property defined


property
  namehive.metastore.warehouse.dir/name
  value/user/hive/warehouse/value
  descriptionlocation of default database for the warehouse/description 
/property

2) Should a HDFS file directory be created from Hadoop before using this 
property?
hdfs dfs -mkdir /user/hive/warehouse
Change the owner:group to hive:hive 
 


From: Dean Wampler deanwamp...@gmail.com
To: user@hive.apache.org; Raj Hadoop hadoop...@yahoo.com 
Cc: User u...@hadoop.apache.org 
Sent: Tuesday, May 21, 2013 1:44 PM
Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical 
directory



The name is misleading; this is the directory within HDFS where Hive stores 
the data, by default. (External tables can go elsewhere). It doesn't really 
have anything to do with the metastore. 


dean


On Tue, May 21, 2013 at 12:42 PM, Raj Hadoop hadoop...@yahoo.com wrote:

Can some one help me on this ? I am stuck installing and configuring Hive with 
Oracle. Your timely help is really aprreciated.



From: Raj Hadoop hadoop...@yahoo.com
To: Hive user@hive.apache.org; User u...@hadoop.apache.org 
Sent: Tuesday, May 21, 2013 1:08 PM
Subject: hive.metastore.warehouse.dir - Should it point to a physical 
directory



Hi,

I am configurinig Hive. I ahve a question on the property 
hive.metastore.warehouse.dir.

Should this point to a physical directory. I am guessing it is a logical 
directory under Hadoop fs.default.name. Please advise whether I need to 
create any directory for the variable hive.metastore.warehouse.dir

Thanks,
Raj





-- 
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com/



CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised 
that the content of this message is subject to access, review and disclosure 
by the sender's Email System Administrator.





-- 
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com/ 

Re: hive.metastore.warehouse.dir - Should it point to a physical directory

2013-05-21 Thread Raj Hadoop
Thanks Sanjay




From: Sanjay Subramanian sanjay.subraman...@wizecommerce.com
To: bharath vissapragada bharathvissapragada1...@gmail.com; 
user@hive.apache.org user@hive.apache.org; Raj Hadoop hadoop...@yahoo.com 
Cc: User u...@hadoop.apache.org 
Sent: Tuesday, May 21, 2013 2:27 PM
Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical 
directory



Hi Raj

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Quick-Start/cdh4qs_topic_3.html

Installing CDH4 on a Single Linux Node in Pseudo-distributed Mode

On the left panel of the page u will find info on Hive installation etc.

I suggest CHD4 distribution only because it helps u to get started quickly…as 
developers I love to install from individual tar balls but sometimes there is 
little time to learn and execute

There are some great notes here 

sanjay

From: bharath vissapragada bharathvissapragada1...@gmail.com
Date: Tuesday, May 21, 2013 11:12 AM
To: user@hive.apache.org user@hive.apache.org, Raj Hadoop 
hadoop...@yahoo.com
Cc: Sanjay Subramanian sanjay.subraman...@wizecommerce.com, User 
u...@hadoop.apache.org
Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical 
directory




Yes !

On Tue, May 21, 2013 at 11:41 PM, Raj Hadoop hadoop...@yahoo.com wrote:

So that means I need to create a HDFS ( Not an OS physical directory ) 
directory under Hadoop that need to be used in the Hive config file for this 
property. Right?



From: Dean Wampler deanwamp...@gmail.com
To: Raj Hadoop hadoop...@yahoo.com 
Cc: Sanjay Subramanian sanjay.subraman...@wizecommerce.com; 
user@hive.apache.org user@hive.apache.org; User u...@hadoop.apache.org 
Sent: Tuesday, May 21, 2013 2:06 PM 

Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical 
directory



No, you only need a directory in HDFS, which will be virtually located 
somewhere in your cluster automatically by HDFS. 


Also there's a typo in your hive.xml:


  value/software/home/hadoop/hive/hive-0.9.0/warehouse/value

Should be


  value/correct/path/in/hdfs/to/your/warehouse/directory/value


On Tue, May 21, 2013 at 1:04 PM, Raj Hadoop hadoop...@yahoo.com wrote:

Thanks Sanjay.
 
My environment is  like this.
 
$ echo $HADOOP_HOME
/software/home/hadoop/hadoop/hadoop-1.1.2
 
$ echo $HIVE_HOME
/software/home/hadoop/hive/hive-0.9.0

$ id
uid=50052(hadoop) gid=600(apps) groups=600(apps)

 
So can i do like this:
 
$pwd
/software/home/hadoop/hive/hive-0.9.0
 
$mkdir warehouse
 
$cd /software/home/hadoop/hive/hive-0.9.0/warehouse
 
$ in hive-site.xmlproperty
  namehive.metastore.warehouse.dir/name
  value/software/home/hadoop/hive/hive-0.9.0/warehouse/value
  descriptionlocation of default database for the warehouse/description 
/property
 
Where should I create the HDFS directory ?
 

From: Sanjay Subramanian sanjay.subraman...@wizecommerce.com
To: user@hive.apache.org user@hive.apache.org; Raj Hadoop 
hadoop...@yahoo.com; Dean Wampler deanwamp...@gmail.com 
Cc: User u...@hadoop.apache.org 
Sent: Tuesday, May 21, 2013 1:53 PM 

Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical 
directory



Notes below

From: Raj Hadoop hadoop...@yahoo.com
Reply-To: user@hive.apache.org user@hive.apache.org, Raj Hadoop 
hadoop...@yahoo.com
Date: Tuesday, May 21, 2013 10:49 AM
To: Dean Wampler deanwamp...@gmail.com, user@hive.apache.org 
user@hive.apache.org
Cc: User u...@hadoop.apache.org
Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical 
directory



Ok.I got it. My questions -
 
1) Should a local physical directory be created before using this property?
I created a directory in HDFS during Hive installation
/user/hive/warehouse


My hive-site.xml has the following property defined


property
  namehive.metastore.warehouse.dir/name
  value/user/hive/warehouse/value
  descriptionlocation of default database for the warehouse/description 
/property

2) Should a HDFS file directory be created from Hadoop before using this 
property?
hdfs dfs -mkdir /user/hive/warehouse
Change the owner:group to hive:hive 
 


From: Dean Wampler deanwamp...@gmail.com
To: user@hive.apache.org; Raj Hadoop hadoop...@yahoo.com 
Cc: User u...@hadoop.apache.org 
Sent: Tuesday, May 21, 2013 1:44 PM
Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical 
directory



The name is misleading; this is the directory within HDFS where Hive stores 
the data, by default. (External tables can go elsewhere). It doesn't really 
have anything to do with the metastore. 


dean


On Tue, May 21, 2013 at 12:42 PM, Raj Hadoop hadoop...@yahoo.com wrote:

Can some one help me on this ? I am stuck installing and configuring Hive 
with Oracle. Your timely help is really aprreciated.



From: Raj Hadoop hadoop...@yahoo.com
To: Hive user@hive.apache.org; User u...@hadoop.apache.org 
Sent: Tuesday, May 21, 2013 1:08 PM
Subject: hive.metastore.warehouse.dir - Should it point to a physical

Where to get Oracle scripts for Hive Metastore

2013-05-21 Thread Raj Hadoop
I am trying to get Oracle scripts for Hive Metastore.
 
http://mail-archives.apache.org/mod_mbox/hive-commits/201204.mbox/%3c20120423201303.9742b2388...@eris.apache.org%3E
 
The scripts in the above link has a  + at the begining of each line. How should 
I supposed to execute scripts like this through Oracle sqlplus.
 
+CREATE TABLE PART_COL_PRIVS
+(
+    PART_COLUMN_GRANT_ID NUMBER NOT NULL,
+    COLUMN_NAME VARCHAR2(128) NULL,
+    CREATE_TIME NUMBER (10) NOT NULL,
+    GRANT_OPTION NUMBER (5) NOT NULL,
+    GRANTOR VARCHAR2(128) NULL,
+    GRANTOR_TYPE VARCHAR2(128) NULL,
+    PART_ID NUMBER NULL,
+    PRINCIPAL_NAME VARCHAR2(128) NULL,
+    PRINCIPAL_TYPE VARCHAR2(128) NULL,
+    PART_COL_PRIV VARCHAR2(128) NULL
+);
+

Re: Where to get Oracle scripts for Hive Metastore

2013-05-21 Thread Raj Hadoop
I got it. This is the link.
 
http://svn.apache.org/viewvc/hive/trunk/metastore/scripts/upgrade/oracle/hive-schema-0.9.0.oracle.sql?revision=1329416view=copathrev=1329416



From: Raj Hadoop hadoop...@yahoo.com
To: Hive user@hive.apache.org; User u...@hadoop.apache.org 
Sent: Tuesday, May 21, 2013 3:08 PM
Subject: Where to get Oracle scripts for Hive Metastore



I am trying to get Oracle scripts for Hive Metastore.

http://mail-archives.apache.org/mod_mbox/hive-commits/201204.mbox/%3c20120423201303.9742b2388...@eris.apache.org%3E

The scripts in the above link has a  + at the begining of each line. How should 
I supposed to execute scripts like this through Oracle sqlplus.

+CREATE TABLE PART_COL_PRIVS
+(
+    PART_COLUMN_GRANT_ID NUMBER NOT NULL,
+    COLUMN_NAME VARCHAR2(128) NULL,
+    CREATE_TIME NUMBER (10) NOT NULL,
+    GRANT_OPTION NUMBER (5) NOT NULL,
+    GRANTOR VARCHAR2(128) NULL,
+    GRANTOR_TYPE VARCHAR2(128) NULL,
+    PART_ID NUMBER NULL,
+    PRINCIPAL_NAME VARCHAR2(128) NULL,
+    PRINCIPAL_TYPE VARCHAR2(128) NULL,
+    PART_COL_PRIV VARCHAR2(128) NULL
+);
+

Re: Where to get Oracle scripts for Hive Metastore

2013-05-21 Thread Raj Hadoop
Sanjay -
 
This is the first location I tried. But Apache Hive 0.9.0 doesnt have an oracle 
folder. It only had mysql and derby.
 
Thanks,
Raj



From: Sanjay Subramanian sanjay.subraman...@wizecommerce.com
To: u...@hadoop.apache.org u...@hadoop.apache.org; Raj Hadoop 
hadoop...@yahoo.com; Hive user@hive.apache.org 
Sent: Tuesday, May 21, 2013 3:12 PM
Subject: Re: Where to get Oracle scripts for Hive Metastore



Raj

The correct location of the script is where u deflated the hive tar 

For example 
/usr/lib/hive/scripts/metastore/upgrade/oracle

You will find a file in this directory called hive-schema-0.9.0.oracle.sql

Use this

sanjay
From: Raj Hadoop hadoop...@yahoo.com
Reply-To: u...@hadoop.apache.org u...@hadoop.apache.org, Raj Hadoop 
hadoop...@yahoo.com
Date: Tuesday, May 21, 2013 12:08 PM
To: Hive user@hive.apache.org, User u...@hadoop.apache.org
Subject: Where to get Oracle scripts for Hive Metastore


I am trying to get Oracle scripts for Hive Metastore.

http://mail-archives.apache.org/mod_mbox/hive-commits/201204.mbox/%3c20120423201303.9742b2388...@eris.apache.org%3E

The scripts in the above link has a  + at the begining of each line. How should 
I supposed to execute scripts like this through Oracle sqlplus.

+CREATE TABLE PART_COL_PRIVS
+(
+    PART_COLUMN_GRANT_ID NUMBER NOT NULL,
+    COLUMN_NAME VARCHAR2(128) NULL,
+    CREATE_TIME NUMBER (10) NOT NULL,
+    GRANT_OPTION NUMBER (5) NOT NULL,
+    GRANTOR VARCHAR2(128) NULL,
+    GRANTOR_TYPE VARCHAR2(128) NULL,
+    PART_ID NUMBER NULL,
+    PRINCIPAL_NAME VARCHAR2(128) NULL,
+    PRINCIPAL_TYPE VARCHAR2(128) NULL,
+    PART_COL_PRIV VARCHAR2(128) NULL
+);
+




CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.

ORA-01950: no privileges on tablespace

2013-05-21 Thread Raj Hadoop
 
I am setting up a metastore on Oracle for Hive. I executed the script 
hive-schema-0.9.0-sql file too succesfully.
 
When i ran this
hive  show tables;
 
I am getting the following error.
 
ORA-01950: no privileges on tablespace
 
What kind of Oracle privileges are required (Quota wise for Hive) for hive 
oracle user in metastore? Please advise.

Unable to stop Thrift Server

2013-05-20 Thread Raj Hadoop
Hi,

I was not able to stopThrift Server after performing the following steps.


$ bin/hive --service hiveserver 
Starting Hive Thrift Server

$ netstat -nl | grep 1
tcp 0 0 :::1 :::* LISTEN


I gave the following to stop. but not working.


hive --service hiveserver --action stop 1

How can I stop this service?


Thanks,
Raj


Re: Unable to stop Thrift Server

2013-05-20 Thread Raj Hadoop
Hi Sanjay,

I am using 0.9 version.
I do not have a sudo access. is there any other command to stop the service.

thanks,
raj





 From: Sanjay Subramanian sanjay.subraman...@wizecommerce.com
To: user@hive.apache.org user@hive.apache.org; Raj Hadoop 
hadoop...@yahoo.com; User u...@hadoop.apache.org 
Sent: Monday, May 20, 2013 5:11 PM
Subject: Re: Unable to stop Thrift Server
 


Raj
Which version r u using ?

I think from 0.9+ onwards its best to use service to stop and start and NOT 
hive 

sudo service hive-metastore stop
sudo service hive-server stop

sudo service hive-metastore start
sudo service hive-server start

Couple of general things that might help 

1. Use linux screens : then u can start many screen sessions and u don't have 
to give the synch mode  of execution 
     Its very easy to manage several screen sessions and they keep running till 
your server restarts….and generally u can ssh to some jumhost and create your 
screen sessions there  

2. Run the following 
     pstree -pulac | less
     U can possible search for hive or your username or root which was used to 
start the service…and kill the process

sanjay 
From: Raj Hadoop hadoop...@yahoo.com
Reply-To: user@hive.apache.org user@hive.apache.org, Raj Hadoop 
hadoop...@yahoo.com
Date: Monday, May 20, 2013 2:03 PM
To: Hive user@hive.apache.org, User u...@hadoop.apache.org
Subject: Unable to stop Thrift Server


Hi,

I was not able to stopThrift Server after performing the following steps.


$ bin/hive --service hiveserver 
Starting Hive Thrift Server

$ netstat -nl | grep 1
tcp 0 0 :::1 :::* LISTEN


I gave the following to stop. but not working.


hive --service hiveserver --action stop 1

How can I stop this service?


Thanks,
Raj


CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient,
 please contact the sender by reply email and destroy all copies of the 
original message along with any attachments, from your computer system. If you 
are the intended recipient, please be advised that the content of this message 
is subject to access, review
 and disclosure by the sender's Email System Administrator.

Did any one used Hive on Oracle Metastore

2013-05-18 Thread Raj Hadoop
Hi,
I wanted to know whether any one used Hive on Oracle Metastore? Can you please 
share your experiences?
Thanks,
Raj

Hive on Oracle

2013-05-17 Thread Raj Hadoop
Hi,

I am planning to install Hive and want to set up Meta store on Oracle. What is 
the procedure? Which driver (JDBC) do I need to use it?


Thanks,
Raj


Re: Hive on Oracle

2013-05-17 Thread Raj Hadoop

Thanks for the reply.

Can you specify whether which jar file need need to be used ? where can i get 
the jar file? does oracle provide one for free? let me know please.

Thanks,
Raj






 From: bejoy...@yahoo.com bejoy...@yahoo.com
To: user@hive.apache.org; Raj Hadoop hadoop...@yahoo.com; User 
u...@hadoop.apache.org 
Sent: Friday, May 17, 2013 11:42 PM
Subject: Re: Hive on Oracle
 


Hi

The procedure is same as setting up mysql metastore. You need to use the jdbc 
driver/jar corresponding to the oracle version/release you are intending to use.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos


From:  Raj Hadoop hadoop...@yahoo.com 
Date: Fri, 17 May 2013 17:10:07 -0700 (PDT)
To: Hiveuser@hive.apache.org; Useru...@hadoop.apache.org
ReplyTo:  user@hive.apache.org 
Subject: Hive on Oracle

Hi,

I am planning to install Hive and want to set up Meta store on Oracle. What is 
the procedure? Which driver (JDBC) do I need to use it?


Thanks,
Raj