How to extract data from hive effectively

2018-04-28 Thread
Does anyone know how to extract data from hive to other databases effectively, 
like hawq etc.

发自我的 iPhone

What does the ORC SERDE do

2018-05-13 Thread
Hello,everyone
   I know the json serde turn fields in a row to a json format, csv serde turn 
it to csv format with their serdeproperties. But I wonder what the orc serde 
does when I choose to stored as orc file format. And why is there still 
escaper, separator in orc serdeproperties. Also with RC Parquet. I think they 
are just about how to stored and compressed with their input and output format 
respectively, but I don’t know what their serde does, can anyone give some 
hint?  

Re: What does the ORC SERDE do

2018-05-13 Thread
Thank you, it makes the concept clearer to me. I think I need to look up the 
source code for some details.
> 在 2018年5月13日,下午10:42,Jörn Franke <jornfra...@gmail.com> 写道:
> 
> In detail you can check the source code, but a Serde needs to translate an 
> object to a Hive object and vice versa. Usually this is very simple (simply 
> passing the object or create A HiveDecimal etc). It also provides an 
> ObjectInspector that basically describes an object in more detail (eg to be 
> processed by an UDF). For example, it can tell you precision and scale of an 
> objects. In case of ORC it describes also how a bunch of objects (vectorized) 
> can be mapped to hive objects and the other way around. Furthermore, it 
> provides statistics and provides means to deal with partitions as well as 
> table properties (!=input/outputformat properties).
> Although it sounds complex, hive provides most of the functionality so 
> implementing a serde is most of the times easy.
> 
>> On 13. May 2018, at 16:34, 侯宗田 <zongtian...@icloud.com> wrote:
>> 
>> Hello,everyone
>>  I know the json serde turn fields in a row to a json format, csv serde turn 
>> it to csv format with their serdeproperties. But I wonder what the orc serde 
>> does when I choose to stored as orc file format. And why is there still 
>> escaper, separator in orc serdeproperties. Also with RC Parquet. I think 
>> they are just about how to stored and compressed with their input and output 
>> format respectively, but I don’t know what their serde does, can anyone give 
>> some hint?  



Does one table only have one hdfs directory

2018-05-17 Thread
Hello,
Hive table is stored in the path of hive.metastore.warehouse.dir, is one table 
only have one path, or it might have multiple directories?

Re: How to extract data from hive effectively

2018-04-28 Thread
Thank you, 
> 在 2018年4月28日,下午11:09,Johannes Alberti <johan...@altiscale.com> 写道:
> 
> sqoop 1.x ...sqoop export -connect ...
> 
> Sent from my iPhone
> 
>> On Apr 28, 2018, at 4:29 AM, 侯宗田 <zongtian...@icloud.com> wrote:
>> 
>> Does anyone know how to extract data from hive to other databases 
>> effectively, like hawq etc.
>> 
>> 发自我的 iPhone



Hcatalog respond very slow

2018-04-21 Thread
Hello,I am writing a application which needs the metastore in hive. I am 
preparing to use webhcat to get the information about tables and process them. 
But a simple request takes over eight seconds to respond on localhost. Why is 
this so slow, and how can I fix it?
$ time curl -s 
'http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean'
{"columns": 
  [{"name":"id","type":"int"}],
  "database":"default",
  "table":"haha"}

real0m8.400s
user0m0.053s
sys 0m0.019s
The webhcat.log is very short, it seems to run a hcat.py, I have looked up the 
log, and I can't figure out what's going on here. Here is the info when I 
simply run hcat. It seems take all the time. Thank you very much for your kind 
reply.


$hcat.py -e "use default; desc haha; "
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/04/21 16:38:13 INFO conf.HiveConf: Found configuration file 
file:/usr/local/hive/conf/hive-site.xml
18/04/21 16:38:15 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory: 
/tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
18/04/21 16:38:16 INFO session.SessionState: Created local directory: 
/tmp/hive/java/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory: 
/tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668/_tmp_space.db
18/04/21 16:38:16 INFO ql.Driver: Compiling 
command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62): 
use default
18/04/21 16:38:17 INFO metastore.HiveMetaStore: 0: Opening raw store with 
implementation class:org.apache.hadoop.hive.metastore.ObjectStore
18/04/21 16:38:17 INFO metastore.ObjectStore: ObjectStore, initialize called
18/04/21 16:38:18 INFO DataNucleus.Persistence: Property 
hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/04/21 16:38:18 INFO DataNucleus.Persistence: Property 
datanucleus.cache.level2 unknown - will be ignored
18/04/21 16:38:18 INFO metastore.ObjectStore: Setting MetaStore object pin 
classes with 
hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/04/21 16:38:20 INFO metastore.MetaStoreDirectSql: Using direct SQL, 
underlying DB is MYSQL
18/04/21 16:38:20 INFO metastore.ObjectStore: Initialized ObjectStore
18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added admin role in metastore
18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added public role in metastore
18/04/21 16:38:20 INFO metastore.HiveMetaStore: No user is added in admin role, 
since config is empty
18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_all_functions
18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
ip=unknown-ip-addr  cmd=get_all_functions
18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
ip=unknown-ip-addr  cmd=get_database: default
18/04/21 16:38:20 INFO ql.Driver: Semantic Analysis Completed
18/04/21 16:38:20 INFO ql.Driver: Returning Hive schema: 
Schema(fieldSchemas:null, properties:null)
18/04/21 16:38:20 INFO ql.Driver: Completed compiling 
command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62); 
Time taken: 3.936 seconds
18/04/21 16:38:20 INFO ql.Driver: Concurrency mode is disabled, not creating a 
lock manager
18/04/21 16:38:20 INFO ql.Driver: Executing 
command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62): 
use default
18/04/21 16:38:20 INFO sqlstd.SQLStdHiveAccessController: Created 
SQLStdHiveAccessController for session context : HiveAuthzSessionContext 
[sessionString=05096382-f9b6-4dae-aee2-dfa6750c0668, clientType=HIVECLI]
18/04/21 16:38:20 WARN session.SessionState: METASTORE_FILTER_HOOK will be 
ignored, since hive.security.authorization.manager is set to instance of 
HiveAuthorizerFactory.
18/04/21 16:38:20 INFO hive.metastore: Mestastore configuration 
hive.metastore.filter.hook changed from 
org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to 
org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook
18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Cleaning up thread local 
RawStore...
18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
ip=unknown-ip-addr  cmd=Cleaning up thread local RawStore...
18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Done cleaning up thread 
local 

Hive can't be installed properly

2018-04-22 Thread
Hi,

I have git clone hive from the master branch and installed it by Maven, but 
always get the following error, does anyone what is going on here?

[INFO] Running org.apache.hadoop.hive.metastore.conf.TestMetastoreConf
[ERROR] Tests run: 22, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 0.356 
s <<< FAILURE! - in org.apache.hadoop.hive.metastore.conf.TestMetastoreConf
[ERROR] 
readHiveMetastoreSiteWithHiveHomeDir(org.apache.hadoop.hive.metastore.conf.TestMetastoreConf)
  Time elapsed: 0.036 s  <<< FAILURE!
java.lang.AssertionError
at 
org.apache.hadoop.hive.metastore.conf.TestMetastoreConf.readHiveMetastoreSiteWithHiveHomeDir(TestMetastoreConf.java:222)

[ERROR] 
readHiveSiteWithHiveHomeDir(org.apache.hadoop.hive.metastore.conf.TestMetastoreConf)
  Time elapsed: 0.008 s  <<< FAILURE!
java.lang.AssertionError
at 
org.apache.hadoop.hive.metastore.conf.TestMetastoreConf.readHiveSiteWithHiveHomeDir(TestMetastoreConf.java:203)

[ERROR] Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 9.63 s 
<<< FAILURE! - in org.apache.hadoop.hive.metastore.metrics.TestMetrics
[ERROR] defaults(org.apache.hadoop.hive.metastore.metrics.TestMetrics)  Time 
elapsed: 0.018 s  <<< ERROR!
java.lang.RuntimeException: Unknown metric type  jmx
at 
org.apache.hadoop.hive.metastore.metrics.TestMetrics.defaults(TestMetrics.java:119)

[INFO] Running org.apache.hadoop.hive.metastore.TestAdminUser
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.163 s 
<<< FAILURE! - in org.apache.hadoop.hive.metastore.TestAdminUser
[ERROR] testCreateAdminNAddUser(org.apache.hadoop.hive.metastore.TestAdminUser) 
 Time elapsed: 1.161 s  <<< ERROR!
javax.jdo.JDOFatalInternalException: Error creating transactional connection 
factory
at 
org.apache.hadoop.hive.metastore.TestAdminUser.testCreateAdminNAddUser(TestAdminUser.java:41)
Caused by: java.lang.reflect.InvocationTargetException
at 
org.apache.hadoop.hive.metastore.TestAdminUser.testCreateAdminNAddUser(TestAdminUser.java:41)
Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the 
"BONECP" plugin to create a ConnectionPool gave an error : The specified 
datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. 
Please check your CLASSPATH specification, and the name of the driver.
at 
org.apache.hadoop.hive.metastore.TestAdminUser.testCreateAdminNAddUser(TestAdminUser.java:41)
Caused by: 
org.datanucleus.store.rdbms.connectionpool.DatastoreDriverNotFoundException: 
The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the 
CLASSPATH. Please check your CLASSPATH specification, and the name of the 
driver.
at 
org.apache.hadoop.hive.metastore.TestAdminUser.testCreateAdminNAddUser(TestAdminUser.java:41)

Re: error when beeline connecting to hiveserver2

2018-04-25 Thread
Hi, Antal

Thank you, I have followed some web guide and set hive2 transport mode to http, 
then the port number become 10001. I changed it back and set the proxy username 
in hdfs and it worked.

Regards,
Hou
> 在 2018年4月25日,下午6:23,Antal Sinkovits <asinkov...@cloudera.com> 写道:
> 
> Hi,
> 
> First of all, I would check if the HiveServer2 is running, and listens on the 
> given port.
> 
> E.g:
> lsof -i -P |grep java
> 
> You should see something like:
> java  33169 asinkovits  349u  IPv6 0x  0t0  TCP 
> *:1 (LISTEN)
> java  33169 asinkovits  350u  IPv6 0x  0t0  TCP 
> *:10002 (LISTEN)
> 
> If it does, try
> beeline -u jdbc:hive2://localhost:1
> 
> Regards,
> Antal
> 
> 
> On Tue, Apr 24, 2018 at 10:19 AM, 侯宗田 <zongtian...@icloud.com 
> <mailto:zongtian...@icloud.com>> wrote:
> Thank you very much for your reply, I have changed the port number and set 
> the thrift.bind.host to localhost. But I still get the error, do you have 
> some ideas about this?
> 
> beeline> !connect jdbc:hive2://localhost:1 <> anonymous anonymous
> Connecting to jdbc:hive2://localhost:1 <>
> /04/24 16:13:59 [main]: WARN jdbc.HiveConnection: Failed to connect to 
> localhost:1
> Could not open connection to the HS2 server. Please check the server URI and 
> if the URI is correct, then ask the administrator to check the server status.
> Error: Could not open client transport with JDBC Uri: 
> jdbc:hive2://localhost:1: <> java.net.ConnectException: Connection 
> refused (Connection refused) (state=08S01,code=0)
> 
>> 在 2018年4月24日,上午1:55,Johannes Alberti <johan...@altiscale.com 
>> <mailto:johan...@altiscale.com>> 写道:
>> 
>> You should connect by default to 1, the webui port is not the port 
>> beeline connects with. Regards, Johannes
>> 
>> Sent from my iPhone
>> 
>> On Apr 23, 2018, at 6:38 AM, 侯宗田 <zongtian...@icloud.com 
>> <mailto:zongtian...@icloud.com>> wrote:
>> 
>>> Hi,
>>> 
>>> I have started hiveserver2 and try to connect it with beeline using the 
>>> following command:
>>> >!connect jdbc:hive2://localhost:10002/default <>
>>> 
>>> But get the following error
>>> 
>>> WARN jdbc.HiveConnection: Failed to connect to localhost:10002
>>> Unknown HS2 problem when communicating with Thrift server.
>>> Error: Could not open client transport with JDBC Uri: 
>>> jdbc:hive2://localhost:10002/default: <> Invalid status 72 
>>> (state=08S01,code=0)
>>> beeline>
>>> 
>>> I have set the webUI port to 10002 and mode as http, do I still lost 
>>> something? 
>>> Does anyone know what is the problem and how to solve it?
> 
> 



error when beeline connecting to hiveserver2

2018-04-23 Thread
Hi,

I have started hiveserver2 and try to connect it with beeline using the 
following command:
>!connect jdbc:hive2://localhost:10002/default 

But get the following error

WARN jdbc.HiveConnection: Failed to connect to localhost:10002
Unknown HS2 problem when communicating with Thrift server.
Error: Could not open client transport with JDBC Uri: 
jdbc:hive2://localhost:10002/default: Invalid status 72 (state=08S01,code=0)
beeline>

I have set the webUI port to 10002 and mode as http, do I still lost something? 
Does anyone know what is the problem and how to solve it?