How to extract data from hive effectively
Does anyone know how to extract data from hive to other databases effectively, like hawq etc. 发自我的 iPhone
What does the ORC SERDE do
Hello,everyone I know the json serde turn fields in a row to a json format, csv serde turn it to csv format with their serdeproperties. But I wonder what the orc serde does when I choose to stored as orc file format. And why is there still escaper, separator in orc serdeproperties. Also with RC Parquet. I think they are just about how to stored and compressed with their input and output format respectively, but I don’t know what their serde does, can anyone give some hint?
Re: What does the ORC SERDE do
Thank you, it makes the concept clearer to me. I think I need to look up the source code for some details. > 在 2018年5月13日,下午10:42,Jörn Franke <jornfra...@gmail.com> 写道: > > In detail you can check the source code, but a Serde needs to translate an > object to a Hive object and vice versa. Usually this is very simple (simply > passing the object or create A HiveDecimal etc). It also provides an > ObjectInspector that basically describes an object in more detail (eg to be > processed by an UDF). For example, it can tell you precision and scale of an > objects. In case of ORC it describes also how a bunch of objects (vectorized) > can be mapped to hive objects and the other way around. Furthermore, it > provides statistics and provides means to deal with partitions as well as > table properties (!=input/outputformat properties). > Although it sounds complex, hive provides most of the functionality so > implementing a serde is most of the times easy. > >> On 13. May 2018, at 16:34, 侯宗田 <zongtian...@icloud.com> wrote: >> >> Hello,everyone >> I know the json serde turn fields in a row to a json format, csv serde turn >> it to csv format with their serdeproperties. But I wonder what the orc serde >> does when I choose to stored as orc file format. And why is there still >> escaper, separator in orc serdeproperties. Also with RC Parquet. I think >> they are just about how to stored and compressed with their input and output >> format respectively, but I don’t know what their serde does, can anyone give >> some hint?
Does one table only have one hdfs directory
Hello, Hive table is stored in the path of hive.metastore.warehouse.dir, is one table only have one path, or it might have multiple directories?
Re: How to extract data from hive effectively
Thank you, > 在 2018年4月28日,下午11:09,Johannes Alberti <johan...@altiscale.com> 写道: > > sqoop 1.x ...sqoop export -connect ... > > Sent from my iPhone > >> On Apr 28, 2018, at 4:29 AM, 侯宗田 <zongtian...@icloud.com> wrote: >> >> Does anyone know how to extract data from hive to other databases >> effectively, like hawq etc. >> >> 发自我的 iPhone
Hcatalog respond very slow
Hello,I am writing a application which needs the metastore in hive. I am preparing to use webhcat to get the information about tables and process them. But a simple request takes over eight seconds to respond on localhost. Why is this so slow, and how can I fix it? $ time curl -s 'http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean' {"columns": [{"name":"id","type":"int"}], "database":"default", "table":"haha"} real0m8.400s user0m0.053s sys 0m0.019s The webhcat.log is very short, it seems to run a hcat.py, I have looked up the log, and I can't figure out what's going on here. Here is the info when I simply run hcat. It seems take all the time. Thank you very much for your kind reply. $hcat.py -e "use default; desc haha; " SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 18/04/21 16:38:13 INFO conf.HiveConf: Found configuration file file:/usr/local/hive/conf/hive-site.xml 18/04/21 16:38:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory: /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668 18/04/21 16:38:16 INFO session.SessionState: Created local directory: /tmp/hive/java/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory: /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668/_tmp_space.db 18/04/21 16:38:16 INFO ql.Driver: Compiling command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62): use default 18/04/21 16:38:17 INFO metastore.HiveMetaStore: 0: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore 18/04/21 16:38:17 INFO metastore.ObjectStore: ObjectStore, initialize called 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored 18/04/21 16:38:18 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" 18/04/21 16:38:20 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL 18/04/21 16:38:20 INFO metastore.ObjectStore: Initialized ObjectStore 18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added admin role in metastore 18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added public role in metastore 18/04/21 16:38:20 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_all_functions 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda ip=unknown-ip-addr cmd=get_all_functions 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda ip=unknown-ip-addr cmd=get_database: default 18/04/21 16:38:20 INFO ql.Driver: Semantic Analysis Completed 18/04/21 16:38:20 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null) 18/04/21 16:38:20 INFO ql.Driver: Completed compiling command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62); Time taken: 3.936 seconds 18/04/21 16:38:20 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager 18/04/21 16:38:20 INFO ql.Driver: Executing command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62): use default 18/04/21 16:38:20 INFO sqlstd.SQLStdHiveAccessController: Created SQLStdHiveAccessController for session context : HiveAuthzSessionContext [sessionString=05096382-f9b6-4dae-aee2-dfa6750c0668, clientType=HIVECLI] 18/04/21 16:38:20 WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory. 18/04/21 16:38:20 INFO hive.metastore: Mestastore configuration hive.metastore.filter.hook changed from org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Cleaning up thread local RawStore... 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda ip=unknown-ip-addr cmd=Cleaning up thread local RawStore... 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Done cleaning up thread local
Hive can't be installed properly
Hi, I have git clone hive from the master branch and installed it by Maven, but always get the following error, does anyone what is going on here? [INFO] Running org.apache.hadoop.hive.metastore.conf.TestMetastoreConf [ERROR] Tests run: 22, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 0.356 s <<< FAILURE! - in org.apache.hadoop.hive.metastore.conf.TestMetastoreConf [ERROR] readHiveMetastoreSiteWithHiveHomeDir(org.apache.hadoop.hive.metastore.conf.TestMetastoreConf) Time elapsed: 0.036 s <<< FAILURE! java.lang.AssertionError at org.apache.hadoop.hive.metastore.conf.TestMetastoreConf.readHiveMetastoreSiteWithHiveHomeDir(TestMetastoreConf.java:222) [ERROR] readHiveSiteWithHiveHomeDir(org.apache.hadoop.hive.metastore.conf.TestMetastoreConf) Time elapsed: 0.008 s <<< FAILURE! java.lang.AssertionError at org.apache.hadoop.hive.metastore.conf.TestMetastoreConf.readHiveSiteWithHiveHomeDir(TestMetastoreConf.java:203) [ERROR] Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 9.63 s <<< FAILURE! - in org.apache.hadoop.hive.metastore.metrics.TestMetrics [ERROR] defaults(org.apache.hadoop.hive.metastore.metrics.TestMetrics) Time elapsed: 0.018 s <<< ERROR! java.lang.RuntimeException: Unknown metric type jmx at org.apache.hadoop.hive.metastore.metrics.TestMetrics.defaults(TestMetrics.java:119) [INFO] Running org.apache.hadoop.hive.metastore.TestAdminUser [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.163 s <<< FAILURE! - in org.apache.hadoop.hive.metastore.TestAdminUser [ERROR] testCreateAdminNAddUser(org.apache.hadoop.hive.metastore.TestAdminUser) Time elapsed: 1.161 s <<< ERROR! javax.jdo.JDOFatalInternalException: Error creating transactional connection factory at org.apache.hadoop.hive.metastore.TestAdminUser.testCreateAdminNAddUser(TestAdminUser.java:41) Caused by: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.metastore.TestAdminUser.testCreateAdminNAddUser(TestAdminUser.java:41) Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "BONECP" plugin to create a ConnectionPool gave an error : The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver. at org.apache.hadoop.hive.metastore.TestAdminUser.testCreateAdminNAddUser(TestAdminUser.java:41) Caused by: org.datanucleus.store.rdbms.connectionpool.DatastoreDriverNotFoundException: The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver. at org.apache.hadoop.hive.metastore.TestAdminUser.testCreateAdminNAddUser(TestAdminUser.java:41)
Re: error when beeline connecting to hiveserver2
Hi, Antal Thank you, I have followed some web guide and set hive2 transport mode to http, then the port number become 10001. I changed it back and set the proxy username in hdfs and it worked. Regards, Hou > 在 2018年4月25日,下午6:23,Antal Sinkovits <asinkov...@cloudera.com> 写道: > > Hi, > > First of all, I would check if the HiveServer2 is running, and listens on the > given port. > > E.g: > lsof -i -P |grep java > > You should see something like: > java 33169 asinkovits 349u IPv6 0x 0t0 TCP > *:1 (LISTEN) > java 33169 asinkovits 350u IPv6 0x 0t0 TCP > *:10002 (LISTEN) > > If it does, try > beeline -u jdbc:hive2://localhost:1 > > Regards, > Antal > > > On Tue, Apr 24, 2018 at 10:19 AM, 侯宗田 <zongtian...@icloud.com > <mailto:zongtian...@icloud.com>> wrote: > Thank you very much for your reply, I have changed the port number and set > the thrift.bind.host to localhost. But I still get the error, do you have > some ideas about this? > > beeline> !connect jdbc:hive2://localhost:1 <> anonymous anonymous > Connecting to jdbc:hive2://localhost:1 <> > /04/24 16:13:59 [main]: WARN jdbc.HiveConnection: Failed to connect to > localhost:1 > Could not open connection to the HS2 server. Please check the server URI and > if the URI is correct, then ask the administrator to check the server status. > Error: Could not open client transport with JDBC Uri: > jdbc:hive2://localhost:1: <> java.net.ConnectException: Connection > refused (Connection refused) (state=08S01,code=0) > >> 在 2018年4月24日,上午1:55,Johannes Alberti <johan...@altiscale.com >> <mailto:johan...@altiscale.com>> 写道: >> >> You should connect by default to 1, the webui port is not the port >> beeline connects with. Regards, Johannes >> >> Sent from my iPhone >> >> On Apr 23, 2018, at 6:38 AM, 侯宗田 <zongtian...@icloud.com >> <mailto:zongtian...@icloud.com>> wrote: >> >>> Hi, >>> >>> I have started hiveserver2 and try to connect it with beeline using the >>> following command: >>> >!connect jdbc:hive2://localhost:10002/default <> >>> >>> But get the following error >>> >>> WARN jdbc.HiveConnection: Failed to connect to localhost:10002 >>> Unknown HS2 problem when communicating with Thrift server. >>> Error: Could not open client transport with JDBC Uri: >>> jdbc:hive2://localhost:10002/default: <> Invalid status 72 >>> (state=08S01,code=0) >>> beeline> >>> >>> I have set the webUI port to 10002 and mode as http, do I still lost >>> something? >>> Does anyone know what is the problem and how to solve it? > >
error when beeline connecting to hiveserver2
Hi, I have started hiveserver2 and try to connect it with beeline using the following command: >!connect jdbc:hive2://localhost:10002/default But get the following error WARN jdbc.HiveConnection: Failed to connect to localhost:10002 Unknown HS2 problem when communicating with Thrift server. Error: Could not open client transport with JDBC Uri: jdbc:hive2://localhost:10002/default: Invalid status 72 (state=08S01,code=0) beeline> I have set the webUI port to 10002 and mode as http, do I still lost something? Does anyone know what is the problem and how to solve it?