date:20190211

[GitHub] sankarh closed pull request #529: HIVE-21206: Bootstrap replication is slow as it opens lot of metastore connections.

2019-02-11 Thread GitBox

sankarh closed pull request #529: HIVE-21206: Bootstrap replication is slow as 
it opens lot of metastore connections.
URL: https://github.com/apache/hive/pull/529
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Created] (HIVE-21250) NPE in HiveProtoLoggingHook for eventPerFile mode.

2019-02-11 Thread Harish Jaiprakash (JIRA)

Harish Jaiprakash created HIVE-21250:


 Summary: NPE in HiveProtoLoggingHook for eventPerFile mode.
 Key: HIVE-21250
 URL: https://issues.apache.org/jira/browse/HIVE-21250
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Harish Jaiprakash
Assignee: Harish Jaiprakash
 Attachments: HIVE-21250.01.patch

When eventPerFile is enabled, writer is set to null after the first event, it 
causes an NPE in the next path until handleTick comes back.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-21249) Reduce memory footprint in ObjectStore.refreshPrivileges

2019-02-11 Thread Daniel Dai (JIRA)

Daniel Dai created HIVE-21249:
-

 Summary: Reduce memory footprint in ObjectStore.refreshPrivileges  
 Key: HIVE-21249
 URL: https://issues.apache.org/jira/browse/HIVE-21249
 Project: Hive
  Issue Type: Bug
  Components: Standalone Metastore
Reporter: Daniel Dai
Assignee: Daniel Dai


We found there're could be many records in TBL_COL_PRIVS for a single table (a 
table granted to many users), thus result a OOM in 
ObjectStore.listTableAllColumnGrants. We shall reduce the memory footprint for 
ObjectStore.refreshPrivileges. Here is the stack of OOM:
{code}
org.datanucleus.api.jdo.JDOPersistenceManager.retrieveAll(JDOPersistenceManager.java:690)
org.datanucleus.api.jdo.JDOPersistenceManager.retrieveAll(JDOPersistenceManager.java:710)
org.apache.hadoop.hive.metastore.ObjectStore.listTableAllColumnGrants(ObjectStore.java:6629)
org.apache.hadoop.hive.metastore.ObjectStore.refreshPrivileges(ObjectStore.java:6200)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97)
com.sun.proxy.$Proxy32.refreshPrivileges(, line not available)
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.refresh_privileges(HiveMetaStore.java:6507)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
com.sun.proxy.$Proxy34.refresh_privileges(, line not available)
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$refresh_privileges.getResult(ThriftHiveMetastore.java:17608)
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$refresh_privileges.getResult(ThriftHiveMetastore.java:17592)
org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:636)
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:631)
java.security.AccessController.doPrivileged(Native method)
javax.security.auth.Subject.doAs(Subject.java:422)
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:631)
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-21248) WebHCat returns HTTP error code 500 rather than 429 when submitting large number of jobs in stress tests

2019-02-11 Thread Daniel Dai (JIRA)

Daniel Dai created HIVE-21248:
-

 Summary: WebHCat returns HTTP error code 500 rather than 429 when 
submitting large number of jobs in stress tests
 Key: HIVE-21248
 URL: https://issues.apache.org/jira/browse/HIVE-21248
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Reporter: Daniel Dai
Assignee: Daniel Dai


Saw the exception in webhcat.log:
{code}
java.lang.NoSuchMethodError: 
javax.ws.rs.core.Response$Status$Family.familyOf(I)Ljavax/ws/rs/core/Response$Status$Family;
at 
org.glassfish.jersey.message.internal.Statuses$StatusImpl.(Statuses.java:63)
 ~[jersey-common-2.25.1.jar:?]
at 
org.glassfish.jersey.message.internal.Statuses$StatusImpl.(Statuses.java:54)
 ~[jersey-common-2.25.1.jar:?]
at 
org.glassfish.jersey.message.internal.Statuses.from(Statuses.java:132) 
~[jersey-common-2.25.1.jar:?]
at 
org.glassfish.jersey.message.internal.OutboundJaxrsResponse$Builder.status(OutboundJaxrsResponse.java:414)
 ~[jersey-common-2.25.1.jar:?]
at javax.ws.rs.core.Response.status(Response.java:128) 
~[jsr311-api-1.1.1.jar:?]
at 
org.apache.hive.hcatalog.templeton.SimpleWebException.buildMessage(SimpleWebException.java:67)
 ~[hive-webhcat-3.1.0.3.0.2.0-50.jar:3.1.0.3.0.2.0-50]
at 
org.apache.hive.hcatalog.templeton.SimpleWebException.getResponse(SimpleWebException.java:51)
 ~[hive-webhcat-3.1.0.3.0.2.0-50.jar:3.1.0.3.0.2.0-50]
at 
org.apache.hive.hcatalog.templeton.SimpleExceptionMapper.toResponse(SimpleExceptionMapper.java:33)
 ~[hive-webhcat-3.1.0.3.0.2.0-50.jar:3.1.0.3.0.2.0-50]
at 
org.apache.hive.hcatalog.templeton.SimpleExceptionMapper.toResponse(SimpleExceptionMapper.java:29)
 ~[hive-webhcat-3.1.0.3.0.2.0-50.jar:3.1.0.3.0.2.0-50]
at 
com.sun.jersey.spi.container.ContainerResponse.mapException(ContainerResponse.java:480)
 ~[jersey-server-1.19.jar:1.19]
at 
com.sun.jersey.spi.container.ContainerResponse.mapMappableContainerException(ContainerResponse.java:417)
 ~[jersey-server-1.19.jar:1.19]
at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1477)
 ~[jersey-server-1.19.jar:1.19]
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419)
 ~[jersey-server-1.19.jar:1.19]
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409)
 ~[jersey-server-1.19.jar:1.19]
at 
com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)
 ~[jersey-servlet-1.19.jar:1.19]
at 
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)
 ~[jersey-servlet-1.19.jar:1.19]
at 
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733)
 ~[jersey-servlet-1.19.jar:1.19]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) 
~[javax.servlet-api-3.1.0.jar:3.1.0]
at 
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) 
~[jetty-runner-9.3.20.v20170531.jar:9.3.20.v20170531]
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
 ~[jetty-runner-9.3.20.v20170531.jar:9.3.20.v20170531]
at 
org.apache.hive.hcatalog.templeton.Main$XFrameOptionsFilter.doFilter(Main.java:299)
 ~[hive-webhcat-3.1.0.3.0.2.0-50.jar:3.1.0.3.0.2.0-50]
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
 ~[jetty-runner-9.3.20.v20170531.jar:9.3.20.v20170531]
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644)
 ~[hadoop-auth-3.1.1.3.0.2.0-50.jar:?]
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592)
 ~[hadoop-auth-3.1.1.3.0.2.0-50.jar:?]
at org.apache.hadoop.hdfs.web.AuthFilter.doFilter(AuthFilter.java:90) 
~[hadoop-hdfs-3.1.1.3.0.2.0-50.jar:?]
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
 ~[jetty-runner-9.3.20.v20170531.jar:9.3.20.v20170531]
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) 
[jetty-runner-9.3.20.v20170531.jar:9.3.20.v20170531]
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
 [jetty-runner-9.3.20.v20170531.jar:9.3.20.v20170531]
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) 
[jetty-runner-9.3.20.v20170531.jar:9.3.20.v20170531]
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
 [jetty-runner-9.3.20.v20170531.jar:9.3.20.v20170531]
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)

[jira] [Created] (HIVE-21247) Webhcat beeline in secure mode

2019-02-11 Thread Daniel Dai (JIRA)

Daniel Dai created HIVE-21247:
-

 Summary: Webhcat beeline in secure mode
 Key: HIVE-21247
 URL: https://issues.apache.org/jira/browse/HIVE-21247
 Project: Hive
  Issue Type: Improvement
  Components: WebHCat
Reporter: Daniel Dai
Assignee: Daniel Dai


Follow up HIVE-20550, we need to make beeline work in secure mode. That means, 
we need to get a delegation token from hiveserver2, and pass that to beeline. 
This is similar to HIVE-5133, I make two changes:
1. Make a jdbc connection to hs2, pull delegation token from HiveConnection, 
and pass along
2. In hive jdbc driver, check for token file in HADOOP_TOKEN_FILE_LOCATION, and 
extract delegation token if exists



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-21246) Un-bury DelimitedJSONSerDe from PlanUtils.java

2019-02-11 Thread BELUGA BEHR (JIRA)

BELUGA BEHR created HIVE-21246:
--

 Summary: Un-bury DelimitedJSONSerDe from PlanUtils.java
 Key: HIVE-21246
 URL: https://issues.apache.org/jira/browse/HIVE-21246
 Project: Hive
  Issue Type: Improvement
Reporter: BELUGA BEHR
 Attachments: HIVE-21246.1.patch

Ultimately, I'd like to get rid of 
{{org.apache.hadoop.hive.serde2.DelimitedJSONSerDe}}, but for now, trying to 
make it easier to get rid of later.  It's currently buried in 
{{PlanUtils.java}}.

A SerDe and a flag gets passed into utilities.  If the class boolean is set, 
the passed-in SerDe is overwritten.  This is not documented anywhere and it's 
weird to do it, just pass in the SerDe to use instead of the SerDe you don't 
want to use and a flag to change it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Setting up a Standalone Hive Metastore

2019-02-11 Thread Joo Wan Ro

Hello Hive Experts,

I am a software engineer at Microsoft, and I am having trouble trying to run a
standalone Hive metastore service on my Windows 10 machine. Your assistance
would be greatly appreciated.

[0] Github project here (running branch "3.1"):
link
[1] Related documentation here:
link
[2] Path to setup scripts: ($HOME\hive\standalone-metastore\src\main\scripts)

I was able to build it successfully using Maven but having trouble running the
jar. Here are my questions:

1. I believe this standalone metastore project is relatively new, as it was
introduced in Hive 3.0. When was the stable version of the standalone
metastore released?

1. While the documentation link [1] above helps with the config setup, is
there a resource for step-by-step guidance on how to bootstrap the standalone
metastore service on a Windows 10 or Linux machine? Without it, I've been
trying to reverse engineer on how to get it running.

* For starters, it seems that the scripts under the directory above [2]
are not Windows 10 friendly because of carriage returns. And having to using
Cygwin confused me on which path convention to use for the environment/system
variables (E:\src\hive\... vs. /cygdrive/e/src/hive/). I removed the carriage
returns and used the /cygdrive/ convention to get it working partially.

* I had no clue which environment/system variables I needed and if there
were any dependencies, which I assumed to be none because the related
documentation [1] above notes the independent nature of the standalone
metastore project. However, by studying the scripts (base, start-metastore,
and metastore.sh) under the path above [2], I found two things:

i. The need
to define METASTORE_HOME
($HOME\hive\standalone-metastore\target\apache-hive-metastore-3.1.1-bin\apache-hive-metastore-3.1.1-bin)
and METASTORE_CONF_DIR environment variables

ii. The need
to install Hadoop as a dependency because the metastore.sh script uses it to
start the metastore service; hence, installing it and then defining the
HADOOP_HOME ($HOME\hadoop-3.1.1) environment variable (I also had to remove the
carriage returns under $HOME\hadoop-3.1.1)

iii. I have no
other environment variables or dependencies other than the ones aforementioned

* At this point, the metastore service began to start running; however,
I ran into an exception "Failed to get schema version" - more information here:
link.
I believe this is because the default derby database was not initialized.

* So, using the schematool script under my
apache-hive-metastore-3.1.1-bin directory, I ran schematool --dbType derby
-initSchema. Then I ran into an exception "Unknown version specified for
initialization: 3.1.0" - the exception is thrown here
link.
It cannot find the derby schema script, but I confirmed that it is there
($HOME\hive\standalone-metastore\target\apache-hive-metastore-3.1.1-bin\apache-hive-metastore-3.1.1-bin\scripts\metastore\upgrade\derby\hive-schema-3.1.0.derby.sql).
This led me to believe that there was a conflict again with file path
conventions between Windows "\" and Linux "/" and I have faced a dead end.

For the time being, I am redirecting my efforts on setting up a Linux machine
to see if I would have a smoother experience, but any help for my
concerns/issues above would be greatly appreciated.

Thank you!

Joo Wan

[jira] [Created] (HIVE-21245) Make hive.fetch.output.serde Default to LazySimpleSerde

2019-02-11 Thread BELUGA BEHR (JIRA)

BELUGA BEHR created HIVE-21245:
--

 Summary: Make hive.fetch.output.serde Default to LazySimpleSerde
 Key: HIVE-21245
 URL: https://issues.apache.org/jira/browse/HIVE-21245
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 4.0.0, 3.2.0
Reporter: BELUGA BEHR
 Fix For: 4.0.0


For all intents and purposes, it already is:

{code:java|title=HiveSessionImpl.java}
  private static final String FETCH_WORK_SERDE_CLASS =
  "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe";

  @Override
  public HiveConf getHiveConf() {
sessionConf.setVar(HiveConf.ConfVars.HIVEFETCHOUTPUTSERDE, 
FETCH_WORK_SERDE_CLASS);
return sessionConf;
  }
{code}

https://github.com/apache/hive/blob/master/service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java#L489-L492

Ultimately, I'd like to get rid of 
{{org.apache.hadoop.hive.serde2.DelimitedJSONSerDe}} altogether.  It's a weird 
thing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-21244) NPE in Hive Proto Logger

2019-02-11 Thread Prasanth Jayachandran (JIRA)

Prasanth Jayachandran created HIVE-21244:


 Summary: NPE in Hive Proto Logger
 Key: HIVE-21244
 URL: https://issues.apache.org/jira/browse/HIVE-21244
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran


[https://github.com/apache/hive/blob/4ddc9de90b6de032d77709c9631ab787cef225d5/ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java#L308]
 can cause NPE. There is no uncaught exception handler for this thread. This 
NPE can silently fail and drop the event.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

RE: Setting up a Standalone Hive Metastore

2019-02-11 Thread Joo Wan Ro

[Resending because I needed to add myself to the mailing lists]

Hello Hive Experts,

I am a software engineer at Microsoft, and I am having trouble trying to run a
standalone Hive metastore service on my Windows 10 machine. Your assistance
would be greatly appreciated.

[0] Github project here (running branch "3.1"):
link
[1] Related documentation here:
link
[2] Path to setup scripts: ($HOME\hive\standalone-metastore\src\main\scripts)

I was able to build it successfully using Maven but having trouble running the
jar. Here are my questions:

1. I believe this standalone metastore project is relatively new, as it was
introduced in Hive 3.0. When was the stable version of the standalone
metastore released?

i. The need
to define METASTORE_HOME
($HOME\hive\standalone-metastore\target\apache-hive-metastore-3.1.1-bin\apache-hive-metastore-3.1.1-bin)
and METASTORE_CONF_DIR environment variables

iii. I have no
other environment variables or dependencies other than the ones aforementioned

For the time being, I am redirecting my efforts

[jira] [Created] (HIVE-21243) Create Default DB in default.db

2019-02-11 Thread BELUGA BEHR (JIRA)

BELUGA BEHR created HIVE-21243:
--

 Summary: Create Default DB in default.db
 Key: HIVE-21243
 URL: https://issues.apache.org/jira/browse/HIVE-21243
 Project: Hive
  Issue Type: Improvement
Reporter: BELUGA BEHR


When a database is created in Hive, it is stored in 
{{/user/hive/warehouse/[MyDatabase.db]/}}  It is very confusing that the Hive 
default database is not located in {{/user/hive/warehouse/[default.db]/}}. 
Please address this and make it consistent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-21242) Calcite Planner Logging Indicates UTF-16 Encoding

2019-02-11 Thread BELUGA BEHR (JIRA)

BELUGA BEHR created HIVE-21242:
--

 Summary: Calcite Planner Logging Indicates UTF-16 Encoding
 Key: HIVE-21242
 URL: https://issues.apache.org/jira/browse/HIVE-21242
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Affects Versions: 4.0.0, 3.2.0
Reporter: BELUGA BEHR


I noticed some debug logging from calcite and it is using UTF-16.   I would 
expect UTF-8.

{code}
2019-02-10T19:08:06,393 DEBUG [7db4d3c5-0f88-49db-88fa-ad6428c23784 main] 
parse.CalcitePlanner: Plan after decorrelation:
HiveSortLimit(offset=[0], fetch=[2])
  HiveProject(_o__c0=[array(3, 2, 1)], _o__c1=[map(1, 2001-01-01, 2, null)], 
_o__c2=[named_struct(_UTF-16LE'c1', 123456, _UTF-16LE'c2', _UTF-16LE'hello', 
_UTF-16LE'c3', array(_UTF-16LE'aa', _UTF-16LE'bb', _UTF-16LE'cc'), 
_UTF-16LE'c4', map(_UTF-16LE'abc', 123, _UTF-16LE'xyz', 456), _UTF-16LE'c5', 
named_struct(_UTF-16LE'c5_1', _UTF-16LE'bye', _UTF-16LE'c5_2', 88))])
HiveTableScan(table=[[default, src]], table:alias=[src])
{code}

I'm not sure if this is a calcite internal thing which can be configured or if 
this only an artifact of the way the logging works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] BELUGABEHR opened a new pull request #531: HIVE-21241: Migrate TimeStamp Parser From Joda Time

2019-02-11 Thread GitBox

BELUGABEHR opened a new pull request #531: HIVE-21241: Migrate TimeStamp Parser 
From Joda Time
URL: https://github.com/apache/hive/pull/531
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Created] (HIVE-21241) Migrate TimeStamp Parser From Joda Time

2019-02-11 Thread BELUGA BEHR (JIRA)

BELUGA BEHR created HIVE-21241:
--

 Summary: Migrate TimeStamp Parser From Joda Time
 Key: HIVE-21241
 URL: https://issues.apache.org/jira/browse/HIVE-21241
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 3.2.0
Reporter: BELUGA BEHR
 Fix For: 4.0.0


Hive uses Joda time for its TimeStampParser.

{quote}
Joda-Time is the de facto standard date and time library for Java prior to Java 
SE 8. Users are now asked to migrate to java.time (JSR-310).

https://www.joda.org/joda-time/
{quote}

Migrate TimeStampParser to {{java.time}}

I also added a couple new pre-canned timestamp parsers for convenience:

* ISO 8601
* RFC 1123



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] BELUGABEHR opened a new pull request #530: HIVE-21240: JSON SerDe Deserialize Re-Write

2019-02-11 Thread GitBox

BELUGABEHR opened a new pull request #530: HIVE-21240: JSON SerDe Deserialize 
Re-Write
URL: https://github.com/apache/hive/pull/530
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Created] (HIVE-21240) JSON SerDe Re-Write

2019-02-11 Thread BELUGA BEHR (JIRA)

BELUGA BEHR created HIVE-21240:
--

 Summary: JSON SerDe Re-Write
 Key: HIVE-21240
 URL: https://issues.apache.org/jira/browse/HIVE-21240
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 3.1.1, 4.0.0
Reporter: BELUGA BEHR
 Fix For: 4.0.0


The JSON SerDe has a few issues, I will link them to this JIRA.

* Use Jackson Tree parser instead of manually parsing
* Added support for base-64 encoded data (the expected format when using JSON)
* Added support to skip blank lines (returns all columns as null values)
* Current JSON parser accepts, but does not apply, custom timestamp formats in 
most cases
* Added some unit tests
* Added cache for column-name to column-index searches, currently O\(n\) for 
each row processed, for each column in the row



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] cricket007 commented on a change in pull request #526: HIVE-21218: KafkaSerDe doesn't support topics created via Confluent

2019-02-11 Thread GitBox

cricket007 commented on a change in pull request #526: HIVE-21218: KafkaSerDe 
doesn't support topics created via Confluent
URL: https://github.com/apache/hive/pull/526#discussion_r28896
 
 

 ##
 File path: kafka-handler/pom.xml
 ##
 @@ -114,8 +114,21 @@
   1.7.25
   test
 
+
+  io.confluent
+  kafka-streams-avro-serde
 
 Review comment:
   You're not using Kafka Streams or kafka's serde classes 
   
   Maybe you want `kafka-avro-serializer`? Or the schema registry client? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] cricket007 commented on a change in pull request #526: HIVE-21218: KafkaSerDe doesn't support topics created via Confluent

2019-02-11 Thread GitBox

cricket007 commented on a change in pull request #526: HIVE-21218: KafkaSerDe 
doesn't support topics created via Confluent
URL: https://github.com/apache/hive/pull/526#discussion_r28896
 
 

 ##
 File path: kafka-handler/pom.xml
 ##
 @@ -114,8 +114,21 @@
   1.7.25
   test
 
+
+  io.confluent
+  kafka-streams-avro-serde
 
 Review comment:
   You're not using Kafka Streams or the serde.
   
   Maybe you want `kafka-avro-serializer`? Or the schema registry client? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] cricket007 commented on a change in pull request #526: HIVE-21218: KafkaSerDe doesn't support topics created via Confluent

2019-02-11 Thread GitBox

cricket007 commented on a change in pull request #526: HIVE-21218: KafkaSerDe 
doesn't support topics created via Confluent
URL: https://github.com/apache/hive/pull/526#discussion_r28896
 
 

 ##
 File path: kafka-handler/pom.xml
 ##
 @@ -114,8 +114,21 @@
   1.7.25
   test
 
+
+  io.confluent
+  kafka-streams-avro-serde
 
 Review comment:
   You're not using Kafka Streams or the serde.
   
   Maybe you want `kafka-avro-serializer`? Or the schema registry client like 
you had before? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] cricket007 commented on a change in pull request #526: HIVE-21218: KafkaSerDe doesn't support topics created via Confluent

2019-02-11 Thread GitBox

cricket007 commented on a change in pull request #526: HIVE-21218: KafkaSerDe 
doesn't support topics created via Confluent
URL: https://github.com/apache/hive/pull/526#discussion_r29248
 
 

 ##
 File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java
 ##
 @@ -133,12 +134,24 @@
   Preconditions.checkArgument(!schemaFromProperty.isEmpty(), "Avro Schema 
is empty Can not go further");
   Schema schema = AvroSerdeUtils.getSchemaFor(schemaFromProperty);
   LOG.debug("Building Avro Reader with schema {}", schemaFromProperty);
-  bytesConverter = new AvroBytesConverter(schema);
+  bytesConverter = getByteConverterForAvroDelegate(schema, tbl);
 } else {
   bytesConverter = new BytesWritableConverter();
 }
   }
 
+  BytesConverter getByteConverterForAvroDelegate(Schema schema, Properties 
tbl) {
+String avroByteConverterType = 
tbl.getProperty(AvroSerdeUtils.AvroTableProperties.AVRO_SERDE_TYPE
+ .getPropName(), 
"none");
+int avroSkipBytes = 
Integer.getInteger(tbl.getProperty(AvroSerdeUtils.AvroTableProperties.AVRO_SERDE_SKIP_BYTES
+ .getPropName(), "5"));
+switch ( avroByteConverterType ) {
+  case "confluent" : return new AvroSkipBytesConverter(schema, 5);
+  case "skip" : return new AvroSkipBytesConverter(schema, avroSkipBytes);
+  default : return new AvroBytesConverter(schema);
 
 Review comment:
   Would it be better if this were an enum rather than a string comparison? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] cricket007 commented on a change in pull request #526: HIVE-21218: KafkaSerDe doesn't support topics created via Confluent

2019-02-11 Thread GitBox

cricket007 commented on a change in pull request #526: HIVE-21218: KafkaSerDe 
doesn't support topics created via Confluent
URL: https://github.com/apache/hive/pull/526#discussion_r28285
 
 

 ##
 File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java
 ##
 @@ -131,20 +131,27 @@
   bytesConverter = new TextBytesConverter();
 } else if (delegateSerDe.getSerializedClass() == 
AvroGenericRecordWritable.class) {
   String schemaFromProperty = 
tbl.getProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(),
 "");
-  String magicBitProperty = 
tbl.getProperty(AvroSerdeUtils.AvroTableProperties.AVRO_SERDE_MAGIC_BYTES
-
-.getPropName(), 
Boolean.FALSE.toString());
   Preconditions.checkArgument(!schemaFromProperty.isEmpty(), "Avro Schema 
is empty Can not go further");
   Schema schema = AvroSerdeUtils.getSchemaFor(schemaFromProperty);
   LOG.debug("Building Avro Reader with schema {}", schemaFromProperty);
-  bytesConverter =
-  (Boolean.valueOf(magicBitProperty)) ?
-  new ConfluentAvroBytesConverter(schema) : new 
AvroBytesConverter(schema);
+  bytesConverter = getByteConverterForAvroDelegate(schema, tbl);
 } else {
   bytesConverter = new BytesWritableConverter();
 }
   }
 
+  BytesConverter getByteConverterForAvroDelegate(Schema schema, Properties 
tbl) {
+String avroByteConverterType = 
tbl.getProperty(AvroSerdeUtils.AvroTableProperties.AVRO_SERDE_TYPE
+ .getPropName(), 
"none");
+int avroSkipBytes = 
Integer.getInteger(tbl.getProperty(AvroSerdeUtils.AvroTableProperties.AVRO_SERDE_SKIP_BYTES
+ .getPropName(), "5"));
 
 Review comment:
   I'm not sure entirely sure this should have a default value 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Created] (HIVE-21239) Beeline help LDAP connection example incorrect

2019-02-11 Thread Zsolt Herczeg (JIRA)

Zsolt Herczeg created HIVE-21239:


 Summary: Beeline help LDAP connection example incorrect
 Key: HIVE-21239
 URL: https://issues.apache.org/jira/browse/HIVE-21239
 Project: Hive
  Issue Type: Bug
 Environment: This was tested on a test environment with SSL and LDAP 
authentication enabled, and seems to be reproducible on any environment with 
LDAP authentication available in HiveServer2.
Reporter: Zsolt Herczeg


There's the following connection example string in the beeline -h command 
output:

 
{code:java}
5. Connect using LDAP authentication
$ beeline -u jdbc:hive2://hs2.local:10013/default  

{code}
When a user attempts to connect like above, it'll fail with LDAP authentication 
failure. This is because username and passwords are not picked up in the shown 
form. A working example would be:
{code:java}
$ beeline -n  -p  -u 
jdbc:hive2://hs2.local:10013/default
{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] sankarh opened a new pull request #529: HIVE-21206: Bootstrap replication is slow as it opens lot of metastore connections.

2019-02-11 Thread GitBox

sankarh opened a new pull request #529: HIVE-21206: Bootstrap replication is 
slow as it opens lot of metastore connections.
URL: https://github.com/apache/hive/pull/529
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Re: Please unsubscribe me

2019-02-11 Thread Lefty Leverenz

Daniel, to unsubscribe please send a message to
dev-unsubscr...@hive.apache.org as described here:  Mailing Lists
.

Make sure you're sending the message from the email address that should be
unsubscribed, since it's an automated system.

Thanks.

-- Lefty

On Wed, Jan 30, 2019 at 6:41 AM Daniel Takacs 
wrote:

>
> I tried to unsubscribe several times but still it is not working.  Please
> help me unsubscribe
> Sent from my iPhone

[GitHub] xbaran commented on a change in pull request #526: HIVE-21218: KafkaSerDe doesn't support topics created via Confluent

2019-02-11 Thread GitBox

xbaran commented on a change in pull request #526: HIVE-21218: KafkaSerDe 
doesn't support topics created via Confluent
URL: https://github.com/apache/hive/pull/526#discussion_r255412146
 
 

 ##
 File path: kafka-handler/README.md
 ##
 @@ -25,6 +25,10 @@ If you want to switch serializer/deserializer classes you 
can use alter table.
 ```sql
 ALTER TABLE kafka_table SET TBLPROPERTIES 
("kafka.serde.class"="org.apache.hadoop.hive.serde2.avro.AvroSerDe");
 ``` 
+
+If you use Confluent Avro serialzier/deserializer with schema registry you may 
want to remove 4 bytes from beginning that represents schema ID from registry. 
+It can be done with settings `"avro.serde.magic.bytes"="true"`. It's 
recommended to set 
`"avro.schema.url"="http://schemaregistry/SimpleDocument.avsc"`
 
 Review comment:
   I've added 2 properties `avro.serde.type` and `avro.serde.skip.bytes` and if 
type is set to confluent it skip first 5 bytes. These 5 bytes can be set to 
different value with avro.serde.skip.bytes property.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] xbaran commented on a change in pull request #526: HIVE-21218: KafkaSerDe doesn't support topics created via Confluent

2019-02-11 Thread GitBox

xbaran commented on a change in pull request #526: HIVE-21218: KafkaSerDe 
doesn't support topics created via Confluent
URL: https://github.com/apache/hive/pull/526#discussion_r255411250
 
 

 ##
 File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java
 ##
 @@ -369,6 +379,20 @@ private SubStructObjectInspector(StructObjectInspector 
baseOI, int toIndex) {
 }
   }
 
+  static class ConfluentAvroBytesConverter extends AvroBytesConverter {
+ConfluentAvroBytesConverter(Schema schema) {
+  super(schema);
+}
+
+@Override
+Decoder getDecoder(byte[] value) {
+  /**
+   * Confluent 4 magic bytes that represents Schema ID as Integer. These 
bits are added before value bytes.
+   */
+  return DecoderFactory.get().binaryDecoder(value, 5, value.length - 5, 
null);
 
 Review comment:
   Well, I think schema registry is out of scope. It should be address in 
another PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] xbaran commented on a change in pull request #526: HIVE-21218: KafkaSerDe doesn't support topics created via Confluent

2019-02-11 Thread GitBox

xbaran commented on a change in pull request #526: HIVE-21218: KafkaSerDe 
doesn't support topics created via Confluent
URL: https://github.com/apache/hive/pull/526#discussion_r255410938
 
 

 ##
 File path: kafka-handler/pom.xml
 ##
 @@ -114,8 +114,21 @@
   1.7.25
   test
 
+
+  io.confluent
+  kafka-streams-avro-serde
+  4.1.0
 
 Review comment:
   Changed to 5.1.0


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] xbaran commented on a change in pull request #526: HIVE-21218: KafkaSerDe doesn't support topics created via Confluent

2019-02-11 Thread GitBox

xbaran commented on a change in pull request #526: HIVE-21218: KafkaSerDe 
doesn't support topics created via Confluent
URL: https://github.com/apache/hive/pull/526#discussion_r255410838
 
 

 ##
 File path: kafka-handler/pom.xml
 ##
 @@ -180,5 +197,28 @@
 
   
 
+
+  
+
+  org.apache.avro
+  avro-maven-plugin
+  1.8.2
 
 Review comment:
   Changed to latest version.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] xbaran commented on a change in pull request #526: HIVE-21218: KafkaSerDe doesn't support topics created via Confluent

2019-02-11 Thread GitBox

xbaran commented on a change in pull request #526: HIVE-21218: KafkaSerDe 
doesn't support topics created via Confluent
URL: https://github.com/apache/hive/pull/526#discussion_r255410708
 
 

 ##
 File path: 
kafka-handler/src/test/org/apache/hadoop/hive/kafka/AvroBytesConverterTest.java
 ##
 @@ -0,0 +1,70 @@
+package org.apache.hadoop.hive.kafka;
+
+import com.google.common.collect.Maps;
+import io.confluent.kafka.schemaregistry.client.MockSchemaRegistryClient;
+import io.confluent.kafka.serializers.KafkaAvroSerializer;
+import org.apache.avro.Schema;
+import org.apache.hadoop.hive.serde2.avro.AvroGenericRecordWritable;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import java.util.Map;
+
+/**
+ * Created by Milan Baran on 1/29/19 15:03.
 
 Review comment:
   Removed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] xbaran commented on a change in pull request #526: HIVE-21218: KafkaSerDe doesn't support topics created via Confluent

2019-02-11 Thread GitBox

xbaran commented on a change in pull request #526: HIVE-21218: KafkaSerDe 
doesn't support topics created via Confluent
URL: https://github.com/apache/hive/pull/526#discussion_r255410759
 
 

 ##
 File path: 
serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java
 ##
 @@ -88,6 +89,7 @@ public String getPropName(){
   @Deprecated public static final String SCHEMA_NAME = "avro.schema.name";
   @Deprecated public static final String SCHEMA_DOC = "avro.schema.doc";
   @Deprecated public static final String AVRO_SERDE_SCHEMA = 
AvroTableProperties.AVRO_SERDE_SCHEMA.getPropName();
+  @Deprecated public static final String AVRO_SERDE_MAGIC_BIT = 
AvroTableProperties.AVRO_SERDE_MAGIC_BYTES.getPropName();
 
 Review comment:
   Removed


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] sankarh closed pull request #529: HIVE-21206: Bootstrap replication is slow as it opens lot of metastore connections.

[jira] [Created] (HIVE-21250) NPE in HiveProtoLoggingHook for eventPerFile mode.

[jira] [Created] (HIVE-21249) Reduce memory footprint in ObjectStore.refreshPrivileges

[jira] [Created] (HIVE-21248) WebHCat returns HTTP error code 500 rather than 429 when submitting large number of jobs in stress tests

[jira] [Created] (HIVE-21247) Webhcat beeline in secure mode

[jira] [Created] (HIVE-21246) Un-bury DelimitedJSONSerDe from PlanUtils.java

Setting up a Standalone Hive Metastore

[jira] [Created] (HIVE-21245) Make hive.fetch.output.serde Default to LazySimpleSerde

[jira] [Created] (HIVE-21244) NPE in Hive Proto Logger

RE: Setting up a Standalone Hive Metastore

[jira] [Created] (HIVE-21243) Create Default DB in default.db

[jira] [Created] (HIVE-21242) Calcite Planner Logging Indicates UTF-16 Encoding

[GitHub] BELUGABEHR opened a new pull request #531: HIVE-21241: Migrate TimeStamp Parser From Joda Time

[jira] [Created] (HIVE-21241) Migrate TimeStamp Parser From Joda Time

[GitHub] BELUGABEHR opened a new pull request #530: HIVE-21240: JSON SerDe Deserialize Re-Write

[jira] [Created] (HIVE-21240) JSON SerDe Re-Write

[GitHub] cricket007 commented on a change in pull request #526: HIVE-21218: KafkaSerDe doesn't support topics created via Confluent

[GitHub] cricket007 commented on a change in pull request #526: HIVE-21218: KafkaSerDe doesn't support topics created via Confluent

[GitHub] cricket007 commented on a change in pull request #526: HIVE-21218: KafkaSerDe doesn't support topics created via Confluent

[GitHub] cricket007 commented on a change in pull request #526: HIVE-21218: KafkaSerDe doesn't support topics created via Confluent

[GitHub] cricket007 commented on a change in pull request #526: HIVE-21218: KafkaSerDe doesn't support topics created via Confluent

[jira] [Created] (HIVE-21239) Beeline help LDAP connection example incorrect

[GitHub] sankarh opened a new pull request #529: HIVE-21206: Bootstrap replication is slow as it opens lot of metastore connections.

Re: Please unsubscribe me

[GitHub] xbaran commented on a change in pull request #526: HIVE-21218: KafkaSerDe doesn't support topics created via Confluent

[GitHub] xbaran commented on a change in pull request #526: HIVE-21218: KafkaSerDe doesn't support topics created via Confluent

[GitHub] xbaran commented on a change in pull request #526: HIVE-21218: KafkaSerDe doesn't support topics created via Confluent

[GitHub] xbaran commented on a change in pull request #526: HIVE-21218: KafkaSerDe doesn't support topics created via Confluent

[GitHub] xbaran commented on a change in pull request #526: HIVE-21218: KafkaSerDe doesn't support topics created via Confluent

[GitHub] xbaran commented on a change in pull request #526: HIVE-21218: KafkaSerDe doesn't support topics created via Confluent

30 matches

Site Navigation

Mail list logo

Footer information