Re: [VOTE] Apache Hive 2.3.9 Release Candidate 0

2021-06-04 Thread Xuefu Zhang
+1. Thank Chao for doing this. I performed the following actions:

1. Downloaded the release candidate artifacts and verified the signatures
and checksums.
2. Built from the source.
3. Initialized schema using schematool and launched hive metastore service
on top of it.
4. Used Hive client to connect to the above hive metastore service.
5. created some databases and tables and loaded data into tables.
6. Run some simple queries.

Thanks,
Xuefu

On Tue, Jun 1, 2021 at 6:02 PM Chao Sun  wrote:

> Apache Hive 2.3.9 Release Candidate 0 is available here:
> https://people.apache.org/~sunchao/apache-hive-2.3.9-rc-0/
> Maven artifacts are available here:
> https://repository.apache.org/content/repositories/orgapachehive-1106/
> The tag release-2.3.9-rc0 has been applied to the source for this
> release in github, you can see it at
> https://github.com/apache/hive/tree/release-2.3.9-rc0
> Voting will conclude in 72 hours (or whenever I scrounge together enough
> votes).
>
> Hive PMC Members: Please test and vote.
>
> Thanks.
>


Re: [VOTE] Apache Hive 2.3.8 Release Candidate 3

2021-01-10 Thread Xuefu Zhang
+1. I performed the following:

1. Downloaded bin and src tarball and verified signature and checksum
2. Freshly initiated a mysql based metastore and ran Hive metastore service
on top of it.
3. Created a simple table and queried it with Hive CLI.
4. Didn't test beeline, however.

Thanks,
Xuefu

On Thu, Jan 7, 2021 at 11:25 PM Chao Sun  wrote:

> Apache Hive 2.3.8 Release Candidate 3 is available here:
> https://people.apache.org/~sunchao/apache-hive-2.3.8-rc-3
> Maven artifacts are available here:
> https://repository.apache.org/content/repositories/orgapachehive-1105
> The tag release-2.3.8-rc3 has been applied to the source for this
> release in github, you can see it at
> https://github.com/apache/hive/tree/release-2.3.8-rc3
> Voting will conclude in 72 hours (or whenever I scrounge together enough
> votes).
>
> Hive PMC Members: Please test and vote.
>
> Thanks.
>


[jira] [Created] (HIVE-24280) Fix a potential NPE

2020-10-15 Thread Xuefu Zhang (Jira)
Xuefu Zhang created HIVE-24280:
--

 Summary: Fix a potential NPE
 Key: HIVE-24280
 URL: https://issues.apache.org/jira/browse/HIVE-24280
 Project: Hive
  Issue Type: Improvement
  Components: Vectorization
Affects Versions: 3.1.2
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang


{code:java}
case STRING:
case CHAR:
case VARCHAR: {
  BytesColumnVector bcv = (BytesColumnVector) cols[colIndex];
  String sVal = value.toString();
  if (sVal == null) {
bcv.noNulls = false;
bcv.isNull[0] = true;
bcv.isRepeating = true;
  } else {
bcv.fill(sVal.getBytes());
  }
}
break;
{code}
The above code snippet seems assuming that sVal can be null, but didn't handle 
the case where value is null. However, if value is not null, it's unlikely that 
value.toString() returns null.

We treat partition column value for default partition of string types as null, 
not as "__HIVE_DEFAULT_PARTITION__", which Hive assumes. Thus, we actually hit 
the problem that sVal is null.

I propose a harmless fix, as shown in the attached patch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Time to Remove Hive-on-Spark

2020-07-23 Thread Xuefu Zhang
Previous reasoning seemed to suggest a lack of user adoption. Now we are
concerned about ongoing maintenance effort. Both are valid considerations.
However, I think we should have ways to find out the answers. Therefore, I
suggest the following be carried out:

1. Send out the proposal (removing Hive on Spark) to users including
u...@hive.apache.org and get their feedback.
2. Ask if any developers on this mailing list are willing to take on the
maintenance effort.

I'm concerned about user impact because I can still see issues being
reported on HoS from time to time. I'm more concerned about the future of
Hive if we narrow Hive neutrality on execution engines, which will possibly
force more Hive users to migrate to other alternatives such as Spark SQL,
which is already eroding Hive's user base.

Being open and neutral used to be Hive's most admired strengths.

Thanks,
Xuefu


On Wed, Jul 22, 2020 at 8:46 AM Alan Gates  wrote:

> An important point here is I don't believe David is proposing to remove
> Hive on Spark from the 2 or 3 lines, but only from trunk.  Continuing to
> support it in existing 2 and 3 lines makes sense, but since no one has
> maintained it on trunk for some time and it does not work with many of the
> newer features it should be removed from trunk.
>
> Alan.
>
> On Tue, Jul 21, 2020 at 4:10 PM Chao Sun  wrote:
>
> > Thanks David. FWIW Uber is still running Hive on Spark (2.3.4) on a very
> > large scale in production right now and I don't think we have any plan to
> > change it soon.
> >
> >
> >
> > On Tue, Jul 21, 2020 at 11:28 AM David  wrote:
> >
> > > Hello,
> > >
> > > Thanks for the feedback.
> > >
> > > Just a quick recap: I did propose this @dev and I received unanimous
> +1's
> > > from the community.  After a couple months, I created the PR.
> > >
> > > Certainly open to discussion, but there hasn't been any discussion thus
> > far
> > > because there have been no objections until this point.
> > >
> > > HoS has low adoption, heavy technical debt, and the manner in which its
> > > build process is setup is impeding some other work that is not even
> > related
> > > to HoS.
> > >
> > > We can deprecate in Hive 3.x and remove in Hive 4.x.  The plan would be
> > to
> > > use Tez moving forward.
> > >
> > > My point about the vendor's move to Tez is that HoS adoption is very
> low,
> > > it's only going lower, and while I don't know the specifics of it,
> there
> > > must be some migration plan in place there (i.e., it must be possible
> to
> > do
> > > it already).
> > >
> > > Thanks,
> > > David
> > >
> > > On Tue, Jul 21, 2020 at 12:23 PM Xuefu Zhang  wrote:
> > >
> > > > Hi David,
> > > >
> > > > While a vendor may not support a component in an open source project,
> > > > removing it or not is a decision by and for the community. I
> certainly
> > > > understand that the vendor you mentioned has contributed a great deal
> > > > (including my personal effort while working there), it's not up to
> the
> > > > vendor to make a call like what is proposed here.
> > > >
> > > > As a community, we should have gone through a thorough discussion and
> > > > reached a consensus before actually making such a big change, in my
> > > > opinion.
> > > >
> > > > Thanks,
> > > > Xuefu
> > > >
> > > > On Tue, Jul 21, 2020 at 8:49 AM David  wrote:
> > > >
> > > > > Hey,
> > > > >
> > > > > Thanks for the input.
> > > > >
> > > > > FYI. Cloudera (Cloudera + Hortonworks) have removed HoS from their
> > > latest
> > > > > offering.
> > > > >
> > > > > "Tez is now the only supported execution engine, existing queries
> > that
> > > > > change execution mode to Spark or MapReduce within a session, for
> > > > example,
> > > > > fail."
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.cloudera.com/cdp/latest/upgrade-post/topics/ug_hive_configuration_changes.html
> > > > >
> > > > >
> > > > > So I don't know who will be supporting this feature moving forward,
> > but
> > > > > there has been a lot of work done to make this change as painless
> as
> > > > > possible.  Simply set the engin

HIVE-19853 and Timestamp in Arrow

2020-07-21 Thread Xuefu Zhang
Hi all,

Recently I noticed that HIve serializes timestamp, which has no time zone,
in Arrow as microsec in UTC. This seems a little counterintuitive on the
surface. Thus, I'd like to understand it a bit more. From the JIRA [1], it
seems that the choice was made for Spark because Spark does that. However,
I wonder what we do if another system expects timestamp values in Arrow
having no time zone. I asked this because I exactly hit this issue.

Any insights would be appreciated.

Thanks,
Xuefu


Re: Time to Remove Hive-on-Spark

2020-07-21 Thread Xuefu Zhang
Hi David,

While a vendor may not support a component in an open source project,
removing it or not is a decision by and for the community. I certainly
understand that the vendor you mentioned has contributed a great deal
(including my personal effort while working there), it's not up to the
vendor to make a call like what is proposed here.

As a community, we should have gone through a thorough discussion and
reached a consensus before actually making such a big change, in my opinion.

Thanks,
Xuefu

On Tue, Jul 21, 2020 at 8:49 AM David  wrote:

> Hey,
>
> Thanks for the input.
>
> FYI. Cloudera (Cloudera + Hortonworks) have removed HoS from their latest
> offering.
>
> "Tez is now the only supported execution engine, existing queries that
> change execution mode to Spark or MapReduce within a session, for example,
> fail."
>
>
> https://docs.cloudera.com/cdp/latest/upgrade-post/topics/ug_hive_configuration_changes.html
>
>
> So I don't know who will be supporting this feature moving forward, but
> there has been a lot of work done to make this change as painless as
> possible.  Simply set the engine to 'tez' and remove the HoS-related
> settings should address many use cases.
>
> Thanks.
>
> On Tue, Jul 21, 2020 at 11:36 AM Xuefu Z  wrote:
>
> > Sorry for chiming in late. However, I don't think we should remove Hive
> on
> > Spark just because of a technical problem. This is rather a big decision
> > that we need to be careful about. There are users that will be left high
> > and dry by this move.
> >
> > If the community decides to desupport and eventually remove it, I think
> we
> > need to have a due process. We also need a deprecation plan if that's we
> > decide to do. Before that, I'm -1 on this proposal.
> >
> > Thanks,
> > Xuefu
> >
> > On Tue, Jul 21, 2020 at 7:57 AM David  wrote:
> >
> > > Hello Team,
> > >
> > > https://github.com/apache/hive/pull/1285
> > >
> > > Thanks.
> > >
> > > On Wed, Jun 3, 2020 at 11:49 PM Gopal V  wrote:
> > >
> > > >
> > > > +1
> > > >
> > > > Cheers,
> > > > Gopal
> > > >
> > > > On 6/3/20 7:48 PM, Jesus Camacho Rodriguez wrote:
> > > > > +1
> > > > >
> > > > > -Jesús
> > > > >
> > > > > On Wed, Jun 3, 2020 at 1:58 PM Alan Gates 
> > > wrote:
> > > > >
> > > > >> +1.
> > > > >>
> > > > >> Alan.
> > > > >>
> > > > >> On Wed, Jun 3, 2020 at 1:40 PM Prasanth Jayachandran
> > > > >>  wrote:
> > > > >>
> > > > >>> +1
> > > > >>>
> > > > >>>> On Jun 3, 2020, at 1:38 PM, Ashutosh Chauhan <
> > hashut...@apache.org>
> > > > >>> wrote:
> > > > >>>>
> > > > >>>> +1
> > > > >>>>
> > > > >>>> On Wed, Jun 3, 2020 at 1:23 PM David Mollitor <
> dam6...@gmail.com>
> > > > >> wrote:
> > > > >>>>
> > > > >>>>> Hello Gang,
> > > > >>>>>
> > > > >>>>> I have spent some time working on upgrading Avro (far less than
> > > > >> others):
> > > > >>>>>
> > > > >>>>> https://issues.apache.org/jira/browse/HIVE-21737
> > > > >>>>>
> > > > >>>>> This should be a relatively easy thing to do, but is blocked by
> > > > >>>>> Hive-on-Spark.  HoS has a weird thing where it downloads some
> > > > >>>>> cloud-storage-hosted file of Spark-Hadoop as part of its maven
> > run.
> > > > >>>>>
> > > > >>>>> Since HoS is not going to receive updates from the major
> vendors,
> > > is
> > > > >> it
> > > > >>>>> time to simply remove it?
> > > > >>>>>
> > > > >>>>> Tests are currently disabled:
> > > > >>>>> https://issues.apache.org/jira/browse/HIVE-23137
> > > > >>>>>
> > > > >>>>> Thanks.
> > > > >>>>>
> > > > >>>
> > > > >>>
> > > > >>
> > > > >
> > > >
> > >
> >
> >
> > --
> > Xuefu Zhang
> >
> > "In Honey We Trust!"
> >
>


Re: [ANNOUNCE] New PMC Member : Vihang Karajgaonkar

2018-07-30 Thread Xuefu Zhang
Congratulations, Vihang!

On Mon, Jul 30, 2018 at 10:53 AM, Andrew Sherman <
asher...@cloudera.com.invalid> wrote:

> Congratulations Vihang!
>
>
>
> On Mon, Jul 30, 2018 at 12:44 AM Peter Vary 
> wrote:
>
> > Congratulations Vihang!
> >
> > > On Jul 29, 2018, at 22:32, Vineet Garg  wrote:
> > >
> > > Congratulations Vihang!
> > >
> > >> On Jul 26, 2018, at 11:27 AM, Ashutosh Chauhan 
> > wrote:
> > >>
> > >> On behalf of the Hive PMC I am delighted to announce Vihang
> > Karajgaonkar
> > >> is joining Hive PMC.
> > >> Thanks Vihang for all your contributions till now. Looking forward to
> > many
> > >> more.
> > >>
> > >> Welcome, Vihang!
> > >>
> > >> Thanks,
> > >> Ashutosh
> > >
> >
> >
>


Re: [ANNOUNCE] New PMC Member : Sahil Takiar

2018-07-30 Thread Xuefu Zhang
Congrats, Sahil!

On Mon, Jul 30, 2018 at 10:53 AM, Andrew Sherman <
asher...@cloudera.com.invalid> wrote:

> Congratulations Sahil!
>
> On Mon, Jul 30, 2018 at 7:29 AM Sahil Takiar 
> wrote:
>
> > Thanks!
> >
> > On Mon, Jul 30, 2018 at 2:44 AM, Peter Vary 
> > wrote:
> >
> > > Congratulations Sahil!
> > >
> > > > On Jul 29, 2018, at 22:32, Vineet Garg 
> wrote:
> > > >
> > > > Congratulations Sahil!
> > > >
> > > >> On Jul 26, 2018, at 11:28 AM, Ashutosh Chauhan <
> hashut...@apache.org>
> > > wrote:
> > > >>
> > > >> On behalf of the Hive PMC I am delighted to announce Sahil Takiar is
> > > >> joining Hive PMC.
> > > >> Thanks Sahil for all your contributions till now. Looking forward to
> > > many
> > > >> more.
> > > >>
> > > >> Welcome, Sahil!
> > > >>
> > > >> Thanks,
> > > >> Ashutosh
> > > >
> > >
> > >
> >
> >
> > --
> > Sahil Takiar
> > Software Engineer
> > takiar.sa...@gmail.com | (510) 673-0309
> >
>


Re: [ANNOUNCE] New PMC Member : Peter Vary

2018-07-30 Thread Xuefu Zhang
Congratulations, Peter!

On Mon, Jul 30, 2018 at 12:11 PM, Jesus Camacho Rodriguez <
jcamachorodrig...@hortonworks.com> wrote:

> Congrats Peter!
>
> On 7/30/18, 10:53 AM, "Andrew Sherman" 
> wrote:
>
> Congratulations Peter!
>
> On Sun, Jul 29, 2018 at 1:32 PM Vineet Garg 
> wrote:
>
> > Congratulations Peter!
> >
> > > On Jul 26, 2018, at 11:25 AM, Ashutosh Chauhan <
> hashut...@apache.org>
> > wrote:
> > >
> > > On behalf of the Hive PMC I am delighted to announce Peter Vary is
> > joining
> > > Hive PMC.
> > > Thanks Peter for all your contributions till now. Looking forward
> to many
> > > more.
> > >
> > > Welcome, Peter!
> > >
> > > Thanks,
> > > Ashutosh
> >
> >
>
>
>


Re: [ANNOUNCE] New committer: Slim Bouguerra

2018-07-30 Thread Xuefu Zhang
congratulations!!!

On Mon, Jul 30, 2018 at 12:10 PM, Jesus Camacho Rodriguez <
jcamachorodrig...@hortonworks.com> wrote:

> Congrats Slim!
>
> On 7/30/18, 10:53 AM, "Andrew Sherman" 
> wrote:
>
> Congratulations Slim!
>
> On Mon, Jul 30, 2018 at 12:46 AM Peter Vary  >
> wrote:
>
> > Congratulations Slim!
> >
> > > On Jul 30, 2018, at 02:00, Ashutosh Chauhan 
> > wrote:
> > >
> > > Apache Hive's Project Management Committee (PMC) has invited Slim
> > Bouguerra
> > > to become a committer, and we are pleased to announce that he has
> > accepted.
> > >
> > > Slim, welcome, thank you for your contributions, and we look
> forward your
> > > further interactions with the community!
> > >
> > > Ashutosh Chauhan (on behalf of the Apache Hive PMC)
> >
> >
>
>
>


Re: [ANNOUNCE] New PMC Member : Vineet Garg

2018-07-30 Thread Xuefu Zhang
Congratulations!

On Mon, Jul 30, 2018 at 12:10 PM, Jesus Camacho Rodriguez <
jcamachorodrig...@hortonworks.com> wrote:

> Congrats Vineet!
>
> On 7/30/18, 10:53 AM, "Andrew Sherman" 
> wrote:
>
> Congratulations Vineet!
>
> On Mon, Jul 30, 2018 at 12:52 AM Deepak Jaiswal <
> djais...@hortonworks.com>
> wrote:
>
> > Congratulations Vineet!
> >
> > On 7/30/18, 12:45 AM, "Peter Vary" 
> wrote:
> >
> > Congratulations Vineet!
> >
> > > On Jul 30, 2018, at 01:59, Ashutosh Chauhan <
> hashut...@apache.org>
> > wrote:
> > >
> > > On behalf of the Hive PMC I am delighted to announce Vineet
> Garg is
> > joining
> > > Hive PMC.
> > > Thanks Vineet for all your contributions till now. Looking
> forward
> > to many
> > > more.
> > >
> > > Welcome, Vineet!
> > >
> > > Thanks,
> > > Ashutosh
> >
> >
> >
> >
>
>
>


Re: Integrating Yetus with Precommit job

2017-11-07 Thread Xuefu Zhang
+1 on the ideas.

On Tue, Nov 7, 2017 at 6:17 AM, Adam Szita  wrote:

> Thanks for all the replies.
>
> Vihang: Good idea on making everything green before turning this on. For
> this purpose I've filed a couple of jiras:
> -HIVE-17995  Run
> checkstyle on standalone-metastore module with proper configuration
> -HIVE-17996  Fix ASF
> headers
> -HIVE-17997  Add rat
> plugin and configuration to standalone metastore pom
>
> Sahil: there is an umbrella jira (HIVE-13503
> ) for test improvements,
> the Yetus integration itself is also a subtask of it. I think any further
> improvements on what Yetus features we want to enable should go here too.
>
> Adam
>
>
>
> On 6 November 2017 at 22:02, Sahil Takiar  wrote:
>
> > +1 - think this will be a great addition to Hive. Helping us catch issues
> > earlier, keeping the Hive code cleaner, etc. Getting the basic Yetus
> checks
> > to work seems like a great start, do we have follow JIRAs to get more
> YETUS
> > tests integrated - e.g. FindBugs?
> >
> > On Mon, Nov 6, 2017 at 10:29 AM, Vihang Karajgaonkar <
> vih...@cloudera.com>
> > wrote:
> >
> > > Thanks Adam for this work. This is definitely useful and a good
> addition
> > to
> > > our test infrastructure.
> > >
> > > Can we fix the existing issues pointed by Yetus (in a separate JIRA) so
> > > that we have a +1 from yetus on the current code? Once that is done,
> > > committers can help keep it green as they review patches and merge it.
> > >
> > > Thanks,
> > > Vihang
> > >
> > > On Mon, Nov 6, 2017 at 9:04 AM, Thejas Nair 
> > wrote:
> > >
> > > > +1
> > > > Yes, I think this can help us catch many issues early on, it will be
> > very
> > > > useful!
> > > >
> > > >
> > > > On Mon, Nov 6, 2017 at 7:43 AM, Adam Szita 
> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > As a next step of test subsystem improvements we would like to have
> > the
> > > > > Yetus check integrated with the ptest framework. This means that
> > > > whenever a
> > > > > new patch is uploaded - along with the already existing Precommit
> > test
> > > > run
> > > > > - Hive's Yetus patch check script would be triggered also. This
> > script
> > > > runs
> > > > > checkstyle, findbugs, ASF license check, etc with and without the
> > > > submitted
> > > > > patch applied and reports the diffs (i.e. how many checkstyle
> > problems
> > > > does
> > > > > the patch introduce).
> > > > >
> > > > > It would be executed parallel to the ptest test execution and
> report
> > > back
> > > > > the results as a (another) jira comment to the issue in question.
> > > > > In the last days I've been working on this (HIVE-16748) and a patch
> > is
> > > > > ready to make this happen. A sample Yetus result comment is
> available
> > > at
> > > > > https://issues.apache.org/jira/browse/HIVE-16748?
> > > > > focusedCommentId=16218616=com.atlassian.jira.
> > > > > plugin.system.issuetabpanels:comment-tabpanel#comment-16218616
> > > > >
> > > > > We think this would be a useful tool for us developers and would
> like
> > > to
> > > > go
> > > > > ahead with this change, but we're also curious about your input in
> > this
> > > > > matter. Please let us know what you think about this change.
> > > > >
> > > > > Thanks,
> > > > > Adam
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Sahil Takiar
> > Software Engineer at Cloudera
> > takiar.sa...@gmail.com | (510) 673-0309
> >
>


Re: does anyone care about list bucketing stored as directories?

2017-10-08 Thread Xuefu Zhang
Lack a response doesn't necessarily means "don't care". Maybe you can have
a good description of the problem and proposed solution. Frankly I cannot
make much sense out of the previous email.

Thanks,
Xuefu

On Fri, Oct 6, 2017 at 5:05 PM, Sergey Shelukhin 
wrote:

> Looks like nobody does… I’ll file a ticket to remove it shortly.
>
> From: Sergey Shelukhin >
> Date: Tuesday, October 3, 2017 at 12:59
> To: "u...@hive.apache.org" <
> u...@hive.apache.org>, "dev@hive.apache.org
> "  v...@hive.apache.org>>
> Subject: does anyone care about list bucketing stored as directories?
>
> 1) There seem to be some bugs and limitations in LB (e.g. incorrect
> cleanup - https://issues.apache.org/jira/browse/HIVE-14886) and nobody
> appears to as much as watch JIRAs ;) Does anyone actually use this stuff?
> Should we nuke it in 3.0, and by 3.0 I mean I’ll remove it from master in a
> few weeks? :)
>
> 2) I actually wonder, on top of the same SQL syntax, wouldn’t it be much
> easier to add logic to partitioning to write skew values into partitions
> and non-skew values into a new type of default partition? It won’t affect
> nearly as many low level codepaths in obscure and unobvious ways, instead
> keeping all the logic in metastore and split generation, and would
> integrate with Hive features like PPD automatically.
> Esp. if we are ok with the same limitations - e.g. if you add a new skew
> value right now, I’m not sure what happens to the rows with that value
> already sitting in the non-skew directories, but I don’t expect anything
> reasonable...
>
>


[jira] [Created] (HIVE-17586) Make HS2 BackgroundOperationPool not fixed

2017-09-22 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-17586:
--

 Summary: Make HS2 BackgroundOperationPool not fixed
 Key: HIVE-17586
 URL: https://issues.apache.org/jira/browse/HIVE-17586
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 1.1.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang


Currently the threadpool for background asynchronous operatons has a fixed size 
controled by {{hive.server2.async.exec.threads}}. However, the thread factory 
supplied for this threadpool is {{ThreadFactoryWithGarbageCleanup}} which 
creates ThreadWithGarbageCleanup. Since this is a fixed threadpool, the thread 
is actually never killed, defecting the purpose of garbage cleanup as noted in 
the thread class name. On the other hand, since these threads never go away, 
significant resources such as threadlocal variables (classloaders, hiveconfs, 
etc) are holding up even if there is no operation running. This can lead to 
escalated HS2 memory usage.

Ideally, the threadpool should not be fixed, allowing thread to die out so 
resources can be reclaimed. The existing config 
{{hive.server2.async.exec.threads}} is treated as the max, and we can add a min 
for the threadpool {{hive.server2.async.exec.min.threads}}. Default value for 
this configure is -1, which keeps the existing behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17548) ThriftCliService reports inaccurate the number of current sessions in the log message

2017-09-17 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-17548:
--

 Summary: ThriftCliService reports inaccurate the number of current 
sessions in the log message
 Key: HIVE-17548
 URL: https://issues.apache.org/jira/browse/HIVE-17548
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 1.1.0
Reporter: Xuefu Zhang


Currently ThriftCliService uses an atomic integer to keep track of the number 
of currently open sessions. It reports it through the following two log 
messages:
{code}
2017-09-18 04:14:31,722 INFO [HiveServer2-Handler-Pool: Thread-729979]: 
org.apache.hive.service.cli.thrift.ThriftCLIService: Opened a session: 
SessionHandle [99ec30d7-5c44-4a45-a8d6-0f0e7ecf4879], current sessions: 345
2017-09-18 04:14:41,926 INFO [HiveServer2-Handler-Pool: Thread-717542]: 
org.apache.hive.service.cli.thrift.ThriftCLIService: Closed session: 
SessionHandle [f38f7890-cba4-459c-872e-4c261b897e00], current sessions: 344
{code}
This assumes that all sessions are closed or opened thru Thrift API. This 
assumption isn't correct because sessions may be closed by the server such as 
in case of timeout. Therefore, such log messages tends to over-report the 
number of open sessions.

In order to accurately report the number of outstanding sessions, session 
manager should be consulted instead.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17507) Support Mesos for Hive on Spark

2017-09-11 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-17507:
--

 Summary: Support Mesos for Hive on Spark
 Key: HIVE-17507
 URL: https://issues.apache.org/jira/browse/HIVE-17507
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Xuefu Zhang


>From the comment in HIVE-7292:
{quote}
I see the following case: I use Mesos DC/OS and Spark on Mesos. Because it's 
very convenient. But if I want to use Hive on Spark in Mesos DC/OS, I need 
special framework Apache Myriad to run YARN on Mesos. It's very cluttering 
because I run one Resource Manager on another Resource Manager, and it creates 
a lot of redundant abstraction levels.
And there are questions about that on the Internet (e.g. 
http://grokbase.com/t/hive/user/15997dye2q/hive-on-spark-on-mesos)
Can we create the new sub-task for this feature?
{quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Review Request 61663: WebUI query plan graphs

2017-08-29 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61663/#review184121
---



Patch looks good. I didn't read the js files, but hope it's okay. a few minor 
comments.


ql/src/java/org/apache/hadoop/hive/ql/QueryDisplay.java
Lines 127 (patched)
<https://reviews.apache.org/r/61663/#comment260180>

What happens if this is a map-only task?



ql/src/java/org/apache/hadoop/hive/ql/QueryDisplay.java
Lines 132 (patched)
<https://reviews.apache.org/r/61663/#comment260181>

It might be better if the null check is put in getCountersJson() method.



service/src/resources/hive-webapps/static/css/query-plan-graph.css
Lines 1 (patched)
<https://reviews.apache.org/r/61663/#comment260183>

Apache license header if possible.



service/src/resources/hive-webapps/static/js/query-plan-graph.js
Lines 1 (patched)
<https://reviews.apache.org/r/61663/#comment260182>

I think we need apache license header.



service/src/resources/hive-webapps/static/js/vis.min.js
Lines 1 (patched)
<https://reviews.apache.org/r/61663/#comment260184>

    Apache license header.


- Xuefu Zhang


On Aug. 16, 2017, 1:55 p.m., Karen Coppage wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61663/
> ---
> 
> (Updated Aug. 16, 2017, 1:55 p.m.)
> 
> 
> Review request for hive, Peter Vary and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> I’m working on a feature of the Hive WebUI Query Plan tab that would provide 
> the option to display the query plan as a nice graph (scroll down for 
> screenshots). If you click on one of the graph’s stages, the plan for that 
> stage appears as text below.
> Stages are color-coded if they have a status (Success, Error, Running), and 
> the rest are grayed out. Coloring is based on status already available in the 
> WebUI, under the Stages tab.
> There is an additional option to display stats for MapReduce tasks. This 
> includes the job’s ID, tracking URL (where the logs are found), and mapper 
> and reducer numbers/progress, among other info.
> The library I’m using for the graph is called vis.js (http://visjs.org/). It 
> has an Apache license, and the only necessary file to be included from this 
> library is about 700 KB.
> I tried to keep server-side changes minimal, and graph generation is taken 
> care of by the client. Plans with more than a given number of stages 
> (default: 25) won't be displayed in order to preserve resources.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/LogUtils.java 0a3e0c7201 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 3c158a6692 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 4e7c80f184 
>   ql/src/java/org/apache/hadoop/hive/ql/MapRedStats.java 4b6051485e 
>   ql/src/java/org/apache/hadoop/hive/ql/QueryDisplay.java bf6cb91745 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java 
> 3c0719717c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 1bd4db7805 
>   service/src/jamon/org/apache/hive/tmpl/QueryProfileTmpl.jamon ff7476ee02 
>   service/src/resources/hive-webapps/static/css/query-plan-graph.css 
> PRE-CREATION 
>   service/src/resources/hive-webapps/static/js/query-plan-graph.js 
> PRE-CREATION 
>   service/src/resources/hive-webapps/static/js/vis.min.js PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/61663/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Karen Coppage
> 
>



[jira] [Created] (HIVE-17401) Hive session idle timeout doesn't function properly

2017-08-28 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-17401:
--

 Summary: Hive session idle timeout doesn't function properly
 Key: HIVE-17401
 URL: https://issues.apache.org/jira/browse/HIVE-17401
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang


It's apparent in our production environment that HS2 leaks sessions, which at 
least contributed to memory leaks in HS2. We further found that idle HS2 
sessions rarely get timed out and the number of live session keeps increasing 
as time goes on. Eventually, HS2 becomes irresponsive and demands a restart.

Investigation shows that session idle timeout doesn't work appropriately.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Aug. 2017 Hive User Group Meeting

2017-08-21 Thread Xuefu Zhang
Dear Hive users and developers,

As reminder, the next Hive User Group Meeting will occur this Thursday,
Aug. 24. The agenda is available on the event page (
https://www.meetup.com/Hive-User-Group-Meeting/events/242210487/).

See you all there!

Thanks,
Xuefu

On Tue, Aug 1, 2017 at 7:18 PM, Xuefu Zhang <xu...@apache.org> wrote:

> Hi all,
>
> It's an honor to announce that Hive community is launching a Hive user
> group meeting in the bay area this month. The details can be found at
> https://www.meetup.com/Hive-User-Group-Meeting/events/242210487/.
>
> We are inviting talk proposals from Hive users as well as developers at
> this time. We currently have 5 openings.
>
> Please let me know if you have any questions or suggestions.
>
> Thanks,
> Xuefu
>
>


Re: [DISCUSSION] WebUI query plan graphs

2017-08-10 Thread Xuefu Zhang
Hi Karen,

Thanks for reaching out. While your message doesn't seem showing any
images, I think the feature would be a great addition to Hive. (Hive
community always welcomes contributions like this.)

Please feel free to create an JIRA for easier discussion and tracking.

Thanks again for your interest.

--Xuefu

On Thu, Aug 10, 2017 at 6:25 AM, Karen Coppage 
wrote:

> Hi all,
>
> I’m working on a feature of the Hive WebUI Query Plan tab that would
> provide the option to display the query plan as a nice graph (scroll down
> for screenshots). If you click on one of the graph’s stages, the plan for
> that stage appears as text below.
>
> Stages are color-coded if they have a status (Success, Error, Running), and
> the rest are grayed out. Coloring is based on status already available in
> the WebUI, under the Stages tab.
>
> There is an additional option to display stats for MapReduce tasks. This
> includes the job’s ID, tracking URL (where the logs are found), and mapper
> and reducer numbers/progress, among other info.
>
> The library I’m using for the graph is called vis.js (http://visjs.org/).
> It has an Apache license, and the only necessary file to be included from
> this library is about 700 KB.
>
> I tried to keep server-side changes minimal, and graph generation is taken
> care of by the client. Plans with more than a given number of stages
> (default: 25) won't be displayed in order to preserve resources.
>
> I’d love to hear any and all input from the community about this feature:
> do you think it’s useful, and is there anything important I’m missing?
>
> Thanks,
>
> Karen Coppage
>
> *
>
> A completely successful query:
>
> [image: Inline image 1]
>
>
> A MapReduce task selected, with MapReduce stats view on:
>
> [image: Inline image 2]
>
>
> Full MapReduce stats, lacking some information because the query was run in
> local mode:
>
> [image: Inline image 3]
>
>
> A non-MapReduce stage selected:
>
> [image: Inline image 4]
>
>
> Last stage running:
>
> [image: Inline image 5]
>
>
> Last stage returns error:
>
> [image: Inline image 6]
>


Aug. 2017 Hive User Group Meeting

2017-08-01 Thread Xuefu Zhang
Hi all,

It's an honor to announce that Hive community is launching a Hive user
group meeting in the bay area this month. The details can be found at
https://www.meetup.com/Hive-User-Group-Meeting/events/242210487/.

We are inviting talk proposals from Hive users as well as developers at
this time. We currently have 5 openings.

Please let me know if you have any questions or suggestions.

Thanks,
Xuefu


Re: [Announce] New committer: Peter Vary

2017-07-07 Thread Xuefu Zhang
Congratulations!

On Fri, Jul 7, 2017 at 4:17 AM, Adam Szita  wrote:

> Congrats all!
>
> On 7 July 2017 at 10:03, Zoltan Haindrich 
> wrote:
>
> > Congratulations Peter, Teddy, Deepesh, Vihang and Sahil!
> > It's great to see that the Hive community is growing!
> >
> > On 6 Jul 2017 02:52, Ashutosh Chauhan  wrote:
> > The Project Management Committee (PMC) for Apache Hive has invited Peter
> > Vary to become a committer and we are pleased to announce that he has
> > accepted.
> >
> > Welcome, Peter!
> >
> > Thanks,
> > Ashutosh
> >
> >
>


Re: [ANNOUNCE] New PMC Member : Matt McCline

2017-07-07 Thread Xuefu Zhang
Congratulations!

On Fri, Jul 7, 2017 at 8:27 AM, Eugene Koifman 
wrote:

> Congratulations!
>
> On 7/7/17, 1:04 AM, "Zoltan Haindrich"  wrote:
>
> Congrats Matt!
>
> On 7 Jul 2017 09:46, Peter Vary  wrote:
> Congratulations Matt! :)
>
> 2017. júl. 7. 0:34 ezt írta ("Jesus Camacho Rodriguez" <
> jcama...@apache.org
> >):
>
> > Congrats Matt!
> >
> > -Jesús
> >
> >
> >
> > On 7/6/17, 11:13 PM, "Lefty Leverenz" 
> wrote:
> >
> > >Congratulations Matt!  Well deserved.
> > >
> > >-- Lefty
> > >
> > >On Thu, Jul 6, 2017 at 11:31 AM, Ashutosh Chauhan <
> hashut...@apache.org>
> > >wrote:
> > >
> > >> On behalf of the Hive PMC I am delighted to announce Matt McCline
> is
> > >> joining Hive PMC.
> > >> Matt is a long time contributor in Hive and is focusing on
> vectorization
> > >> these days.
> > >>
> > >> Welcome, Matt!
> > >>
> > >> Thanks,
> > >> Ashutosh
> > >>
> >
> >
>
>
>
>


Re: [Announce] New committer: Teddy Choi

2017-07-07 Thread Xuefu Zhang
Congratulations!

On Fri, Jul 7, 2017 at 12:09 AM, Matthew McCline 
wrote:

> Congratulations Teddy!
>
> Get Outlook for iOS
>
>
>
> On Wed, Jul 5, 2017 at 5:53 PM -0700, "Ashutosh Chauhan" <
> hashut...@apache.org> wrote:
>
>
> The Project Management Committee (PMC) for Apache Hive has invited Teddy
> Choi to become a committer and we are pleased to announce that he has
> accepted.
>
> Welcome, Teddy!
>
> Thanks,
> Ashutosh
>
>


Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-05 Thread Xuefu Zhang
I think Edward's concern is valid. While I voiced my support for this
proposal, which was more from the benefits of the whole Hadoop ecosystem, I
don't see the equal benefits for Hive. Instead, it may even create more
overhead for Hive. I'd really like to take time to see what are the road
blocks for other projects to use HMS as it is. The issue of Spark including
a Hive fork, which was brought up some time back, is certainly not one of
them.

Thanks,
Xuefu

On Wed, Jul 5, 2017 at 12:33 PM, Edward Capriolo 
wrote:

> On Wed, Jul 5, 2017 at 1:51 PM, Alan Gates  wrote:
>
> > On Mon, Jul 3, 2017 at 6:20 AM, Edward Capriolo 
> > wrote:
> >
> > >
> > > We already have things in the meta-store not directly tied to language
> > > features. For example hive metastore has a "retention" property which
> is
> > > not actively in use by anything. In reality, we rarely say 'no' or -1
> to
> > > much. Which in part is why I believe our release process is grinding
> > > slower: we have so many things in flight I do not feel that any one
> > person
> > > can keep track. You are working on porting the metastore to hbase.
> > > https://issues.apache.org/jira/browse/HIVE-9452 did you get a -1 or
> 'No'
> > > along the way? When I first noticed this I pointed out that someone has
> > > already ported the metastore to Cassandra
> > > https://github.com/riptano/brisk/blob/master/src/java/
> > > src/org/apache/cassandra/hadoop/hive/metastore/SchemaManager
> > Service.java,
> > > but I was more exciting/rational for this multi-year approach using
> hbase
> > > so I let everyone 'have at it'.
> > >
> > Your example and mine are not equivalent.  The HBase metastore is still a
> > Hive feature, even if some thought it not worth while.  That is different
> > than people bringing features that will never interest Hive or that Hive
> > could never use (e.g. Dain’s desire for the metastore to support Presto
> > style views).
> >
> > I forgot to mention the issue these would be non-Hive contributors have
> > with releases if they contribute their features to the metastore while
> it’s
> > inside Hive.  Is Hive going to do a release just to push out features in
> > the metastore that it doesn’t care about?
> >
> > You seem to be asserting that doing this doesn’t really help non-Hive
> based
> > systems that are using or would like to use the metastore.  But it is
> > interesting that people from three of those systems have commented in the
> > thread so far, and all are positive (Dmitrias from Impala, Dain from
> > Presto, and Sriharsha from the schema registry project).
> >
> >
> > > I am going to give a hypothetical but real world situation. Suppose I
> > want
> > > to add the statement "CREATE permanent macro xyz", this feature I
> believe
> > > would cross cut calcite, hive, and hive metastore. To build this
> feature
> > I
> > > would need to orchestrate the change across 3 separate groups of hive
> > > 'subcommittees' for lack of a better word. 3 git repos, 3 Jira's 3
> > > releases. That is not counting if we run into some bug or misfeature
> > (maybe
> > > with Tez or something else) so that brings in 4-5 releases of upstream
> to
> > > add a feature to hive. This does not take into account normal processes
> > > mess ups. For example say you get the metastore done, but now the
> people
> > > doing the calcite/antlr suggest the feature have different syntax
> because
> > > they did not read the 3-4 linked tickets when the process started? Now,
> > you
> > > have to loop back around the process. Finding 1 person in 1 project to
> > > usher along the feature you want is difficult, having to find and clear
> > > time with 3 people across three projects is going to be a difficult
> along
> > > with then 'pushing' them all to kick out a release so you can finally
> use
> > > said feature.
> > >
> >
> > I partially agree with you.  On the reviews, JIRAs, etc. I don’t think it
> > adds much, if any, overhead.  Hive is a big project and no one person
> knows
> > all the code anymore.  If you wanted to add a permanent macros feature
> you
> > would need reviews from someone who knows the parser (probably
> Pengcheng),
> > people who know the optimizer (Jesus, Ashutosh, …), and someone who knows
> > the metastore (me, Thejas, …).  And any large feature is going to be
> > implemented over multiple JIRAs, all of which are linkable regardless of
> > whether the JIRAs start with METASTORE- or HIVE-.   I also don’t think it
> > makes the feature disagreement any worse.  If the optimizer team
> absolutely
> > insists it has to have some feature and the metastore team insists that
> it
> > can’t have that feature you’re going to have to work through the issue
> > whether they all are in Hive or in two separate projects.
> >
> > Where I agree the split adds cost is releases.  Before your macro feature
> > could go live you need releases from each of the components.  And while
> in

Re: [DISCUSS] Separating out the metastore as its own TLP

2017-06-30 Thread Xuefu Zhang
+1, sounds like a good idea!

On Fri, Jun 30, 2017 at 1:24 PM, Harsha  wrote:

> Thanks for the proposal Alan. I am +1 on separating the Hive Metastore.
> This is a great opportunity for building a Metastore to not only address
> schemas for the data at rest but also for the data in motion. We have a
> SchemaRegistry (http://github.com/hortonworks/registry)  project that
> allows users to register schemas for data in motion and integrates with
> Kafka, Kinesis, Evenhubs and other messaging queues. This will provide
> us with opportunity to integrate our apis with Hive Metastore and
> provide with one project that is truly a single metastore that can hold
> all schemas.
>
> Thanks,
> Harsha
>
> On Fri, Jun 30, 2017, at 01:18 PM, Sergio Pena wrote:
> > Great, thanks Alan for putting all this in the email.
> > +1
> >
> > Allowing other components to continue to use the Metastore without the
> > need
> > to use Hive dependencies is a big plus for them. I agree with everything
> > you mention on the email.
> >
> > - Sergio
> >
> > On Fri, Jun 30, 2017 at 1:49 PM, Julian Hyde  wrote:
> >
> > > +1
> > >
> > > As a Calcite PMC member, I am very pleased to see this change. Calcite
> > > reads metadata from a variety of sources (including JDBC databases,
> NoSQL
> > > databases such as Cassandra and Druid, and streaming systems), and if
> more
> > > of those sources choose to store their metadata in the metastore it
> will
> > > make our lives easier.
> > >
> > > Hive’s metastore has established a position as the place to go for
> > > metadata in the Hadoop ecosystem. Not all metadata is relational, or
> > > processed by Hive, so there are other parties using the metastore who
> > > justifiably would like to influence its direction. Opening up the
> metastore
> > > will help retain and extend this position.
> > >
> > > Julian
> > >
> > >
> > > On 2017-06-30 10:00 (-0700), "Dimitris ts...@apache.org> wrote:
> > > >
> > > >
> > > > On 2017-06-30 07:56 (-0700), Alan Gates  wrote: >
> > > > > A few of us have been talking and come to the conclussion that it
> > > would be>
> > > > > a good thing to split out the Hive metastore into its own Apache
> > > project.>
> > > > > Below and in the linked wiki page we explain what we see as the
> > > advantages>
> > > > > to this and how we would go about it.>
> > > > > >
> > > > > Hive’s metastore has long been used by other projects in the
> Hadoop>
> > > > > ecosystem to store and access metadata.  Apache Impala, Apache
> Spark,>
> > > > > Apache Drill, Presto, and other systems all use Hive’s metastore.
> > > Some,>
> > > > > like Impala and Presto can use it as their own metadata system with
> > > the>
> > > > > rest of Hive not present.>
> > > > > >
> > > > > This sharing is excellent for the ecosystem.  Together with HDFS it
> > > allows>
> > > > > users to use the tool of their choice while still accessing the
> same
> > > shared>
> > > > > data.  But having this shared metadata inside the Hive project
> limits
> > > the>
> > > > > ability of other projects to contribute to the metastore.  It also
> > > makes it>
> > > > > harder for new systems that have similar but not identical
> metadata>
> > > > > requirements (for example, stream processing systems on top of
> Apache>
> > > > > Kafka) to use Hive’s metastore.  This difficulty for other systems
> > > comes>
> > > > > out in two ways.  One, it is hard for non-Hive community members
> to>
> > > > > participate in the project.  Second, it adds operational cost since
> > > users>
> > > > > are forced to deploy all of the Hive jars just to get the
> metastore to
> > > work.>
> > > > > >
> > > > > Therefore we propose to split Hive’s metastore out into a separate
> > > Apache>
> > > > > project.  This new project will continue to support the same Thrift
> > > API as>
> > > > > the current metastore.  It will continue to focus on being a high>
> > > > > performance, fault tolerant, large scale, operational metastore for
> > > SQL>
> > > > > engines and other systems that want to store schema information
> about
> > > their>
> > > > > data.>
> > > > > >
> > > > > By making it a separate project we will enable other projects to
> join
> > > us in>
> > > > > innovating on the metastore.  It will simplify operations for
> non-Hive>
> > > > > users that want to use the metastore as they will no longer need to
> > > install>
> > > > > Hive just to get the metastore.  And it will attract new projects
> that>
> > > > > might otherwise feel the need to solve their metadata problems on
> > > their own.>
> > > > > >
> > > > > Any Hive PMC member or committer will be welcome to join the new
> > > project at>
> > > > > the same level.  We propose this project go straight to a top
> level>
> > > > > project.  Given that the initial PMC will be formed from
> experienced
> > > Hive>
> > > > > PMC members we do not believe incubation will be necessary.  (Note
> > > that the>
> > > > > 

[jira] [Created] (HIVE-16962) Better error msg for Hive on Spark in case user cancels query and closes session

2017-06-26 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-16962:
--

 Summary: Better error msg for Hive on Spark in case user cancels 
query and closes session
 Key: HIVE-16962
 URL: https://issues.apache.org/jira/browse/HIVE-16962
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Affects Versions: 1.1.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang


In case user cancels a query and closes the session, Hive marks the query as 
failed. However, the error message is a little confusing. It still says:
{quote}
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: 
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create spark client. 
This is likely because the queue you assigned to does not have free resource at 
the moment to start the job. Please check your queue usage and try the query 
again later.
{quote}
followed by some InterruptedException.
Ideally, the error should clearly indicates the fact that user cancels the 
execution.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-16961) Hive on Spark leaks spark application in case user cancels query and closes session

2017-06-26 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-16961:
--

 Summary: Hive on Spark leaks spark application in case user 
cancels query and closes session
 Key: HIVE-16961
 URL: https://issues.apache.org/jira/browse/HIVE-16961
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: 1.1.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang


It's found that a Spark application is leaked when user cancels query and 
closes the session while Hive is waiting for remote driver to connect back. 
This is found for asynchronous query execution, but seemingly equally 
applicable for synchronous submission when session is abruptly closed. The 
leaked Spark application that runs Spark driver connects back to Hive 
successfully and run for ever (until HS2 restarts), but receives no job 
submission because the session is already closed. Ideally, Hive should rejects 
the connection from the driver so the driver will exist.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-16854) SparkClientFactory is locked too aggressively

2017-06-07 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-16854:
--

 Summary: SparkClientFactory is locked too aggressively
 Key: HIVE-16854
 URL: https://issues.apache.org/jira/browse/HIVE-16854
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: 1.1.0
Reporter: Xuefu Zhang


Most methods in SparkClientFactory are synchronized on the SparkClientFactory 
singleton. However, some methods are very expensive, such as createClient(), 
which returns a SparkClientImpl instance. However, creating a SparkClientImpl 
instance requires starting a remote driver to connect back to RPCServer. This 
process can take a long time such as in case of a busy yarn queue. When this 
happens, all pending  calls on SparkClientFactory will have to wait for a long 
time.

In our case, hive.spark.client.server.connect.timeout is set to 1hr. This makes 
some queries waiting for hours before starting.

The current implementation seems pretty much making all remote driver launches 
serialized. If one of them takes time, the following ones will have to wait.

HS2 stacktrace is attached for reference. It's based on earlier version of 
Hive, so the line numbers might be slightly off.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16799) Control the max number of task for a stage in a spark job

2017-05-31 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-16799:
--

 Summary: Control the max number of task for a stage in a spark job
 Key: HIVE-16799
 URL: https://issues.apache.org/jira/browse/HIVE-16799
 Project: Hive
  Issue Type: Improvement
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang


HIVE-16552 gives admin an option to control the maximum number of tasks a Spark 
job may have. However, this may not be sufficient as this tends to penalize 
jobs that have many stages while favoring jobs that has fewer stages. Ideally, 
we should also limit the number of tasks in a stage, which is closer to the 
maximum number of mappers or reducers in a MR job.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [Announce] New PMC members

2017-05-26 Thread Xuefu Zhang
Many congratulations to Yongzhi, Daniel, Vaibhav, Sergio, Aihua, Chaoyu!
Well-deserved!

Thanks,
Xuefu

On Fri, May 26, 2017 at 7:19 AM, Jimmy Xiang  wrote:

> Congrats!!
>
> On Fri, May 26, 2017 at 2:52 AM, Adam Szita  wrote:
> > Congrats to all, well done!
> >
> > Adam
> >
> > On 26 May 2017 at 08:17, Anshuman Dwivedi 
> wrote:
> >
> >> Congrats !
> >>
> >> Regards,
> >> Anshuman Dwivedi
> >>
> >>
> >> -Peter Vary  wrote: -
> >> To: dev@hive.apache.org
> >> From: Peter Vary 
> >> Date: 05/26/2017 10:57AM
> >> Subject: Re: [Announce] New PMC members
> >>
> >> Wow!
> >> That's a spring shower of PMCs. :)
> >> Well deserved Yongzhi, Daniel, Vaibhav, Sergio, Aihua, Chaoyu!
> >>
> >> Congratulations to all of you!
> >>
> >> Peter
> >>
> >> 2017. máj. 26. 6:42 ezt írta ("Ashutosh Chauhan"  >):
> >>
> >> The Project Management Committee (PMC) for Apache Hive has invited
> Yongzhi
> >> Chen to become a PMC member and we are pleased to announce that he has
> >> accepted.
> >>
> >> Please join me in congratulating Yongzhi!
> >>
> >> Thanks,
> >> Ashutosh on behalf of Hive PMC
> >> =-=-=
> >> Notice: The information contained in this e-mail
> >> message and/or attachments to it may contain
> >> confidential or privileged information. If you are
> >> not the intended recipient, any dissemination, use,
> >> review, distribution, printing or copying of the
> >> information contained in this e-mail message
> >> and/or attachments to it are strictly prohibited. If
> >> you have received this communication in error,
> >> please notify us by reply e-mail or telephone and
> >> immediately and permanently delete the message
> >> and any attachments. Thank you
> >>
> >>
> >>
>


Welcome Rui Li to Hive PMC

2017-05-24 Thread Xuefu Zhang
Hi all,

It's an honer to announce that Apache Hive PMC has recently voted to invite
Rui Li as a new Hive PMC member. Rui is a long time Hive contributor and
committer, and has made significant contribution in Hive especially in Hive
on Spark. Please join me in congratulating him and looking forward to a
bigger role that he will play in Apache Hive project.

Thanks,
Xuefu


Jimmy Xiang now a Hive PMC member

2017-05-24 Thread Xuefu Zhang
Hi all,

It's an honer to announce that Apache Hive PMC has recently voted to invite
Jimmy Xiang as a new Hive PMC member. Please join me in congratulating him
and looking forward to a bigger role that he will play in Apache Hive
project.

Thanks,
Xuefu


Re: Review Request 50787: Add a timezone-aware timestamp

2017-05-08 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50787/#review174202
---


Ship it!




Ship It!

- Xuefu Zhang


On May 8, 2017, 3:17 p.m., Rui Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50787/
> ---
> 
> (Updated May 8, 2017, 3:17 p.m.)
> 
> 
> Review request for hive, pengcheng xiong and Xuefu Zhang.
> 
> 
> Bugs: HIVE-14412
> https://issues.apache.org/jira/browse/HIVE-14412
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The 1st patch to add timezone-aware timestamp.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/type/TimestampTZ.java 
> PRE-CREATION 
>   common/src/test/org/apache/hadoop/hive/common/type/TestTimestampTZ.java 
> PRE-CREATION 
>   contrib/src/test/queries/clientnegative/serde_regex.q a676338 
>   contrib/src/test/queries/clientpositive/serde_regex.q d75d607 
>   contrib/src/test/results/clientnegative/serde_regex.q.out 58b1c02 
>   contrib/src/test/results/clientpositive/serde_regex.q.out 2984293 
>   hbase-handler/src/test/queries/positive/hbase_timestamp.q 0350afe 
>   hbase-handler/src/test/results/positive/hbase_timestamp.q.out 3918121 
>   itests/hive-blobstore/src/test/queries/clientpositive/orc_format_part.q 
> 358eccd 
>   
> itests/hive-blobstore/src/test/queries/clientpositive/orc_nonstd_partitions_loc.q
>  c462538 
>   itests/hive-blobstore/src/test/queries/clientpositive/rcfile_format_part.q 
> c563d3a 
>   
> itests/hive-blobstore/src/test/queries/clientpositive/rcfile_nonstd_partitions_loc.q
>  d17c281 
>   itests/hive-blobstore/src/test/results/clientpositive/orc_format_part.q.out 
> 5d1319f 
>   
> itests/hive-blobstore/src/test/results/clientpositive/orc_nonstd_partitions_loc.q.out
>  70e72f7 
>   
> itests/hive-blobstore/src/test/results/clientpositive/rcfile_format_part.q.out
>  bed10ab 
>   
> itests/hive-blobstore/src/test/results/clientpositive/rcfile_nonstd_partitions_loc.q.out
>  c6442f9 
>   jdbc/src/java/org/apache/hive/jdbc/HiveBaseResultSet.java ade1900 
>   jdbc/src/java/org/apache/hive/jdbc/JdbcColumn.java 38918f0 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 1b556ac 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java f8b55da 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java 
> 01a652d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/TypeConverter.java
>  38308c9 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
> 0cf9205 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g 190b66b 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g ca639d3 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g 645ced9 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 
> c3227c9 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java bda2050 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToBoolean.java 7cdf2c3 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToString.java 5cacd59 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java 68d98f5 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDate.java 
> 5a31e61 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFToTimestampTZ.java
>  PRE-CREATION 
>   
> ql/src/test/org/apache/hadoop/hive/ql/parse/TestSQL11ReservedKeyWordsNegative.java
>  0dc6b19 
>   ql/src/test/queries/clientnegative/serde_regex.q c9cfc7d 
>   ql/src/test/queries/clientnegative/serde_regex2.q a29bb9c 
>   ql/src/test/queries/clientnegative/serde_regex3.q 4e91f06 
>   ql/src/test/queries/clientpositive/create_like.q bd39731 
>   ql/src/test/queries/clientpositive/join43.q 12c45a6 
>   ql/src/test/queries/clientpositive/serde_regex.q e21c6e1 
>   ql/src/test/queries/clientpositive/timestamptz.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/timestamptz_1.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/timestamptz_2.q PRE-CREATION 
>   ql/src/test/results/clientnegative/serde_regex.q.out a1ec5ca 
>   ql/src/test/results/clientnegative/serde_regex2.q.out 374675d 
>   ql/src/test/results/clientnegative/serde_regex3.q.out dc0a0e2 
>   ql/src/test/results/clientpositive/create_like.q.out ff2e752 
>   ql/src/test/results/clientpositive/join43.q.out e8c7278 
>   ql/src/test/results/clientpositive/serde_regex.q.out 7bebb0c 
>   ql/src/t

Re: Review Request 50787: Add a timezone-aware timestamp

2017-05-07 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50787/#review174136
---




common/src/java/org/apache/hadoop/hive/common/type/TimestampTZ.java
Lines 138 (patched)
<https://reviews.apache.org/r/50787/#comment247252>

Not sure if I understand this, but why cannot we get seconds/nanos from 
date/timestamp and then convert to TimestapTZ? I assume this is a faster way.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/TypeConverter.java
Lines 204 (patched)
<https://reviews.apache.org/r/50787/#comment247253>

What does this imply?



serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyTimestampTZ.java
Lines 32 (patched)
<https://reviews.apache.org/r/50787/#comment247254>

Can you also make a note about the source of the code, like 
TimeStampTZWritable?


- Xuefu Zhang


On May 3, 2017, 6:34 a.m., Rui Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50787/
> ---
> 
> (Updated May 3, 2017, 6:34 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14412
> https://issues.apache.org/jira/browse/HIVE-14412
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The 1st patch to add timezone-aware timestamp.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/type/TimestampTZ.java 
> PRE-CREATION 
>   common/src/test/org/apache/hadoop/hive/common/type/TestTimestampTZ.java 
> PRE-CREATION 
>   contrib/src/test/queries/clientnegative/serde_regex.q a676338 
>   contrib/src/test/queries/clientpositive/serde_regex.q d75d607 
>   contrib/src/test/results/clientnegative/serde_regex.q.out 58b1c02 
>   contrib/src/test/results/clientpositive/serde_regex.q.out 2984293 
>   hbase-handler/src/test/queries/positive/hbase_timestamp.q 0350afe 
>   hbase-handler/src/test/results/positive/hbase_timestamp.q.out 3918121 
>   itests/hive-blobstore/src/test/queries/clientpositive/orc_format_part.q 
> 358eccd 
>   
> itests/hive-blobstore/src/test/queries/clientpositive/orc_nonstd_partitions_loc.q
>  c462538 
>   itests/hive-blobstore/src/test/queries/clientpositive/rcfile_format_part.q 
> c563d3a 
>   
> itests/hive-blobstore/src/test/queries/clientpositive/rcfile_nonstd_partitions_loc.q
>  d17c281 
>   itests/hive-blobstore/src/test/results/clientpositive/orc_format_part.q.out 
> 5d1319f 
>   
> itests/hive-blobstore/src/test/results/clientpositive/orc_nonstd_partitions_loc.q.out
>  70e72f7 
>   
> itests/hive-blobstore/src/test/results/clientpositive/rcfile_format_part.q.out
>  bed10ab 
>   
> itests/hive-blobstore/src/test/results/clientpositive/rcfile_nonstd_partitions_loc.q.out
>  c6442f9 
>   jdbc/src/java/org/apache/hive/jdbc/HiveBaseResultSet.java ade1900 
>   jdbc/src/java/org/apache/hive/jdbc/JdbcColumn.java 38918f0 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 8dc5f2e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java f8b55da 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java 
> 01a652d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/TypeConverter.java
>  38308c9 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
> 0cf9205 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g 0721b92 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g d98a663 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g 8598fae 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 
> 8f8eab0 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java bda2050 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToBoolean.java 7cdf2c3 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToString.java 5cacd59 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java 68d98f5 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDate.java 
> 5a31e61 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFToTimestampTZ.java
>  PRE-CREATION 
>   
> ql/src/test/org/apache/hadoop/hive/ql/parse/TestSQL11ReservedKeyWordsNegative.java
>  0dc6b19 
>   ql/src/test/queries/clientnegative/serde_regex.q c9cfc7d 
>   ql/src/test/queries/clientnegative/serde_regex2.q a29bb9c 
>   ql/src/test/queries/clientnegative/serde_regex3.q 4e91f06 
>   ql/src/test/queries/clientpositive/create_like.q bd39731 
>   ql/src/test/queries/clientpositive/join43.q 12c45a6 
>   ql/s

Welcome new Hive committer, Zhihai Xu

2017-05-05 Thread Xuefu Zhang
Hi all,

I'm very please to announce that Hive PMC has recently voted to offer
Zhihai a committership which he accepted. Please join me in congratulating
on this recognition and thanking him for his contributions to Hive.

Regards,
Xuefu


Re: Review Request 58865: HIVE-16552: Limit the number of tasks a Spark job may contain

2017-05-03 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58865/
---

(Updated May 3, 2017, 6:14 p.m.)


Review request for hive.


Bugs: HIVE-16552
https://issues.apache.org/jira/browse/HIVE-16552


Repository: hive-git


Description
---

See JIRA description


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 84398c6 
  itests/src/test/resources/testconfiguration.properties 753f3a9 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 32a7730 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/RemoteSparkJobMonitor.java
 dd73f3e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobMonitor.java 
0b224f2 
  ql/src/test/queries/clientnegative/spark_job_max_tasks.q PRE-CREATION 
  ql/src/test/results/clientnegative/spark/spark_job_max_tasks.q.out 
PRE-CREATION 


Diff: https://reviews.apache.org/r/58865/diff/4/

Changes: https://reviews.apache.org/r/58865/diff/3-4/


Testing
---

Test locally


Thanks,

Xuefu Zhang



Re: Review Request 58865: HIVE-16552: Limit the number of tasks a Spark job may contain

2017-05-02 Thread Xuefu Zhang


> On May 3, 2017, 3:35 a.m., Rui Li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java
> > Lines 132 (patched)
> > <https://reviews.apache.org/r/58865/diff/3/?file=1705971#file1705971line132>
> >
> > I think the log is unnecessary because the failure should already be 
> > logged in the monitor

This is not new code.


> On May 3, 2017, 3:35 a.m., Rui Li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java
> > Lines 135 (patched)
> > <https://reviews.apache.org/r/58865/diff/3/?file=1705971#file1705971line135>
> >
> > Same as above. Can we consolidate the logs a bit?

Jobmonitor prints it on console, while the log here is written to hive.log.


> On May 3, 2017, 3:35 a.m., Rui Li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/RemoteSparkJobMonitor.java
> > Lines 104 (patched)
> > <https://reviews.apache.org/r/58865/diff/3/?file=1705972#file1705972line104>
> >
> > Maybe I was being misleading. I mean we can compute the total task only 
> > once when the job first reaches RUNNING state, i.e. in the "if (!running)". 
> > At this point, the total count is determined and won't change.

Yeah. However, I'd like to keep the state transition to running first before 
breaking up and returning rc=4. In fact, if we lose the transition, Hive 
actually goes into an instable state. What you said was what I tried in first 
place.


- Xuefu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58865/#review173689
---


On May 2, 2017, 6:49 p.m., Xuefu Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58865/
> ---
> 
> (Updated May 2, 2017, 6:49 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-16552
> https://issues.apache.org/jira/browse/HIVE-16552
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> See JIRA description
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 84398c6 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 32a7730 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/RemoteSparkJobMonitor.java
>  dd73f3e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobMonitor.java 
> 0b224f2 
> 
> 
> Diff: https://reviews.apache.org/r/58865/diff/3/
> 
> 
> Testing
> ---
> 
> Test locally
> 
> 
> Thanks,
> 
> Xuefu Zhang
> 
>



Re: Review Request 58865: HIVE-16552: Limit the number of tasks a Spark job may contain

2017-05-02 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58865/
---

(Updated May 2, 2017, 6:49 p.m.)


Review request for hive.


Bugs: HIVE-16552
https://issues.apache.org/jira/browse/HIVE-16552


Repository: hive-git


Description
---

See JIRA description


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 84398c6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 32a7730 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/RemoteSparkJobMonitor.java
 dd73f3e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobMonitor.java 
0b224f2 


Diff: https://reviews.apache.org/r/58865/diff/3/

Changes: https://reviews.apache.org/r/58865/diff/2-3/


Testing
---

Test locally


Thanks,

Xuefu Zhang



Re: Review Request 58865: HIVE-16552: Limit the number of tasks a Spark job may contain

2017-05-01 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58865/
---

(Updated May 1, 2017, 5:13 p.m.)


Review request for hive.


Changes
---

Updated patch to reflect Lefty's feedback.


Bugs: HIVE-16552
https://issues.apache.org/jira/browse/HIVE-16552


Repository: hive-git


Description
---

See JIRA description


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java d3ea824 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 32a7730 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/RemoteSparkJobMonitor.java
 dd73f3e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobMonitor.java 
0b224f2 


Diff: https://reviews.apache.org/r/58865/diff/2/

Changes: https://reviews.apache.org/r/58865/diff/1-2/


Testing
---

Test locally


Thanks,

Xuefu Zhang



Review Request 58865: HIVE-16552: Limit the number of tasks a Spark job may contain

2017-04-28 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58865/
---

Review request for hive.


Bugs: HIVE-16552
https://issues.apache.org/jira/browse/HIVE-16552


Repository: hive-git


Description
---

See JIRA description


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java d3ea824 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 32a7730 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/RemoteSparkJobMonitor.java
 dd73f3e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobMonitor.java 
0b224f2 


Diff: https://reviews.apache.org/r/58865/diff/1/


Testing
---

Test locally


Thanks,

Xuefu Zhang



[jira] [Created] (HIVE-16552) Limit the number of tasks a Spark job may contain

2017-04-27 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-16552:
--

 Summary: Limit the number of tasks a Spark job may contain
 Key: HIVE-16552
 URL: https://issues.apache.org/jira/browse/HIVE-16552
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang


It's commonly desirable to block bad and big queries that takes a lot of YARN 
resources. One approach, similar to mapreduce.job.max.map in MapReduce, is to 
stop a query that invokes a Spark job that contains too many tasks. The 
proposal here is to introduce hive.spark.job.max.tasks with a default value of 
-1 (no limit), which an admin can set to block queries that trigger too many 
spark tasks.

Please note that this control knob applies to a spark job, though it's possible 
that one query can trigger multiple Spark jobs (such as in case of map-join). 
Nevertheless, the proposed approach is still helpful.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Review Request 57586: HIVE-16183: Fix potential thread safety issues with static variables

2017-03-15 Thread Xuefu Zhang
/apache/hadoop/hive/serde2/lazy/fast/StringToDouble.java 
f50b4fd 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java 
f4ac56f 
  serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/TypeInfoUtils.java 
14349fa 
  shims/common/src/main/java/org/apache/hadoop/hive/io/HdfsUtils.java 7270426 
  
shims/common/src/main/java/org/apache/hadoop/hive/io/HiveIOExceptionHandlerChain.java
 a58f1f2 
  
shims/common/src/main/java/org/apache/hadoop/hive/io/HiveIOExceptionHandlerUtil.java
 d972edb 
  shims/common/src/main/java/org/apache/hadoop/hive/shims/ShimLoader.java 
44f24b2 
  
storage-api/src/java/org/apache/hadoop/hive/common/type/FastHiveDecimalImpl.java
 7a565dd 
  storage-api/src/java/org/apache/hadoop/hive/common/type/RandomTypeUtil.java 
8d950a2 
  testutils/src/java/org/apache/hive/testutils/jdbc/HiveBurnInClient.java 
41ade5f 


Diff: https://reviews.apache.org/r/57586/diff/3/

Changes: https://reviews.apache.org/r/57586/diff/2-3/


Testing
---

This relies on existing test cases.


Thanks,

Xuefu Zhang



Re: Review Request 57586: HIVE-16183: Fix potential thread safety issues with static variables

2017-03-15 Thread Xuefu Zhang
/apache/hadoop/hive/serde2/lazy/fast/StringToDouble.java 
f50b4fd 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java 
f4ac56f 
  serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/TypeInfoUtils.java 
14349fa 
  shims/common/src/main/java/org/apache/hadoop/hive/io/HdfsUtils.java 7270426 
  
shims/common/src/main/java/org/apache/hadoop/hive/io/HiveIOExceptionHandlerChain.java
 a58f1f2 
  
shims/common/src/main/java/org/apache/hadoop/hive/io/HiveIOExceptionHandlerUtil.java
 d972edb 
  shims/common/src/main/java/org/apache/hadoop/hive/shims/ShimLoader.java 
44f24b2 
  
storage-api/src/java/org/apache/hadoop/hive/common/type/FastHiveDecimalImpl.java
 7a565dd 
  storage-api/src/java/org/apache/hadoop/hive/common/type/RandomTypeUtil.java 
8d950a2 
  testutils/src/java/org/apache/hive/testutils/jdbc/HiveBurnInClient.java 
41ade5f 


Diff: https://reviews.apache.org/r/57586/diff/2/

Changes: https://reviews.apache.org/r/57586/diff/1-2/


Testing
---

This relies on existing test cases.


Thanks,

Xuefu Zhang



Re: [ANNOUNCE] New PMC Member : Eugene Koifman

2017-03-15 Thread Xuefu Zhang
Congratulations, Eugene!

On Wed, Mar 15, 2017 at 7:31 AM, Sergio Pena 
wrote:

> Congratulations Eugene !!
>
> On Wed, Mar 15, 2017 at 5:41 AM, Prasanth Jayachandran <
> pjayachand...@hortonworks.com> wrote:
>
> > Congratulations Eugene!
> >
> > Thanks
> > Prasanth
> >
> >
> >
> >
> > On Tue, Mar 14, 2017 at 10:02 PM -1000, "Zoltan Haindrich" <
> > zhaindr...@hortonworks.com> wrote:
> >
> >
> > Congrats Eugene!!
> >
> > On 15 Mar 2017 07:50, Peter Vary  wrote:
> > Congratulations! :)
> >
> > 2017. márc. 15. 7:05 ezt írta ("Vaibhav Gumashta" ):
> >
> > > Congrats Eugene!
> > >
> > >
> > > On 3/14/17, 11:03 PM, "Rajesh Balamohan"  wrote:
> > >
> > > >Congrats Eugene!! :)
> > > >
> > > >~Rajesh.B
> > > >
> > > >On Wed, Mar 15, 2017 at 11:21 AM, Pengcheng Xiong
> > > >wrote:
> > > >
> > > >> Congrats! Well deserved!
> > > >>
> > > >> Thanks.
> > > >> Pengcheng
> > > >>
> > > >> On Tue, Mar 14, 2017 at 10:39 PM, Ashutosh Chauhan
> > > >>
> > > >> wrote:
> > > >>
> > > >> > On behalf of the Hive PMC I am delighted to announce Eugene
> Koifman
> > is
> > > >> > joining Hive PMC.
> > > >> > Eugene is a long time contributor in Hive and is focusing on ACID
> > > >>support
> > > >> > areas these days.
> > > >> >
> > > >> > Welcome, Eugene!
> > > >> >
> > > >> > Thanks,
> > > >> > Ashutosh
> > > >> >
> > > >>
> > >
> > >
> >
> >
> >
>


Re: Review Request 57586: HIVE-16183: Fix potential thread safety issues with static variables

2017-03-14 Thread Xuefu Zhang


> On March 14, 2017, 6:18 p.m., Sergey Shelukhin wrote:
> > cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java
> > Line 265 (original), 265 (patched)
> > <https://reviews.apache.org/r/57586/diff/1/?file=1663203#file1663203line265>
> >
> > not sure if removing static from methods is needed... I usually prefer 
> > to ADD static to methods if they don't depend on an instance :)
> > Non-binding

Given that "test" is changed to an instance variable, this method needs to be 
instance method as well.


- Xuefu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57586/#review168921
-------


On March 14, 2017, 4:32 a.m., Xuefu Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/57586/
> ---
> 
> (Updated March 14, 2017, 4:32 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-16183
> https://issues.apache.org/jira/browse/HIVE-16183
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Please see JIRA description
> 
> 
> Diffs
> -
> 
>   beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java 7e6846d 
>   beeline/src/java/org/apache/hive/beeline/HiveSchemaHelper.java 181f0d2 
>   cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java f1806a0 
>   cli/src/test/org/apache/hadoop/hive/cli/TestRCFileCat.java 11ceb31 
>   common/src/java/org/apache/hadoop/hive/common/LogUtils.java c2a0d9a 
>   common/src/java/org/apache/hadoop/hive/common/StatsSetupConst.java 926b4a6 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreSchemaInfo.java 
> 9c30ee7 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ArchiveUtils.java 6381a21 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 4ac25c2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 6693134 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 
> 5b0c2bf 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CuckooSetBytes.java
>  6383e8a 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastHashTable.java
>  9030e5f 
>   ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryImpl.java 6582cdd 
>   ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java a1408e9 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java 7727114 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 4995bdf 
>   ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java d391164 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 369584b 
>   ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/stats/PartialScanTask.java 
> 90b1dff 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/VirtualColumn.java 044d64c 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 0e67ea6 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/listbucketingpruner/ListBucketingPrunerUtils.java
>  4d3e74e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
>  93202c3 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
> 50eda15 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/VectorizerReason.java
>  e0a6198 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
> 36009bf 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
> f175663 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/WindowingSpec.java 01b5559 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/AbstractVectorDesc.java e85a418 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/GroupByDesc.java 0b49294 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java ca69697 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 9ae30ab 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/VectorAppMasterEventDesc.java 
> 2e11321 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/VectorFileSinkDesc.java 325ac91 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/VectorFilterDesc.java 6feed84 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/VectorGroupByDesc.java f8554e2 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/VectorLimitDesc.java c9bc45a 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/VectorMapJoinDesc.java 3aa65d3 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/VectorMapJoinInfo.java 9429785 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/VectorPartitionDesc.java 4078c7d 
>

[jira] [Created] (HIVE-16196) UDFJson having thread-safety issues

2017-03-13 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-16196:
--

 Summary: UDFJson having thread-safety issues
 Key: HIVE-16196
 URL: https://issues.apache.org/jira/browse/HIVE-16196
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 1.1.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang


Followup for HIVE-16183, there seems to be some concurrency issues in 
UDFJson.java, especially around static class variables.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Review Request 57586: HIVE-16183: Fix potential thread safety issues with static variables

2017-03-13 Thread Xuefu Zhang
/serde2/lazybinary/LazyBinaryUtils.java 
f4ac56f 
  serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/TypeInfoUtils.java 
14349fa 
  shims/common/src/main/java/org/apache/hadoop/hive/io/HdfsUtils.java 7270426 
  
shims/common/src/main/java/org/apache/hadoop/hive/io/HiveIOExceptionHandlerChain.java
 a58f1f2 
  
shims/common/src/main/java/org/apache/hadoop/hive/io/HiveIOExceptionHandlerUtil.java
 d972edb 
  shims/common/src/main/java/org/apache/hadoop/hive/shims/ShimLoader.java 
44f24b2 
  
storage-api/src/java/org/apache/hadoop/hive/common/type/FastHiveDecimalImpl.java
 7a565dd 
  storage-api/src/java/org/apache/hadoop/hive/common/type/RandomTypeUtil.java 
8d950a2 
  testutils/src/java/org/apache/hive/testutils/jdbc/HiveBurnInClient.java 
41ade5f 


Diff: https://reviews.apache.org/r/57586/diff/1/


Testing
---

This relies on existing test cases.


Thanks,

Xuefu Zhang



[jira] [Created] (HIVE-16183) Fix potential thread safety issues with static variables

2017-03-12 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-16183:
--

 Summary: Fix potential thread safety issues with static variables
 Key: HIVE-16183
 URL: https://issues.apache.org/jira/browse/HIVE-16183
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang


Many concurrency issues have been found with respect to class static variable 
usages. With fact that HS2 supports concurrent compilation and task execution 
as well as some backend engines (such as Spark) running multiple tasks in a 
single JVM, traditional assumption (or mindset) of single threaded execution 
needs to be abandoned.

This purpose of this JIRA is to do a global scan of static variables in Hive 
code base, and correct potential thread-safety issues. However, it's not meant 
to be exhaustive.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16179) HoS tasks may fail due to ArrayIndexOutOfBoundException in BinarySortableSerDe

2017-03-10 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-16179:
--

 Summary: HoS tasks may fail due to ArrayIndexOutOfBoundException 
in BinarySortableSerDe
 Key: HIVE-16179
 URL: https://issues.apache.org/jira/browse/HIVE-16179
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 1.1.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang


Stacktrace:
{code}
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error: Unable to deserialize reduce input key from 
x1x100x101x97x51x49x50x97x102x45x97x98x56x52x45x52x102x52x53x45x56x49x101x99x45x49x99x100x98x55x97x51x52x100x49x49x55x0x1x128x0x0x0x0x0x0x19x1x128x0x0x0x0x0x0x3x1x128x0x66x179x1x192x244x45x90x1x85x98x101x114x0x1x76x111x115x32x65x110x103x101x108x101x115x0x1x2x128x0x0x2x50x51x57x51x0x1x192x55x238x20x122x225x71x174x1x128x0x0x0x87x240x169x195x1x50x48x49x54x45x49x48x45x48x49x32x50x51x58x51x49x58x51x49x0x1x117x98x101x114x88x0x255
 with properties 
{columns=_col0,_col1,_col2,_col3,_col4,_col5,_col6,_col7,_col8,_col9,_col10,_col11,
 
serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe,
 serialization.sort.order=, 
columns.types=string,bigint,bigint,date,int,varchar(50),varchar(255),decimal(12,2),double,bigint,string,varchar(255)}
at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:339)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:54)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
at 
org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
at 
org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2004)
at 
org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2004)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
Error: Unable to deserialize reduce input key from 
x1x100x101x97x51x49x50x97x102x45x97x98x56x52x45x52x102x52x53x45x56x49x101x99x45x49x99x100x98x55x97x51x52x100x49x49x55x0x1x128x0x0x0x0x0x0x19x1x128x0x0x0x0x0x0x3x1x128x0x66x179x1x192x244x45x90x1x85x98x101x114x0x1x76x111x115x32x65x110x103x101x108x101x115x0x1x2x128x0x0x2x50x51x57x51x0x1x192x55x238x20x122x225x71x174x1x128x0x0x0x87x240x169x195x1x50x48x49x54x45x49x48x45x48x49x32x50x51x58x51x49x58x51x49x0x1x117x98x101x114x88x0x255
 with properties 
{columns=_col0,_col1,_col2,_col3,_col4,_col5,_col6,_col7,_col8,_col9,_col10,_col11,
 
serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe,
 serialization.sort.order=, 
columns.types=string,bigint,bigint,date,int,varchar(50),varchar(255),decimal(12,2),double,bigint,string,varchar(255)}
at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:311)
... 16 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 3
at 
org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:413)
at 
org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:190)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:309)
... 16 more
{code}

It seems to be a synchronization issue in BinarySortableSerDe.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16156) FileSinkOperator should delete existing output target when renaming

2017-03-09 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-16156:
--

 Summary: FileSinkOperator should delete existing output target 
when renaming
 Key: HIVE-16156
 URL: https://issues.apache.org/jira/browse/HIVE-16156
 Project: Hive
  Issue Type: Bug
  Components: Operators
Affects Versions: 1.1.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang


If a task get killed (for whatever a reason) after it completes the renaming 
the temp output to final output during commit, subsequent task attempts will 
fail when renaming because of the existence of the target output. This can 
happen, however rarely.

Hive should check the existence of the target output and delete it before 
renaming.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15893) Followup on HIVE-15671

2017-02-13 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-15893:
--

 Summary: Followup on HIVE-15671
 Key: HIVE-15893
 URL: https://issues.apache.org/jira/browse/HIVE-15893
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Affects Versions: 2.2.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang


In HIVE-15671, we fixed a type where server.connect.timeout is used in the 
place of client.connect.timeout. This might solve some potential problems, but 
the original problem reported in HIVE-15671 might still exist. (Not sure if 
HIVE-15860 helps). Here is the proposal suggested by Marcelo:
{quote}
bq: server detecting a driver problem after it has connected back to the server.

Hmm. That is definitely not any of the "connect" timeouts, which probably means 
it isn't configured and is just using netty's default (which is probably no 
timeout?). Would probably need something using 
io.netty.handler.timeout.IdleStateHandler, and also some periodic "ping" so 
that the connection isn't torn down without reason.
{code}

We will use this JIRA to track the issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15683) Measure performance impact on group by by HIVE-15580

2017-01-20 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-15683:
--

 Summary: Measure performance impact on group by by HIVE-15580
 Key: HIVE-15683
 URL: https://issues.apache.org/jira/browse/HIVE-15683
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Affects Versions: 2.2.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang


HIVE-15580 changed the way the data is shuffled for order by: instead of using 
Spark's groupByKey to shuffle data, Hive on Spark now uses 
repartitionAndSortWithinPartitions(), which generates (key, value) pairs 
instead of original (key, value iterator). This might have some performance 
implications, but it's needed to get rid of unbound memory usage by 
{{groupByKey}}.

Here we'd like to compare group by performance with or w/o HIVE-15580. If the 
impact is significant, we can provide a configuration that allows user to 
switch back to the original way of shuffling.

This work should be ideally done after HIVE-15682 as the optimization there 
should help the performance here as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15682) Eliminate the dummy iterator and optimize the per row based reducer-side processing

2017-01-20 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-15682:
--

 Summary: Eliminate the dummy iterator and optimize the per row 
based reducer-side processing
 Key: HIVE-15682
 URL: https://issues.apache.org/jira/browse/HIVE-15682
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Affects Versions: 2.2.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang


HIVE-15580 introduced a dummy iterator per input row which can be eliminated. 
This is because {{SparkReduceRecordHandler}} is able to handle single key value 
pairs. We can refactor this part of code 1. to remove the need for a iterator 
and 2. to optimize the code path for per (key, value) based (instead of (key, 
value iterator)) processing. It would be also great if we can measure the 
performance after the optimizations and compare to performance prior to 
HIVE-15580.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 55776: Eliminate unbounded memory usage for orderBy and groupBy in Hive on Spark

2017-01-20 Thread Xuefu Zhang


On Jan. 20, 2017, 6:26 p.m., Xuefu Zhang wrote:
> > I think we also need to update 
> > ql/src/test/results/clientpositive/union_top_level.q.out
> 
> Xuefu Zhang wrote:
> No. I verified that MR's plan and result don't change at all. This is 
> because the keys are the same for group by and order by.
> 
> Chao Sun wrote:
> Hmm.. I'm surprised. We changed the input qfile and how come the result 
> is not changed?
> 
> Xuefu Zhang wrote:
> MR group by is also sorted, so the order by is something not needed so 
> eliminated during optimization. So you, the test didn't fail in the Jenkins 
> result.
> 
> Chao Sun wrote:
> No, I mean the input query is changed, so the output should also be 
> changed. If you look at the MR output qfile, it still has
> ```
> PREHOOK: query: explain
> select * from (select s1.key as k, s2.value as v from src s1 join src s2 
> on (s1.key = s2.key) limit 10)a
> union all
> select * from (select s1.key as k, s2.value as v from src s1 join src s2 
> on (s1.key = s2.key) limit 10)b
> PREHOOK: type: QUERY
> POSTHOOK: query: explain
> select * from (select s1.key as k, s2.value as v from src s1 join src s2 
> on (s1.key = s2.key) limit 10)a
> union all
> select * from (select s1.key as k, s2.value as v from src s1 join src s2 
> on (s1.key = s2.key) limit 10)b
> POSTHOOK: type: QUERY
> ```
> which suggest the test is not triggered on the MR path. Anyway, maybe the 
> test is turned off for MR.

Yeah, Got it. Maybe. I don't believe that's a blocker. We can file a followup 
JIRA for this. Do you have any other comments?


- Xuefu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55776/#review162449
---


On Jan. 20, 2017, 6:07 p.m., Xuefu Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55776/
> ---
> 
> (Updated Jan. 20, 2017, 6:07 p.m.)
> 
> 
> Review request for hive, Chao Sun and Rui Li.
> 
> 
> Bugs: HIVE-15580
> https://issues.apache.org/jira/browse/HIVE-15580
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> See JIRA description.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java 
> e128dd2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
> eeb4443 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java
>  d57cac4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java 3d56876 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java a774395 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SortByShuffler.java 
> 997ab7e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
> 66ffe5d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  0d31e5f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkShuffler.java 40e251f 
>   ql/src/test/queries/clientpositive/union_top_level.q d93fe38 
>   ql/src/test/results/clientpositive/llap/union_top_level.q.out b48ab83 
>   ql/src/test/results/clientpositive/spark/lateral_view_explode2.q.out 
> 65a6e3e 
>   ql/src/test/results/clientpositive/spark/union_remove_25.q.out 9fec1d4 
>   ql/src/test/results/clientpositive/spark/union_top_level.q.out c9cb5d3 
>   ql/src/test/results/clientpositive/spark/vector_outer_join5.q.out 9e1742f 
> 
> Diff: https://reviews.apache.org/r/55776/diff/
> 
> 
> Testing
> ---
> 
> All test passed
> 
> 
> Thanks,
> 
> Xuefu Zhang
> 
>



Re: Review Request 55776: Eliminate unbounded memory usage for orderBy and groupBy in Hive on Spark

2017-01-20 Thread Xuefu Zhang


On Jan. 20, 2017, 6:26 p.m., Xuefu Zhang wrote:
> > I think we also need to update 
> > ql/src/test/results/clientpositive/union_top_level.q.out
> 
> Xuefu Zhang wrote:
> No. I verified that MR's plan and result don't change at all. This is 
> because the keys are the same for group by and order by.
> 
> Chao Sun wrote:
> Hmm.. I'm surprised. We changed the input qfile and how come the result 
> is not changed?

MR group by is also sorted, so the order by is something not needed so 
eliminated during optimization. So you, the test didn't fail in the Jenkins 
result.


- Xuefu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55776/#review162449
---


On Jan. 20, 2017, 6:07 p.m., Xuefu Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55776/
> ---
> 
> (Updated Jan. 20, 2017, 6:07 p.m.)
> 
> 
> Review request for hive, Chao Sun and Rui Li.
> 
> 
> Bugs: HIVE-15580
> https://issues.apache.org/jira/browse/HIVE-15580
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> See JIRA description.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java 
> e128dd2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
> eeb4443 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java
>  d57cac4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java 3d56876 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java a774395 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SortByShuffler.java 
> 997ab7e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
> 66ffe5d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  0d31e5f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkShuffler.java 40e251f 
>   ql/src/test/queries/clientpositive/union_top_level.q d93fe38 
>   ql/src/test/results/clientpositive/llap/union_top_level.q.out b48ab83 
>   ql/src/test/results/clientpositive/spark/lateral_view_explode2.q.out 
> 65a6e3e 
>   ql/src/test/results/clientpositive/spark/union_remove_25.q.out 9fec1d4 
>   ql/src/test/results/clientpositive/spark/union_top_level.q.out c9cb5d3 
>   ql/src/test/results/clientpositive/spark/vector_outer_join5.q.out 9e1742f 
> 
> Diff: https://reviews.apache.org/r/55776/diff/
> 
> 
> Testing
> ---
> 
> All test passed
> 
> 
> Thanks,
> 
> Xuefu Zhang
> 
>



Re: Review Request 55776: Eliminate unbounded memory usage for orderBy and groupBy in Hive on Spark

2017-01-20 Thread Xuefu Zhang


On Jan. 20, 2017, 6:26 p.m., Xuefu Zhang wrote:
> > I think we also need to update 
> > ql/src/test/results/clientpositive/union_top_level.q.out

No. I verified that MR's plan and result don't change at all. This is because 
the keys are the same for group by and order by.


- Xuefu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55776/#review162449
---


On Jan. 20, 2017, 6:07 p.m., Xuefu Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55776/
> ---
> 
> (Updated Jan. 20, 2017, 6:07 p.m.)
> 
> 
> Review request for hive, Chao Sun and Rui Li.
> 
> 
> Bugs: HIVE-15580
> https://issues.apache.org/jira/browse/HIVE-15580
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> See JIRA description.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java 
> e128dd2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
> eeb4443 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java
>  d57cac4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java 3d56876 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java a774395 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SortByShuffler.java 
> 997ab7e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
> 66ffe5d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  0d31e5f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkShuffler.java 40e251f 
>   ql/src/test/queries/clientpositive/union_top_level.q d93fe38 
>   ql/src/test/results/clientpositive/llap/union_top_level.q.out b48ab83 
>   ql/src/test/results/clientpositive/spark/lateral_view_explode2.q.out 
> 65a6e3e 
>   ql/src/test/results/clientpositive/spark/union_remove_25.q.out 9fec1d4 
>   ql/src/test/results/clientpositive/spark/union_top_level.q.out c9cb5d3 
>   ql/src/test/results/clientpositive/spark/vector_outer_join5.q.out 9e1742f 
> 
> Diff: https://reviews.apache.org/r/55776/diff/
> 
> 
> Testing
> ---
> 
> All test passed
> 
> 
> Thanks,
> 
> Xuefu Zhang
> 
>



Re: Review Request 55776: Eliminate unbounded memory usage for orderBy and groupBy in Hive on Spark

2017-01-20 Thread Xuefu Zhang


> On Jan. 20, 2017, 6:26 p.m., Chao Sun wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java, line 
> > 31
> > <https://reviews.apache.org/r/55776/diff/1/?file=1610799#file1610799line31>
> >
> > Is it possible that `numPartitions` equals to 0?

No. If partition number is zero, that means no partition. Then we will not even 
get here. Nevertheless, if it's set to 0, we take 1 instead.


> On Jan. 20, 2017, 6:26 p.m., Chao Sun wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java, line 
> > 34
> > <https://reviews.apache.org/r/55776/diff/1/?file=1610799#file1610799line34>
> >
> > I wonder whether this also has some extra cost comparing to the 
> > original `groupByKey`, since it needs to sort all records by key in a 
> > single partition, right?

Well, we don't know which one performs better yet. 
repartitionAndSortWithinPartitions() brings extra softing, but it eliminates 
grouping in groupByKey(). Also, groupByKey() has unbounded memory usage, which 
is the problem we are tryig to solve. As described in the JIRA description. We 
will follow up with performance testing, and may provide an option to use 
either groupBy() which might be more performing but w/ unlimitted memory usage 
or the new way where memory usage is bounded.


- Xuefu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55776/#review162449
-------


On Jan. 20, 2017, 6:07 p.m., Xuefu Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55776/
> ---
> 
> (Updated Jan. 20, 2017, 6:07 p.m.)
> 
> 
> Review request for hive, Chao Sun and Rui Li.
> 
> 
> Bugs: HIVE-15580
> https://issues.apache.org/jira/browse/HIVE-15580
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> See JIRA description.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java 
> e128dd2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
> eeb4443 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java
>  d57cac4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java 3d56876 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java a774395 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SortByShuffler.java 
> 997ab7e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
> 66ffe5d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  0d31e5f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkShuffler.java 40e251f 
>   ql/src/test/queries/clientpositive/union_top_level.q d93fe38 
>   ql/src/test/results/clientpositive/llap/union_top_level.q.out b48ab83 
>   ql/src/test/results/clientpositive/spark/lateral_view_explode2.q.out 
> 65a6e3e 
>   ql/src/test/results/clientpositive/spark/union_remove_25.q.out 9fec1d4 
>   ql/src/test/results/clientpositive/spark/union_top_level.q.out c9cb5d3 
>   ql/src/test/results/clientpositive/spark/vector_outer_join5.q.out 9e1742f 
> 
> Diff: https://reviews.apache.org/r/55776/diff/
> 
> 
> Testing
> ---
> 
> All test passed
> 
> 
> Thanks,
> 
> Xuefu Zhang
> 
>



Review Request 55776: Eliminate unbounded memory usage for orderBy and groupBy in Hive on Spark

2017-01-20 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55776/
---

Review request for hive, Chao Sun and Rui Li.


Bugs: HIVE-15580
https://issues.apache.org/jira/browse/HIVE-15580


Repository: hive-git


Description
---

See JIRA description.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java e128dd2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
eeb4443 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java
 d57cac4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java 3d56876 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java a774395 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SortByShuffler.java 997ab7e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
66ffe5d 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 
0d31e5f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkShuffler.java 40e251f 
  ql/src/test/queries/clientpositive/union_top_level.q d93fe38 
  ql/src/test/results/clientpositive/llap/union_top_level.q.out b48ab83 
  ql/src/test/results/clientpositive/spark/lateral_view_explode2.q.out 65a6e3e 
  ql/src/test/results/clientpositive/spark/union_remove_25.q.out 9fec1d4 
  ql/src/test/results/clientpositive/spark/union_top_level.q.out c9cb5d3 
  ql/src/test/results/clientpositive/spark/vector_outer_join5.q.out 9e1742f 

Diff: https://reviews.apache.org/r/55776/diff/


Testing
---

All test passed


Thanks,

Xuefu Zhang



[jira] [Created] (HIVE-15671) RPCServer.registerClient() erroneously uses server/client handshake timeout for connection timeout

2017-01-19 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-15671:
--

 Summary: RPCServer.registerClient() erroneously uses server/client 
handshake timeout for connection timeout
 Key: HIVE-15671
 URL: https://issues.apache.org/jira/browse/HIVE-15671
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: 1.1.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang


{code}
  /**
   * Tells the RPC server to expect a connection from a new client.
   * ...
   */
  public Future registerClient(final String clientId, String secret,
  RpcDispatcher serverDispatcher) {
return registerClient(clientId, secret, serverDispatcher, 
config.getServerConnectTimeoutMs());
  }
{code}

config.getServerConnectTimeoutMs() returns value for 
hive.spark.client.server.connect.timeout, which is meant for timeout for 
handshake between Hive client and remote Spark driver. Instead, the timeout 
should be hive.spark.client.connect.timeout, which is for timeout for remote 
Spark driver in connecting back to Hive client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15580) Replace Spark's groupByKey operator with something with bounded memory

2017-01-10 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-15580:
--

 Summary: Replace Spark's groupByKey operator with something with 
bounded memory
 Key: HIVE-15580
 URL: https://issues.apache.org/jira/browse/HIVE-15580
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15543) Don't try to get memory/cores to decide parallelism when Spark dynamic allocation is enabled

2017-01-04 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-15543:
--

 Summary: Don't try to get memory/cores to decide parallelism when 
Spark dynamic allocation is enabled
 Key: HIVE-15543
 URL: https://issues.apache.org/jira/browse/HIVE-15543
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Affects Versions: 2.2.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang


Presently Hive tries to get numbers for memory and cores from the Spark 
application and use them to determine RS parallelism. However, this doesn't 
make sense when Spark dynamic allocation is enabled because the current numbers 
doesn't represent available computing resources, especially when SparkContext 
is initially launched.

Thus, it makes send not to do that when dynamic allocation is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15527) Memory usage is unbound in SortByShuffler for Spark

2016-12-30 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-15527:
--

 Summary: Memory usage is unbound in SortByShuffler for Spark
 Key: HIVE-15527
 URL: https://issues.apache.org/jira/browse/HIVE-15527
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Affects Versions: 1.1.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang


In SortByShuffler.java, an ArrayList is used to back the iterator for values 
that have the same key in shuffled result produced by spark transformation 
sortByKey. It's possible that memory can be exhausted because of a large key 
group.

{code}
@Override
public Tuple2<HiveKey, Iterable> next() {
  // TODO: implement this by accumulating rows with the same key 
into a list.
  // Note that this list needs to improved to prevent excessive 
memory usage, but this
  // can be done in later phase.
  while (it.hasNext()) {
Tuple2<HiveKey, BytesWritable> pair = it.next();
if (curKey != null && !curKey.equals(pair._1())) {
  HiveKey key = curKey;
  List values = curValues;
  curKey = pair._1();
  curValues = new ArrayList();
  curValues.add(pair._2());
  return new Tuple2<HiveKey, Iterable>(key, 
values);
}
curKey = pair._1();
curValues.add(pair._2());
  }
  if (curKey == null) {
throw new NoSuchElementException();
  }
  // if we get here, this should be the last element we have
  HiveKey key = curKey;
  curKey = null;
  return new Tuple2<HiveKey, Iterable>(key, 
curValues);
}
{code}

Since the output from sortByKey is already sorted on key, it's possible to 
backup the value iterable using the input iterator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Naveen Gangam has become a Hive Committer

2016-12-16 Thread Xuefu Zhang
Bcc: dev/user

Hi all,

It's my honor to announce that Apache Hive PMC has voted on and approved
Naveen's committership. Please join me in congratulate him on his
contributions and achievements.

Regards,
Xuefu


Re: [ANNOUNCE] New Hive Committer - Rajesh Balamohan

2016-12-13 Thread Xuefu Zhang
Congratulations!

On Tue, Dec 13, 2016 at 6:51 PM, Prasanth Jayachandran  wrote:

> The Apache Hive PMC has voted to make Rajesh Balamohan a committer on the
> Apache Hive Project. Please join me in congratulating Rajesh.
>
> Congratulations Rajesh!
>
> Thanks
> Prasanth


[jira] [Created] (HIVE-15237) Propagate Spark job failure to Hive

2016-11-17 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-15237:
--

 Summary: Propagate Spark job failure to Hive
 Key: HIVE-15237
 URL: https://issues.apache.org/jira/browse/HIVE-15237
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: 2.1.0
Reporter: Xuefu Zhang


If a Spark job failed for some reason, Hive doesn't get any additional error 
message, which makes it very hard for user to figure out why. Here is an 
example:
{code}
Status: Running (Hive on Spark job[0])
Job Progress Format
CurrentTime StageId_StageAttemptId: 
SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount 
[StageCost]
2016-11-17 21:32:53,134 Stage-0_0: 0/23 Stage-1_0: 0/28 
2016-11-17 21:32:55,156 Stage-0_0: 0(+1)/23 Stage-1_0: 0/28 
2016-11-17 21:32:57,167 Stage-0_0: 0(+3)/23 Stage-1_0: 0/28 
2016-11-17 21:33:00,216 Stage-0_0: 0(+3)/23 Stage-1_0: 0/28 
2016-11-17 21:33:03,251 Stage-0_0: 0(+3)/23 Stage-1_0: 0/28 
2016-11-17 21:33:06,286 Stage-0_0: 0(+4)/23 Stage-1_0: 0/28 
2016-11-17 21:33:09,308 Stage-0_0: 0(+2,-3)/23  Stage-1_0: 0/28 
2016-11-17 21:33:12,332 Stage-0_0: 0(+2,-3)/23  Stage-1_0: 0/28 
2016-11-17 21:33:13,338 Stage-0_0: 0(+21,-3)/23 Stage-1_0: 0/28 
2016-11-17 21:33:15,349 Stage-0_0: 0(+21,-5)/23 Stage-1_0: 0/28 
2016-11-17 21:33:16,358 Stage-0_0: 0(+18,-8)/23 Stage-1_0: 0/28 
2016-11-17 21:33:19,373 Stage-0_0: 0(+21,-8)/23 Stage-1_0: 0/28 
2016-11-17 21:33:22,400 Stage-0_0: 0(+18,-14)/23Stage-1_0: 0/28 
2016-11-17 21:33:23,404 Stage-0_0: 0(+15,-20)/23Stage-1_0: 0/28 
2016-11-17 21:33:24,408 Stage-0_0: 0(+12,-23)/23Stage-1_0: 0/28 
2016-11-17 21:33:25,417 Stage-0_0: 0(+9,-26)/23 Stage-1_0: 0/28 
2016-11-17 21:33:26,420 Stage-0_0: 0(+12,-26)/23Stage-1_0: 0/28 
2016-11-17 21:33:28,427 Stage-0_0: 0(+9,-29)/23 Stage-1_0: 0/28 
2016-11-17 21:33:29,432 Stage-0_0: 0(+12,-29)/23Stage-1_0: 0/28 
2016-11-17 21:33:31,444 Stage-0_0: 0(+18,-29)/23Stage-1_0: 0/28 
2016-11-17 21:33:34,464 Stage-0_0: 0(+18,-29)/23Stage-1_0: 0/28 
Status: Failed
FAILED: Execution Error, return code 3 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTask
{code}
It would be better if we can propagate Spark error to Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 53166: HIVE-14910: Flaky test: TestSparkClient.testJobSubmission

2016-10-28 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53166/#review154132
---




spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java 
(line 165)
<https://reviews.apache.org/r/53166/#comment223637>

Nit: could we avoid using single letter variable name for things other than 
integer iterators? Same below.


- Xuefu Zhang


On Oct. 26, 2016, 1:48 p.m., Barna Zsombor Klara wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53166/
> ---
> 
> (Updated Oct. 26, 2016, 1:48 p.m.)
> 
> 
> Review request for hive, Mohit Sabharwal, Siddharth Seth, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-14910: Flaky test: TestSparkClient.testJobSubmission
> I ran into this problem today while investigating a flaky test. I think the 
> failure is coming from this race condition: the listener can be added to the 
> JobHandle only after the job has been submitted. So there is no guarantee 
> that every method of the listener will be invoked, some state changes may 
> have happened before the caller received the handler back.
> I propose a slight change in the API. We should add the listeners as an 
> argument of the submit method, so we can set them on the Handler before the 
> job itself is submitted. This way any status change should be signalled to 
> the listener.
> 
> 
> Diffs
> -
> 
>   spark-client/src/main/java/org/apache/hive/spark/client/JobHandle.java 
> 44aa255a8271894ed3e787c3e7d1323628db63c4 
>   spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java 
> 17c8f40edd472682d5604f41980d06e60cc92893 
>   spark-client/src/main/java/org/apache/hive/spark/client/SparkClient.java 
> 3e921a5d9b77966d368684ee7b6f1c861ac60e08 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
> e2a30a76e0f7fe95d8a453f502311baa08abcbe2 
>   spark-client/src/test/java/org/apache/hive/spark/client/TestJobHandle.java 
> e8f352dce9f618573c2d79e9b8c59e19fad7298a 
>   
> spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java 
> b95cd7a05d44224b53bf2cef9170146b8b2eb4a8 
> 
> Diff: https://reviews.apache.org/r/53166/diff/
> 
> 
> Testing
> ---
> 
> Unit tests modified and tested.
> I also ran a simple query with HoS as the execution engine.
> 
> 
> Thanks,
> 
> Barna Zsombor Klara
> 
>



Re: behavior or insert overwrite with dynamic partitions

2016-10-17 Thread Xuefu Zhang
It seems right to me that an existing partition should be overwritten if
that partition gets any data while older, untouched partition should stay.
After all, we are overwriting certain partitions, not the whole table.

--Xuefu

On Mon, Oct 17, 2016 at 6:10 PM, Sergey Shelukhin 
wrote:

> What do you think this SHOULD do?
>
> > select key from src;
> 10
> 25
> 50
>
> > create table t(val int) partitioned by (pk int);
> > insert overwrite table t partition (pk)
>   select 0 as val, key from src where key < 30;
> > insert overwrite table t partition (pk)
>   select 1 as val, key from src where key > 20;
>
>
> > select val, pk from t;
> ?
>
>


[jira] [Created] (HIVE-14885) Support PPD for nested columns

2016-10-04 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-14885:
--

 Summary: Support PPD for nested columns
 Key: HIVE-14885
 URL: https://issues.apache.org/jira/browse/HIVE-14885
 Project: Hive
  Issue Type: Improvement
  Components: Logical Optimizer, Serializers/Deserializers
Affects Versions: 2.1.0
Reporter: Xuefu Zhang


It looks like that PPD doesn't work for nested columns, at least not for 
Parquet. For a given schema
{code}
hive> desc nested;
OK
a   int 
b   string  
c   struct<d:int,e:string>  
{code}
PPD works for a query like
{code}
select * from nested where a=1;
{code}
while NOT for
{code}
select * from nested where c.d=2;
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 50787: Add a timezone-aware timestamp

2016-09-21 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50787/#review149942
---


Ship it!




Ship It!

- Xuefu Zhang


On Sept. 22, 2016, 4:05 a.m., Rui Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50787/
> ---
> 
> (Updated Sept. 22, 2016, 4:05 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14412
> https://issues.apache.org/jira/browse/HIVE-14412
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The 1st patch to add timezone-aware timestamp.
> 
> 
> Diffs
> -
> 
>   common/src/test/org/apache/hadoop/hive/common/type/TestTimestampTZ.java 
> PRE-CREATION 
>   contrib/src/test/queries/clientnegative/serde_regex.q a676338 
>   contrib/src/test/queries/clientpositive/serde_regex.q d75d607 
>   contrib/src/test/results/clientnegative/serde_regex.q.out 0f9b036 
>   contrib/src/test/results/clientpositive/serde_regex.q.out 2984293 
>   hbase-handler/src/test/queries/positive/hbase_timestamp.q 0350afe 
>   hbase-handler/src/test/results/positive/hbase_timestamp.q.out 3918121 
>   jdbc/src/java/org/apache/hive/jdbc/HiveBaseResultSet.java 93f093f 
>   jdbc/src/java/org/apache/hive/jdbc/JdbcColumn.java 38918f0 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java de74c3e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java f28d33e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java 
> 7be628e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/TypeConverter.java
>  ba41518 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
> 8b0db4a 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g 7ceb005 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 62bbcc6 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g 9ba1865 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 
> 82080eb 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java a718264 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToBoolean.java 17b892c 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToByte.java efae82d 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToDouble.java 9cbc114 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToFloat.java 5808c90 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java a7551cb 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToLong.java c961d14 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToShort.java 570408a 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToString.java 5cacd59 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java 259fde8 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFToTimestampTZ.java
>  PRE-CREATION 
>   ql/src/test/queries/clientnegative/serde_regex.q c9cfc7d 
>   ql/src/test/queries/clientnegative/serde_regex2.q a29bb9c 
>   ql/src/test/queries/clientnegative/serde_regex3.q 4e91f06 
>   ql/src/test/queries/clientpositive/create_like.q bd39731 
>   ql/src/test/queries/clientpositive/join43.q 12c45a6 
>   ql/src/test/queries/clientpositive/serde_regex.q e21c6e1 
>   ql/src/test/queries/clientpositive/timestamptz.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/timestamptz_1.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/timestamptz_2.q PRE-CREATION 
>   ql/src/test/results/clientnegative/invalid_cast_from_binary_1.q.out acecbae 
>   ql/src/test/results/clientnegative/invalid_cast_from_binary_2.q.out 41e1c80 
>   ql/src/test/results/clientnegative/invalid_cast_from_binary_3.q.out 23e3403 
>   ql/src/test/results/clientnegative/invalid_cast_from_binary_4.q.out 3541ef6 
>   ql/src/test/results/clientnegative/invalid_cast_from_binary_5.q.out 177039c 
>   ql/src/test/results/clientnegative/invalid_cast_from_binary_6.q.out 668380f 
>   ql/src/test/results/clientnegative/serde_regex.q.out 7892bb2 
>   ql/src/test/results/clientnegative/serde_regex2.q.out 1ceb387 
>   ql/src/test/results/clientnegative/serde_regex3.q.out 028a24f 
>   ql/src/test/results/clientnegative/wrong_column_type.q.out 6ff90ea 
>   ql/src/test/results/clientpositive/create_like.q.out 0111c94 
>   ql/src/test/results/clientpositive/join43.q.out 127d5d0 
>   ql/src/test/results/clientpositive/serde_regex.q.out 7bebb0c 
>   ql/src/test/results/clientpositive/timestamptz.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/timestamptz_1.q.out PRE-CREATION 
>   ql/src/

Re: Review Request 50787: Add a timezone-aware timestamp

2016-09-21 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50787/#review149940
---




ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java (line 406)
<https://reviews.apache.org/r/50787/#comment217724>

Nit: might be better to put timestamptz right after timestamp for visual 
ease, but not a big deal. Same below.



serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampTZWritable.java (line 
1)
<https://reviews.apache.org/r/50787/#comment217725>

License header please.


- Xuefu Zhang


On Sept. 22, 2016, 1:08 a.m., Rui Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50787/
> ---
> 
> (Updated Sept. 22, 2016, 1:08 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14412
> https://issues.apache.org/jira/browse/HIVE-14412
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The 1st patch to add timezone-aware timestamp.
> 
> 
> Diffs
> -
> 
>   common/src/test/org/apache/hadoop/hive/common/type/TestTimestampTZ.java 
> PRE-CREATION 
>   contrib/src/test/queries/clientnegative/serde_regex.q a676338 
>   contrib/src/test/queries/clientpositive/serde_regex.q d75d607 
>   contrib/src/test/results/clientnegative/serde_regex.q.out 0f9b036 
>   contrib/src/test/results/clientpositive/serde_regex.q.out 2984293 
>   hbase-handler/src/test/queries/positive/hbase_timestamp.q 0350afe 
>   hbase-handler/src/test/results/positive/hbase_timestamp.q.out 3918121 
>   jdbc/src/java/org/apache/hive/jdbc/HiveBaseResultSet.java 93f093f 
>   jdbc/src/java/org/apache/hive/jdbc/JdbcColumn.java 38918f0 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java de74c3e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java f28d33e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java 
> 7be628e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/TypeConverter.java
>  ba41518 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
> 8b0db4a 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g 7ceb005 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 62bbcc6 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g 9ba1865 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 
> 82080eb 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java a718264 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToBoolean.java 17b892c 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToByte.java efae82d 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToDouble.java 9cbc114 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToFloat.java 5808c90 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java a7551cb 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToLong.java c961d14 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToShort.java 570408a 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToString.java 5cacd59 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java 259fde8 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFToTimestampTZ.java
>  PRE-CREATION 
>   ql/src/test/queries/clientnegative/serde_regex.q c9cfc7d 
>   ql/src/test/queries/clientnegative/serde_regex2.q a29bb9c 
>   ql/src/test/queries/clientnegative/serde_regex3.q 4e91f06 
>   ql/src/test/queries/clientpositive/create_like.q bd39731 
>   ql/src/test/queries/clientpositive/join43.q 12c45a6 
>   ql/src/test/queries/clientpositive/serde_regex.q e21c6e1 
>   ql/src/test/queries/clientpositive/timestamptz.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/timestamptz_1.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/timestamptz_2.q PRE-CREATION 
>   ql/src/test/results/clientnegative/invalid_cast_from_binary_1.q.out acecbae 
>   ql/src/test/results/clientnegative/invalid_cast_from_binary_2.q.out 41e1c80 
>   ql/src/test/results/clientnegative/invalid_cast_from_binary_3.q.out 23e3403 
>   ql/src/test/results/clientnegative/invalid_cast_from_binary_4.q.out 3541ef6 
>   ql/src/test/results/clientnegative/invalid_cast_from_binary_5.q.out 177039c 
>   ql/src/test/results/clientnegative/invalid_cast_from_binary_6.q.out 668380f 
>   ql/src/test/results/clientnegative/serde_regex.q.out 7892bb2 
>   ql/src/test/results/clientnegative/serde_regex2.q.out 1ceb387 
>   ql/src/test/results/clientnegative/serde_regex3.q.out 028a24f 
>   ql/src/test/results/

[jira] [Created] (HIVE-14617) NPE in UDF MapValues() if input is null

2016-08-24 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-14617:
--

 Summary: NPE in UDF MapValues() if input is null
 Key: HIVE-14617
 URL: https://issues.apache.org/jira/browse/HIVE-14617
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 2.1.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang


Job fails with error msg as follows:
{code}
Error: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row 
{"ts":null,"_max_added_id":null,"identity_info":null,"vehicle_specs":null,"tracking_info":null,"color_info":null,"vehicle_traits":null,"detail_info":null,"_row_key":null,"_shard":null,"image_info":null,"vehicle_tags":null,"activation_info":null,"flavor_info":null,"sounds":null,"legacy_info":null,"images":null,"datestr":"2016-08-24"}
 at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179) at 
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at 
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at 
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:422) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row 
{"ts":null,"_max_added_id":null,"identity_info":null,"vehicle_specs":null,"tracking_info":null,"color_info":null,"vehicle_traits":null,"detail_info":null,"_row_key":null,"_shard":null,"image_info":null,"vehicle_tags":null,"activation_info":null,"flavor_info":null,"sounds":null,"legacy_info":null,"images":null,"datestr":"2016-08-24"}
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507) at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170) ... 8 
more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error 
evaluating map_values(vehicle_traits.vehicle_traits) at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:82) 
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at 
org.apache.hadoop.hive.ql.exec.LateralViewForwardOperator.processOp(LateralViewForwardOperator.java:37)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
 at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497) 
... 9 more Caused by: java.lang.NullPointerException at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDFMapValues.evaluate(GenericUDFMapValues.java:64)
 at 
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:185)
 at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
 at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
 at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:77) 
... 15 more 
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Question about Hive on Spark

2016-08-22 Thread Xuefu Zhang
That happens per session or if certain configs are changed in the session.

On Mon, Aug 22, 2016 at 5:53 PM, Tao Li  wrote:

> Hi,
>
> Looks like the Spark client (SparkClientImpl class) submits Sparks jobs to
> the YARN cluster by forking a process and kicking off spark-submit script.
> Are we provisioning new containers every time we submit a job? There could
> be a perf hit by doing that.
>
> Thanks.
>


Re: [ANNOUNCE] New PMC Member : Pengcheng

2016-07-17 Thread Xuefu Zhang
Congrats, PengCheng!

On Sun, Jul 17, 2016 at 2:28 PM, Sushanth Sowmyan 
wrote:

> Welcome aboard Pengcheng! :)
>
> On Jul 17, 2016 12:01, "Lefty Leverenz"  wrote:
>
> > Congratulations Pengcheng!
> >
> > -- Lefty
> >
> > On Sun, Jul 17, 2016 at 1:03 PM, Ashutosh Chauhan 
> > wrote:
> >
> >> >
> >> > Hello Hive community,
> >> >
> >> > I'm pleased to announce that Pengcheng Xiong has accepted the Apache
> >> Hive
> >> > PMC's
> >> > invitation, and is now our newest PMC member. Many thanks to Pengcheng
> >> for
> >> > all of his hard work.
> >> >
> >> > Please join me congratulating Pengcheng!
> >> >
> >> > Best,
> >> > Ashutosh
> >> > (On behalf of the Apache Hive PMC)
> >> >
> >>
> >
> >
>


Re: [Announce] New Hive Committer - Mohit Sabharwal

2016-07-04 Thread Xuefu Zhang
Congrats, Mohit!

--Xuefu

On Mon, Jul 4, 2016 at 10:09 AM, Sahil Takiar 
wrote:

> Congrats Mohit!
>
> On Sun, Jul 3, 2016 at 10:11 PM, Lefty Leverenz 
> wrote:
>
> > Congratulations Mohit!
> >
> > -- Lefty
> >
> > On Sun, Jul 3, 2016 at 11:50 PM, Alpesh Patel 
> > wrote:
> >
> > > Congrats
> > > On Jul 1, 2016 9:57 AM, "Szehon Ho"  wrote:
> > >
> > >> On behalf of the Apache Hive PMC, I'm pleased to announce that Mohit
> > >> Sabharwal has been voted a committer on the Apache Hive project.
> > >>
> > >> Please join me in congratulating Mohit !
> > >>
> > >> Thanks,
> > >> Szehon
> > >>
> > >
> >
>
>
>
> --
> Sahil Takiar
> Senior Software Engineer at LinkedIn
> takiar.sa...@gmail.com | (510) 673-0309
>


[jira] [Created] (HIVE-13873) Column pruning for nested fields

2016-05-26 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-13873:
--

 Summary: Column pruning for nested fields
 Key: HIVE-13873
 URL: https://issues.apache.org/jira/browse/HIVE-13873
 Project: Hive
  Issue Type: New Feature
  Components: Logical Optimizer
Reporter: Xuefu Zhang


Some columnar file formats such as Parquet store fields in struct type also 
column by column using encoding described in Google Dramel pager. It's very 
common in big data where data are stored in structs while queries only needs a 
subset of the the fields in the structs. However, presently Hive still needs to 
read the whole struct regardless whether all fields are selected. Therefore, 
pruning unwanted sub-fields in struct or nested fields at file reading time 
would be a big performance boost for such scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Error in Hive on Spark

2016-03-23 Thread Xuefu Zhang
Yes, it seems more viable that you integrate your application with HS2 via
JDBC or thrift rather than at code level.

--Xuefu

On Tue, Mar 22, 2016 at 12:01 AM, Stana <st...@is-land.com.tw> wrote:

> Hi, Xuefu
>
> You are right.
> Maybe I should launch spark-submit by HS2 or Hive CLI ?
>
> Thanks a lot,
> Stana
>
>
> 2016-03-22 1:16 GMT+08:00 Xuefu Zhang <xu...@uber.com>:
>
> > Stana,
> >
> > I'm not sure if I fully understand the problem. spark-submit is launched
> in
> > the same host as your application, which should be able to access
> > hive-exec.jar. Yarn cluster needs the jar also, but HS2 or Hive CLI will
> > take care of that. Since you are not using either of which, then, it's
> your
> > application's responsibility to make that happen.
> >
> > Did I missed anything else?
> >
> > Thanks,
> > Xuefu
> >
> > On Sun, Mar 20, 2016 at 11:18 PM, Stana <st...@is-land.com.tw> wrote:
> >
> > > Does anyone have suggestions in setting property of hive-exec-2.0.0.jar
> > > path in application?
> > > Something like
> > >
> > >
> >
> 'hiveConf.set("hive.remote.driver.jar","hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar")'.
> > >
> > >
> > >
> > > 2016-03-11 10:53 GMT+08:00 Stana <st...@is-land.com.tw>:
> > >
> > > > Thanks for reply
> > > >
> > > > I have set the property spark.home in my application. Otherwise the
> > > > application threw 'SPARK_HOME not found exception'.
> > > >
> > > > I found hive source code in SparkClientImpl.java:
> > > >
> > > > private Thread startDriver(final RpcServer rpcServer, final String
> > > > clientId, final String secret)
> > > >   throws IOException {
> > > > ...
> > > >
> > > > List argv = Lists.newArrayList();
> > > >
> > > > ...
> > > >
> > > > argv.add("--class");
> > > > argv.add(RemoteDriver.class.getName());
> > > >
> > > > String jar = "spark-internal";
> > > > if (SparkContext.jarOfClass(this.getClass()).isDefined()) {
> > > > jar = SparkContext.jarOfClass(this.getClass()).get();
> > > > }
> > > > argv.add(jar);
> > > >
> > > > ...
> > > >
> > > > }
> > > >
> > > > When hive executed spark-submit , it generate the shell command with
> > > > --class org.apache.hive.spark.client.RemoteDriver ,and set jar path
> > with
> > > > SparkContext.jarOfClass(this.getClass()).get(). It will get the local
> > > path
> > > > of hive-exec-2.0.0.jar.
> > > >
> > > > In my situation, the application and yarn cluster are in different
> > > cluster.
> > > > When application executed spark-submit with local path of
> > > > hive-exec-2.0.0.jar to yarn cluster, there 's no hive-exec-2.0.0.jar
> in
> > > > yarn cluster. Then application threw the exception:
> > "hive-exec-2.0.0.jar
> > > >   does not exist ...".
> > > >
> > > > Can it be set property of hive-exec-2.0.0.jar path in application ?
> > > > Something like 'hiveConf.set("hive.remote.driver.jar",
> > > > "hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar")'.
> > > > If not, is it possible to achieve in the future version?
> > > >
> > > >
> > > >
> > > >
> > > > 2016-03-10 23:51 GMT+08:00 Xuefu Zhang <xu...@uber.com>:
> > > >
> > > >> You can probably avoid the problem by set environment variable
> > > SPARK_HOME
> > > >> or JVM property spark.home that points to your spark installation.
> > > >>
> > > >> --Xuefu
> > > >>
> > > >> On Thu, Mar 10, 2016 at 3:11 AM, Stana <st...@is-land.com.tw>
> wrote:
> > > >>
> > > >> >  I am trying out Hive on Spark with hive 2.0.0 and spark 1.4.1,
> and
> > > >> > executing org.apache.hadoop.hive.ql.Driver with java application.
> > > >> >
> > > >> > Following are my situations:
> > > >> > 1.Building spark 1.4.1 assembly jar without Hive .
> > > >> > 2.Uploading the spark assembly jar to the hadoop cluster.
> > > >> > 3.Executing the java 

Re: Error in Hive on Spark

2016-03-21 Thread Xuefu Zhang
Stana,

I'm not sure if I fully understand the problem. spark-submit is launched in
the same host as your application, which should be able to access
hive-exec.jar. Yarn cluster needs the jar also, but HS2 or Hive CLI will
take care of that. Since you are not using either of which, then, it's your
application's responsibility to make that happen.

Did I missed anything else?

Thanks,
Xuefu

On Sun, Mar 20, 2016 at 11:18 PM, Stana <st...@is-land.com.tw> wrote:

> Does anyone have suggestions in setting property of hive-exec-2.0.0.jar
> path in application?
> Something like
>
> 'hiveConf.set("hive.remote.driver.jar","hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar")'.
>
>
>
> 2016-03-11 10:53 GMT+08:00 Stana <st...@is-land.com.tw>:
>
> > Thanks for reply
> >
> > I have set the property spark.home in my application. Otherwise the
> > application threw 'SPARK_HOME not found exception'.
> >
> > I found hive source code in SparkClientImpl.java:
> >
> > private Thread startDriver(final RpcServer rpcServer, final String
> > clientId, final String secret)
> >   throws IOException {
> > ...
> >
> > List argv = Lists.newArrayList();
> >
> > ...
> >
> > argv.add("--class");
> > argv.add(RemoteDriver.class.getName());
> >
> > String jar = "spark-internal";
> > if (SparkContext.jarOfClass(this.getClass()).isDefined()) {
> > jar = SparkContext.jarOfClass(this.getClass()).get();
> > }
> > argv.add(jar);
> >
> > ...
> >
> > }
> >
> > When hive executed spark-submit , it generate the shell command with
> > --class org.apache.hive.spark.client.RemoteDriver ,and set jar path with
> > SparkContext.jarOfClass(this.getClass()).get(). It will get the local
> path
> > of hive-exec-2.0.0.jar.
> >
> > In my situation, the application and yarn cluster are in different
> cluster.
> > When application executed spark-submit with local path of
> > hive-exec-2.0.0.jar to yarn cluster, there 's no hive-exec-2.0.0.jar in
> > yarn cluster. Then application threw the exception: "hive-exec-2.0.0.jar
> >   does not exist ...".
> >
> > Can it be set property of hive-exec-2.0.0.jar path in application ?
> > Something like 'hiveConf.set("hive.remote.driver.jar",
> > "hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar")'.
> > If not, is it possible to achieve in the future version?
> >
> >
> >
> >
> > 2016-03-10 23:51 GMT+08:00 Xuefu Zhang <xu...@uber.com>:
> >
> >> You can probably avoid the problem by set environment variable
> SPARK_HOME
> >> or JVM property spark.home that points to your spark installation.
> >>
> >> --Xuefu
> >>
> >> On Thu, Mar 10, 2016 at 3:11 AM, Stana <st...@is-land.com.tw> wrote:
> >>
> >> >  I am trying out Hive on Spark with hive 2.0.0 and spark 1.4.1, and
> >> > executing org.apache.hadoop.hive.ql.Driver with java application.
> >> >
> >> > Following are my situations:
> >> > 1.Building spark 1.4.1 assembly jar without Hive .
> >> > 2.Uploading the spark assembly jar to the hadoop cluster.
> >> > 3.Executing the java application with eclipse IDE in my client
> computer.
> >> >
> >> > The application went well and it submitted mr job to the yarn cluster
> >> > successfully when using " hiveConf.set("hive.execution.engine", "mr")
> >> > ",but it threw exceptions in spark-engine.
> >> >
> >> > Finally, i traced Hive source code and came to the conclusion:
> >> >
> >> > In my situation, SparkClientImpl class will generate the spark-submit
> >> > shell and executed it.
> >> > The shell command allocated  --class with RemoteDriver.class.getName()
> >> > and jar with SparkContext.jarOfClass(this.getClass()).get(), so that
> >> > my application threw the exception.
> >> >
> >> > Is it right? And how can I do to execute the application with
> >> > spark-engine successfully in my client computer ? Thanks a lot!
> >> >
> >> >
> >> > Java application code:
> >> >
> >> > public class TestHiveDriver {
> >> >
> >> > private static HiveConf hiveConf;
> >> > private static Driver driver;
> >> > private static CliSessionState ss;
> >> > public static void main(String[] args)

[jira] [Created] (HIVE-13276) Hive on Spark doesn't work when spark.master=local

2016-03-13 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-13276:
--

 Summary: Hive on Spark doesn't work when spark.master=local
 Key: HIVE-13276
 URL: https://issues.apache.org/jira/browse/HIVE-13276
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: 2.1.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang


The following problem occurs with latest Hive master and Spark 1.6.1. I'm using 
hive CLI on mac.

{code}
  set mapreduce.job.reduces=
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.spark.rdd.RDDOperationScope$
at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
at org.apache.spark.SparkContext.hadoopRDD(SparkContext.scala:991)
at 
org.apache.spark.api.java.JavaSparkContext.hadoopRDD(JavaSparkContext.scala:419)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateMapInput(SparkPlanGenerator.java:205)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateParentTran(SparkPlanGenerator.java:145)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:117)
at 
org.apache.hadoop.hive.ql.exec.spark.LocalHiveSparkClient.execute(LocalHiveSparkClient.java:130)
at 
org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.submit(SparkSessionImpl.java:71)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:94)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:156)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:101)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1837)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1578)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1351)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1122)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1110)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:778)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:717)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:645)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
FAILED: Execution Error, return code -101 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTask. Could not initialize class 
org.apache.spark.rdd.RDDOperationScope$
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Error in Hive on Spark

2016-03-10 Thread Xuefu Zhang
You can probably avoid the problem by set environment variable SPARK_HOME
or JVM property spark.home that points to your spark installation.

--Xuefu

On Thu, Mar 10, 2016 at 3:11 AM, Stana  wrote:

>  I am trying out Hive on Spark with hive 2.0.0 and spark 1.4.1, and
> executing org.apache.hadoop.hive.ql.Driver with java application.
>
> Following are my situations:
> 1.Building spark 1.4.1 assembly jar without Hive .
> 2.Uploading the spark assembly jar to the hadoop cluster.
> 3.Executing the java application with eclipse IDE in my client computer.
>
> The application went well and it submitted mr job to the yarn cluster
> successfully when using " hiveConf.set("hive.execution.engine", "mr")
> ",but it threw exceptions in spark-engine.
>
> Finally, i traced Hive source code and came to the conclusion:
>
> In my situation, SparkClientImpl class will generate the spark-submit
> shell and executed it.
> The shell command allocated  --class with RemoteDriver.class.getName()
> and jar with SparkContext.jarOfClass(this.getClass()).get(), so that
> my application threw the exception.
>
> Is it right? And how can I do to execute the application with
> spark-engine successfully in my client computer ? Thanks a lot!
>
>
> Java application code:
>
> public class TestHiveDriver {
>
> private static HiveConf hiveConf;
> private static Driver driver;
> private static CliSessionState ss;
> public static void main(String[] args){
>
> String sql = "select * from hadoop0263_0 as a join
> hadoop0263_0 as b
> on (a.key = b.key)";
> ss = new CliSessionState(new HiveConf(SessionState.class));
> hiveConf = new HiveConf(Driver.class);
> hiveConf.set("fs.default.name", "hdfs://storm0:9000");
> hiveConf.set("yarn.resourcemanager.address",
> "storm0:8032");
> hiveConf.set("yarn.resourcemanager.scheduler.address",
> "storm0:8030");
>
> hiveConf.set("yarn.resourcemanager.resource-tracker.address","storm0:8031");
> hiveConf.set("yarn.resourcemanager.admin.address",
> "storm0:8033");
> hiveConf.set("mapreduce.framework.name", "yarn");
> hiveConf.set("mapreduce.johistory.address",
> "storm0:10020");
>
> hiveConf.set("javax.jdo.option.ConnectionURL","jdbc:mysql://storm0:3306/stana_metastore");
>
> hiveConf.set("javax.jdo.option.ConnectionDriverName","com.mysql.jdbc.Driver");
> hiveConf.set("javax.jdo.option.ConnectionUserName",
> "root");
> hiveConf.set("javax.jdo.option.ConnectionPassword",
> "123456");
> hiveConf.setBoolean("hive.auto.convert.join",false);
> hiveConf.set("spark.yarn.jar",
> "hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar");
> hiveConf.set("spark.home","target/spark");
> hiveConf.set("hive.execution.engine", "spark");
> hiveConf.set("hive.dbname", "default");
>
>
> driver = new Driver(hiveConf);
> SessionState.start(hiveConf);
>
> CommandProcessorResponse res = null;
> try {
> res = driver.run(sql);
> } catch (CommandNeedRetryException e) {
> // TODO Auto-generated catch block
> e.printStackTrace();
> }
>
> System.out.println("Response Code:" +
> res.getResponseCode());
> System.out.println("Error Message:" +
> res.getErrorMessage());
> System.out.println("SQL State:" + res.getSQLState());
>
> }
> }
>
>
>
>
> Exception of spark-engine:
>
> 16/03/10 18:32:58 INFO SparkClientImpl: Running client driver with
> argv:
> /Volumes/Sdhd/Documents/project/island/java/apache/hive-200-test/hive-release-2.0.0/itests/hive-unit/target/spark/bin/spark-submit
> --properties-file
>
> /var/folders/vt/cjcdhms903x7brn1kbh558s4gn/T/spark-submit.7697089826296920539.properties
> --class org.apache.hive.spark.client.RemoteDriver
>
> /Users/stana/.m2/repository/org/apache/hive/hive-exec/2.0.0/hive-exec-2.0.0.jar
> --remote-host MacBook-Pro.local --remote-port 51331 --conf
> hive.spark.client.connect.timeout=1000 --conf
> hive.spark.client.server.connect.timeout=9 --conf
> hive.spark.client.channel.log.level=null --conf
> hive.spark.client.rpc.max.size=52428800 --conf
> hive.spark.client.rpc.threads=8 --conf
> hive.spark.client.secret.bits=256
> 16/03/10 18:33:09 INFO SparkClientImpl: 16/03/10 18:33:09 INFO Client:
> 16/03/10 18:33:09 INFO SparkClientImpl:  client token: N/A
> 16/03/10 18:33:09 INFO SparkClientImpl:  diagnostics: N/A
> 16/03/10 18:33:09 INFO SparkClientImpl:  ApplicationMaster host:
> N/A
> 16/03/10 18:33:09 INFO SparkClientImpl:  ApplicationMaster RPC
> port: -1
> 16/03/10 18:33:09 INFO SparkClientImpl:  queue: default
> 16/03/10 18:33:09 

Re: [ANNOUNCE] New Hive Committer - Wei Zheng

2016-03-09 Thread Xuefu Zhang
Congratulations, Wei!

On Wed, Mar 9, 2016 at 6:48 PM, Chao Sun  wrote:

> Congratulations!
>
> On Wed, Mar 9, 2016 at 6:44 PM, Prasanth Jayachandran <
> pjayachand...@hortonworks.com> wrote:
>
> > Congratulations Wei!
> >
> > On Mar 9, 2016, at 8:43 PM, Sergey Shelukhin  > > wrote:
> >
> > Congrats!
> >
> > From: Szehon Ho >
> > Reply-To: "u...@hive.apache.org" <
> > u...@hive.apache.org>
> > Date: Wednesday, March 9, 2016 at 17:40
> > To: "u...@hive.apache.org" <
> > u...@hive.apache.org>
> > Cc: "dev@hive.apache.org" <
> dev@hive.apache.org
> > >, "w...@apache.org"
> <
> > w...@apache.org>
> > Subject: Re: [ANNOUNCE] New Hive Committer - Wei Zheng
> >
> > Congratulations Wei!
> >
> > On Wed, Mar 9, 2016 at 5:26 PM, Vikram Dixit K   > vik...@apache.org>> wrote:
> > The Apache Hive PMC has voted to make Wei Zheng a committer on the Apache
> > Hive Project. Please join me in congratulating Wei.
> >
> > Thanks
> > Vikram.
> >
> >
> >
>


Re: Spark and HBase metastore jiras in 2.0 release

2016-02-17 Thread Xuefu Zhang
I don't think 2.0 branch has every patch from spark-branch. However, master
does. I will take care of spark branch JIrA resolution versions.

Thanks,
Xuefu

On Tue, Feb 16, 2016 at 9:48 PM, Thejas Nair  wrote:

> The hbase-metastore branch jiras are all part of Hive 2.0.0 release,
> as no work was done in that branch after the merge into the master
> branch.
> I think we should add 2.0.0 as a fix version before closing them.
> Thoughts ?
>
> I am not sure if that is the case with spark-branch. Ideally, I think
> we should update the fix version whenever we merge the branch into
> master/branch-1 .
>
>
>
> On Tue, Feb 16, 2016 at 5:31 PM, Sergey Shelukhin
>  wrote:
> > Hi.
> > There is a number of JIRAs resolved in a variety of branches that are a
> > part of 2.0 release.
> > We need to close these; I’ll take care of closing the llap JIRAs; as far
> > as hbase-metastore-branch and spark-branch are concerned, any objection
> to
> > closing these? None of them look new. I will close tomorrow if no
> > objections.
> >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20HIVE%20AND%20%28
> >
> fixVersion%20%3D%20spark-branch%20OR%20fixVersion%20%3D%20hbase-metastore-b
> >
> ranch%29%20AND%20fixVersion%20not%20in%20%282.0.0%2C%202.1.0%29%20AND%20sta
> > tus%20%3D%20Resolved%20ORDER%20BY%20fixVersion%20ASC%2C%20resolved%20DESC
> >
>


Re: [ANNOUNCE] Apache Hive 2.0.0 Released

2016-02-16 Thread Xuefu Zhang
Congratulation, guys!!!

--Xuefu

On Tue, Feb 16, 2016 at 11:54 AM, Prasanth Jayachandran <
pjayachand...@hortonworks.com> wrote:

> Great news! Thanks Sergey for the effort.
>
> Thanks
> Prasanth
>
> > On Feb 16, 2016, at 1:44 PM, Sergey Shelukhin  wrote:
> >
> > The Apache Hive team is proud to announce the the release of Apache Hive
> > version 2.0.0.
> >
> > The Apache Hive (TM) data warehouse software facilitates querying and
> > managing large datasets residing in distributed storage. Built on top of
> > Apache Hadoop (TM), it provides:
> >
> > * Tools to enable easy data extract/transform/load (ETL)
> >
> > * A mechanism to impose structure on a variety of data formats
> >
> > * Access to files stored either directly in Apache HDFS (TM) or in other
> > data storage systems such as Apache HBase (TM)
> >
> > * Query execution via Apache Hadoop MapReduce and Apache Tez frameworks.
> >
> > For Hive release details and downloads, please visit:
> > https://hive.apache.org/downloads.html
> >
> > Hive 2.0.0 Release Notes are available here:
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12332641
> > jectId=12310843
> >
> > We would like to thank the many contributors who made this release
> > possible.
> >
> > Regards,
> >
> > The Apache Hive Team
> >
> >
> >
>
>


Re: Review Request 43176: HIVE-12965: Insert overwrite local directory should perserve the overwritten directory permission

2016-02-11 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43176/#review118939
---




ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 
<https://reviews.apache.org/r/43176/#comment180247>

qq: So closing the file systems gave the issues seen in the test failures?


- Xuefu Zhang


On Feb. 11, 2016, 9:54 p.m., Chaoyu Tang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43176/
> ---
> 
> (Updated Feb. 11, 2016, 9:54 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-12965
> https://issues.apache.org/jira/browse/HIVE-12965
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> In Hive, "insert overwrite local directory" first deletes the overwritten 
> directory if exists, recreate a new one, then copy the files from src 
> directory to the new local directory. This process sometimes changes the 
> permissions of the to-be-overwritten local directory, therefore causing some 
> applications no more to be able to access its content.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java e9cd450 
> 
> Diff: https://reviews.apache.org/r/43176/diff/
> 
> 
> Testing
> ---
> 
> Manual tests
> Precommit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>



Re: Review Request 43176: HIVE-12965: Insert overwrite local directory should perserve the overwritten directory permission

2016-02-11 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43176/#review118941
---


Ship it!




Ship It!

- Xuefu Zhang


On Feb. 11, 2016, 9:54 p.m., Chaoyu Tang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43176/
> ---
> 
> (Updated Feb. 11, 2016, 9:54 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-12965
> https://issues.apache.org/jira/browse/HIVE-12965
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> In Hive, "insert overwrite local directory" first deletes the overwritten 
> directory if exists, recreate a new one, then copy the files from src 
> directory to the new local directory. This process sometimes changes the 
> permissions of the to-be-overwritten local directory, therefore causing some 
> applications no more to be able to access its content.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java e9cd450 
> 
> Diff: https://reviews.apache.org/r/43176/diff/
> 
> 
> Testing
> ---
> 
> Manual tests
> Precommit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>



Re: Review Request 43176: HIVE-12965: Insert overwrite local directory should perserve the overwritten directory permission

2016-02-09 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43176/#review118398
---




ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java (line 135)
<https://reviews.apache.org/r/43176/#comment179626>

I think the existance of a file/dir can be checked by calling FS.exists(). 
I usually avoid depending on exception to do something useful other than 
handling exception itself. However, it's up to you.



ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java (line 160)
<https://reviews.apache.org/r/43176/#comment179627>

it seems fs didn't get closed in all paths. However, I understand that the 
problem was there before.


- Xuefu Zhang


On Feb. 4, 2016, 4:07 a.m., Chaoyu Tang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43176/
> ---
> 
> (Updated Feb. 4, 2016, 4:07 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-12965
> https://issues.apache.org/jira/browse/HIVE-12965
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> In Hive, "insert overwrite local directory" first deletes the overwritten 
> directory if exists, recreate a new one, then copy the files from src 
> directory to the new local directory. This process sometimes changes the 
> permissions of the to-be-overwritten local directory, therefore causing some 
> applications no more to be able to access its content.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java e9cd450 
> 
> Diff: https://reviews.apache.org/r/43176/diff/
> 
> 
> Testing
> ---
> 
> Manual tests
> Precommit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>



Re: Review Request 43176: HIVE-12965: Insert overwrite local directory should perserve the overwritten directory permission

2016-02-09 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43176/#review118462
---


Ship it!




Ship It!

- Xuefu Zhang


On Feb. 9, 2016, 8:55 p.m., Chaoyu Tang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43176/
> ---
> 
> (Updated Feb. 9, 2016, 8:55 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-12965
> https://issues.apache.org/jira/browse/HIVE-12965
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> In Hive, "insert overwrite local directory" first deletes the overwritten 
> directory if exists, recreate a new one, then copy the files from src 
> directory to the new local directory. This process sometimes changes the 
> permissions of the to-be-overwritten local directory, therefore causing some 
> applications no more to be able to access its content.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java e9cd450 
> 
> Diff: https://reviews.apache.org/r/43176/diff/
> 
> 
> Testing
> ---
> 
> Manual tests
> Precommit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>



Re: Review Request 43176: HIVE-12965: Insert overwrite local directory should perserve the overwritten directory permission

2016-02-09 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43176/#review118442
---




ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java (line 120)
<https://reviews.apache.org/r/43176/#comment179669>

Do we need returned boolean? I don't see it's used.



ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java (line 171)
<https://reviews.apache.org/r/43176/#comment179670>

could we use FS.exists() to detect this instead of relying on exception 
handling? See below exmaple when checking the sourcePath.


- Xuefu Zhang


On Feb. 9, 2016, 6:54 p.m., Chaoyu Tang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43176/
> ---
> 
> (Updated Feb. 9, 2016, 6:54 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-12965
> https://issues.apache.org/jira/browse/HIVE-12965
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> In Hive, "insert overwrite local directory" first deletes the overwritten 
> directory if exists, recreate a new one, then copy the files from src 
> directory to the new local directory. This process sometimes changes the 
> permissions of the to-be-overwritten local directory, therefore causing some 
> applications no more to be able to access its content.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java e9cd450 
> 
> Diff: https://reviews.apache.org/r/43176/diff/
> 
> 
> Testing
> ---
> 
> Manual tests
> Precommit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>



Re: [VOTE] Apache Hive 2.0.0 Release Candidate 2

2016-02-09 Thread Xuefu Zhang
Some trouble I met:

1. downloaded source and build Hive with -Pdist.
2. try to run Hive CLI and get the following error:

Exception in thread "main" java.lang.RuntimeException: Hive metastore
database is not initialized. Please use schematool (e.g. ./schematool
-initSchema -dbType ...) to create the schema. If needed, don't forget to
include the option to auto-create the underlying database in your JDBC
connection string (e.g. ?createDatabaseIfNotExist=true for mysql)


Is this expected? This used to work out of box with embedded metastore
created automatcially.

Thanks,
Xuefu

On Tue, Feb 9, 2016 at 10:15 AM, Alan Gates  wrote:

> FYI the URL for the candidate returns 404, the correct URL is
> http://people.apache.org/~sershe/hive-2.0.0-rc2/
>
> +1, checks signatures, did a build with a brand new maven repo, and ran a
> quick smoke test.
>
> Alan.
>
> Sergey Shelukhin 
> February 8, 2016 at 18:29
> Apache Hive 2.0.0 Release Candidate 2 is available here:
>
> http://people.apache.org/~sershe/hive-2.0.0-RC2/
>
>
> Maven artifacts are at
>
> https://repository.apache.org/content/repositories/orgapachehive-1044/
>
>
> Source tag for RC2 (github mirror) is:
> https://github.com/apache/hive/releases/tag/release-2.0.0-rc2
> (
> https://github.com/apache/hive/commit/ecccdda845a0a45de24463669847ddf37ad3
> c220)
>
> Voting will conclude in 72 hours.
>
> Hive PMC Members: Please test and vote.
>
>
>
> Thanks.
>
>
>


Re: Review Request 43176: HIVE-12965: Insert overwrite local directory should perserve the overwritten directory permission

2016-02-04 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43176/#review117816
---




ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java (line 129)
<https://reviews.apache.org/r/43176/#comment179081>

two nested try-catch block seem making the code hard to read.



ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java (line 135)
<https://reviews.apache.org/r/43176/#comment179079>

Should we check if it's a directory before calling listStatus() on it?



ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java (line 144)
<https://reviews.apache.org/r/43176/#comment179080>

Instead of doing this, should we explicitly check the existance of the 
destination?


On a high level, deleting file one by one is slower. Could we instead remember 
the original permission and set it to the new directory that we are going to 
replace?

- Xuefu Zhang


On Feb. 4, 2016, 4:07 a.m., Chaoyu Tang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43176/
> ---
> 
> (Updated Feb. 4, 2016, 4:07 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-12965
> https://issues.apache.org/jira/browse/HIVE-12965
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> In Hive, "insert overwrite local directory" first deletes the overwritten 
> directory if exists, recreate a new one, then copy the files from src 
> directory to the new local directory. This process sometimes changes the 
> permissions of the to-be-overwritten local directory, therefore causing some 
> applications no more to be able to access its content.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java e9cd450 
> 
> Diff: https://reviews.apache.org/r/43176/diff/
> 
> 
> Testing
> ---
> 
> Manual tests
> Precommit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>



[jira] [Created] (HIVE-12951) Reduce Spark executor prewarm timeout to 5s

2016-01-27 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-12951:
--

 Summary: Reduce Spark executor prewarm timeout to 5s
 Key: HIVE-12951
 URL: https://issues.apache.org/jira/browse/HIVE-12951
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: 1.2.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang


Currently it's set to 30s, which tends to be longer than needed. Reduce it to 
5s, only considering jvm startup time. (Eventually, we may want to make this 
configurable.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: January Hive User Group Meeting

2016-01-21 Thread Xuefu Zhang
For those who cannot attend in person, here is the webex info:

https://cloudera.webex.com/meet/xzhang
1-650-479-3208  Call-in toll number (US/Canada)
623 810 662 (access code)

Thanks,
Xuefu

On Wed, Jan 20, 2016 at 9:45 AM, Xuefu Zhang <xzh...@cloudera.com> wrote:

> Hi all,
>
> As a reminder, the meeting will be held tomorrow as scheduled. Please
> refer to the meetup page[1] for details. Looking forward to meeting you all!
>
> Thanks,
> Xuefu
>
> [1] http://www.meetup.com/Hive-User-Group-Meeting/events/227463783/
>
> On Wed, Dec 16, 2015 at 3:38 PM, Xuefu Zhang <xzh...@cloudera.com> wrote:
>
>> Dear Hive users and developers,
>>
>> Hive community is considering a user group meeting[1] January 21, 2016
>> at Cloudera facility in Palo Alto, CA. This will be a great opportunity
>> for vast users and developers to find out what's happening in the
>> community and share each other's experience with Hive. Therefore, I'd urge
>> you to attend the meetup. Please RSVP and the list will be closed a few
>> days ahead of the event.
>>
>> At the same time, I'd like to solicit light talks (15 minutes max) from
>> users and developers. If you have a proposal, please let me or Thejas know.
>> Your participation is greatly appreciated.
>>
>> Sincerely,
>> Xuefu
>>
>> [1] http://www.meetup.com/Hive-User-Group-Meeting/events/227463783/
>>
>
>


Re: January Hive User Group Meeting

2016-01-20 Thread Xuefu Zhang
Hi all,

As a reminder, the meeting will be held tomorrow as scheduled. Please refer
to the meetup page[1] for details. Looking forward to meeting you all!

Thanks,
Xuefu

[1] http://www.meetup.com/Hive-User-Group-Meeting/events/227463783/

On Wed, Dec 16, 2015 at 3:38 PM, Xuefu Zhang <xzh...@cloudera.com> wrote:

> Dear Hive users and developers,
>
> Hive community is considering a user group meeting[1] January 21, 2016 at
> Cloudera facility in Palo Alto, CA. This will be a great opportunity for
> vast users and developers to find out what's happening in the community
> and share each other's experience with Hive. Therefore, I'd urge you to
> attend the meetup. Please RSVP and the list will be closed a few days ahead
> of the event.
>
> At the same time, I'd like to solicit light talks (15 minutes max) from
> users and developers. If you have a proposal, please let me or Thejas know.
> Your participation is greatly appreciated.
>
> Sincerely,
> Xuefu
>
> [1] http://www.meetup.com/Hive-User-Group-Meeting/events/227463783/
>


[jira] [Created] (HIVE-12828) Update Spark version to 1.6

2016-01-09 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-12828:
--

 Summary: Update Spark version to 1.6
 Key: HIVE-12828
 URL: https://issues.apache.org/jira/browse/HIVE-12828
 Project: Hive
  Issue Type: Task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12811) Name yarn application name more meaning than just "Hive on Spark"

2016-01-08 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-12811:
--

 Summary: Name yarn application name more meaning than just "Hive 
on Spark"
 Key: HIVE-12811
 URL: https://issues.apache.org/jira/browse/HIVE-12811
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang


MR uses the query as the application name. Hopefully this can be set via 
spark.app.name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 41582: HIVE-12713: Miscellaneous improvements in driver compile and execute logging

2015-12-21 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41582/#review111472
---



ql/src/java/org/apache/hadoop/hive/ql/Driver.java (line 407)
<https://reviews.apache.org/r/41582/#comment171684>

To clarify, the above log redaction is needed because of the addition of 
this line, right?



ql/src/java/org/apache/hadoop/hive/ql/parse/ParseDriver.java (line 185)
<https://reviews.apache.org/r/41582/#comment171683>

It might be better if we check if debug log is enabled. Same below


- Xuefu Zhang


On Dec. 20, 2015, 1:05 a.m., Chaoyu Tang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41582/
> ---
> 
> (Updated Dec. 20, 2015, 1:05 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-12713
> https://issues.apache.org/jira/browse/HIVE-12713
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Patch improves the driver compile and execute logging in following:
> 1. ensuring that only the redacted query to be logged out
> 2. removing redundant variable substitution in HS2 SQLOperation
> 3. logging out the query and its compilation time without having to enable 
> PerfLogger debug, to help identify badly written queries which take a lot of 
> time to compile and probably cause other good queries to be queued 
> (HIVE-12516)
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java 98ebd50 
>   
> itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java
>  7cc0acf 
>   itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java 
> e9206b9 
>   
> itests/hive-unit/src/test/java/org/apache/hive/jdbc/cbo_rp_TestJdbcDriver2.java
>  c66f166 
>   
> itests/hive-unit/src/test/java/org/apache/hive/service/cli/operation/TestOperationLoggingAPIWithMr.java
>  d21571e 
>   
> itests/hive-unit/src/test/java/org/apache/hive/service/cli/operation/TestOperationLoggingAPIWithTez.java
>  8b21647 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 3d5f3b5 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ParseDriver.java c33bb66 
>   service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
> d90dd0d 
> 
> Diff: https://reviews.apache.org/r/41582/diff/
> 
> 
> Testing
> ---
> 
> 1. Manual tests
> 2. Precommit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>



Re: Review Request 41582: HIVE-12713: Miscellaneous improvements in driver compile and execute logging

2015-12-21 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41582/#review111501
---

Ship it!


Ship It!

- Xuefu Zhang


On Dec. 21, 2015, 3:46 p.m., Chaoyu Tang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41582/
> ---
> 
> (Updated Dec. 21, 2015, 3:46 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-12713
> https://issues.apache.org/jira/browse/HIVE-12713
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Patch improves the driver compile and execute logging in following:
> 1. ensuring that only the redacted query to be logged out
> 2. removing redundant variable substitution in HS2 SQLOperation
> 3. logging out the query and its compilation time without having to enable 
> PerfLogger debug, to help identify badly written queries which take a lot of 
> time to compile and probably cause other good queries to be queued 
> (HIVE-12516)
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java 98ebd50 
>   
> itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java
>  7cc0acf 
>   itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java 
> e9206b9 
>   
> itests/hive-unit/src/test/java/org/apache/hive/jdbc/cbo_rp_TestJdbcDriver2.java
>  c66f166 
>   
> itests/hive-unit/src/test/java/org/apache/hive/service/cli/operation/TestOperationLoggingAPIWithMr.java
>  d21571e 
>   
> itests/hive-unit/src/test/java/org/apache/hive/service/cli/operation/TestOperationLoggingAPIWithTez.java
>  8b21647 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 3d5f3b5 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ParseDriver.java c33bb66 
>   service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
> d90dd0d 
> 
> Diff: https://reviews.apache.org/r/41582/diff/
> 
> 
> Testing
> ---
> 
> 1. Manual tests
> 2. Precommit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>



[jira] [Created] (HIVE-12708) Hive on Spark doesn't work with Kerboresed HBase [Spark Branch]

2015-12-18 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-12708:
--

 Summary: Hive on Spark doesn't work with Kerboresed HBase [Spark 
Branch]
 Key: HIVE-12708
 URL: https://issues.apache.org/jira/browse/HIVE-12708
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: 1.1.0, 1.2.0, 2.0.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang


Spark application launcher (spark-submit) acquires HBase delegation token on 
Hive user's behalf when the application is launched. This mechanism, which 
doesn't work for long-running sessions, is not in line with what Hive is doing. 
Hive actually acquires the token automatically whenever a job needs it. The 
right approach for Spark should be allowing applications to dynamically add 
whatever tokens they need to the spark context. While this needs work on Spark 
side, we provide a workaround solution in Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


January Hive User Group Meeting

2015-12-16 Thread Xuefu Zhang
Dear Hive users and developers,

Hive community is considering a user group meeting[1] January 21, 2016 at
Cloudera facility in Palo Alto, CA. This will be a great opportunity for
vast users and developers to find out what's happening in the community and
share each other's experience with Hive. Therefore, I'd urge you to attend
the meetup. Please RSVP and the list will be closed a few days ahead of the
event.

At the same time, I'd like to solicit light talks (15 minutes max) from
users and developers. If you have a proposal, please let me or Thejas know.
Your participation is greatly appreciated.

Sincerely,
Xuefu

[1] http://www.meetup.com/Hive-User-Group-Meeting/events/227463783/


Re: January Hive User Group Meeting

2015-12-16 Thread Xuefu Zhang
Yeah. I can try to set up a webex for this. However, I'd encourage folks to
attend in person to get more live experience, especially for those from
local Bay Area.

Thanks,
Xuefu

On Wed, Dec 16, 2015 at 3:42 PM, Mich Talebzadeh <m...@peridale.co.uk>
wrote:

> Thanks for heads up.
>
>
>
> Will it be possible to remote to this meetings for live sessions?
>
>
>
> Regards,
>
>
>
>
>
> Mich Talebzadeh
>
>
>
> *Sybase ASE 15 Gold Medal Award 2008*
>
> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>
>
> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>
> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE
> 15", ISBN 978-0-9563693-0-7*.
>
> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN
> 978-0-9759693-0-4*
>
> *Publications due shortly:*
>
> *Complex Event Processing in Heterogeneous Environments*, ISBN:
> 978-0-9563693-3-8
>
> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume
> one out shortly
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Ltd, its subsidiaries nor their employees
> accept any responsibility.
>
>
>
> *From:* Xuefu Zhang [mailto:xzh...@cloudera.com]
> *Sent:* 16 December 2015 23:39
> *To:* u...@hive.apache.org; dev@hive.apache.org; Thejas M Nair <
> thejas.n...@yahoo.com>
> *Subject:* January Hive User Group Meeting
>
>
>
> Dear Hive users and developers,
>
> Hive community is considering a user group meeting[1] January 21, 2016 at
> Cloudera facility in Palo Alto, CA. This will be a great opportunity for
> vast users and developers to find out what's happening in the community and
> share each other's experience with Hive. Therefore, I'd urge you to attend
> the meetup. Please RSVP and the list will be closed a few days ahead of the
> event.
>
> At the same time, I'd like to solicit light talks (15 minutes max) from
> users and developers. If you have a proposal, please let me or Thejas know.
> Your participation is greatly appreciated.
>
> Sincerely,
>
> Xuefu
>
> [1] http://www.meetup.com/Hive-User-Group-Meeting/events/227463783/
>


  1   2   3   4   5   6   7   8   9   10   >