Re: Tez job submissions failing when cluster is under provisioned..

2016-03-10 Thread Gopal Vijayaraghavan

> This seems to be something YARN fair-scheduler reporting it this way..
>although Tez doesn't seem to handle.

Pepperdata?

 
> I did come across HIVE-12957, in which the fix patch seems to only
>report the error better instead of doing anything about it.
...
> Now comes my question, is this in an expected failure case ? Is there a
>bug I should know about in YARN scheduling or am I misunderstanding the
>issue?

YARN is reporting -ve head-room, this means your cluster is running with
negative-capacity for some strange reason.

That is a bug somewhere in the FairScheduler's internal state.

The issue was never reproduced after we switched over to the
CapacityScheduler, so it's in limbo.

Cheers,
Gopal




Re: SELECT without FROM

2016-03-10 Thread Dmitry Tolpeko
Thank you, Shannon!

On Fri, Mar 11, 2016 at 4:47 AM, Shannon Ladymon  wrote:

> It looks like FROM was made optional in Hive 0.13.0 with HIVE-4144
>  (thanks Alan Gates to
> pointing us to the grammar file and Sushanth Sowyman for helping track this
> down). A note has been added to the wiki
> 
> about this.
>
> Dmitry, you said it didn't work in your Hive 0.13 version, but it seems
> like the patch was applied to 0.13.0. You might want to take a look at
> HIVE-4144 and see if that patch was applied to your version.
>
> -Shannon Ladymon
>
> On Wed, Mar 9, 2016 at 2:13 AM, Mich Talebzadeh  > wrote:
>
>> hm. Certainly it worked if I recall correctly on 0.14, 1.2.1 and now on 2
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 9 March 2016 at 10:08, Dmitry Tolpeko  wrote:
>>
>>> Not sure. It does not work in my Hive 0.13 version.
>>>
>>> On Wed, Mar 9, 2016 at 1:06 PM, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 I believe it has always been there

 Dr Mich Talebzadeh



 LinkedIn * 
 https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 *



 http://talebzadehmich.wordpress.com



 On 9 March 2016 at 09:58, Dmitry Tolpeko  wrote:

> Mich,
>
> I now that. I just want to trace when it was added to Hive.
>
> Dmitry
>
> On Wed, Mar 9, 2016 at 12:56 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> ASAIK any database does that!
>>
>> 1> set nocount on
>> 2> select @@version
>> 3> select 1 + 1
>> 4> go
>>
>>
>>
>>  
>> ---
>>  Adaptive Server Enterprise/15.7/EBF 21708 SMP SP110
>> /P/x86_64/Enterprise Linux/ase157sp11x/3546/64-bit/FBO/Fri Nov  8 
>> 05:39:38
>> 2013
>>
>>  ---
>>2
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 9 March 2016 at 09:50, Dmitry Tolpeko  wrote:
>>
>>> I noticed that Hive allows you to execute SELECT without FROM clause
>>> (tested in Hive 0.14, Hive 1.2.1):
>>>
>>> SELECT 1+1;
>>>
>>> In which version was it added (is there a Jira)? I see that it is
>>> not mentioned in docs
>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select
>>>
>>> So the question whether it is official and will not be removed in
>>> future.
>>>
>>> Thanks,
>>>
>>> Dmitry
>>>
>>
>>
>

>>>
>>
>


Re: Error in Hive on Spark

2016-03-10 Thread Stana
Thanks for reply

I have set the property spark.home in my application. Otherwise the
application threw 'SPARK_HOME not found exception'.

I found hive source code in SparkClientImpl.java:

private Thread startDriver(final RpcServer rpcServer, final String
clientId, final String secret)
  throws IOException {
...

List argv = Lists.newArrayList();

...

argv.add("--class");
argv.add(RemoteDriver.class.getName());

String jar = "spark-internal";
if (SparkContext.jarOfClass(this.getClass()).isDefined()) {
jar = SparkContext.jarOfClass(this.getClass()).get();
}
argv.add(jar);

...

}

When hive executed spark-submit , it generate the shell command with
--class org.apache.hive.spark.client.RemoteDriver ,and set jar path with
SparkContext.jarOfClass(this.getClass()).get(). It will get the local path
of hive-exec-2.0.0.jar.

In my situation, the application and yarn cluster are in different cluster.
When application executed spark-submit with local path of
hive-exec-2.0.0.jar to yarn cluster, there 's no hive-exec-2.0.0.jar in
yarn cluster. Then application threw the exception: "hive-exec-2.0.0.jar
  does not exist ...".

Can it be set property of hive-exec-2.0.0.jar path in application ?
Something like 'hiveConf.set("hive.remote.driver.jar",
"hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar")'.
If not, is it possible to achieve in the future version?



2016-03-10 23:51 GMT+08:00 Xuefu Zhang :

> You can probably avoid the problem by set environment variable SPARK_HOME
> or JVM property spark.home that points to your spark installation.
>
> --Xuefu
>
> On Thu, Mar 10, 2016 at 3:11 AM, Stana  wrote:
>
> >  I am trying out Hive on Spark with hive 2.0.0 and spark 1.4.1, and
> > executing org.apache.hadoop.hive.ql.Driver with java application.
> >
> > Following are my situations:
> > 1.Building spark 1.4.1 assembly jar without Hive .
> > 2.Uploading the spark assembly jar to the hadoop cluster.
> > 3.Executing the java application with eclipse IDE in my client computer.
> >
> > The application went well and it submitted mr job to the yarn cluster
> > successfully when using " hiveConf.set("hive.execution.engine", "mr")
> > ",but it threw exceptions in spark-engine.
> >
> > Finally, i traced Hive source code and came to the conclusion:
> >
> > In my situation, SparkClientImpl class will generate the spark-submit
> > shell and executed it.
> > The shell command allocated  --class with RemoteDriver.class.getName()
> > and jar with SparkContext.jarOfClass(this.getClass()).get(), so that
> > my application threw the exception.
> >
> > Is it right? And how can I do to execute the application with
> > spark-engine successfully in my client computer ? Thanks a lot!
> >
> >
> > Java application code:
> >
> > public class TestHiveDriver {
> >
> > private static HiveConf hiveConf;
> > private static Driver driver;
> > private static CliSessionState ss;
> > public static void main(String[] args){
> >
> > String sql = "select * from hadoop0263_0 as a join
> > hadoop0263_0 as b
> > on (a.key = b.key)";
> > ss = new CliSessionState(new
> HiveConf(SessionState.class));
> > hiveConf = new HiveConf(Driver.class);
> > hiveConf.set("fs.default.name", "hdfs://storm0:9000");
> > hiveConf.set("yarn.resourcemanager.address",
> > "storm0:8032");
> > hiveConf.set("yarn.resourcemanager.scheduler.address",
> > "storm0:8030");
> >
> >
> hiveConf.set("yarn.resourcemanager.resource-tracker.address","storm0:8031");
> > hiveConf.set("yarn.resourcemanager.admin.address",
> > "storm0:8033");
> > hiveConf.set("mapreduce.framework.name", "yarn");
> > hiveConf.set("mapreduce.johistory.address",
> > "storm0:10020");
> >
> >
> hiveConf.set("javax.jdo.option.ConnectionURL","jdbc:mysql://storm0:3306/stana_metastore");
> >
> >
> hiveConf.set("javax.jdo.option.ConnectionDriverName","com.mysql.jdbc.Driver");
> > hiveConf.set("javax.jdo.option.ConnectionUserName",
> > "root");
> > hiveConf.set("javax.jdo.option.ConnectionPassword",
> > "123456");
> > hiveConf.setBoolean("hive.auto.convert.join",false);
> > hiveConf.set("spark.yarn.jar",
> > "hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar");
> > hiveConf.set("spark.home","target/spark");
> > hiveConf.set("hive.execution.engine", "spark");
> > hiveConf.set("hive.dbname", "default");
> >
> >
> > driver = new Driver(hiveConf);
> > SessionState.start(hiveConf);
> >
> > CommandProcessorResponse res = null;
> > try {
> > res = driver.run(sql);
> > } catch (CommandNeedRetryException e) {
> > // TODO Auto-generated catch block
> > e.printStackTrace();
> > 

Re: SELECT without FROM

2016-03-10 Thread Stephen Boesch
>> any database

Just as trivia:
i have not used oracle for quite a while but it traditionally does not.

AFAICT it is also not ansi sql

2016-03-10 17:47 GMT-08:00 Shannon Ladymon :

> It looks like FROM was made optional in Hive 0.13.0 with HIVE-4144
>  (thanks Alan Gates to
> pointing us to the grammar file and Sushanth Sowyman for helping track this
> down). A note has been added to the wiki
> 
> about this.
>
> Dmitry, you said it didn't work in your Hive 0.13 version, but it seems
> like the patch was applied to 0.13.0. You might want to take a look at
> HIVE-4144 and see if that patch was applied to your version.
>
> -Shannon Ladymon
>
> On Wed, Mar 9, 2016 at 2:13 AM, Mich Talebzadeh  > wrote:
>
>> hm. Certainly it worked if I recall correctly on 0.14, 1.2.1 and now on 2
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 9 March 2016 at 10:08, Dmitry Tolpeko  wrote:
>>
>>> Not sure. It does not work in my Hive 0.13 version.
>>>
>>> On Wed, Mar 9, 2016 at 1:06 PM, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 I believe it has always been there

 Dr Mich Talebzadeh



 LinkedIn * 
 https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 *



 http://talebzadehmich.wordpress.com



 On 9 March 2016 at 09:58, Dmitry Tolpeko  wrote:

> Mich,
>
> I now that. I just want to trace when it was added to Hive.
>
> Dmitry
>
> On Wed, Mar 9, 2016 at 12:56 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> ASAIK any database does that!
>>
>> 1> set nocount on
>> 2> select @@version
>> 3> select 1 + 1
>> 4> go
>>
>>
>>
>>  
>> ---
>>  Adaptive Server Enterprise/15.7/EBF 21708 SMP SP110
>> /P/x86_64/Enterprise Linux/ase157sp11x/3546/64-bit/FBO/Fri Nov  8 
>> 05:39:38
>> 2013
>>
>>  ---
>>2
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 9 March 2016 at 09:50, Dmitry Tolpeko  wrote:
>>
>>> I noticed that Hive allows you to execute SELECT without FROM clause
>>> (tested in Hive 0.14, Hive 1.2.1):
>>>
>>> SELECT 1+1;
>>>
>>> In which version was it added (is there a Jira)? I see that it is
>>> not mentioned in docs
>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select
>>>
>>> So the question whether it is official and will not be removed in
>>> future.
>>>
>>> Thanks,
>>>
>>> Dmitry
>>>
>>
>>
>

>>>
>>
>


Re: SELECT without FROM

2016-03-10 Thread Shannon Ladymon
It looks like FROM was made optional in Hive 0.13.0 with HIVE-4144
 (thanks Alan Gates to
pointing us to the grammar file and Sushanth Sowyman for helping track this
down). A note has been added to the wiki

about this.

Dmitry, you said it didn't work in your Hive 0.13 version, but it seems
like the patch was applied to 0.13.0. You might want to take a look at
HIVE-4144 and see if that patch was applied to your version.

-Shannon Ladymon

On Wed, Mar 9, 2016 at 2:13 AM, Mich Talebzadeh 
wrote:

> hm. Certainly it worked if I recall correctly on 0.14, 1.2.1 and now on 2
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 9 March 2016 at 10:08, Dmitry Tolpeko  wrote:
>
>> Not sure. It does not work in my Hive 0.13 version.
>>
>> On Wed, Mar 9, 2016 at 1:06 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> I believe it has always been there
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 9 March 2016 at 09:58, Dmitry Tolpeko  wrote:
>>>
 Mich,

 I now that. I just want to trace when it was added to Hive.

 Dmitry

 On Wed, Mar 9, 2016 at 12:56 PM, Mich Talebzadeh <
 mich.talebza...@gmail.com> wrote:

> ASAIK any database does that!
>
> 1> set nocount on
> 2> select @@version
> 3> select 1 + 1
> 4> go
>
>
>
>  
> ---
>  Adaptive Server Enterprise/15.7/EBF 21708 SMP SP110
> /P/x86_64/Enterprise Linux/ase157sp11x/3546/64-bit/FBO/Fri Nov  8 05:39:38
> 2013
>
>  ---
>2
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 9 March 2016 at 09:50, Dmitry Tolpeko  wrote:
>
>> I noticed that Hive allows you to execute SELECT without FROM clause
>> (tested in Hive 0.14, Hive 1.2.1):
>>
>> SELECT 1+1;
>>
>> In which version was it added (is there a Jira)? I see that it is not
>> mentioned in docs
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select
>>
>> So the question whether it is official and will not be removed in
>> future.
>>
>> Thanks,
>>
>> Dmitry
>>
>
>

>>>
>>
>


Tez job submissions failing when cluster is under provisioned..

2016-03-10 Thread Gautam
Hello,

Ran into this today.. We'r seeing Tez jobs failing to submit when cluster
is under high load. In particular, the split calculation seems to fall over
when it sees # slots <0. This seems to be something YARN fair-scheduler
reporting it this way.. although Tez doesn't seem to handle.

Vertex failed, vertexName=Map 1, vertexId=vertex_1457029908268_101939_1_00,
diagnostics=[Vertex vertex_1457029908268_101939_1_00 [Map 1] killed/failed
due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: upsight_clean_aggregate_data
initializer failed, vertex=vertex_1457029908268_101939_1_00 [Map 1], java.
lang.IllegalArgumentException: Illegal Capacity: -135

at java.util.ArrayList.(ArrayList.java:142)

at org.apache.hadoop.mapred.FileInputFormat.getSplits(
FileInputFormat.java:330)

at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(
HiveInputFormat.java:306)

at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(
HiveInputFormat.java:408)

at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(
HiveSplitGenerator.java:129)




 I did come across HIVE-12957, in which the fix patch seems to only report
the error better instead of doing anything about it.

Now comes my question, is this in an expected failure case ? Is there a bug
I should know about in YARN scheduling or am I misunderstanding the issue?
It seems rather frivolous on Tez's part to give up when the cluster is
under high load instead of just defaulting to some sane default and adding
tasks to the queue.


-Gautam.


Re: ODBC drivers for Hive 2

2016-03-10 Thread Mich Talebzadeh
The problem with Tableau is that it tries to optimize the code itself which
really does not work for something that uses generic ODBC 3. We used to
have it with Tableau connecting to Oracle TimesTen IMDB (that did not have
its dedicated ODBC driver in Tableau so had to use ODBC 2, even an older
version). I am sure until Tableau knows how to optimize the query for Hive,
it would not really work. Optimizing joins in Hive etc at the moment I am
not sure.

I checked Tableau connectivity but they seem to suggest drivers from
Hortonworks and Cloudera and also their propriety Hadoop tools.

Cheers,

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 10 March 2016 at 23:49, Gopal Vijayaraghavan  wrote:

>
> > If yes, maybe one should think about an open source one, which is
> >reliable and supports a richer set of Odbc functionality.
>
> I had a similar thought last week, which ended up with me discovering that
> the hive/odbc folder is full of dead code.
>
> I'm going to rm -rvf odbc/ with
> https://issues.apache.org/jira/browse/HIVE-13234
>
> > I can see already some improvements that could be made especially for
> >visual analytic tools, such as Tableau or Spotfire.
>
> The ODBC capabilities bitsets need to be upgraded for something like
> Tableau's "Add to Context" to create a temporary table instead of running
> the full query everytime, for instance.
>
> Cheers,
> Gopal
>
>
>


Re: ODBC drivers for Hive 2

2016-03-10 Thread Gopal Vijayaraghavan

> If yes, maybe one should think about an open source one, which is
>reliable and supports a richer set of Odbc functionality.

I had a similar thought last week, which ended up with me discovering that
the hive/odbc folder is full of dead code.

I'm going to rm -rvf odbc/ with
https://issues.apache.org/jira/browse/HIVE-13234

> I can see already some improvements that could be made especially for
>visual analytic tools, such as Tableau or Spotfire.

The ODBC capabilities bitsets need to be upgraded for something like
Tableau's "Add to Context" to create a temporary table instead of running
the full query everytime, for instance.

Cheers,
Gopal




Re: ODBC drivers for Hive 2

2016-03-10 Thread Mich Talebzadeh
I guess these vendor drivers are variants of generic ODBC 3 drivers. Some
of them like Progress direct do bespoke drivers. They used to be called
Merant drivers that made drivers for Oracle, Sybase and so forth and now
rebadged as Progress DataDirect ODBC drivers.

The issue is that I don't intend to fetch data from Hive. I am interested
in Hive metadata that actually is a schema stored in Oracle database in my
case. Also the physical model in Hive does not show table indexes or
constraints. Certainly These exist in Hive schema like below

[image: Inline images 2]

But not shown in Power Designer Physical model.

Does anyone know of a tool that shows full Hive schema with storage types
the table is built on etc?

Thanks


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 10 March 2016 at 21:03, Jörn Franke  wrote:

>
> Just out of curiosity: what is the code base for the odbc drivers by
> Hortonworks, cloudera & co? Did they develop them on their own?
>
> If yes, maybe one should think about an open source one, which is reliable
> and supports a richer set of Odbc functionality.
>
> Especially in the light of Orc,parquet, llap, tez and spark on Hive the
> odbc driver has actually now some use cases for interactive analytics. I
> can see already some improvements that could be made especially for visual
> analytic tools, such as Tableau or Spotfire.
>
> On 10 Mar 2016, at 21:49, Toby Allsopp 
> wrote:
>
> I've had the best luck with the Hortonworks driver (32-bit Windows). The
> Cloudera and Microsoft ones have seemed flaky (crashes, some SQL not
> supported). I haven't tried the Data Direct driver.
>
> Cheers,
> Toby.
>
> On Fri, Mar 11, 2016 at 4:46 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Hi,
>>
>> The best ODBC drivers that I found to work with Hive 2 is Progress Data
>> Direct driver for Hive (ODBC 3 compliant)..
>>
>> I tried Cloudera one very shaky (although I tried that on Hive 1.2.1).
>> Tried Microsoft ones but it hangs. Just to be clear I am using 64-bit
>> drivers.
>>
>> This is not direct data fetch from Hive tables. I have used Power
>> Designer to create a Physical Mo from Hive schema/database using ODBC3
>> connection (Power designer does not have Hive in list of its databases so
>> ODB3  is the choice).
>>
>> I then intend to create a logical model from this physical model.
>>
>> Anyone has better suggestions(s) for Hive ODBC drivers
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>
>


Re: Hive Cli ORC table read error with limit option

2016-03-10 Thread Prasanth Jayachandran
Alternatively you can send orcfiledump output for the empty orc file from 
broken partition.

Thanks
Prasanth
On Mar 10, 2016, at 5:11 PM, Prasanth Jayachandran 
mailto:pjayachand...@hortonworks.com>> wrote:

Could you attach the emtpy orc files from one of the broken partition 
somewhere? I can run some tests on it to see why its happening.

Thanks
Prasanth

On Mar 8, 2016, at 12:02 AM, Biswajit Nayak 
mailto:biswa...@altiscale.com>> wrote:

Both the parameters are set to false by default.

hive> set hive.optimize.index.filter;
hive.optimize.index.filter=false
hive> set hive.orc.splits.include.file.footer;
hive.orc.splits.include.file.footer=false
hive>

>>>I suspect this might be related to having 0 row files in the buckets not
having any recorded schema.

yes there are few files with 0 row, but the query works with other partition 
(which has 0 row files). Out of 30 partition (for a month), 3-4 partition are 
having this issue. Even reload of the data does not yield anything. Query works 
fine in MR now, but having issue in tez.



On Tue, Mar 8, 2016 at 2:43 AM, Gopal Vijayaraghavan 
mailto:gop...@apache.org>> wrote:

> cvarchar(2)
...
> Num Buckets: 7

I suspect this might be related to having 0 row files in the buckets not
having any recorded schema.

You can also experiment with hive.optimize.index.filter=false, to see if
the zero row case is artificially produced via predicate push-down.


That shouldn't be a problem unless you've turned on
hive.orc.splits.include.file.footer=true (recommended to be false).

Your row-locations don't actually match any Apache source jar in my
builds, are there any other patches to consider?

Cheers,
Gopal







Re: Hive Cli ORC table read error with limit option

2016-03-10 Thread Prasanth Jayachandran
Could you attach the emtpy orc files from one of the broken partition 
somewhere? I can run some tests on it to see why its happening.

Thanks
Prasanth

On Mar 8, 2016, at 12:02 AM, Biswajit Nayak 
mailto:biswa...@altiscale.com>> wrote:

Both the parameters are set to false by default.

hive> set hive.optimize.index.filter;
hive.optimize.index.filter=false
hive> set hive.orc.splits.include.file.footer;
hive.orc.splits.include.file.footer=false
hive>

>>>I suspect this might be related to having 0 row files in the buckets not
having any recorded schema.

yes there are few files with 0 row, but the query works with other partition 
(which has 0 row files). Out of 30 partition (for a month), 3-4 partition are 
having this issue. Even reload of the data does not yield anything. Query works 
fine in MR now, but having issue in tez.



On Tue, Mar 8, 2016 at 2:43 AM, Gopal Vijayaraghavan 
mailto:gop...@apache.org>> wrote:

> cvarchar(2)
...
> Num Buckets: 7

I suspect this might be related to having 0 row files in the buckets not
having any recorded schema.

You can also experiment with hive.optimize.index.filter=false, to see if
the zero row case is artificially produced via predicate push-down.


That shouldn't be a problem unless you've turned on
hive.orc.splits.include.file.footer=true (recommended to be false).

Your row-locations don't actually match any Apache source jar in my
builds, are there any other patches to consider?

Cheers,
Gopal






Re: Hive alter table concatenate loses data - can parquet help?

2016-03-10 Thread Prasanth Jayachandran
After hive 1.2.1 there is one patch that went in related to alter table 
concatenation. https://issues.apache.org/jira/browse/HIVE-12450

I am not sure if its related though. Could you please file a bug for this? It 
will be great if you can attach a small enough repro for this issue. I can 
verify it and provide a fix in case of bug.

Thanks
Prasanth

On Mar 8, 2016, at 5:52 AM, Marcin Tustin 
mailto:mtus...@handybook.com>> wrote:

Hi Mich,

ddl as below.

Hi Prasanth,

Hive version as reported by Hortonworks is 1.2.1.2.3.

Thanks,
Marcin


CREATE TABLE ``(

  `col1` string,

  `col2` bigint,

  `col3` string,

  `col4` string,

  `col4` string,

  `col5` bigint,

  `col6` string,

  `col7` string,

  `col8` string,

  `col9` string,

  `col10` boolean,

  `col11` boolean,

  `col12` string,

  `metadata` 
struct,

  `col14` string,

  `col15` bigint,

  `col16` double,

  `col17` bigint)

ROW FORMAT SERDE

  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'

STORED AS INPUTFORMAT

  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'

OUTPUTFORMAT

  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'

LOCATION

  'hdfs://reporting-handy/'

TBLPROPERTIES (

  'COLUMN_STATS_ACCURATE'='true',

  'numFiles'='2800',

  'numRows'='297263',

  'rawDataSize'='454748401',

  'totalSize'='31310353',

  'transient_lastDdlTime'='1457437204')

Time taken: 1.062 seconds, Fetched: 34 row(s)

On Tue, Mar 8, 2016 at 4:29 AM, Mich Talebzadeh 
mailto:mich.talebza...@gmail.com>> wrote:
Hi

can you please provide DDL for this table "show create table "

Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com



On 7 March 2016 at 23:25, Marcin Tustin 
mailto:mtus...@handybook.com>> wrote:
Hi All,

Following on from from our parquet vs orc discussion, today I observed hive's 
alter table ... concatenate command remove rows from an ORC formatted table.

1. Has anyone else observed this (fuller description below)? And
2. How to do parquet users handle the file fragmentation issue?

Description of the problem:

Today I ran a query to count rows by date. Relevant days below:
2016-02-28 16866
2016-03-06 219
2016-03-07 2863
I then ran concatenation on that table. Rerunning the same query resulted in:

2016-02-28 16866
2016-03-06 219
2016-03-07 1158

Note reduced count for 2016-03-07

I then ran concatenation a second time, and the query a third time:
2016-02-28 16344
2016-03-06 219
2016-03-07 1158

Now the count for 2016-02-28 is reduced.

This doesn't look like an elimination of duplicates occurring by design - these 
didn't all happen on the first run of concatenation. It looks like 
concatenation just kind of loses data.



Want to work at Handy? Check out our culture deck and open 
roles
Latest news at Handy
Handy just raised 
$50m
 led by Fidelity

[http://marketing-email-assets.handybook.com/smalllogo.png]



Want to work at Handy? Check out our culture deck and open 
roles
Latest news at Handy
Handy just raised 
$50m
 led by Fidelity

[http://marketing-email-assets.handybook.com/smalllogo.png]



Re: ODBC drivers for Hive 2

2016-03-10 Thread Jörn Franke

Just out of curiosity: what is the code base for the odbc drivers by 
Hortonworks, cloudera & co? Did they develop them on their own?

If yes, maybe one should think about an open source one, which is reliable and 
supports a richer set of Odbc functionality.

Especially in the light of Orc,parquet, llap, tez and spark on Hive the odbc 
driver has actually now some use cases for interactive analytics. I can see 
already some improvements that could be made especially for visual analytic 
tools, such as Tableau or Spotfire.

> On 10 Mar 2016, at 21:49, Toby Allsopp  wrote:
> 
> I've had the best luck with the Hortonworks driver (32-bit Windows). The 
> Cloudera and Microsoft ones have seemed flaky (crashes, some SQL not 
> supported). I haven't tried the Data Direct driver.
> 
> Cheers,
> Toby.
> 
>> On Fri, Mar 11, 2016 at 4:46 AM, Mich Talebzadeh  
>> wrote:
>> Hi,
>> 
>> The best ODBC drivers that I found to work with Hive 2 is Progress Data 
>> Direct driver for Hive (ODBC 3 compliant)..
>> 
>> I tried Cloudera one very shaky (although I tried that on Hive 1.2.1). Tried 
>> Microsoft ones but it hangs. Just to be clear I am using 64-bit drivers.
>> 
>> This is not direct data fetch from Hive tables. I have used Power Designer 
>> to create a Physical Mo from Hive schema/database using ODBC3 connection 
>> (Power designer does not have Hive in list of its databases so ODB3  is the 
>> choice).
>> 
>> I then intend to create a logical model from this physical model.
>> 
>> Anyone has better suggestions(s) for Hive ODBC drivers
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>  
>> http://talebzadehmich.wordpress.com
> 


Re: ODBC drivers for Hive 2

2016-03-10 Thread Toby Allsopp
I've had the best luck with the Hortonworks driver (32-bit Windows). The
Cloudera and Microsoft ones have seemed flaky (crashes, some SQL not
supported). I haven't tried the Data Direct driver.

Cheers,
Toby.

On Fri, Mar 11, 2016 at 4:46 AM, Mich Talebzadeh 
wrote:

> Hi,
>
> The best ODBC drivers that I found to work with Hive 2 is Progress Data
> Direct driver for Hive (ODBC 3 compliant)..
>
> I tried Cloudera one very shaky (although I tried that on Hive 1.2.1).
> Tried Microsoft ones but it hangs. Just to be clear I am using 64-bit
> drivers.
>
> This is not direct data fetch from Hive tables. I have used Power Designer
> to create a Physical Mo from Hive schema/database using ODBC3 connection
> (Power designer does not have Hive in list of its databases so ODB3  is the
> choice).
>
> I then intend to create a logical model from this physical model.
>
> Anyone has better suggestions(s) for Hive ODBC drivers
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>


Re: [ANNOUNCE] New Hive Committer - Wei Zheng

2016-03-10 Thread Wei Zheng
Thanks folks! I'm happy to be part of the community and able to contribute!

Thanks,
Wei

From: Eugene Koifman mailto:ekoif...@hortonworks.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Thursday, March 10, 2016 at 10:25
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Cc: "d...@hive.apache.org" 
mailto:d...@hive.apache.org>>, 
"w...@apache.org" 
mailto:w...@apache.org>>
Subject: Re: [ANNOUNCE] New Hive Committer - Wei Zheng

Congratulations!

From: Hari Sivarama Subramaniyan 
mailto:hsubramani...@hortonworks.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Thursday, March 10, 2016 at 10:02 AM
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Cc: "d...@hive.apache.org" 
mailto:d...@hive.apache.org>>, 
"w...@apache.org" 
mailto:w...@apache.org>>
Subject: Re: [ANNOUNCE] New Hive Committer - Wei Zheng


​Congrats Wei.


Thanks

Hari


From: Pengcheng Xiong mailto:pxi...@apache.org>>
Sent: Thursday, March 10, 2016 9:50 AM
To: user@hive.apache.org
Cc: d...@hive.apache.org; 
w...@apache.org
Subject: Re: [ANNOUNCE] New Hive Committer - Wei Zheng

Big congrats Wei!

Pengcheng

On Thu, Mar 10, 2016 at 9:25 AM, Vijay K.N 
mailto:vijayja...@gmail.com>> wrote:

Congrats Wei Zheng!!

On Mar 10, 2016 6:57 AM, "Vikram Dixit K" 
mailto:vik...@apache.org>> wrote:
The Apache Hive PMC has voted to make Wei Zheng a committer on the Apache Hive 
Project. Please join me in congratulating Wei.

Thanks
Vikram.



Re: [ANNOUNCE] New Hive Committer - Wei Zheng

2016-03-10 Thread Eugene Koifman
Congratulations!

From: Hari Sivarama Subramaniyan 
mailto:hsubramani...@hortonworks.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Thursday, March 10, 2016 at 10:02 AM
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Cc: "d...@hive.apache.org" 
mailto:d...@hive.apache.org>>, 
"w...@apache.org" 
mailto:w...@apache.org>>
Subject: Re: [ANNOUNCE] New Hive Committer - Wei Zheng


​Congrats Wei.


Thanks

Hari


From: Pengcheng Xiong mailto:pxi...@apache.org>>
Sent: Thursday, March 10, 2016 9:50 AM
To: user@hive.apache.org
Cc: d...@hive.apache.org; 
w...@apache.org
Subject: Re: [ANNOUNCE] New Hive Committer - Wei Zheng

Big congrats Wei!

Pengcheng

On Thu, Mar 10, 2016 at 9:25 AM, Vijay K.N 
mailto:vijayja...@gmail.com>> wrote:

Congrats Wei Zheng!!

On Mar 10, 2016 6:57 AM, "Vikram Dixit K" 
mailto:vik...@apache.org>> wrote:
The Apache Hive PMC has voted to make Wei Zheng a committer on the Apache Hive 
Project. Please join me in congratulating Wei.

Thanks
Vikram.



Re: [ANNOUNCE] New Hive Committer - Wei Zheng

2016-03-10 Thread Hari Sivarama Subramaniyan
?Congrats Wei.


Thanks

Hari


From: Pengcheng Xiong 
Sent: Thursday, March 10, 2016 9:50 AM
To: user@hive.apache.org
Cc: d...@hive.apache.org; w...@apache.org
Subject: Re: [ANNOUNCE] New Hive Committer - Wei Zheng

Big congrats Wei!

Pengcheng

On Thu, Mar 10, 2016 at 9:25 AM, Vijay K.N 
mailto:vijayja...@gmail.com>> wrote:

Congrats Wei Zheng!!

On Mar 10, 2016 6:57 AM, "Vikram Dixit K" 
mailto:vik...@apache.org>> wrote:
The Apache Hive PMC has voted to make Wei Zheng a committer on the Apache Hive 
Project. Please join me in congratulating Wei.

Thanks
Vikram.



Re: [ANNOUNCE] New Hive Committer - Wei Zheng

2016-03-10 Thread Pengcheng Xiong
Big congrats Wei!

Pengcheng

On Thu, Mar 10, 2016 at 9:25 AM, Vijay K.N  wrote:

> Congrats Wei Zheng!!
> On Mar 10, 2016 6:57 AM, "Vikram Dixit K"  wrote:
>
>> The Apache Hive PMC has voted to make Wei Zheng a committer on the Apache
>> Hive Project. Please join me in congratulating Wei.
>>
>> Thanks
>> Vikram.
>>
>


Re: [ANNOUNCE] New Hive Committer - Wei Zheng

2016-03-10 Thread Vijay K.N
Congrats Wei Zheng!!
On Mar 10, 2016 6:57 AM, "Vikram Dixit K"  wrote:

> The Apache Hive PMC has voted to make Wei Zheng a committer on the Apache
> Hive Project. Please join me in congratulating Wei.
>
> Thanks
> Vikram.
>


what are the exact performance criteria when you are comparing pig and hive?

2016-03-10 Thread dhruv kapatel
HI

I've gone through https://issues.apache.org/jira/browse/HIVE-396 benchmarks.
My question is can we generalize that hive will be faster in all the use
cases.

   - weblogs
   - xml
   - json

Does hive performance depend on different use cases or type of data we are
processing?



-- 


*With Regards:Kapatel Dhruv v*


Re: [ANNOUNCE] New Hive Committer - Wei Zheng

2016-03-10 Thread Jesus Camacho Rodriguez
Congrats Wei!


From: Madhu Thalakola mailto:madhu8...@yahoo.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>, Madhu Thalakola 
mailto:madhu8...@yahoo.com>>
Date: Thursday, March 10, 2016 at 2:47 PM
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Cc: "d...@hive.apache.org" 
mailto:d...@hive.apache.org>>, 
"w...@apache.org" 
mailto:w...@apache.org>>
Subject: Re: [ANNOUNCE] New Hive Committer - Wei Zheng

Congratulations Wei Zheng

Thanks,
MAdhu
Help ever, Hurt never






Re: 答复: [ANNOUNCE] New Hive Committer - Wei Zheng

2016-03-10 Thread Takahiko Saito
Congrats, Wei!

On Thu, Mar 10, 2016 at 6:49 AM, Daniel Lopes 
wrote:

> \o/
>
> On Thu, Mar 10, 2016 at 11:31 AM, 谭成灶  wrote:
>
>> Congratulations, Wei !
>>
>> --
>> 发件人: Madhu Thalakola 
>> 发送时间: ‎2016/‎3/‎10 21:47
>> 收件人: user@hive.apache.org
>> 抄送: d...@hive.apache.org; w...@apache.org
>> 主题: Re: [ANNOUNCE] New Hive Committer - Wei Zheng
>>
>> Congratulations Wei Zheng
>>
>> Thanks,
>> MAdhu
>> Help ever, Hurt never
>>
>>
>>
>>
>>
>
>
> --
> *Daniel Lopes, B.Eng*
> Data Scientist - BankFacil
> CREA/SP 5069410560
> 
> Mob +55 (18) 99764-2733 
> Ph +55 (11) 3522-8009
> http://about.me/dannyeuu
>
> Av. Nova Independência, 956, São Paulo, SP
> Bairro Brooklin Paulista
> CEP 04570-001
> https://www.bankfacil.com.br
>
>


-- 
Takahiko Saito


ODBC drivers for Hive 2

2016-03-10 Thread Mich Talebzadeh
Hi,

The best ODBC drivers that I found to work with Hive 2 is Progress Data
Direct driver for Hive (ODBC 3 compliant)..

I tried Cloudera one very shaky (although I tried that on Hive 1.2.1).
Tried Microsoft ones but it hangs. Just to be clear I am using 64-bit
drivers.

This is not direct data fetch from Hive tables. I have used Power Designer
to create a Physical Mo from Hive schema/database using ODBC3 connection
(Power designer does not have Hive in list of its databases so ODB3  is the
choice).

I then intend to create a logical model from this physical model.

Anyone has better suggestions(s) for Hive ODBC drivers

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


Re: impala like formatting of tables inside Hive cli

2016-03-10 Thread Mich Talebzadeh
Try beeline tool to connect to HIve Server2.

It may not be that  aesthetically close.


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 10 March 2016 at 15:09, Awhan Patnaik  wrote:

> Is there a setting that will yield nicely formatted tables as in Impala. I
> am attaching an example of what I mean.
>


impala like formatting of tables inside Hive cli

2016-03-10 Thread Awhan Patnaik
Is there a setting that will yield nicely formatted tables as in Impala. I
am attaching an example of what I mean.


Re: 答复: [ANNOUNCE] New Hive Committer - Wei Zheng

2016-03-10 Thread Daniel Lopes
\o/

On Thu, Mar 10, 2016 at 11:31 AM, 谭成灶  wrote:

> Congratulations, Wei !
>
> --
> 发件人: Madhu Thalakola 
> 发送时间: ‎2016/‎3/‎10 21:47
> 收件人: user@hive.apache.org
> 抄送: d...@hive.apache.org; w...@apache.org
> 主题: Re: [ANNOUNCE] New Hive Committer - Wei Zheng
>
> Congratulations Wei Zheng
>
> Thanks,
> MAdhu
> Help ever, Hurt never
>
>
>
>
>


-- 
*Daniel Lopes, B.Eng*
Data Scientist - BankFacil
CREA/SP 5069410560

Mob +55 (18) 99764-2733 
Ph +55 (11) 3522-8009
http://about.me/dannyeuu

Av. Nova Independência, 956, São Paulo, SP
Bairro Brooklin Paulista
CEP 04570-001
https://www.bankfacil.com.br


答复: [ANNOUNCE] New Hive Committer - Wei Zheng

2016-03-10 Thread 谭成灶
Congratulations, Wei !


发件人: Madhu Thalakola
发送时间: ‎2016/‎3/‎10 21:47
收件人: user@hive.apache.org
抄送: d...@hive.apache.org; 
w...@apache.org
主题: Re: [ANNOUNCE] New Hive Committer - Wei Zheng

Congratulations Wei Zheng Thanks,
MAdhu
Help ever, Hurt never






Re: [ANNOUNCE] New Hive Committer - Wei Zheng

2016-03-10 Thread Madhu Thalakola
Congratulations Wei Zheng Thanks, 
MAdhu 
Help ever, Hurt never



  

Hive StreamingAPI leaves table in not consistent state

2016-03-10 Thread Igor Kuzmenko
Hello, I'm using Hortonworks Data Platform 2.3.4 which includes Apache Hive
1.2.1 and Apache Storm 0.10.
I've build Storm topology using Hive Bolt, which eventually using Hive
StreamingAPI to stream data into hive table.
In Hive I've created transactional table:


   1. CREATE EXTERNAL TABLE cdr1 (
   2. 
   3. )
   4. PARTITIONED BY (dt INT)
   5. CLUSTERED BY (telcoId) INTO 5 buckets
   6. STORED AS ORC
   7. LOCATION '/data/sorm3/cdr/cdr1'
   8. TBLPROPERTIES ("transactional"="true")


Hive settings:


   1. hive.support.concurrency=true
   2. hive.enforce.bucketing=true
   3. hive.exec.dynamic.partition.mode=nonstrict
   4. hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
   5. hive.compactor.initiator.on=true
   6. hive.compactor.worker.threads=1


When I run my Storm Topology it fails with OutOfMemoryException. The Storm
exception doesn't bother me, it was just a test. But after topology fail my
Hive table is not consistent.
Simple select from table leads into exception:

SELECT COUNT(*) FROM cdr1
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1,
vertexId=vertex_1453891518300_0098_1_00, diagnostics=[Task failed,
taskId=task_1453891518300_0098_1_00_00, diagnostics=[TaskAttempt 0
failed, info=[Error: Failure while running task:java.lang.RuntimeException:
java.lang.RuntimeException: java.io.IOException: java.io.EOFException

Caused by: java.io.IOException: java.io.EOFException
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:251)
at
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)
... 19 more
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:370)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:317)
at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:238)
at
org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:460)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1269)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1151)
at
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:249)
... 20 more
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1
killedTasks:0, Vertex vertex_1453891518300_0098_1_00 [Map 1] killed/failed
due to:OWN_TASK_FAILURE]
ERROR : Vertex killed, vertexName=Reducer 2,
vertexId=vertex_1453891518300_0098_1_01, diagnostics=[Vertex received Kill
while in RUNNING state., Vertex did not succeed due to
OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:1, Vertex
vertex_1453891518300_0098_1_01 [Reducer 2] killed/failed due
to:OTHER_VERTEX_FAILURE]
ERROR : DAG did not succeed due to VERTEX_FAILURE. failedVertices:1
killedVertices:1



Compaction fails with same exception:

2016-03-10 13:20:54,550 WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child : java.io.EOFException: Cannot seek after EOF
at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1488)
at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:62)
at
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:368)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:317)
at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:238)
at
org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:460)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:1362)
at
org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:565)
at
org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:544)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)



Looking throw files that was created by streaming I've found several zero
sized ORC files. Probably these files leads to exception.


Is it normal for hive transactional table? How can I prevent such behavior?


Error in Hive on Spark

2016-03-10 Thread Stana
 I am trying out Hive on Spark with hive 2.0.0 and spark 1.4.1, and
executing org.apache.hadoop.hive.ql.Driver with java application.

Following are my situations:
1.Building spark 1.4.1 assembly jar without Hive .
2.Uploading the spark assembly jar to the hadoop cluster.
3.Executing the java application with eclipse IDE in my client computer.

The application went well and it submitted mr job to the yarn cluster
successfully when using " hiveConf.set("hive.execution.engine", "mr")
",but it threw exceptions in spark-engine.

Finally, i traced Hive source code and came to the conclusion:

In my situation, SparkClientImpl class will generate the spark-submit
shell and executed it.
The shell command allocated  --class with RemoteDriver.class.getName()
and jar with SparkContext.jarOfClass(this.getClass()).get(), so that
my application threw the exception.

Is it right? And how can I do to execute the application with
spark-engine successfully in my client computer ? Thanks a lot!


Java application code:

public class TestHiveDriver {

private static HiveConf hiveConf;
private static Driver driver;
private static CliSessionState ss;
public static void main(String[] args){

String sql = "select * from hadoop0263_0 as a join hadoop0263_0 
as b
on (a.key = b.key)";
ss = new CliSessionState(new HiveConf(SessionState.class));
hiveConf = new HiveConf(Driver.class);
hiveConf.set("fs.default.name", "hdfs://storm0:9000");
hiveConf.set("yarn.resourcemanager.address", "storm0:8032");
hiveConf.set("yarn.resourcemanager.scheduler.address", 
"storm0:8030");

hiveConf.set("yarn.resourcemanager.resource-tracker.address","storm0:8031");
hiveConf.set("yarn.resourcemanager.admin.address", 
"storm0:8033");
hiveConf.set("mapreduce.framework.name", "yarn");
hiveConf.set("mapreduce.johistory.address", "storm0:10020");

hiveConf.set("javax.jdo.option.ConnectionURL","jdbc:mysql://storm0:3306/stana_metastore");

hiveConf.set("javax.jdo.option.ConnectionDriverName","com.mysql.jdbc.Driver");
hiveConf.set("javax.jdo.option.ConnectionUserName", "root");
hiveConf.set("javax.jdo.option.ConnectionPassword", "123456");
hiveConf.setBoolean("hive.auto.convert.join",false);
hiveConf.set("spark.yarn.jar",
"hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar");
hiveConf.set("spark.home","target/spark");
hiveConf.set("hive.execution.engine", "spark");
hiveConf.set("hive.dbname", "default");


driver = new Driver(hiveConf);
SessionState.start(hiveConf);

CommandProcessorResponse res = null;
try {
res = driver.run(sql);
} catch (CommandNeedRetryException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

System.out.println("Response Code:" + res.getResponseCode());
System.out.println("Error Message:" + res.getErrorMessage());
System.out.println("SQL State:" + res.getSQLState());

}
}




Exception of spark-engine:

16/03/10 18:32:58 INFO SparkClientImpl: Running client driver with
argv: 
/Volumes/Sdhd/Documents/project/island/java/apache/hive-200-test/hive-release-2.0.0/itests/hive-unit/target/spark/bin/spark-submit
--properties-file
/var/folders/vt/cjcdhms903x7brn1kbh558s4gn/T/spark-submit.7697089826296920539.properties
--class org.apache.hive.spark.client.RemoteDriver
/Users/stana/.m2/repository/org/apache/hive/hive-exec/2.0.0/hive-exec-2.0.0.jar
--remote-host MacBook-Pro.local --remote-port 51331 --conf
hive.spark.client.connect.timeout=1000 --conf
hive.spark.client.server.connect.timeout=9 --conf
hive.spark.client.channel.log.level=null --conf
hive.spark.client.rpc.max.size=52428800 --conf
hive.spark.client.rpc.threads=8 --conf
hive.spark.client.secret.bits=256
16/03/10 18:33:09 INFO SparkClientImpl: 16/03/10 18:33:09 INFO Client:
16/03/10 18:33:09 INFO SparkClientImpl:  client token: N/A
16/03/10 18:33:09 INFO SparkClientImpl:  diagnostics: N/A
16/03/10 18:33:09 INFO SparkClientImpl:  ApplicationMaster host: N/A
16/03/10 18:33:09 INFO SparkClientImpl:  ApplicationMaster RPC port: -1
16/03/10 18:33:09 INFO SparkClientImpl:  queue: default
16/03/10 18:33:09 INFO SparkClientImpl:  start time: 1457180833494
16/03/10 18:33:09 INFO SparkClientImpl:  final status: UNDEFINED
16/03/10 18:33:09 INFO SparkClientImpl:  tracking URL:
http://storm0:8088/proxy/application_1457002628102_0043/
16/03/10 18:33:09 INFO SparkClientImpl:  user: stana
16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33

Re: [ANNOUNCE] New Hive Committer - Wei Zheng

2016-03-10 Thread Mich Talebzadeh
Best wishes

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 10 March 2016 at 10:48, Loïc Chanel  wrote:

> Congratulations !
>
> Loïc CHANEL
> System & virtualization engineer
> TO - XaaS Ind - Worldline (Villeurbanne, France)
>
> 2016-03-10 11:44 GMT+01:00 Jeff Zhang :
>
>> Congratulations, Wei !
>>
>> On Thu, Mar 10, 2016 at 3:27 PM, Lefty Leverenz 
>> wrote:
>>
>>> Congratulations!
>>>
>>> -- Lefty
>>>
>>> On Wed, Mar 9, 2016 at 10:30 PM, Dmitry Tolpeko 
>>> wrote:
>>>
 Congratulations, Wei!

 On Thu, Mar 10, 2016 at 5:48 AM, Chao Sun  wrote:

> Congratulations!
>
> On Wed, Mar 9, 2016 at 6:44 PM, Prasanth Jayachandran <
> pjayachand...@hortonworks.com> wrote:
>
>> Congratulations Wei!
>>
>> On Mar 9, 2016, at 8:43 PM, Sergey Shelukhin > > wrote:
>>
>> Congrats!
>>
>> From: Szehon Ho mailto:sze...@cloudera.com>>
>> Reply-To: "user@hive.apache.org" <
>> user@hive.apache.org>
>> Date: Wednesday, March 9, 2016 at 17:40
>> To: "user@hive.apache.org" <
>> user@hive.apache.org>
>> Cc: "d...@hive.apache.org" <
>> d...@hive.apache.org>, "w...@apache.org
>> " mailto:w...@apache.org>>
>> Subject: Re: [ANNOUNCE] New Hive Committer - Wei Zheng
>>
>> Congratulations Wei!
>>
>> On Wed, Mar 9, 2016 at 5:26 PM, Vikram Dixit K > > wrote:
>> The Apache Hive PMC has voted to make Wei Zheng a committer on the
>> Apache Hive Project. Please join me in congratulating Wei.
>>
>> Thanks
>> Vikram.
>>
>>
>>
>

>>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>


Re: [ANNOUNCE] New Hive Committer - Wei Zheng

2016-03-10 Thread Loïc Chanel
Congratulations !

Loïc CHANEL
System & virtualization engineer
TO - XaaS Ind - Worldline (Villeurbanne, France)

2016-03-10 11:44 GMT+01:00 Jeff Zhang :

> Congratulations, Wei !
>
> On Thu, Mar 10, 2016 at 3:27 PM, Lefty Leverenz 
> wrote:
>
>> Congratulations!
>>
>> -- Lefty
>>
>> On Wed, Mar 9, 2016 at 10:30 PM, Dmitry Tolpeko 
>> wrote:
>>
>>> Congratulations, Wei!
>>>
>>> On Thu, Mar 10, 2016 at 5:48 AM, Chao Sun  wrote:
>>>
 Congratulations!

 On Wed, Mar 9, 2016 at 6:44 PM, Prasanth Jayachandran <
 pjayachand...@hortonworks.com> wrote:

> Congratulations Wei!
>
> On Mar 9, 2016, at 8:43 PM, Sergey Shelukhin  > wrote:
>
> Congrats!
>
> From: Szehon Ho mailto:sze...@cloudera.com>>
> Reply-To: "user@hive.apache.org" <
> user@hive.apache.org>
> Date: Wednesday, March 9, 2016 at 17:40
> To: "user@hive.apache.org" <
> user@hive.apache.org>
> Cc: "d...@hive.apache.org" <
> d...@hive.apache.org>, "w...@apache.org
> " mailto:w...@apache.org>>
> Subject: Re: [ANNOUNCE] New Hive Committer - Wei Zheng
>
> Congratulations Wei!
>
> On Wed, Mar 9, 2016 at 5:26 PM, Vikram Dixit K  > wrote:
> The Apache Hive PMC has voted to make Wei Zheng a committer on the
> Apache Hive Project. Please join me in congratulating Wei.
>
> Thanks
> Vikram.
>
>
>

>>>
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>


Re: [ANNOUNCE] New Hive Committer - Wei Zheng

2016-03-10 Thread Jeff Zhang
Congratulations, Wei !

On Thu, Mar 10, 2016 at 3:27 PM, Lefty Leverenz 
wrote:

> Congratulations!
>
> -- Lefty
>
> On Wed, Mar 9, 2016 at 10:30 PM, Dmitry Tolpeko 
> wrote:
>
>> Congratulations, Wei!
>>
>> On Thu, Mar 10, 2016 at 5:48 AM, Chao Sun  wrote:
>>
>>> Congratulations!
>>>
>>> On Wed, Mar 9, 2016 at 6:44 PM, Prasanth Jayachandran <
>>> pjayachand...@hortonworks.com> wrote:
>>>
 Congratulations Wei!

 On Mar 9, 2016, at 8:43 PM, Sergey Shelukhin >>> > wrote:

 Congrats!

 From: Szehon Ho mailto:sze...@cloudera.com>>
 Reply-To: "user@hive.apache.org" <
 user@hive.apache.org>
 Date: Wednesday, March 9, 2016 at 17:40
 To: "user@hive.apache.org" <
 user@hive.apache.org>
 Cc: "d...@hive.apache.org" <
 d...@hive.apache.org>, "w...@apache.org
 " mailto:w...@apache.org>>
 Subject: Re: [ANNOUNCE] New Hive Committer - Wei Zheng

 Congratulations Wei!

 On Wed, Mar 9, 2016 at 5:26 PM, Vikram Dixit K >>> > wrote:
 The Apache Hive PMC has voted to make Wei Zheng a committer on the
 Apache Hive Project. Please join me in congratulating Wei.

 Thanks
 Vikram.



>>>
>>
>


-- 
Best Regards

Jeff Zhang