[jira] [Created] (HIVE-14771) Add a warning to the jenkins report for slow tests

2016-09-15 Thread Siddharth Seth (JIRA)
Siddharth Seth created HIVE-14771:
-

 Summary: Add a warning to the jenkins report for slow tests
 Key: HIVE-14771
 URL: https://issues.apache.org/jira/browse/HIVE-14771
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth


Initially for new tests being slow, or existing tests slowing down a lot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14770) too many locks acquired?

2016-09-15 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-14770:
-

 Summary: too many locks acquired?
 Key: HIVE-14770
 URL: https://issues.apache.org/jira/browse/HIVE-14770
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Reporter: Eugene Koifman
Assignee: Eugene Koifman


need to verify

UpdateDeleteSemanticAnalyzer.reparseAndSuperAnalyze() has
{noformat}
if (inputIsPartitioned(inputs)) {
  // In order to avoid locking the entire write table we need to replace 
the single WriteEntity
  // with a WriteEntity for each partition
  outputs.clear();
  for (ReadEntity input : inputs) {
if (input.getTyp() == Entity.Type.PARTITION) {
  WriteEntity.WriteType writeType = deleting() ? 
WriteEntity.WriteType.DELETE :
  WriteEntity.WriteType.UPDATE;
  outputs.add(new WriteEntity(input.getPartition(), writeType));
}
  }
} else {
{noformat}

but this seems to assume that each partition read is also written

shouldn't this check isWritten()?  see HIVE-11848



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] hive pull request #102: HIVE-14309. Fix naming of classes in ORC module.

2016-09-15 Thread omalley
GitHub user omalley opened a pull request:

https://github.com/apache/hive/pull/102

HIVE-14309. Fix naming of classes in ORC module.

Renames the classes in Hive from org.apache.orc.* to org.apache.hive.orc.*

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/omalley/hive hive-14309

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/102.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #102


commit 62407b2d21000d11e2579f878245049799f22f73
Author: Owen O'Malley 
Date:   2016-09-16T00:54:49Z

HIVE-14309. Fix naming of classes in ORC module.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (HIVE-14769) TestVectorRowObject test takes >12 mins

2016-09-15 Thread Prasanth Jayachandran (JIRA)
Prasanth Jayachandran created HIVE-14769:


 Summary: TestVectorRowObject test takes >12 mins
 Key: HIVE-14769
 URL: https://issues.apache.org/jira/browse/HIVE-14769
 Project: Hive
  Issue Type: Sub-task
  Components: Test
Affects Versions: 2.2.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran


TestVectorRowObject test loops over 100K * 100K iterations. The test runs way 
faster with 10K loop which should be sufficient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 51940: Add a new UDTF ExplodeByNumber

2016-09-15 Thread pengcheng xiong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51940/
---

Review request for hive and Ashutosh Chauhan.


Repository: hive-git


Description
---

HIVE-14768


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java a854f9f 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFExplodeByNumber.java
 PRE-CREATION 
  ql/src/test/queries/clientpositive/udtf_explode_number.q PRE-CREATION 
  ql/src/test/results/clientpositive/udtf_explode_number.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/51940/diff/


Testing
---


Thanks,

pengcheng xiong



[jira] [Created] (HIVE-14768) Add a new UDTF ExplodeByNumber

2016-09-15 Thread Pengcheng Xiong (JIRA)
Pengcheng Xiong created HIVE-14768:
--

 Summary: Add a new UDTF ExplodeByNumber
 Key: HIVE-14768
 URL: https://issues.apache.org/jira/browse/HIVE-14768
 Project: Hive
  Issue Type: Sub-task
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong


For intersect all and except all implementation purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Load performance with partitioned table

2016-09-15 Thread Jörn Franke
What is your hardware setup?
Are the bloom filters necessary on all columns? Usually they make only sense 
for non-numeric columns. Updating bloom filters take time and should be avoided 
where they do not make sense.
Can you provide an example of the data and the select queries that you execute 
on them?
Do you use compression on the tables? If so which?
What are the exact times and data volumes?

> On 15 Sep 2016, at 19:56, naveen mahadevuni  wrote:
> 
> Hi,
> 
> I'm using ORC format for our table storage. The table has a timestamp
> column(say TS) and 25 other columns. The other ORC properties we are using
> arestorage index and bloom filters. We are loading 100 million records in
> to this table on a 4-node cluster.
> 
> Our source table is a text table with CSV format. In the source table
> timestamp values come as BIGINT. In the INSERT SELECT, we use function
> "from_unixtime(sourceTable.TS)" to convert the BIGINT values to timestamp
> in the target ORC table. So the first INSERT SELECT in to non-partitioned
> table looks like this
> 
> 1) INSERT INTO TARGET SELECT from_unixtime(ts), col1, col2... from SOURCE.
> 
> I wanted to test by partitioning the table by date derived from this
> timestamp, so I used "to_date(from_unixtime(TS))" in the new INSERT SELECT
> with dynamic partitioning. The second one is
> 
> 2) INSERT INTO TARGET PARTITION(datecol) SELECT from_unixtime(ts), col1,
> col2... to_date(from_unixtime(ts)) as datecol from SOURCE.
> 
> The load time increased by 50% from 1 to 2. I understand the second
> statement involves creating many more partition directories and files.
> 
> Is there anyway we can improve the load time? In the second INSERT SELECT,
> will the result of the expression "from_unixtime(ts)" be reused in
> "to_date(from_unixtime(ts))"?
> 
> Thanks,
> Naveen


[jira] [Created] (HIVE-14767) Migrate slow MiniMr tests to faster options

2016-09-15 Thread Prasanth Jayachandran (JIRA)
Prasanth Jayachandran created HIVE-14767:


 Summary: Migrate slow MiniMr tests to faster options
 Key: HIVE-14767
 URL: https://issues.apache.org/jira/browse/HIVE-14767
 Project: Hive
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 2.2.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran


After analyzing the latest test results 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1200/testReport/org.apache.hadoop.hive.cli/
there are many Minimr tests that are slow individually. There are also some 
tests that does not have to run in minimr, some tests are testing hdfs related 
features that can be migrated to MiniLlap (some are already in MiniSpark). 
Since we have removed hadoop20 now some MiniMr tests can be moved to 
TestCliDriver. We should keep the absolute minimum number of tests in minimr 
and move other slow test to minillap (spark already has most of these tests).

These are the individual test runtimes for minimr tests
{code}
QFiles  TestMinimrCliDriver elapsed time (Build #1055)
infer_bucket_sort_reducers_power_two.q  356.211
bucket5.q   257.589
infer_bucket_sort_bucketed_table.q  251.557
bucketizedhiveinputformat.q 249.755
infer_bucket_sort_map_operators.q   249.484
index_bitmap_auto.q 212.9
infer_bucket_sort_dyn_part.q204.401
skewjoin_onesideskew.q  198.428
index_bitmap3.q 194.221
truncate_column_buckets.q   193.966
auto_sortmerge_join_16.q171.094
schemeAuthority.q   170.839
leftsemijoin_mr.q   150.493
join_acid_non_acid.q143.232
empty_dir_in_table.q136.463
bucketmapjoin6.q129.786
reduce_deduplicate.q116.455
exchgpartition2lel.q115.678
bucket_many.q   104.316
infer_bucket_sort_merge.q   100.976
quotedid_smb.q  95.801
external_table_with_space_in_location_path.q80.265
schemeAuthority2.q  72.64
root_dir_external_table.q   72.171
parallel_orderby.q  71.662
disable_merge_for_bucketing.q   69.25
bucket_num_reducers.q   67.618
remote_script.q 67.029
groupby2.q  66.72
bucket4.q   66.02
udf_using.q 64.985
file_with_header_footer.q   42.986
infer_bucket_sort_num_buckets.q 38.795
scriptfile1.q   35.317
bucket6.q   34.963
non_native_window_udf.q 34.129
join1.q 33.831
bucketmapjoin7.q33.71
bucket_num_reducers2.q  30.645
list_bucket_dml_10.q25.751
insert_dir_distcp.q 25.723
uber_reduce.q   16.481
load_hdfs_file_with_space_in_the_name.q 3.899
table_nonprintable.q2.097
load_fs2.q  1.947
input16_cc.q0.916
temp_table_external.q   0.751
import_exported_table.q 0.508
scriptfile1_win.q   0.119
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14766) ObjectStore.initialize() needs retry mechanisms in case of connection failures

2016-09-15 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-14766:
---

 Summary: ObjectStore.initialize() needs retry mechanisms in case 
of connection failures
 Key: HIVE-14766
 URL: https://issues.apache.org/jira/browse/HIVE-14766
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


RetryingHMSHandler handles retries to most HMSHandler calls. However, one area 
where we do not have retries is in the very instantiation of ObjectStore. The 
lack of retries here sometimes means that a flaky db connect around the time 
the metastore is started yields an unresponsive metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14765) metrics - gauge overwritten messages

2016-09-15 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-14765:
---

 Summary: metrics - gauge overwritten messages
 Key: HIVE-14765
 URL: https://issues.apache.org/jira/browse/HIVE-14765
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin


{noformat}
2016-09-14T21:09:55,553 WARN  [HiveServer2-HttpHandler-Pool: Thread-48]: 
metrics2.CodahaleMetrics (CodahaleMetrics.java:addGauge(304)) - A Gauge with 
name [init_total_count_dbs] already exists.  The old gauge will be overwritten, 
but this is not recommended
2016-09-14T21:09:55,553 WARN  [HiveServer2-HttpHandler-Pool: Thread-48]: 
metrics2.CodahaleMetrics (CodahaleMetrics.java:addGauge(304)) - A Gauge with 
name [init_total_count_tables] already exists.  The old gauge will be 
overwritten, but this is not recommended
2016-09-14T21:09:55,554 WARN  [HiveServer2-HttpHandler-Pool: Thread-48]: 
metrics2.CodahaleMetrics (CodahaleMetrics.java:addGauge(304)) - A Gauge with 
name [init_total_count_partitions] already exists.  The old gauge will be 
overwritten, but this is not recommended
{noformat}

Might have something to do with metastore being a threadlocal (just shooting in 
the dark)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Load performance with partitioned table

2016-09-15 Thread naveen mahadevuni
Hi,

I'm using ORC format for our table storage. The table has a timestamp
column(say TS) and 25 other columns. The other ORC properties we are using
arestorage index and bloom filters. We are loading 100 million records in
to this table on a 4-node cluster.

Our source table is a text table with CSV format. In the source table
timestamp values come as BIGINT. In the INSERT SELECT, we use function
"from_unixtime(sourceTable.TS)" to convert the BIGINT values to timestamp
in the target ORC table. So the first INSERT SELECT in to non-partitioned
table looks like this

1) INSERT INTO TARGET SELECT from_unixtime(ts), col1, col2... from SOURCE.

I wanted to test by partitioning the table by date derived from this
timestamp, so I used "to_date(from_unixtime(TS))" in the new INSERT SELECT
with dynamic partitioning. The second one is

2) INSERT INTO TARGET PARTITION(datecol) SELECT from_unixtime(ts), col1,
col2... to_date(from_unixtime(ts)) as datecol from SOURCE.

The load time increased by 50% from 1 to 2. I understand the second
statement involves creating many more partition directories and files.

Is there anyway we can improve the load time? In the second INSERT SELECT,
will the result of the expression "from_unixtime(ts)" be reused in
"to_date(from_unixtime(ts))"?

Thanks,
Naveen


Re: Review Request 51895: HIVE-14714 - Finishing Hive on Spark causes "java.io.IOException: Stream closed"

2016-09-15 Thread Gabor Szadovszky

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51895/
---

(Updated Sept. 15, 2016, 3:27 p.m.)


Review request for hive, Chaoyu Tang, Naveen Gangam, and Barna Zsombor Klara.


Changes
---

Fix of unit test failure: Previously there was a 10s delay after invoking 
protocol.endSession. After my fix there is no such big delay if there are no 
text printed on the streams of the process therefore, the unit tests failed as 
the session was not closed properly between the tests. Added code to wait for 
the ending of the session.


Repository: hive-git


Description
---

HIVE-14714 - Finishing Hive on Spark causes "java.io.IOException: Stream closed"


Diffs (updated)
-

  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
e8ca42aa22f0b312e009bea19e39adc8bd31e2b4 

Diff: https://reviews.apache.org/r/51895/diff/


Testing
---

As the modification result is related to logging and the spark job submission 
it would require too much efforts to create unit tests.

Tested manually by "highjacking" $SPARK_HOME/bin/spark-submit script to 
reproduce the following scenarios:
- The submit process does not exit after the RemoteDriver stopped
  - Generating some output for less time than the actual redirector timeout
  - Generating output for more time than the actual redirector timeout
- The submit process ends properly after the RemoteDriver stopped

Expected behavior: After ending the actual session the client exits immediately 
(beeline). All the stdout/stderr of the RemoteDriver are captured properly in 
the hive.log until the redirector timeout.


Thanks,

Gabor Szadovszky



[GitHub] hive pull request #91: HIVE-14249: Add simple materialized views with manual...

2016-09-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/hive/pull/91


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Review Request 51895: HIVE-14714 - Finishing Hive on Spark causes "java.io.IOException: Stream closed"

2016-09-15 Thread Barna Zsombor Klara


> On Sept. 15, 2016, 9:20 a.m., Barna Zsombor Klara wrote:
> > spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java,
> >  line 708
> > 
> >
> > Wouldn't lineBuilder.indexOf(String.valueOf('\n')) work as well?
> 
> Gabor Szadovszky wrote:
> We have to search for '\n' all the time so it might worth having a bit 
> more complex code for performance. What do you think?

Good point. We can keep it like this I think.


- Barna Zsombor


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51895/#review149035
---


On Sept. 14, 2016, 4:54 p.m., Gabor Szadovszky wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51895/
> ---
> 
> (Updated Sept. 14, 2016, 4:54 p.m.)
> 
> 
> Review request for hive, Chaoyu Tang, Naveen Gangam, and Barna Zsombor Klara.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-14714 - Finishing Hive on Spark causes "java.io.IOException: Stream 
> closed"
> 
> 
> Diffs
> -
> 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
> e8ca42aa22f0b312e009bea19e39adc8bd31e2b4 
> 
> Diff: https://reviews.apache.org/r/51895/diff/
> 
> 
> Testing
> ---
> 
> As the modification result is related to logging and the spark job submission 
> it would require too much efforts to create unit tests.
> 
> Tested manually by "highjacking" $SPARK_HOME/bin/spark-submit script to 
> reproduce the following scenarios:
> - The submit process does not exit after the RemoteDriver stopped
>   - Generating some output for less time than the actual redirector timeout
>   - Generating output for more time than the actual redirector timeout
> - The submit process ends properly after the RemoteDriver stopped
> 
> Expected behavior: After ending the actual session the client exits 
> immediately (beeline). All the stdout/stderr of the RemoteDriver are captured 
> properly in the hive.log until the redirector timeout.
> 
> 
> Thanks,
> 
> Gabor Szadovszky
> 
>



Re: Review Request 51895: HIVE-14714 - Finishing Hive on Spark causes "java.io.IOException: Stream closed"

2016-09-15 Thread Gabor Szadovszky


> On Sept. 15, 2016, 9:20 a.m., Barna Zsombor Klara wrote:
> > spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java,
> >  line 687
> > 
> >
> > Since we have 2 redirectors maybe also log out which one we are in.

The log contains the name of the actual thread (stdout-redir-1 or 
stderr-redir-1).


> On Sept. 15, 2016, 9:20 a.m., Barna Zsombor Klara wrote:
> > spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java,
> >  line 695
> > 
> >
> > Since we have 2 redirectors maybe also log out which one we are in.

The log contains the name of the actual thread (stdout-redir-1 or 
stderr-redir-1).


> On Sept. 15, 2016, 9:20 a.m., Barna Zsombor Klara wrote:
> > spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java,
> >  line 708
> > 
> >
> > Wouldn't lineBuilder.indexOf(String.valueOf('\n')) work as well?

We have to search for '\n' all the time so it might worth having a bit more 
complex code for performance. What do you think?


> On Sept. 15, 2016, 9:20 a.m., Barna Zsombor Klara wrote:
> > spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java,
> >  line 671
> > 
> >
> > What would happen if the child process is killed while we are inside 
> > this while loop (so after the BufferedReader#ready check)? Wouldn't we get 
> > a stream closed exception on line 674?

It is a good point. I'll add a code part to the catch clause of run() so in any 
case we will flush the lines from the buffer.


- Gabor


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51895/#review149035
---


On Sept. 14, 2016, 4:54 p.m., Gabor Szadovszky wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51895/
> ---
> 
> (Updated Sept. 14, 2016, 4:54 p.m.)
> 
> 
> Review request for hive, Chaoyu Tang, Naveen Gangam, and Barna Zsombor Klara.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-14714 - Finishing Hive on Spark causes "java.io.IOException: Stream 
> closed"
> 
> 
> Diffs
> -
> 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
> e8ca42aa22f0b312e009bea19e39adc8bd31e2b4 
> 
> Diff: https://reviews.apache.org/r/51895/diff/
> 
> 
> Testing
> ---
> 
> As the modification result is related to logging and the spark job submission 
> it would require too much efforts to create unit tests.
> 
> Tested manually by "highjacking" $SPARK_HOME/bin/spark-submit script to 
> reproduce the following scenarios:
> - The submit process does not exit after the RemoteDriver stopped
>   - Generating some output for less time than the actual redirector timeout
>   - Generating output for more time than the actual redirector timeout
> - The submit process ends properly after the RemoteDriver stopped
> 
> Expected behavior: After ending the actual session the client exits 
> immediately (beeline). All the stdout/stderr of the RemoteDriver are captured 
> properly in the hive.log until the redirector timeout.
> 
> 
> Thanks,
> 
> Gabor Szadovszky
> 
>



Re: Review Request 51895: HIVE-14714 - Finishing Hive on Spark causes "java.io.IOException: Stream closed"

2016-09-15 Thread Barna Zsombor Klara

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51895/#review149035
---



LGTM, if you take care of my issues/open questions.


spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
(line 662)


What would happen if the child process is killed while we are inside this 
while loop (so after the BufferedReader#ready check)? Wouldn't we get a stream 
closed exception on line 674?



spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
(line 676)


Since we have 2 redirectors maybe also log out which one we are in.



spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
(line 684)


Since we have 2 redirectors maybe also log out which one we are in.



spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
(line 697)


Wouldn't lineBuilder.indexOf(String.valueOf('\n')) work as well?


- Barna Zsombor Klara


On Sept. 14, 2016, 4:54 p.m., Gabor Szadovszky wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51895/
> ---
> 
> (Updated Sept. 14, 2016, 4:54 p.m.)
> 
> 
> Review request for hive, Chaoyu Tang, Naveen Gangam, and Barna Zsombor Klara.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-14714 - Finishing Hive on Spark causes "java.io.IOException: Stream 
> closed"
> 
> 
> Diffs
> -
> 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
> e8ca42aa22f0b312e009bea19e39adc8bd31e2b4 
> 
> Diff: https://reviews.apache.org/r/51895/diff/
> 
> 
> Testing
> ---
> 
> As the modification result is related to logging and the spark job submission 
> it would require too much efforts to create unit tests.
> 
> Tested manually by "highjacking" $SPARK_HOME/bin/spark-submit script to 
> reproduce the following scenarios:
> - The submit process does not exit after the RemoteDriver stopped
>   - Generating some output for less time than the actual redirector timeout
>   - Generating output for more time than the actual redirector timeout
> - The submit process ends properly after the RemoteDriver stopped
> 
> Expected behavior: After ending the actual session the client exits 
> immediately (beeline). All the stdout/stderr of the RemoteDriver are captured 
> properly in the hive.log until the redirector timeout.
> 
> 
> Thanks,
> 
> Gabor Szadovszky
> 
>



[jira] [Created] (HIVE-14764) Enabling "hive.metastore.metrics.enabled" throws OOM in HiveMetastore

2016-09-15 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created HIVE-14764:
---

 Summary: Enabling "hive.metastore.metrics.enabled" throws OOM in 
HiveMetastore
 Key: HIVE-14764
 URL: https://issues.apache.org/jira/browse/HIVE-14764
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
Priority: Minor


After running some queries with metrics enabled, metastore starts throwing the 
following messages.

{noformat}
Caused by: java.sql.SQLException: java.lang.OutOfMemoryError: GC overhead limit 
exceeded
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1075)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:989)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:984)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:929)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:433)
at 
com.mysql.jdbc.PreparedStatement.getInstance(PreparedStatement.java:877)
at 
com.mysql.jdbc.ConnectionImpl.clientPrepareStatement(ConnectionImpl.java:1489)
at 
com.mysql.jdbc.ConnectionImpl.prepareStatement(ConnectionImpl.java:4343)
at 
com.mysql.jdbc.ConnectionImpl.prepareStatement(ConnectionImpl.java:4242)
at 
com.jolbox.bonecp.ConnectionHandle.prepareStatement(ConnectionHandle.java:1024)
at 
org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:350)
at 
org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:295)
at 
org.datanucleus.store.rdbms.scostore.JoinListStore.listIterator(JoinListStore.java:761)
... 36 more
Nested Throwables StackTrace:
java.sql.SQLException: java.lang.OutOfMemoryError: GC overhead limit exceeded
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1075)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:989)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:984)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:929)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:433)
at 
com.mysql.jdbc.PreparedStatement.getInstance(PreparedStatement.java:877)
at 
com.mysql.jdbc.ConnectionImpl.clientPrepareStatement(ConnectionImpl.java:1489)
at 
com.mysql.jdbc.ConnectionImpl.prepareStatement(ConnectionImpl.java:4343)
at 
com.mysql.jdbc.ConnectionImpl.prepareStatement(ConnectionImpl.java:4242)
at 
com.jolbox.bonecp.ConnectionHandle.prepareStatement(ConnectionHandle.java:1024)
at 
org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:350)
at 
org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:295)
at 
org.datanucleus.store.rdbms.scostore.JoinListStore.listIterator(JoinListStore.java:761)
at 
org.datanucleus.store.rdbms.scostore.AbstractListStore.listIterator(AbstractListStore.java:93)
at 
org.datanucleus.store.rdbms.scostore.AbstractListStore.iterator(AbstractListStore.java:83)
at 
org.datanucleus.store.types.wrappers.backed.List.loadFromStore(List.java:264)
at 
org.datanucleus.store.types.wrappers.backed.List.iterator(List.java:492)
at 
org.apache.hadoop.hive.metastore.ObjectStore.convertToFieldSchemas(ObjectStore.java:1199)
at 
org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1266)
at 
org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1281)
at 
org.apache.hadoop.hive.metastore.ObjectStore.convertToTable(ObjectStore.java:1138)
at 
org.apache.hadoop.hive.metastore.ObjectStore.ensureGetTable(ObjectStore.java:2651)
at 
org.apache.hadoop.hive.metastore.ObjectStore.updatePartitionColumnStatistics(ObjectStore.java:6141)
{noformat}

HiveMetastore uses start/end functions for starting/ending the scope in 
MetricsFactory. In some places in HiveMetastore the function names are not 
matching causing gradual memory leak in metastore with metrics enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14763) Exchange Partition with a table that already contains it

2016-09-15 Thread Konstantinos Kallas (JIRA)
Konstantinos Kallas created HIVE-14763:
--

 Summary: Exchange Partition with a table that already contains it
 Key: HIVE-14763
 URL: https://issues.apache.org/jira/browse/HIVE-14763
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Konstantinos Kallas
Priority: Minor


The EXCHANGE PARTITION command fails to execute if the partition already exists 
in the target table. However this is not the expected result from a command 
named *EXCHANGE* partition. 
+Proposed Improvement+
Change the command to really exchange the partition between the target and the 
source table, meaning that after it the source table should have the 
destination table partition and vice versa.
[Documentation EXCHANGE PARTITION | 
https://cwiki.apache.org/confluence/display/Hive/Exchange+Partition]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Hive ACID table error

2016-09-15 Thread naveen mahadevuni
Looks like this has been addressed in HIVE-11716
.

Thanks,
Naveen

On Thu, Sep 15, 2016 at 5:41 AM, Eugene Koifman 
wrote:

> There should be a full stack trace somewhere either in the client side log
> on the job logs.
> ³serious error² is usually ORC complaining about some sort of data
> corruption.
>
> On 9/14/16, 11:16 AM, "naveen mahadevuni"  wrote:
>
> >Hi Wei,
> >I'm using the hive shell.
> >
> >Thanks,
> >Naveen
> >
> >On Wed, Sep 14, 2016 at 8:01 PM, Wei Zheng 
> wrote:
> >
> >> Hi Naveen,
> >>
> >> Which client are you using? Beeline?
> >>
> >> Thanks,
> >> Wei
> >>
> >> On 9/14/16, 18:25, "naveen mahadevuni"  wrote:
> >>
> >> Hi,
> >>
> >> I'm using Hive 1,.2. From a non-ACID hive session, I performed the
> >> following operations and Hive reports 'serious problem'.
> >>
> >> CREATE TABLE test5(
> >>   i int,
> >>   j int)
> >> CLUSTERED BY (i) INTO 8 BUCKETS
> >> STORED AS ORC
> >> TBLPROPERTIES ('transactional'='true');
> >>
> >> insert into test5 values(1,2);
> >> insert into test5 values(3,4);
> >>
> >> select * from test5; -- Fails reporting serious problem.
> >>
> >> *-->Failed with exception java.io.IOException:java.lang.
> >> RuntimeException:
> >> serious problem*
> >>
> >> Hive documents "Reading/writing to an ACID table from a non-ACID
> >> session is
> >> not allowed.". Can a better message be reported rather than 'serious
> >> problem'?
> >>
> >> Thanks,
> >> Naveen
> >>
> >>
> >>
>
>