date:20150415

[jira] [Commented] (HIVE-9580) Server returns incorrect result from JOIN ON VARCHAR columns

2015-04-15 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496203#comment-14496203
 ] 

Aihua Xu commented on HIVE-9580:


Attached the new patch to fix testCliDriver_mapjoin_decimal unit test failure. 
The other failures seem to be unrelated.

 Server returns incorrect result from JOIN ON VARCHAR columns
 

 Key: HIVE-9580
 URL: https://issues.apache.org/jira/browse/HIVE-9580
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0, 0.13.0, 0.14.0
Reporter: Mike
Assignee: Aihua Xu
 Attachments: HIVE-9580.patch


 The database erroneously returns rows when joining two tables which each 
 contain a VARCHAR column and the join's ON condition uses the equality 
 operator on the VARCHAR columns.
 **The following JDBC method exhibits the problem:
   static void joinIssue() 
   throws SQLException {
   
   String sql;
   int rowsAffected;
   ResultSet rs;
   Statement stmt = con.createStatement();
   String table1_Name = blahtab1;
   String table1A_Name = blahtab1A;
   String table1B_Name = blahtab1B;
   String table2_Name = blahtab2;
   
   try {
   sql = drop table  + table1_Name;
   System.out.println(\nsql= + sql);
   rowsAffected = stmt.executeUpdate(sql);
   }
   catch (SQLException se) {
   println(Drop table error: + se.getMessage());
   }
   try {
   sql = CREATE TABLE  + table1_Name + ( +
   VCHARCOL VARCHAR(10)  +
   ,INTEGERCOL INT  +
   ) 
   ;
   System.out.println(\nsql= + sql);
   rowsAffected = stmt.executeUpdate(sql);
   }
   catch (SQLException se) {
   println(create table error: + se.getMessage());
   }
   
   sql = insert into  + table1_Name +  values ('jklmnopqrs', 
 99);
   System.out.println(\nsql= + sql);
   stmt.executeUpdate(sql);
   
   
 System.out.println(===);
   
   try {
   sql = drop table  + table1A_Name;
   System.out.println(\nsql= + sql);
   rowsAffected = stmt.executeUpdate(sql);
   }
   catch (SQLException se) {
   println(Drop table error: + se.getMessage());
   }
   try {
   sql = CREATE TABLE  + table1A_Name + ( +
   VCHARCOL VARCHAR(10)  +
   ) 
   ;
   System.out.println(\nsql= + sql);
   rowsAffected = stmt.executeUpdate(sql);
   }
   catch (SQLException se) {
   println(create table error: + se.getMessage());
   }
   
   sql = insert into  + table1A_Name +  values ('jklmnopqrs');
   System.out.println(\nsql= + sql);
   stmt.executeUpdate(sql);
   
 System.out.println(===);
   
   try {
   sql = drop table  + table1B_Name;
   System.out.println(\nsql= + sql);
   rowsAffected = stmt.executeUpdate(sql);
   }
   catch (SQLException se) {
   println(Drop table error: + se.getMessage());
   }
   try {
   sql = CREATE TABLE  + table1B_Name + ( +
   VCHARCOL VARCHAR(11)  +
   ,INTEGERCOL INT  +
   ) 
   ;
   System.out.println(\nsql= + sql);
   rowsAffected = stmt.executeUpdate(sql);
   }
   catch (SQLException se) {
   println(create table error: + se.getMessage());
   }
   
   sql = insert into  + table1B_Name +  values ('jklmnopqrs', 
 99);

[jira] [Commented] (HIVE-9252) Linking custom SerDe jar to table definition.

2015-04-15 Thread Niels Basjes (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496092#comment-14496092
 ] 

Niels Basjes commented on HIVE-9252:


After the initial patch I no longer see anything happening. What is the status?

 Linking custom SerDe jar to table definition.
 -

 Key: HIVE-9252
 URL: https://issues.apache.org/jira/browse/HIVE-9252
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Niels Basjes
Assignee: Ferdinand Xu
 Attachments: HIVE-9252.1.patch


 In HIVE-6047 the option was created that a jar file can be hooked to the 
 definition of a function. (See: [Language Manual DDL: Permanent 
 Functions|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-PermanentFunctions]
  )
 I propose to add something similar that can be used when defining an external 
 table that relies on a custom Serde (I expect to usually only have the 
 Deserializer).
 Something like this:
 {code}
 CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
 ...
 STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] 
 [USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ];
 {code}
 Using this you can define (and share !!!) a Hive table on top of a custom 
 fileformat without the need to let the IT operations people deploy a custom 
 SerDe jar file on all nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10306) We need to print tez summary when hive.server2.logging.level = PERFORMANCE.

2015-04-15 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496097#comment-14496097
 ] 

Hive QA commented on HIVE-10306:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12725477/HIVE-10306.4.patch

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 8694 tests 
executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a 
TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did 
not produce a TEST-*.xml file
TestOperationLoggingAPIBase - did not produce a TEST-*.xml file
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3441/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3441/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3441/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12725477 - PreCommit-HIVE-TRUNK-Build

 We need to print tez summary when hive.server2.logging.level = PERFORMANCE. 
 -

 Key: HIVE-10306
 URL: https://issues.apache.org/jira/browse/HIVE-10306
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-10306.1.patch, HIVE-10306.2.patch, 
 HIVE-10306.3.patch, HIVE-10306.4.patch


 We need to print tez summary when hive.server2.logging.level = PERFORMANCE. 
 We introduced this parameter via HIVE-10119.
 The logging param for levels is only relevant to HS2, so for hive-cli users 
 the hive.tez.exec.print.summary still makes sense. We can check for log-level 
 param as well, in places we are checking value of 
 hive.tez.exec.print.summary. Ie, consider hive.tez.exec.print.summary=true if 
 log.level = PERFORMANCE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9917) After HIVE-3454 is done, make int to timestamp conversion configurable

2015-04-15 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-9917:
---
Attachment: HIVE-9917.patch

 After HIVE-3454 is done, make int to timestamp conversion configurable
 --

 Key: HIVE-9917
 URL: https://issues.apache.org/jira/browse/HIVE-9917
 Project: Hive
  Issue Type: Improvement
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-9917.patch


 After HIVE-3454 is fixed, we will have correct behavior of converting int to 
 timestamp. While the customers are using such incorrect behavior for so long, 
 better to make it configurable so that in one release, it will default to 
 old/inconsistent way and the next release will default to new/consistent way. 
 And then we will deprecate it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-04-15 Thread Damien Carol (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Damien Carol updated HIVE-10036:

Labels: orcfile (was: )

Writing ORC format big table causes OOM - too many fixed sized stream buffers
-

Key: HIVE-10036
URL: https://issues.apache.org/jira/browse/HIVE-10036
Project: Hive
Issue Type: Improvement
Reporter: Selina Zhang
Assignee: Selina Zhang
Labels: orcfile
Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch,
HIVE-10036.3.patch, HIVE-10036.5.patch, HIVE-10036.6.patch

ORC writer keeps multiple out steams for each column. Each output stream is
allocated fixed size ByteBuffer (configurable, default to 256K). For a big
table, the memory cost is unbearable. Specially when HCatalog dynamic
partition involves, several hundreds files may be open and writing at the
same time (same problems for FileSinkOperator).
Global ORC memory manager controls the buffer size, but it only got kicked in
at 5000 rows interval. An enhancement could be done here, but the problem is
reducing the buffer size introduces worse compression and more IOs in read
path. Sacrificing the read performance is always not a good choice.
I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound
to the existing configurable buffer size. Most of the streams does not need
large buffer so the performance got improved significantly. Comparing to
Facebook's hive-dwrf, I monitored 2x performance gain with this fix.
Solving OOM for ORC completely maybe needs lots of effort , but this is
definitely a low hanging fruit.

1 2 >

1 - 100 of 126 matches

Mail list logo