from:"Shreepadma Venugopalan \(JIRA\)"

[jira] [Commented] (HIVE-6701) Analyze table compute statistics for decimal columns.

2014-03-21 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943626#comment-13943626
 ] 

Shreepadma Venugopalan commented on HIVE-6701:
--

The extra unused field were added in HIVE-1362 precisely to avoid upgrading the 
schema.

 Analyze table compute statistics for decimal columns.
 -

 Key: HIVE-6701
 URL: https://issues.apache.org/jira/browse/HIVE-6701
 Project: Hive
  Issue Type: Bug
Reporter: Jitendra Nath Pandey
Assignee: Sergey Shelukhin
 Attachments: HIVE-6701.02.patch, HIVE-6701.1.patch


 Analyze table should compute statistics for decimal columns as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6157) Fetching column stats slower than the 101 during rush hour

2014-01-19 Thread Shreepadma Venugopalan (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876140#comment-13876140
]

Shreepadma Venugopalan commented on HIVE-6157:
--

Currently, the API fetches statistics for a given column.
hive.stats.fetch.column.stats fetches stats for all columns for all partitions
in all tables. Bad idea. HIVE-4301 was filed to support a bulk fetch API so
that stats for all columns for all partitions in multiple tables can be fetched
with a single call. Feel free to pick up HIVE-4301.

Fetching column stats slower than the 101 during rush hour
--

Key: HIVE-6157
URL: https://issues.apache.org/jira/browse/HIVE-6157
Project: Hive
Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Gunther Hagleitner
Assignee: Sergey Shelukhin
Attachments: HIVE-6157.prelim.patch

hive.stats.fetch.column.stats controls whether the column stats for a table
are fetched during explain (in Tez: during query planning). On my setup (1
table 4000 partitions, 24 columns) the time spent in semantic analyze goes
from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent
fetching column stats...
The reason is probably that the APIs force you to make separate metastore
calls for each column in each partition. That's probably the first thing that
has to change. The question is if in addition to that we need to cache this
in the client or store the stats as a single blob in the database to further
cut down on the time. However, the way it stands right now column stats seem
unusable.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HIVE-5780) Add the missing declaration of HIVE_CLI_SERVICE_PROTOCOL_V4 in TCLIService.thrift

2013-11-07 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816940#comment-13816940
 ] 

Shreepadma Venugopalan commented on HIVE-5780:
--

Regenerating the thrift bindings fails without this patch. Thanks for putting 
this patch together.

 Add the missing declaration of HIVE_CLI_SERVICE_PROTOCOL_V4 in 
 TCLIService.thrift
 -

 Key: HIVE-5780
 URL: https://issues.apache.org/jira/browse/HIVE-5780
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Attachments: HIVE-5780.1.patch


 TCLIService.thrift  is updated as part of HIVE-5355. The new enum 
  HIVE_CLI_SERVICE_PROTOCOL_V4 is referred in the file, but the declaration is 
 missing.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory

2013-10-21 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801070#comment-13801070
 ] 

Shreepadma Venugopalan commented on HIVE-4957:
--

Thanks, Brock!

 Restrict number of bit vectors, to prevent out of Java heap memory
 --

 Key: HIVE-4957
 URL: https://issues.apache.org/jira/browse/HIVE-4957
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Brock Noland
Assignee: Shreepadma Venugopalan
 Fix For: 0.13.0

 Attachments: HIVE-4957.1.patch, HIVE-4957.2.patch


 normally increase number of bit vectors will increase calculation accuracy. 
 Let's say
 {noformat}
 select compute_stats(a, 40) from test_hive;
 {noformat}
 generally get better accuracy than
 {noformat}
 select compute_stats(a, 16) from test_hive;
 {noformat}
 But larger number of bit vectors also cause query run slower. When number of 
 bit vectors over 50, it won't help to increase accuracy anymore. But it still 
 increase memory usage, and crash Hive if number if too huge. Current Hive 
 doesn't prevent user use ridiculous large number of bit vectors in 
 'compute_stats' query.
 One example
 {noformat}
 select compute_stats(a, 9) from column_eight_types;
 {noformat}
 crashes Hive.
 {noformat}
 2012-12-20 23:21:52,247 Stage-1 map = 0%,  reduce = 0%
 2012-12-20 23:22:11,315 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.29 
 sec
 MapReduce Total cumulative CPU time: 290 msec
 Ended Job = job_1354923204155_0777 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/
 Examining task ID: task_1354923204155_0777_m_00 (and more) from job 
 job_1354923204155_0777
 Task with the most failures(4): 
 -
 Task ID:
   task_1354923204155_0777_m_00
 URL:
   
 http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00
 -
 Diagnostic Messages for this Task:
 Error: Java heap space
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (HIVE-5536) Incorrect Operation Name is passed to hookcontext

2013-10-14 Thread Shreepadma Venugopalan (JIRA)

Shreepadma Venugopalan created HIVE-5536:


 Summary: Incorrect Operation Name is passed to hookcontext
 Key: HIVE-5536
 URL: https://issues.apache.org/jira/browse/HIVE-5536
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0, 0.12.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan


HS2 passes incorrect operation name to hookcontext. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HIVE-4669) Make username available to semantic analyzer hooks

2013-10-04 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786375#comment-13786375
 ] 

Shreepadma Venugopalan commented on HIVE-4669:
--

Thank you Brock!

 Make username available to semantic analyzer hooks
 --

 Key: HIVE-4669
 URL: https://issues.apache.org/jira/browse/HIVE-4669
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0, 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.13.0

 Attachments: HIVE-4669.1.patch, HIVE-4669.2.patch, HIVE-4669.3.patch, 
 HIVE-4669.4.patch


 Make username available to the semantic analyzer hooks.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HIVE-4669) Make username available to semantic analyzer hooks

2013-10-02 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4669:
-

Status: Open  (was: Patch Available)

 Make username available to semantic analyzer hooks
 --

 Key: HIVE-4669
 URL: https://issues.apache.org/jira/browse/HIVE-4669
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0, 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4669.1.patch, HIVE-4669.2.patch, HIVE-4669.3.patch


 Make username available to the semantic analyzer hooks.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HIVE-4669) Make username available to semantic analyzer hooks

2013-10-02 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4669:
-

Status: Patch Available  (was: Open)

 Make username available to semantic analyzer hooks
 --

 Key: HIVE-4669
 URL: https://issues.apache.org/jira/browse/HIVE-4669
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0, 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4669.1.patch, HIVE-4669.2.patch, HIVE-4669.3.patch


 Make username available to the semantic analyzer hooks.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HIVE-4669) Make username available to semantic analyzer hooks

2013-10-02 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4669:
-

Attachment: HIVE-4669.3.patch

 Make username available to semantic analyzer hooks
 --

 Key: HIVE-4669
 URL: https://issues.apache.org/jira/browse/HIVE-4669
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0, 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4669.1.patch, HIVE-4669.2.patch, HIVE-4669.3.patch


 Make username available to the semantic analyzer hooks.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HIVE-4669) Make username available to semantic analyzer hooks

2013-10-02 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784352#comment-13784352
 ] 

Shreepadma Venugopalan commented on HIVE-4669:
--

Attached a new patch with the changes. Not sure if we'd have had to modify the 
patch except to remove {noformat} this.userName = userName {noformat} from 
Driver.java.


 Make username available to semantic analyzer hooks
 --

 Key: HIVE-4669
 URL: https://issues.apache.org/jira/browse/HIVE-4669
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0, 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4669.1.patch, HIVE-4669.2.patch, HIVE-4669.3.patch


 Make username available to the semantic analyzer hooks.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HIVE-4669) Make username available to semantic analyzer hooks

2013-10-02 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4669:
-

Attachment: HIVE-4669.4.patch

 Make username available to semantic analyzer hooks
 --

 Key: HIVE-4669
 URL: https://issues.apache.org/jira/browse/HIVE-4669
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0, 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4669.1.patch, HIVE-4669.2.patch, HIVE-4669.3.patch, 
 HIVE-4669.4.patch


 Make username available to the semantic analyzer hooks.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HIVE-4669) Make username available to semantic analyzer hooks

2013-10-02 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784371#comment-13784371
 ] 

Shreepadma Venugopalan commented on HIVE-4669:
--

No worries. Let's make sure the new code is clean. Uploaded a new patch.

 Make username available to semantic analyzer hooks
 --

 Key: HIVE-4669
 URL: https://issues.apache.org/jira/browse/HIVE-4669
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0, 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4669.1.patch, HIVE-4669.2.patch, HIVE-4669.3.patch, 
 HIVE-4669.4.patch


 Make username available to the semantic analyzer hooks.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory

2013-10-01 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4957:
-

Attachment: HIVE-4957.2.patch

 Restrict number of bit vectors, to prevent out of Java heap memory
 --

 Key: HIVE-4957
 URL: https://issues.apache.org/jira/browse/HIVE-4957
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Brock Noland
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4957.1.patch, HIVE-4957.2.patch


 normally increase number of bit vectors will increase calculation accuracy. 
 Let's say
 {noformat}
 select compute_stats(a, 40) from test_hive;
 {noformat}
 generally get better accuracy than
 {noformat}
 select compute_stats(a, 16) from test_hive;
 {noformat}
 But larger number of bit vectors also cause query run slower. When number of 
 bit vectors over 50, it won't help to increase accuracy anymore. But it still 
 increase memory usage, and crash Hive if number if too huge. Current Hive 
 doesn't prevent user use ridiculous large number of bit vectors in 
 'compute_stats' query.
 One example
 {noformat}
 select compute_stats(a, 9) from column_eight_types;
 {noformat}
 crashes Hive.
 {noformat}
 2012-12-20 23:21:52,247 Stage-1 map = 0%,  reduce = 0%
 2012-12-20 23:22:11,315 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.29 
 sec
 MapReduce Total cumulative CPU time: 290 msec
 Ended Job = job_1354923204155_0777 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/
 Examining task ID: task_1354923204155_0777_m_00 (and more) from job 
 job_1354923204155_0777
 Task with the most failures(4): 
 -
 Task ID:
   task_1354923204155_0777_m_00
 URL:
   
 http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00
 -
 Diagnostic Messages for this Task:
 Error: Java heap space
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory

2013-10-01 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783358#comment-13783358
 ] 

Shreepadma Venugopalan commented on HIVE-4957:
--

New patch addresses review comments.

 Restrict number of bit vectors, to prevent out of Java heap memory
 --

 Key: HIVE-4957
 URL: https://issues.apache.org/jira/browse/HIVE-4957
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Brock Noland
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4957.1.patch, HIVE-4957.2.patch


 normally increase number of bit vectors will increase calculation accuracy. 
 Let's say
 {noformat}
 select compute_stats(a, 40) from test_hive;
 {noformat}
 generally get better accuracy than
 {noformat}
 select compute_stats(a, 16) from test_hive;
 {noformat}
 But larger number of bit vectors also cause query run slower. When number of 
 bit vectors over 50, it won't help to increase accuracy anymore. But it still 
 increase memory usage, and crash Hive if number if too huge. Current Hive 
 doesn't prevent user use ridiculous large number of bit vectors in 
 'compute_stats' query.
 One example
 {noformat}
 select compute_stats(a, 9) from column_eight_types;
 {noformat}
 crashes Hive.
 {noformat}
 2012-12-20 23:21:52,247 Stage-1 map = 0%,  reduce = 0%
 2012-12-20 23:22:11,315 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.29 
 sec
 MapReduce Total cumulative CPU time: 290 msec
 Ended Job = job_1354923204155_0777 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/
 Examining task ID: task_1354923204155_0777_m_00 (and more) from job 
 job_1354923204155_0777
 Task with the most failures(4): 
 -
 Task ID:
   task_1354923204155_0777_m_00
 URL:
   
 http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00
 -
 Diagnostic Messages for this Task:
 Error: Java heap space
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HIVE-4669) Make username available to semantic analyzer hooks

2013-10-01 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4669:
-

Status: Open  (was: Patch Available)

 Make username available to semantic analyzer hooks
 --

 Key: HIVE-4669
 URL: https://issues.apache.org/jira/browse/HIVE-4669
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0, 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4669.1.patch, HIVE-4669.2.patch


 Make username available to the semantic analyzer hooks.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HIVE-4669) Make username available to semantic analyzer hooks

2013-10-01 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4669:
-

Status: Patch Available  (was: Open)

 Make username available to semantic analyzer hooks
 --

 Key: HIVE-4669
 URL: https://issues.apache.org/jira/browse/HIVE-4669
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0, 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4669.1.patch, HIVE-4669.2.patch


 Make username available to the semantic analyzer hooks.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HIVE-4669) Make username available to semantic analyzer hooks

2013-10-01 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4669:
-

Attachment: HIVE-4669.2.patch

 Make username available to semantic analyzer hooks
 --

 Key: HIVE-4669
 URL: https://issues.apache.org/jira/browse/HIVE-4669
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0, 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4669.1.patch, HIVE-4669.2.patch


 Make username available to the semantic analyzer hooks.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HIVE-4669) Make username available to semantic analyzer hooks

2013-10-01 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783433#comment-13783433
 ] 

Shreepadma Venugopalan commented on HIVE-4669:
--

Attached new patch rebased to the tip of trunk.

 Make username available to semantic analyzer hooks
 --

 Key: HIVE-4669
 URL: https://issues.apache.org/jira/browse/HIVE-4669
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0, 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4669.1.patch, HIVE-4669.2.patch


 Make username available to the semantic analyzer hooks.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HIVE-4669) Make username available to semantic analyzer hooks

2013-09-26 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13779271#comment-13779271
 ] 

Shreepadma Venugopalan commented on HIVE-4669:
--

Is there anything else needed from my side?

 Make username available to semantic analyzer hooks
 --

 Key: HIVE-4669
 URL: https://issues.apache.org/jira/browse/HIVE-4669
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0, 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4669.1.patch


 Make username available to the semantic analyzer hooks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4670) Authentication module should pass the instance part of the Kerberos principle

2013-09-26 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13779269#comment-13779269
 ] 

Shreepadma Venugopalan commented on HIVE-4670:
--

Is there anything else needed from my side? 

 Authentication module should pass the instance part of the Kerberos principle
 -

 Key: HIVE-4670
 URL: https://issues.apache.org/jira/browse/HIVE-4670
 Project: Hive
  Issue Type: Bug
  Components: Authentication, HiveServer2
Affects Versions: 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4670.2.patch, HIVE-4670.3.patch


 When Kerberos authentication is enabled for HiveServer2, the thrift SASL 
 layer passes instance@realm from the principal. It should instead strip the 
 realm and pass just the instance part of the principal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4629) HS2 should support an API to retrieve query logs

2013-09-25 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13778224#comment-13778224
 ] 

Shreepadma Venugopalan commented on HIVE-4629:
--

I'm able to apply the patch with -p0 to the tip of trunk. I've re-attached the 
patch to trigger a run.

 HS2 should support an API to retrieve query logs
 

 Key: HIVE-4629
 URL: https://issues.apache.org/jira/browse/HIVE-4629
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4629.1.patch, HIVE-4629.2.patch, 
 HIVE-4629-no_thrift.1.patch


 HiveServer2 should support an API to retrieve query logs. This is 
 particularly relevant because HiveServer2 supports async execution but 
 doesn't provide a way to report progress. Providing an API to retrieve query 
 logs will help report progress to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4629) HS2 should support an API to retrieve query logs

2013-09-25 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4629:
-

Attachment: HIVE-4629.2.patch

 HS2 should support an API to retrieve query logs
 

 Key: HIVE-4629
 URL: https://issues.apache.org/jira/browse/HIVE-4629
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4629.1.patch, HIVE-4629.2.patch, 
 HIVE-4629-no_thrift.1.patch


 HiveServer2 should support an API to retrieve query logs. This is 
 particularly relevant because HiveServer2 supports async execution but 
 doesn't provide a way to report progress. Providing an API to retrieve query 
 logs will help report progress to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4669) Make username available to semantic analyzer hooks

2013-09-24 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776701#comment-13776701
 ] 

Shreepadma Venugopalan commented on HIVE-4669:
--

My apologies for not responding earlier. We need this for integrating Sentry 
with Hive.



 Make username available to semantic analyzer hooks
 --

 Key: HIVE-4669
 URL: https://issues.apache.org/jira/browse/HIVE-4669
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0, 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4669.1.patch


 Make username available to the semantic analyzer hooks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4670) Authentication module should pass the instance part of the Kerberos principle

2013-09-24 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776709#comment-13776709
 ] 

Shreepadma Venugopalan commented on HIVE-4670:
--

Apologies for not responding sooner. We need this for integrating Sentry with 
Hive. Users of Sentry prefer to mention the username without the realm when 
grating privileges.

 Authentication module should pass the instance part of the Kerberos principle
 -

 Key: HIVE-4670
 URL: https://issues.apache.org/jira/browse/HIVE-4670
 Project: Hive
  Issue Type: Bug
  Components: Authentication, HiveServer2
Affects Versions: 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4670.2.patch, HIVE-4670.3.patch


 When Kerberos authentication is enabled for HiveServer2, the thrift SASL 
 layer passes instance@realm from the principal. It should instead strip the 
 realm and pass just the instance part of the principal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4629) HS2 should support an API to retrieve query logs

2013-09-24 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4629:
-

Status: Patch Available  (was: In Progress)

 HS2 should support an API to retrieve query logs
 

 Key: HIVE-4629
 URL: https://issues.apache.org/jira/browse/HIVE-4629
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan

 HiveServer2 should support an API to retrieve query logs. This is 
 particularly relevant because HiveServer2 supports async execution but 
 doesn't provide a way to report progress. Providing an API to retrieve query 
 logs will help report progress to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4629) HS2 should support an API to retrieve query logs

2013-09-24 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4629:
-

Attachment: HIVE-4629.1.patch

 HS2 should support an API to retrieve query logs
 

 Key: HIVE-4629
 URL: https://issues.apache.org/jira/browse/HIVE-4629
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4629.1.patch


 HiveServer2 should support an API to retrieve query logs. This is 
 particularly relevant because HiveServer2 supports async execution but 
 doesn't provide a way to report progress. Providing an API to retrieve query 
 logs will help report progress to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4629) HS2 should support an API to retrieve query logs

2013-09-24 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4629:
-

Attachment: HIVE-4629-no_thrift.1.patch

 HS2 should support an API to retrieve query logs
 

 Key: HIVE-4629
 URL: https://issues.apache.org/jira/browse/HIVE-4629
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4629.1.patch, HIVE-4629-no_thrift.1.patch


 HiveServer2 should support an API to retrieve query logs. This is 
 particularly relevant because HiveServer2 supports async execution but 
 doesn't provide a way to report progress. Providing an API to retrieve query 
 logs will help report progress to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4629) HS2 should support an API to retrieve query logs

2013-09-24 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776969#comment-13776969
 ] 

Shreepadma Venugopalan commented on HIVE-4629:
--

Review board: https://reviews.apache.org/r/14326/

 HS2 should support an API to retrieve query logs
 

 Key: HIVE-4629
 URL: https://issues.apache.org/jira/browse/HIVE-4629
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4629.1.patch, HIVE-4629-no_thrift.1.patch


 HiveServer2 should support an API to retrieve query logs. This is 
 particularly relevant because HiveServer2 supports async execution but 
 doesn't provide a way to report progress. Providing an API to retrieve query 
 logs will help report progress to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4629) HS2 should support an API to retrieve query logs

2013-09-24 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4629:
-

Attachment: HIVE-4629.1.patch

 HS2 should support an API to retrieve query logs
 

 Key: HIVE-4629
 URL: https://issues.apache.org/jira/browse/HIVE-4629
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4629.1.patch, HIVE-4629-no_thrift.1.patch


 HiveServer2 should support an API to retrieve query logs. This is 
 particularly relevant because HiveServer2 supports async execution but 
 doesn't provide a way to report progress. Providing an API to retrieve query 
 logs will help report progress to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4629) HS2 should support an API to retrieve query logs

2013-09-24 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4629:
-

Attachment: (was: HIVE-4629.1.patch)

 HS2 should support an API to retrieve query logs
 

 Key: HIVE-4629
 URL: https://issues.apache.org/jira/browse/HIVE-4629
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4629.1.patch, HIVE-4629-no_thrift.1.patch


 HiveServer2 should support an API to retrieve query logs. This is 
 particularly relevant because HiveServer2 supports async execution but 
 doesn't provide a way to report progress. Providing an API to retrieve query 
 logs will help report progress to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5330) Pass query text and IPAddress to SemanticAnalyzerHooks

2013-09-20 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-5330:
-

Attachment: HIVE-5330.1.patch

 Pass query text and IPAddress to SemanticAnalyzerHooks
 --

 Key: HIVE-5330
 URL: https://issues.apache.org/jira/browse/HIVE-5330
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-5330.1.patch


 Today, semantic analyzer hooks don't have IPAddress of the client and query 
 text available. Adding these additional pieces of information to the semantic 
 analyzer hook will make auditing useful and meaningful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5330) Pass query text and IPAddress to SemanticAnalyzerHooks

2013-09-20 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-5330:
-

Status: Patch Available  (was: Open)

 Pass query text and IPAddress to SemanticAnalyzerHooks
 --

 Key: HIVE-5330
 URL: https://issues.apache.org/jira/browse/HIVE-5330
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-5330.1.patch


 Today, semantic analyzer hooks don't have IPAddress of the client and query 
 text available. Adding these additional pieces of information to the semantic 
 analyzer hook will make auditing useful and meaningful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5330) Pass query text and IPAddress to SemanticAnalyzerHooks

2013-09-20 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-5330:
-

  Component/s: SQL
Affects Version/s: 0.11.0

 Pass query text and IPAddress to SemanticAnalyzerHooks
 --

 Key: HIVE-5330
 URL: https://issues.apache.org/jira/browse/HIVE-5330
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan

 Today, semantic analyzer hooks don't have IPAddress of the client and query 
 text available. Adding these additional pieces of information to the semantic 
 analyzer hook will make auditing useful and meaningful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-5330) Pass query text and IPAddress to SemanticAnalyzerHooks

2013-09-20 Thread Shreepadma Venugopalan (JIRA)

Shreepadma Venugopalan created HIVE-5330:


 Summary: Pass query text and IPAddress to SemanticAnalyzerHooks
 Key: HIVE-5330
 URL: https://issues.apache.org/jira/browse/HIVE-5330
 Project: Hive
  Issue Type: Bug
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan


Today, semantic analyzer hooks don't have IPAddress of the client and query 
text available. Adding these additional pieces of information to the semantic 
analyzer hook will make auditing useful and meaningful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5330) Pass query text and IPAddress to SemanticAnalyzerHooks

2013-09-20 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-5330:
-

Attachment: (was: HIVE-5330.1.patch)

 Pass query text and IPAddress to SemanticAnalyzerHooks
 --

 Key: HIVE-5330
 URL: https://issues.apache.org/jira/browse/HIVE-5330
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan

 Today, semantic analyzer hooks don't have IPAddress of the client and query 
 text available. Adding these additional pieces of information to the semantic 
 analyzer hook will make auditing useful and meaningful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5330) Pass query text and IPAddress to SemanticAnalyzerHooks

2013-09-20 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-5330:
-

Status: Open  (was: Patch Available)

 Pass query text and IPAddress to SemanticAnalyzerHooks
 --

 Key: HIVE-5330
 URL: https://issues.apache.org/jira/browse/HIVE-5330
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan

 Today, semantic analyzer hooks don't have IPAddress of the client and query 
 text available. Adding these additional pieces of information to the semantic 
 analyzer hook will make auditing useful and meaningful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory

2013-09-20 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773351#comment-13773351
 ] 

Shreepadma Venugopalan commented on HIVE-4957:
--

RB: https://reviews.apache.org/r/14250/

 Restrict number of bit vectors, to prevent out of Java heap memory
 --

 Key: HIVE-4957
 URL: https://issues.apache.org/jira/browse/HIVE-4957
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Brock Noland
Assignee: Shreepadma Venugopalan

 normally increase number of bit vectors will increase calculation accuracy. 
 Let's say
 {noformat}
 select compute_stats(a, 40) from test_hive;
 {noformat}
 generally get better accuracy than
 {noformat}
 select compute_stats(a, 16) from test_hive;
 {noformat}
 But larger number of bit vectors also cause query run slower. When number of 
 bit vectors over 50, it won't help to increase accuracy anymore. But it still 
 increase memory usage, and crash Hive if number if too huge. Current Hive 
 doesn't prevent user use ridiculous large number of bit vectors in 
 'compute_stats' query.
 One example
 {noformat}
 select compute_stats(a, 9) from column_eight_types;
 {noformat}
 crashes Hive.
 {noformat}
 2012-12-20 23:21:52,247 Stage-1 map = 0%,  reduce = 0%
 2012-12-20 23:22:11,315 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.29 
 sec
 MapReduce Total cumulative CPU time: 290 msec
 Ended Job = job_1354923204155_0777 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/
 Examining task ID: task_1354923204155_0777_m_00 (and more) from job 
 job_1354923204155_0777
 Task with the most failures(4): 
 -
 Task ID:
   task_1354923204155_0777_m_00
 URL:
   
 http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00
 -
 Diagnostic Messages for this Task:
 Error: Java heap space
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory

2013-09-20 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4957:
-

Attachment: HIVE-4957.1.patch

 Restrict number of bit vectors, to prevent out of Java heap memory
 --

 Key: HIVE-4957
 URL: https://issues.apache.org/jira/browse/HIVE-4957
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Brock Noland
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4957.1.patch


 normally increase number of bit vectors will increase calculation accuracy. 
 Let's say
 {noformat}
 select compute_stats(a, 40) from test_hive;
 {noformat}
 generally get better accuracy than
 {noformat}
 select compute_stats(a, 16) from test_hive;
 {noformat}
 But larger number of bit vectors also cause query run slower. When number of 
 bit vectors over 50, it won't help to increase accuracy anymore. But it still 
 increase memory usage, and crash Hive if number if too huge. Current Hive 
 doesn't prevent user use ridiculous large number of bit vectors in 
 'compute_stats' query.
 One example
 {noformat}
 select compute_stats(a, 9) from column_eight_types;
 {noformat}
 crashes Hive.
 {noformat}
 2012-12-20 23:21:52,247 Stage-1 map = 0%,  reduce = 0%
 2012-12-20 23:22:11,315 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.29 
 sec
 MapReduce Total cumulative CPU time: 290 msec
 Ended Job = job_1354923204155_0777 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/
 Examining task ID: task_1354923204155_0777_m_00 (and more) from job 
 job_1354923204155_0777
 Task with the most failures(4): 
 -
 Task ID:
   task_1354923204155_0777_m_00
 URL:
   
 http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00
 -
 Diagnostic Messages for this Task:
 Error: Java heap space
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory

2013-09-20 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4957:
-

Status: Patch Available  (was: In Progress)

 Restrict number of bit vectors, to prevent out of Java heap memory
 --

 Key: HIVE-4957
 URL: https://issues.apache.org/jira/browse/HIVE-4957
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Brock Noland
Assignee: Shreepadma Venugopalan

 normally increase number of bit vectors will increase calculation accuracy. 
 Let's say
 {noformat}
 select compute_stats(a, 40) from test_hive;
 {noformat}
 generally get better accuracy than
 {noformat}
 select compute_stats(a, 16) from test_hive;
 {noformat}
 But larger number of bit vectors also cause query run slower. When number of 
 bit vectors over 50, it won't help to increase accuracy anymore. But it still 
 increase memory usage, and crash Hive if number if too huge. Current Hive 
 doesn't prevent user use ridiculous large number of bit vectors in 
 'compute_stats' query.
 One example
 {noformat}
 select compute_stats(a, 9) from column_eight_types;
 {noformat}
 crashes Hive.
 {noformat}
 2012-12-20 23:21:52,247 Stage-1 map = 0%,  reduce = 0%
 2012-12-20 23:22:11,315 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.29 
 sec
 MapReduce Total cumulative CPU time: 290 msec
 Ended Job = job_1354923204155_0777 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/
 Examining task ID: task_1354923204155_0777_m_00 (and more) from job 
 job_1354923204155_0777
 Task with the most failures(4): 
 -
 Task ID:
   task_1354923204155_0777_m_00
 URL:
   
 http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00
 -
 Diagnostic Messages for this Task:
 Error: Java heap space
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5272) Column statistics on a invalid column name results in IndexOutOfBoundsException

2013-09-11 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13764658#comment-13764658
 ] 

Shreepadma Venugopalan commented on HIVE-5272:
--

Thanks, Prasanth. The code in question assumes, incorrectly, that the 
validation done later by the SemanticAnalyzer is sufficient to raise an invalid 
column error. But looks like, the IndexOutOfBounds occurs prior. I think we can 
either fix the if condition in getTableColumnType() or alternatively perform 
the validation early. One of the reasons for deferring the validation was to 
piggyback on the existing logic later during SemanticAnalysis and avoid 
duplicating work. But, the patch you have put together looks simple enough. 

 Column statistics on a invalid column name results in 
 IndexOutOfBoundsException
 ---

 Key: HIVE-5272
 URL: https://issues.apache.org/jira/browse/HIVE-5272
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: statistics
 Fix For: 0.13.0

 Attachments: HIVE-5272.txt


 When invalid column name is specified for column statistics 
 IndexOutOfBoundsException is thrown. 
 {code}hive analyze table customer_staging compute statistics for columns 
 c_first_name, invalid_name, c_customer_sk;
 FAILED: IndexOutOfBoundsException Index: 2, Size: 1{code}
 If the invalid column name appears at first or last then 
 INVALID_COLUMN_REFERENCE is thrown at query planning stage. But if the 
 invalid column name appears somewhere in the middle of column lists then 
 IndexOutOfBoundsException is thrown at semantic analysis step. The problem is 
 with getTableColumnType() and getPartitionColumnType() methods. The following 
 segment 
 {code}for (int i=0; i numCols; i++) {
   colName = colNames.get(i);
   for (FieldSchema col: cols) {
 if (colName.equalsIgnoreCase(col.getName())) {
   colTypes.add(i, new String(col.getType()));
 }
   }
 }{code}
 is the reason for it. If the invalid column names appears in the middle of 
 column list then the equalsIgnoreCase() skips the invalid name and increments 
 the i. Since the list is not initialized it results in exception. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5272) Column statistics on a invalid column name results in IndexOutOfBoundsException

2013-09-11 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13764688#comment-13764688
 ] 

Shreepadma Venugopalan commented on HIVE-5272:
--

In case I wasn't clear, I'm +1 on it.

 Column statistics on a invalid column name results in 
 IndexOutOfBoundsException
 ---

 Key: HIVE-5272
 URL: https://issues.apache.org/jira/browse/HIVE-5272
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: statistics
 Fix For: 0.13.0

 Attachments: HIVE-5272.txt


 When invalid column name is specified for column statistics 
 IndexOutOfBoundsException is thrown. 
 {code}hive analyze table customer_staging compute statistics for columns 
 c_first_name, invalid_name, c_customer_sk;
 FAILED: IndexOutOfBoundsException Index: 2, Size: 1{code}
 If the invalid column name appears at first or last then 
 INVALID_COLUMN_REFERENCE is thrown at query planning stage. But if the 
 invalid column name appears somewhere in the middle of column lists then 
 IndexOutOfBoundsException is thrown at semantic analysis step. The problem is 
 with getTableColumnType() and getPartitionColumnType() methods. The following 
 segment 
 {code}for (int i=0; i numCols; i++) {
   colName = colNames.get(i);
   for (FieldSchema col: cols) {
 if (colName.equalsIgnoreCase(col.getName())) {
   colTypes.add(i, new String(col.getType()));
 }
   }
 }{code}
 is the reason for it. If the invalid column names appears in the middle of 
 column list then the equalsIgnoreCase() skips the invalid name and increments 
 the i. Since the list is not initialized it results in exception. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5240) Column statistics on a partitioned column should fail early with proper error message

2013-09-10 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13763363#comment-13763363
 ] 

Shreepadma Venugopalan commented on HIVE-5240:
--

Thanks, [~ashutoshc]. 

 Column statistics on a partitioned column should fail early with proper error 
 message
 -

 Key: HIVE-5240
 URL: https://issues.apache.org/jira/browse/HIVE-5240
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.12.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: statistics
 Fix For: 0.12.0

 Attachments: HIVE-5240.txt


 When computing column statistics on a partitioned table, if one of the 
 columns equals the partitioned column then IndexOutOfBoundsException is 
 thrown. 
 Following analyze query throws IndexOutOfBoundsException during semantic 
 analysis phase
 {code}hive analyze table qlog_1m_part partition(year=5) compute statistics 
 for columns year,month,week,type;
 FAILED: IndexOutOfBoundsException Index: 1, Size: 0{code} 
 If the partitioned column is specified at last like below then the same 
 exception is thrown at runtime
 {code}hive analyze table qlog_1m_part partition(year=5) compute statistics 
 for columns month,week,type,year;
 Hadoop job information for null: number of mappers: 0; number of reducers: 0
 2013-09-06 18:05:06,587 null map = 0%,  reduce = 100%
 Ended Job = job_local861862820_0001
 Execution completed successfully
 Mapred Local Task Succeeded . Convert the Join into MapJoin
 java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
   at java.util.LinkedList.entry(LinkedList.java:365)
   at java.util.LinkedList.get(LinkedList.java:315)
   at 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask.constructColumnStatsFromPackedRow(ColumnStatsTask.java:262)
   at 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask.persistPartitionStats(ColumnStatsTask.java:302)
   at 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask.execute(ColumnStatsTask.java:345)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1407)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1187)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1017)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:885)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
   at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5240) Column statistics on a partitioned column should fail early with proper error message

2013-09-09 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13762536#comment-13762536
 ] 

Shreepadma Venugopalan commented on HIVE-5240:
--

There is already a JIRA for this issue - HIVE-4426. However, HIVE-4426 aims to 
allow stats collection on the partitioning key. I think this can be useful. 
I'll be able to start working on HIVE-4426 next week. Let me know if there's 
interest. Thanks!

 Column statistics on a partitioned column should fail early with proper error 
 message
 -

 Key: HIVE-5240
 URL: https://issues.apache.org/jira/browse/HIVE-5240
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.12.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: statistics
 Fix For: 0.12.0

 Attachments: HIVE-5240.txt


 When computing column statistics on a partitioned table, if one of the 
 columns equals the partitioned column then IndexOutOfBoundsException is 
 thrown. 
 Following analyze query throws IndexOutOfBoundsException during semantic 
 analysis phase
 {code}hive analyze table qlog_1m_part partition(year=5) compute statistics 
 for columns year,month,week,type;
 FAILED: IndexOutOfBoundsException Index: 1, Size: 0{code} 
 If the partitioned column is specified at last like below then the same 
 exception is thrown at runtime
 {code}hive analyze table qlog_1m_part partition(year=5) compute statistics 
 for columns month,week,type,year;
 Hadoop job information for null: number of mappers: 0; number of reducers: 0
 2013-09-06 18:05:06,587 null map = 0%,  reduce = 100%
 Ended Job = job_local861862820_0001
 Execution completed successfully
 Mapred Local Task Succeeded . Convert the Join into MapJoin
 java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
   at java.util.LinkedList.entry(LinkedList.java:365)
   at java.util.LinkedList.get(LinkedList.java:315)
   at 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask.constructColumnStatsFromPackedRow(ColumnStatsTask.java:262)
   at 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask.persistPartitionStats(ColumnStatsTask.java:302)
   at 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask.execute(ColumnStatsTask.java:345)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1407)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1187)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1017)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:885)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
   at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-1719) Move RegexSerDe out of hive-contrib and over to hive-serde

2013-08-22 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13748033#comment-13748033
 ] 

Shreepadma Venugopalan commented on HIVE-1719:
--

It was left in contrib so that we don't break backwards compatibility for 
existing users.

 Move RegexSerDe out of hive-contrib and over to hive-serde
 --

 Key: HIVE-1719
 URL: https://issues.apache.org/jira/browse/HIVE-1719
 Project: Hive
  Issue Type: Task
  Components: Serializers/Deserializers
Reporter: Carl Steinbach
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-1719.D3051.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-1719.D3051.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-1719.D3141.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-1719.D3249.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-1719.D3249.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-1719.D3249.3.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-1719.D3249.4.patch, HIVE-1719.3.patch, 
 HIVE-1719.D3249.1.patch


 RegexSerDe is as much a part of the standard Hive distribution as the other 
 SerDes
 currently in hive-serde. I think we should move it over to the hive-serde 
 module so that
 users don't have to go to the added effort of manually registering the 
 contrib jar before
 using it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory

2013-07-30 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan reassigned HIVE-4957:


Assignee: Shreepadma Venugopalan

 Restrict number of bit vectors, to prevent out of Java heap memory
 --

 Key: HIVE-4957
 URL: https://issues.apache.org/jira/browse/HIVE-4957
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Brock Noland
Assignee: Shreepadma Venugopalan

 normally increase number of bit vectors will increase calculation accuracy. 
 Let's say
 {noformat}
 select compute_stats(a, 40) from test_hive;
 {noformat}
 generally get better accuracy than
 {noformat}
 select compute_stats(a, 16) from test_hive;
 {noformat}
 But larger number of bit vectors also cause query run slower. When number of 
 bit vectors over 50, it won't help to increase accuracy anymore. But it still 
 increase memory usage, and crash Hive if number if too huge. Current Hive 
 doesn't prevent user use ridiculous large number of bit vectors in 
 'compute_stats' query.
 One example
 {noformat}
 select compute_stats(a, 9) from column_eight_types;
 {noformat}
 crashes Hive.
 {noformat}
 2012-12-20 23:21:52,247 Stage-1 map = 0%,  reduce = 0%
 2012-12-20 23:22:11,315 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.29 
 sec
 MapReduce Total cumulative CPU time: 290 msec
 Ended Job = job_1354923204155_0777 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/
 Examining task ID: task_1354923204155_0777_m_00 (and more) from job 
 job_1354923204155_0777
 Task with the most failures(4): 
 -
 Task ID:
   task_1354923204155_0777_m_00
 URL:
   
 http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00
 -
 Diagnostic Messages for this Task:
 Error: Java heap space
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory

2013-07-30 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-4957 started by Shreepadma Venugopalan.

 Restrict number of bit vectors, to prevent out of Java heap memory
 --

 Key: HIVE-4957
 URL: https://issues.apache.org/jira/browse/HIVE-4957
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Brock Noland
Assignee: Shreepadma Venugopalan

 normally increase number of bit vectors will increase calculation accuracy. 
 Let's say
 {noformat}
 select compute_stats(a, 40) from test_hive;
 {noformat}
 generally get better accuracy than
 {noformat}
 select compute_stats(a, 16) from test_hive;
 {noformat}
 But larger number of bit vectors also cause query run slower. When number of 
 bit vectors over 50, it won't help to increase accuracy anymore. But it still 
 increase memory usage, and crash Hive if number if too huge. Current Hive 
 doesn't prevent user use ridiculous large number of bit vectors in 
 'compute_stats' query.
 One example
 {noformat}
 select compute_stats(a, 9) from column_eight_types;
 {noformat}
 crashes Hive.
 {noformat}
 2012-12-20 23:21:52,247 Stage-1 map = 0%,  reduce = 0%
 2012-12-20 23:22:11,315 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.29 
 sec
 MapReduce Total cumulative CPU time: 290 msec
 Ended Job = job_1354923204155_0777 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/
 Examining task ID: task_1354923204155_0777_m_00 (and more) from job 
 job_1354923204155_0777
 Task with the most failures(4): 
 -
 Task ID:
   task_1354923204155_0777_m_00
 URL:
   
 http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00
 -
 Diagnostic Messages for this Task:
 Error: Java heap space
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4669) Make username available to semantic analyzer hooks

2013-07-07 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13701665#comment-13701665
 ] 

Shreepadma Venugopalan commented on HIVE-4669:
--

Ping :)

 Make username available to semantic analyzer hooks
 --

 Key: HIVE-4669
 URL: https://issues.apache.org/jira/browse/HIVE-4669
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0, 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4669.1.patch


 Make username available to the semantic analyzer hooks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4549) JDBC compliance change TABLE_SCHEMA to TABLE_SCHEM

2013-06-07 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13678217#comment-13678217
 ] 

Shreepadma Venugopalan commented on HIVE-4549:
--

+1 (non-committer). LGTM.

 JDBC compliance change TABLE_SCHEMA to TABLE_SCHEM
 --

 Key: HIVE-4549
 URL: https://issues.apache.org/jira/browse/HIVE-4549
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.10.0
 Environment: Hive 0.10
Reporter: Johndee Burks
Assignee: Prasad Mujumdar
Priority: Trivial
  Labels: newbie
 Fix For: 0.12.0

 Attachments: HIVE-4549-1.patch


 The ResultSet returned by HiveDatabaseMetadata.getTables has the metadata 
 columns TABLE_CAT, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE, REMARKS. The second 
 column name is not compliant with the JDBC standard 
 (http://docs.oracle.com/javase/6/docs/api/java/sql/DatabaseMetaData.html#getSchemas()):
  the column name should be TABLE_SCHEM instead of TABLE_SCHEMA.
 Suggested fix in Hive 
 (org.apache.hive.service.cli.operation.GetTablesOperation.java) change from
 private static final TableSchema RESULT_SET_SCHEMA = new TableSchema() 
 .addStringColumn(TABLE_CAT, Catalog name. NULL if not applicable.) 
 .addStringColumn(TABLE_SCHEMA, Schema name.) 
 .addStringColumn(TABLE_NAME, Table name.) 
 .addStringColumn(TABLE_TYPE, The table type, e.g. \TABLE\, \VIEW\, 
 etc.) 
 .addStringColumn(REMARKS, Comments about the table.);
 to
 private static final TableSchema RESULT_SET_SCHEMA = new TableSchema() 
 .addStringColumn(TABLE_CAT, Catalog name. NULL if not applicable.) 
 .addStringColumn(TABLE_SCHEM, Schema name.) 
 .addStringColumn(TABLE_NAME, Table name.) 
 .addStringColumn(TABLE_TYPE, The table type, e.g. \TABLE\, \VIEW\, 
 etc.) 
 .addStringColumn(REMARKS, Comments about the table.);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4670) Authentication module should pass the instance part of the Kerberos principle

2013-06-07 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4670:
-

Attachment: HIVE-4670.3.patch

 Authentication module should pass the instance part of the Kerberos principle
 -

 Key: HIVE-4670
 URL: https://issues.apache.org/jira/browse/HIVE-4670
 Project: Hive
  Issue Type: Bug
  Components: Authentication, HiveServer2
Affects Versions: 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4670.2.patch, HIVE-4670.3.patch


 When Kerberos authentication is enabled for HiveServer2, the thrift SASL 
 layer passes instance@realm from the principal. It should instead strip the 
 realm and pass just the instance part of the principal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4588) Support session level hooks for HiveServer2

2013-06-07 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13678515#comment-13678515
 ] 

Shreepadma Venugopalan commented on HIVE-4588:
--

+1 (non-binding), LGTM.

 Support session level hooks for HiveServer2
 ---

 Key: HIVE-4588
 URL: https://issues.apache.org/jira/browse/HIVE-4588
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.12.0

 Attachments: HIVE-4588-1.patch


 Support session level hooks for HiveSrver2. The configured hooks will get 
 executed at beginning of each new session.
 This is useful for auditing connections, possibly tuning the session level 
 properties etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)

2013-06-06 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13676744#comment-13676744
 ] 

Shreepadma Venugopalan commented on HIVE-4561:
--

[~clarkyzl]:  My suggestion is to use Long.Min/Long.Max value instead of a null 
value. The code that looks at column stats can use the min/max in conjunction 
with other stats such as number of rows etc. to infer that the values are 
initialization values for min/max and not true values that represent the bounds 
on the column.

 Column stats :  LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the 
 column values larger than 0.0 (or if all column values smaller than 0.0)
 

 Key: HIVE-4561
 URL: https://issues.apache.org/jira/browse/HIVE-4561
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.12.0
Reporter: caofangkun
Assignee: Zhuoluo (Clark) Yang
 Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch, HIVE-4561.3.patch, 
 HIVE-4561.4.patch


 if all column values larger than 0.0  DOUBLE_LOW_VALUE always will be 0.0 
 or  if all column values less than 0.0,  DOUBLE_HIGH_VALUE will always be 
 hive (default) create table src_test (price double);
 hive (default) load data local inpath './test.txt' into table src_test;
 hive (default) select * from src_test;
 OK
 1.0
 2.0
 3.0
 Time taken: 0.313 seconds, Fetched: 3 row(s)
 hive (default) analyze table src_test compute statistics for columns price;
 mysql select * from TAB_COL_STATS \G;
  CS_ID: 16
DB_NAME: default
 TABLE_NAME: src_test
COLUMN_NAME: price
COLUMN_TYPE: double
 TBL_ID: 2586
 LONG_LOW_VALUE: 0
LONG_HIGH_VALUE: 0
   DOUBLE_LOW_VALUE: 0.   # Wrong Result ! Expected is 1.
  DOUBLE_HIGH_VALUE: 3.
  BIG_DECIMAL_LOW_VALUE: NULL
 BIG_DECIMAL_HIGH_VALUE: NULL
  NUM_NULLS: 0
  NUM_DISTINCTS: 1
AVG_COL_LEN: 0.
MAX_COL_LEN: 0
  NUM_TRUES: 0
 NUM_FALSES: 0
  LAST_ANALYZED: 1368596151
 2 rows in set (0.00 sec)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4675) Create new parallel unit test environment

2013-06-06 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13677319#comment-13677319
 ] 

Shreepadma Venugopalan commented on HIVE-4675:
--

+1 to the proposal.

 Create new parallel unit test environment
 -

 Key: HIVE-4675
 URL: https://issues.apache.org/jira/browse/HIVE-4675
 Project: Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Reporter: Brock Noland
Assignee: Brock Noland

 The current ptest tool is great, but it has the following limitations:
 -Requires an NFS filer
 -Unless the NFS filer is dedicated ptests can become IO bound easily
 -Investigating of failures is troublesome because the source directory for 
 the failure is not saved
 -Ignoring or isolated tests is not supported
 -No unit tests for the ptest framework exist
 It'd be great to have a ptest tool that addresses this limitations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4670) Authentication module should pass the instance part of the Kerberos principle

2013-06-06 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4670:
-

Status: Patch Available  (was: In Progress)

 Authentication module should pass the instance part of the Kerberos principle
 -

 Key: HIVE-4670
 URL: https://issues.apache.org/jira/browse/HIVE-4670
 Project: Hive
  Issue Type: Bug
  Components: Authentication, HiveServer2
Affects Versions: 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4670.2.patch


 When Kerberos authentication is enabled for HiveServer2, the thrift SASL 
 layer passes instance@realm from the principal. It should instead strip the 
 realm and pass just the instance part of the principal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (HIVE-4670) Authentication module should pass the instance part of the Kerberos principle

2013-06-06 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-4670 started by Shreepadma Venugopalan.

 Authentication module should pass the instance part of the Kerberos principle
 -

 Key: HIVE-4670
 URL: https://issues.apache.org/jira/browse/HIVE-4670
 Project: Hive
  Issue Type: Bug
  Components: Authentication, HiveServer2
Affects Versions: 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4670.2.patch


 When Kerberos authentication is enabled for HiveServer2, the thrift SASL 
 layer passes instance@realm from the principal. It should instead strip the 
 realm and pass just the instance part of the principal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4670) Authentication module should pass the instance part of the Kerberos principle

2013-06-06 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4670:
-

Attachment: HIVE-4670.2.patch

 Authentication module should pass the instance part of the Kerberos principle
 -

 Key: HIVE-4670
 URL: https://issues.apache.org/jira/browse/HIVE-4670
 Project: Hive
  Issue Type: Bug
  Components: Authentication, HiveServer2
Affects Versions: 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4670.2.patch


 When Kerberos authentication is enabled for HiveServer2, the thrift SASL 
 layer passes instance@realm from the principal. It should instead strip the 
 realm and pass just the instance part of the principal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4670) Authentication module should pass the instance part of the Kerberos principle

2013-06-06 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13677740#comment-13677740
 ] 

Shreepadma Venugopalan commented on HIVE-4670:
--

https://reviews.apache.org/r/11705/

 Authentication module should pass the instance part of the Kerberos principle
 -

 Key: HIVE-4670
 URL: https://issues.apache.org/jira/browse/HIVE-4670
 Project: Hive
  Issue Type: Bug
  Components: Authentication, HiveServer2
Affects Versions: 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4670.2.patch


 When Kerberos authentication is enabled for HiveServer2, the thrift SASL 
 layer passes instance@realm from the principal. It should instead strip the 
 realm and pass just the instance part of the principal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4657) HCatalog checkstyle violation after HIVE-2670

2013-06-05 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4657:
-

Attachment: HIVE-4657.1.patch

 HCatalog checkstyle violation after HIVE-2670 
 --

 Key: HIVE-4657
 URL: https://issues.apache.org/jira/browse/HIVE-4657
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Shreepadma Venugopalan
 Attachments: HIVE-4657.1.patch


 After HIVE-2670 was committed, I see the following error,
 {noformat}
 checkstyle:
  [echo] hcatalog
 [checkstyle] Running Checkstyle 5.5 on 416 files
 [checkstyle] 
 /Users/vshree/work/repositories/hive15/hcatalog/src/test/e2e/hcatalog/drivers/TestDriverHiveCmdLine.pm:1:
  Line does not match expected header line of '\W*or more contributor license 
 agreements.  See the NOTICE file$'.
 [checkstyle] 
 /Users/vshree/work/repositories/hive15/hcatalog/src/test/e2e/hcatalog/tests/hive_cmdline.conf:1:
  Line does not match expected header line of '\W*or more contributor license 
 agreements.  See the NOTICE file$'.
 [checkstyle] 
 /Users/vshree/work/repositories/hive15/hcatalog/src/test/e2e/hcatalog/tests/hive_nightly.conf:1:
  Line does not match expected header line of '\W*or more contributor license 
 agreements.  See the NOTICE file$'.
   [for] hcatalog: The following error occurred while executing this line:
   [for] /Users/vshree/work/repositories/hive15/build.xml:310: The 
 following error occurred while executing this line:
   [for] /Users/vshree/work/repositories/hive15/hcatalog/build.xml:109: 
 The following error occurred while executing this line:
   [for] 
 /Users/vshree/work/repositories/hive15/hcatalog/build-support/ant/checkstyle.xml:32:
  Got 3 errors and 0 warnings.
 BUILD FAILED
 /Users/vshree/work/repositories/hive15/build.xml:308: Keepgoing execution: 2 
 of 11 iterations failed.
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4657) HCatalog checkstyle violation after HIVE-2670

2013-06-05 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4657:
-

Status: Patch Available  (was: Open)

 HCatalog checkstyle violation after HIVE-2670 
 --

 Key: HIVE-4657
 URL: https://issues.apache.org/jira/browse/HIVE-4657
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Shreepadma Venugopalan
 Attachments: HIVE-4657.1.patch


 After HIVE-2670 was committed, I see the following error,
 {noformat}
 checkstyle:
  [echo] hcatalog
 [checkstyle] Running Checkstyle 5.5 on 416 files
 [checkstyle] 
 /Users/vshree/work/repositories/hive15/hcatalog/src/test/e2e/hcatalog/drivers/TestDriverHiveCmdLine.pm:1:
  Line does not match expected header line of '\W*or more contributor license 
 agreements.  See the NOTICE file$'.
 [checkstyle] 
 /Users/vshree/work/repositories/hive15/hcatalog/src/test/e2e/hcatalog/tests/hive_cmdline.conf:1:
  Line does not match expected header line of '\W*or more contributor license 
 agreements.  See the NOTICE file$'.
 [checkstyle] 
 /Users/vshree/work/repositories/hive15/hcatalog/src/test/e2e/hcatalog/tests/hive_nightly.conf:1:
  Line does not match expected header line of '\W*or more contributor license 
 agreements.  See the NOTICE file$'.
   [for] hcatalog: The following error occurred while executing this line:
   [for] /Users/vshree/work/repositories/hive15/build.xml:310: The 
 following error occurred while executing this line:
   [for] /Users/vshree/work/repositories/hive15/hcatalog/build.xml:109: 
 The following error occurred while executing this line:
   [for] 
 /Users/vshree/work/repositories/hive15/hcatalog/build-support/ant/checkstyle.xml:32:
  Got 3 errors and 0 warnings.
 BUILD FAILED
 /Users/vshree/work/repositories/hive15/build.xml:308: Keepgoing execution: 2 
 of 11 iterations failed.
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4657) HCatalog checkstyle violation after HIVE-2670

2013-06-05 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13676190#comment-13676190
 ] 

Shreepadma Venugopalan commented on HIVE-4657:
--

This fixes the build which is currently broken.

 HCatalog checkstyle violation after HIVE-2670 
 --

 Key: HIVE-4657
 URL: https://issues.apache.org/jira/browse/HIVE-4657
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Shreepadma Venugopalan
 Attachments: HIVE-4657.1.patch


 After HIVE-2670 was committed, I see the following error,
 {noformat}
 checkstyle:
  [echo] hcatalog
 [checkstyle] Running Checkstyle 5.5 on 416 files
 [checkstyle] 
 /Users/vshree/work/repositories/hive15/hcatalog/src/test/e2e/hcatalog/drivers/TestDriverHiveCmdLine.pm:1:
  Line does not match expected header line of '\W*or more contributor license 
 agreements.  See the NOTICE file$'.
 [checkstyle] 
 /Users/vshree/work/repositories/hive15/hcatalog/src/test/e2e/hcatalog/tests/hive_cmdline.conf:1:
  Line does not match expected header line of '\W*or more contributor license 
 agreements.  See the NOTICE file$'.
 [checkstyle] 
 /Users/vshree/work/repositories/hive15/hcatalog/src/test/e2e/hcatalog/tests/hive_nightly.conf:1:
  Line does not match expected header line of '\W*or more contributor license 
 agreements.  See the NOTICE file$'.
   [for] hcatalog: The following error occurred while executing this line:
   [for] /Users/vshree/work/repositories/hive15/build.xml:310: The 
 following error occurred while executing this line:
   [for] /Users/vshree/work/repositories/hive15/hcatalog/build.xml:109: 
 The following error occurred while executing this line:
   [for] 
 /Users/vshree/work/repositories/hive15/hcatalog/build-support/ant/checkstyle.xml:32:
  Got 3 errors and 0 warnings.
 BUILD FAILED
 /Users/vshree/work/repositories/hive15/build.xml:308: Keepgoing execution: 2 
 of 11 iterations failed.
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent

2013-06-05 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13676365#comment-13676365
 ] 

Shreepadma Venugopalan commented on HIVE-4435:
--

[~ashutoshc]: I've updated the .q files in the patches. Thanks!

 Column stats: Distinct value estimator should use hash functions that are 
 pairwise independent
 --

 Key: HIVE-4435
 URL: https://issues.apache.org/jira/browse/HIVE-4435
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0, 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: chart_1(1).png, HIVE-4435.1.patch, HIVE-4435.2.patch


 The current implementation of Flajolet-Martin estimator to estimate the 
 number of distinct values doesn't use hash functions that are pairwise 
 independent. This is problematic because the input values don't distribute 
 uniformly. When run on large TPC-H data sets, this leads to a huge 
 discrepancy for primary key columns. Primary key columns are typically a 
 monotonically increasing sequence.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent

2013-06-05 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4435:
-

Status: Patch Available  (was: Open)

 Column stats: Distinct value estimator should use hash functions that are 
 pairwise independent
 --

 Key: HIVE-4435
 URL: https://issues.apache.org/jira/browse/HIVE-4435
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.11.0, 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: chart_1(1).png, HIVE-4435.1.patch, HIVE-4435.2.patch


 The current implementation of Flajolet-Martin estimator to estimate the 
 number of distinct values doesn't use hash functions that are pairwise 
 independent. This is problematic because the input values don't distribute 
 uniformly. When run on large TPC-H data sets, this leads to a huge 
 discrepancy for primary key columns. Primary key columns are typically a 
 monotonically increasing sequence.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent

2013-06-05 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4435:
-

Attachment: HIVE-4435.2.patch

 Column stats: Distinct value estimator should use hash functions that are 
 pairwise independent
 --

 Key: HIVE-4435
 URL: https://issues.apache.org/jira/browse/HIVE-4435
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0, 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: chart_1(1).png, HIVE-4435.1.patch, HIVE-4435.2.patch


 The current implementation of Flajolet-Martin estimator to estimate the 
 number of distinct values doesn't use hash functions that are pairwise 
 independent. This is problematic because the input values don't distribute 
 uniformly. When run on large TPC-H data sets, this leads to a huge 
 discrepancy for primary key columns. Primary key columns are typically a 
 monotonically increasing sequence.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4641) Support post execution/fetch hook for HiveServer2

2013-06-05 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13676367#comment-13676367
 ] 

Shreepadma Venugopalan commented on HIVE-4641:
--

Enforcing security on a per row basis could be one use of such a hook. The hook 
can be used in other ways to apply custom transformations to the result set 
before returning to the client.

 Support post execution/fetch hook for HiveServer2
 -

 Key: HIVE-4641
 URL: https://issues.apache.org/jira/browse/HIVE-4641
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Query Processor
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan

 Support post execution/fetch hook that is invoked prior to returning results 
 to the client. This can be used to filter results to enforce a specific 
 security policy before returning the result set to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4641) Support post execution/fetch hook for HiveServer2

2013-06-05 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13676521#comment-13676521
 ] 

Shreepadma Venugopalan commented on HIVE-4641:
--

This is a general purpose hook and is not specific to any feature. Hive has 
hooks at various stages of compilation and execution - pre semantic analysis, 
post semantic analysis, pre execution etc, but misses a post execution/post 
fetch hook. This JIRA just adds that.

 Support post execution/fetch hook for HiveServer2
 -

 Key: HIVE-4641
 URL: https://issues.apache.org/jira/browse/HIVE-4641
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Query Processor
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan

 Support post execution/fetch hook that is invoked prior to returning results 
 to the client. This can be used to filter results before returning the result 
 set to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4669) Make username available to semantic analyzer hooks

2013-06-05 Thread Shreepadma Venugopalan (JIRA)

Shreepadma Venugopalan created HIVE-4669:


 Summary: Make username available to semantic analyzer hooks
 Key: HIVE-4669
 URL: https://issues.apache.org/jira/browse/HIVE-4669
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0, 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan


Make username available to the semantic analyzer hooks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4669) Make username available to semantic analyzer hooks

2013-06-05 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4669:
-

Status: Patch Available  (was: In Progress)

 Make username available to semantic analyzer hooks
 --

 Key: HIVE-4669
 URL: https://issues.apache.org/jira/browse/HIVE-4669
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0, 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4669.1.patch


 Make username available to the semantic analyzer hooks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4669) Make username available to semantic analyzer hooks

2013-06-05 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4669:
-

Attachment: HIVE-4669.1.patch

 Make username available to semantic analyzer hooks
 --

 Key: HIVE-4669
 URL: https://issues.apache.org/jira/browse/HIVE-4669
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0, 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4669.1.patch


 Make username available to the semantic analyzer hooks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (HIVE-4669) Make username available to semantic analyzer hooks

2013-06-05 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-4669 started by Shreepadma Venugopalan.

 Make username available to semantic analyzer hooks
 --

 Key: HIVE-4669
 URL: https://issues.apache.org/jira/browse/HIVE-4669
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0, 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4669.1.patch


 Make username available to the semantic analyzer hooks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)

2013-06-05 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13676598#comment-13676598
 ] 

Shreepadma Venugopalan commented on HIVE-4561:
--

[~clarkyzl]: I'm not sure I understand the fix here. Can you please elaborate 
on what it means to leaving it empty in the ColumnStatsTask? Thanks!

 Column stats :  LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the 
 column values larger than 0.0 (or if all column values smaller than 0.0)
 

 Key: HIVE-4561
 URL: https://issues.apache.org/jira/browse/HIVE-4561
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.12.0
Reporter: caofangkun
Assignee: Zhuoluo (Clark) Yang
 Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch, HIVE-4561.3.patch, 
 HIVE-4561.4.patch


 if all column values larger than 0.0  DOUBLE_LOW_VALUE always will be 0.0 
 or  if all column values less than 0.0,  DOUBLE_HIGH_VALUE will always be 
 hive (default) create table src_test (price double);
 hive (default) load data local inpath './test.txt' into table src_test;
 hive (default) select * from src_test;
 OK
 1.0
 2.0
 3.0
 Time taken: 0.313 seconds, Fetched: 3 row(s)
 hive (default) analyze table src_test compute statistics for columns price;
 mysql select * from TAB_COL_STATS \G;
  CS_ID: 16
DB_NAME: default
 TABLE_NAME: src_test
COLUMN_NAME: price
COLUMN_TYPE: double
 TBL_ID: 2586
 LONG_LOW_VALUE: 0
LONG_HIGH_VALUE: 0
   DOUBLE_LOW_VALUE: 0.   # Wrong Result ! Expected is 1.
  DOUBLE_HIGH_VALUE: 3.
  BIG_DECIMAL_LOW_VALUE: NULL
 BIG_DECIMAL_HIGH_VALUE: NULL
  NUM_NULLS: 0
  NUM_DISTINCTS: 1
AVG_COL_LEN: 0.
MAX_COL_LEN: 0
  NUM_TRUES: 0
 NUM_FALSES: 0
  LAST_ANALYZED: 1368596151
 2 rows in set (0.00 sec)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4670) Authentication module should pass the instance part of the Kerberos principle

2013-06-05 Thread Shreepadma Venugopalan (JIRA)

Shreepadma Venugopalan created HIVE-4670:


 Summary: Authentication module should pass the instance part of 
the Kerberos principle
 Key: HIVE-4670
 URL: https://issues.apache.org/jira/browse/HIVE-4670
 Project: Hive
  Issue Type: Bug
  Components: Authentication, HiveServer2
Affects Versions: 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan


When Kerberos authentication is enabled for HiveServer2, the thrift SASL layer 
passes instance@realm from the principal. It should instead strip the realm and 
pass just the instance part of the principal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4657) HCatalog checkstyle violation after HIVE-2670

2013-06-04 Thread Shreepadma Venugopalan (JIRA)

Shreepadma Venugopalan created HIVE-4657:


 Summary: HCatalog checkstyle violation after HIVE-2670 
 Key: HIVE-4657
 URL: https://issues.apache.org/jira/browse/HIVE-4657
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Shreepadma Venugopalan


After HIVE-2670 was committed, I see the following error,

{noformat}
checkstyle:
 [echo] hcatalog
[checkstyle] Running Checkstyle 5.5 on 416 files
[checkstyle] 
/Users/vshree/work/repositories/hive15/hcatalog/src/test/e2e/hcatalog/drivers/TestDriverHiveCmdLine.pm:1:
 Line does not match expected header line of '\W*or more contributor license 
agreements.  See the NOTICE file$'.
[checkstyle] 
/Users/vshree/work/repositories/hive15/hcatalog/src/test/e2e/hcatalog/tests/hive_cmdline.conf:1:
 Line does not match expected header line of '\W*or more contributor license 
agreements.  See the NOTICE file$'.
[checkstyle] 
/Users/vshree/work/repositories/hive15/hcatalog/src/test/e2e/hcatalog/tests/hive_nightly.conf:1:
 Line does not match expected header line of '\W*or more contributor license 
agreements.  See the NOTICE file$'.
  [for] hcatalog: The following error occurred while executing this line:
  [for] /Users/vshree/work/repositories/hive15/build.xml:310: The following 
error occurred while executing this line:
  [for] /Users/vshree/work/repositories/hive15/hcatalog/build.xml:109: The 
following error occurred while executing this line:
  [for] 
/Users/vshree/work/repositories/hive15/hcatalog/build-support/ant/checkstyle.xml:32:
 Got 3 errors and 0 warnings.

BUILD FAILED
/Users/vshree/work/repositories/hive15/build.xml:308: Keepgoing execution: 2 of 
11 iterations failed.
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4648) Add ability to set hadoop conf overrides in JDBC for HiveServer2

2013-06-03 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673367#comment-13673367
 ] 

Shreepadma Venugopalan commented on HIVE-4648:
--

[~harisekhon]: It is possible to set and unset config variables through JDBC 
that can be set/unset through the command line. To do so, you'd need to do an 
execute statement with set config.var = value. To set the scratch dir, you 
can do the following in JDBC,

{noformat}
statement.execute(set hive.exec.scratchdir = /tmp/mydir);
{noformat}

Note that this property is set for the particular JDBC connection. 

 Add ability to set hadoop conf overrides in JDBC for HiveServer2
 

 Key: HIVE-4648
 URL: https://issues.apache.org/jira/browse/HIVE-4648
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2, JDBC
Affects Versions: 0.10.0
Reporter: Hari Sekhon

 It's possible in BeeLine to specify set command overides of hadoop config 
 variables, but I haven't seen any example code of how to do this in JDBC with 
 HiveServer2.
 We need an ability to specify hadoop conf overrides on a per session basis or 
 even half way through the session. See this Hive ticket for some background:
 https://issues.apache.org/jira/browse/HIVE-4644

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4648) Add ability to set hadoop conf overrides in JDBC for HiveServer2

2013-06-03 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673369#comment-13673369
 ] 

Shreepadma Venugopalan commented on HIVE-4648:
--

Please note that setting hive.exec.scratchdir is just an example of doing sets 
through JDBC.

 Add ability to set hadoop conf overrides in JDBC for HiveServer2
 

 Key: HIVE-4648
 URL: https://issues.apache.org/jira/browse/HIVE-4648
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2, JDBC
Affects Versions: 0.10.0
Reporter: Hari Sekhon

 It's possible in BeeLine to specify set command overides of hadoop config 
 variables, but I haven't seen any example code of how to do this in JDBC with 
 HiveServer2.
 We need an ability to specify hadoop conf overrides on a per session basis or 
 even half way through the session. See this Hive ticket for some background:
 https://issues.apache.org/jira/browse/HIVE-4644

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4629) HS2 should support an API to retrieve query logs

2013-06-03 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673508#comment-13673508
 ] 

Shreepadma Venugopalan commented on HIVE-4629:
--

[~cwsteinbach]: Can you look at this? Thanks!

 HS2 should support an API to retrieve query logs
 

 Key: HIVE-4629
 URL: https://issues.apache.org/jira/browse/HIVE-4629
 Project: Hive
  Issue Type: Sub-task
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan

 HiveServer2 should support an API to retrieve query logs. This is 
 particularly relevant because HiveServer2 supports async execution but 
 doesn't provide a way to report progress. Providing an API to retrieve query 
 logs will help report progress to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent

2013-06-03 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673599#comment-13673599
 ] 

Shreepadma Venugopalan commented on HIVE-4435:
--

Thanks Ashutosh!

 Column stats: Distinct value estimator should use hash functions that are 
 pairwise independent
 --

 Key: HIVE-4435
 URL: https://issues.apache.org/jira/browse/HIVE-4435
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: chart_1(1).png, HIVE-4435.1.patch


 The current implementation of Flajolet-Martin estimator to estimate the 
 number of distinct values doesn't use hash functions that are pairwise 
 independent. This is problematic because the input values don't distribute 
 uniformly. When run on large TPC-H data sets, this leads to a huge 
 discrepancy for primary key columns. Primary key columns are typically a 
 monotonically increasing sequence.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)

2013-06-03 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673616#comment-13673616
 ] 

Shreepadma Venugopalan commented on HIVE-4561:
--

[~ashutoshc]: Sure, I'll take a look at this today.

 Column stats :  LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the 
 column values larger than 0.0 (or if all column values smaller than 0.0)
 

 Key: HIVE-4561
 URL: https://issues.apache.org/jira/browse/HIVE-4561
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.12.0
Reporter: caofangkun
Assignee: Zhuoluo (Clark) Yang
 Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch


 if all column values larger than 0.0  DOUBLE_LOW_VALUE always will be 0.0 
 or  if all column values less than 0.0,  DOUBLE_HIGH_VALUE will always be 
 hive (default) create table src_test (price double);
 hive (default) load data local inpath './test.txt' into table src_test;
 hive (default) select * from src_test;
 OK
 1.0
 2.0
 3.0
 Time taken: 0.313 seconds, Fetched: 3 row(s)
 hive (default) analyze table src_test compute statistics for columns price;
 mysql select * from TAB_COL_STATS \G;
  CS_ID: 16
DB_NAME: default
 TABLE_NAME: src_test
COLUMN_NAME: price
COLUMN_TYPE: double
 TBL_ID: 2586
 LONG_LOW_VALUE: 0
LONG_HIGH_VALUE: 0
   DOUBLE_LOW_VALUE: 0.   # Wrong Result ! Expected is 1.
  DOUBLE_HIGH_VALUE: 3.
  BIG_DECIMAL_LOW_VALUE: NULL
 BIG_DECIMAL_HIGH_VALUE: NULL
  NUM_NULLS: 0
  NUM_DISTINCTS: 1
AVG_COL_LEN: 0.
MAX_COL_LEN: 0
  NUM_TRUES: 0
 NUM_FALSES: 0
  LAST_ANALYZED: 1368596151
 2 rows in set (0.00 sec)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)

2013-06-03 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673887#comment-13673887
 ] 

Shreepadma Venugopalan commented on HIVE-4561:
--

LGTM! +1 (non-binding).

 Column stats :  LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the 
 column values larger than 0.0 (or if all column values smaller than 0.0)
 

 Key: HIVE-4561
 URL: https://issues.apache.org/jira/browse/HIVE-4561
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.12.0
Reporter: caofangkun
Assignee: Zhuoluo (Clark) Yang
 Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch


 if all column values larger than 0.0  DOUBLE_LOW_VALUE always will be 0.0 
 or  if all column values less than 0.0,  DOUBLE_HIGH_VALUE will always be 
 hive (default) create table src_test (price double);
 hive (default) load data local inpath './test.txt' into table src_test;
 hive (default) select * from src_test;
 OK
 1.0
 2.0
 3.0
 Time taken: 0.313 seconds, Fetched: 3 row(s)
 hive (default) analyze table src_test compute statistics for columns price;
 mysql select * from TAB_COL_STATS \G;
  CS_ID: 16
DB_NAME: default
 TABLE_NAME: src_test
COLUMN_NAME: price
COLUMN_TYPE: double
 TBL_ID: 2586
 LONG_LOW_VALUE: 0
LONG_HIGH_VALUE: 0
   DOUBLE_LOW_VALUE: 0.   # Wrong Result ! Expected is 1.
  DOUBLE_HIGH_VALUE: 3.
  BIG_DECIMAL_LOW_VALUE: NULL
 BIG_DECIMAL_HIGH_VALUE: NULL
  NUM_NULLS: 0
  NUM_DISTINCTS: 1
AVG_COL_LEN: 0.
MAX_COL_LEN: 0
  NUM_TRUES: 0
 NUM_FALSES: 0
  LAST_ANALYZED: 1368596151
 2 rows in set (0.00 sec)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-4641) Support post execution/fetch hook for HiveServer2

2013-05-31 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan reassigned HIVE-4641:


Assignee: Shreepadma Venugopalan

 Support post execution/fetch hook for HiveServer2
 -

 Key: HIVE-4641
 URL: https://issues.apache.org/jira/browse/HIVE-4641
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Query Processor
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan

 Support post execution/fetch hook that is invoked prior to returning results 
 to the client. This can be used to filter results to enforce a specific 
 security policy before returning the result set to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4641) Support post execution/fetch hook for HiveServer2

2013-05-31 Thread Shreepadma Venugopalan (JIRA)

Shreepadma Venugopalan created HIVE-4641:


 Summary: Support post execution/fetch hook for HiveServer2
 Key: HIVE-4641
 URL: https://issues.apache.org/jira/browse/HIVE-4641
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Query Processor
Reporter: Shreepadma Venugopalan


Support post execution/fetch hook that is invoked prior to returning results to 
the client. This can be used to filter results to enforce a specific security 
policy before returning the result set to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (HIVE-4641) Support post execution/fetch hook for HiveServer2

2013-05-31 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-4641 started by Shreepadma Venugopalan.

 Support post execution/fetch hook for HiveServer2
 -

 Key: HIVE-4641
 URL: https://issues.apache.org/jira/browse/HIVE-4641
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Query Processor
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan

 Support post execution/fetch hook that is invoked prior to returning results 
 to the client. This can be used to filter results to enforce a specific 
 security policy before returning the result set to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (HIVE-4426) Support statistics collection for partitioning key

2013-05-31 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-4426 started by Shreepadma Venugopalan.

 Support statistics collection for partitioning key
 --

 Key: HIVE-4426
 URL: https://issues.apache.org/jira/browse/HIVE-4426
 Project: Hive
  Issue Type: Bug
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan

 We should support the ability to collect statistics on the partitioning key 
 column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4301) Bulk retrieval API for column stats

2013-05-31 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4301:
-

Description: Provide APIs to bulk fetch column stats i.e., stats for all 
columns in a table and stats for all columns in all partitions in a table. This 
is necessary when fetching per partition column stats to avoid un necessary 
network round trips. This is particularly relevant when running a remote 
metastore service.  (was: Provide APIs to bulk fetch column stats i.e., stats 
for all columns in a table and stats for all columns in all partitions in a 
table.)

 Bulk retrieval API for column stats
 ---

 Key: HIVE-4301
 URL: https://issues.apache.org/jira/browse/HIVE-4301
 Project: Hive
  Issue Type: Bug
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan

 Provide APIs to bulk fetch column stats i.e., stats for all columns in a 
 table and stats for all columns in all partitions in a table. This is 
 necessary when fetching per partition column stats to avoid un necessary 
 network round trips. This is particularly relevant when running a remote 
 metastore service.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4628) HS2 sessionmanager should synchronize the call to insert/remove session objects from session hash map

2013-05-30 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13670516#comment-13670516
 ] 

Shreepadma Venugopalan commented on HIVE-4628:
--

Good catch Tejas. Looks like this is not an issue any more. I've set the 
appropriate status.

 HS2 sessionmanager should synchronize the call to insert/remove session 
 objects from session hash map
 -

 Key: HIVE-4628
 URL: https://issues.apache.org/jira/browse/HIVE-4628
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
Priority: Critical

 HS2 SessionManager maintains a hashmap of active HS2 sessions. However, 
 insert and deletes to this hashmap is not synchronized. A consequence of this 
 is a racing thread could overwrite a valid session object in the hashmap and 
 we could end up losing a session!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-4628) HS2 sessionmanager should synchronize the call to insert/remove session objects from session hash map

2013-05-30 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan resolved HIVE-4628.
--

Resolution: Not A Problem

 HS2 sessionmanager should synchronize the call to insert/remove session 
 objects from session hash map
 -

 Key: HIVE-4628
 URL: https://issues.apache.org/jira/browse/HIVE-4628
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
Priority: Critical

 HS2 SessionManager maintains a hashmap of active HS2 sessions. However, 
 insert and deletes to this hashmap is not synchronized. A consequence of this 
 is a racing thread could overwrite a valid session object in the hashmap and 
 we could end up losing a session!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (HIVE-4629) HS2 should support an API to retrieve query logs

2013-05-30 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-4629 started by Shreepadma Venugopalan.

 HS2 should support an API to retrieve query logs
 

 Key: HIVE-4629
 URL: https://issues.apache.org/jira/browse/HIVE-4629
 Project: Hive
  Issue Type: Sub-task
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan

 HiveServer2 should support an API to retrieve query logs. This is 
 particularly relevant because HiveServer2 supports async execution but 
 doesn't provide a way to report progress. Providing an API to retrieve query 
 logs will help report progress to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4629) HS2 should support an API to retrieve query logs

2013-05-30 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13670583#comment-13670583
 ] 

Shreepadma Venugopalan commented on HIVE-4629:
--

@Carl: The proposed addition to TCLIService.thrift is the following new API and 
structs,

{noformat}
// GetLog()
// Fetch operation log from the server corresponding to
// a particular OperationHandle.

struct TGetLogReq {
  // Operation whose log is requested
  1: required TOperationHandle operationHandle
}

struct TGetLogResp {
  1: required TStatus status
  2: required string log
}

service TCLIService {
...
...
TGetLogResp GetLog(1:TGetLogReq req);
}
{noformat}


 HS2 should support an API to retrieve query logs
 

 Key: HIVE-4629
 URL: https://issues.apache.org/jira/browse/HIVE-4629
 Project: Hive
  Issue Type: Sub-task
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan

 HiveServer2 should support an API to retrieve query logs. This is 
 particularly relevant because HiveServer2 supports async execution but 
 doesn't provide a way to report progress. Providing an API to retrieve query 
 logs will help report progress to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4618) show create table creating unusable DDL when field delimiter is \001

2013-05-30 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671073#comment-13671073
 ] 

Shreepadma Venugopalan commented on HIVE-4618:
--

LGTM. +1 (non-binding).

 show create table creating unusable DDL when field delimiter is \001
 

 Key: HIVE-4618
 URL: https://issues.apache.org/jira/browse/HIVE-4618
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.10.0
 Environment: CDH4.2
 Hive 0.10
Reporter: Johndee Burks
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4618.D11007.1.patch


 When including a fields terminated by in the create statement. If the 
 delimiter is preceded by a \001, hive turns this into \u0001 which is 
 correct. However it then gives you a ddl that does not work because the 
 parser changes the \u0001 into u0001. 
 Example: 
 hive create table j1 (a string) row format delimited fields terminated by 
 '\001';
 hive show create table j1;
 CREATE  TABLE j1(
   a string)
 ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '\u0001'
 STORED AS INPUTFORMAT
   'org.apache.hadoop.mapred.TextInputFormat'
 OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
 LOCATION
   'hdfs://forza-1.cloud.rtp.cloudera.com:8020/user/hive/warehouse/j1'
 TBLPROPERTIES (
   'transient_lastDdlTime'='1369664999')
 hive desc formatted j1;
 …shortened to save space
 Storage Desc Params:
   field.delim \u0001
   serialization.format\u0001
 hive drop table j1;
 hive CREATE  TABLE j1(
a string)
  ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u0001'
  STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
  OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
  LOCATION
'hdfs://forza-1.cloud.rtp.cloudera.com:8020/user/hive/warehouse/j1'
  TBLPROPERTIES (
'transient_lastDdlTime'='1369664999');
 hive desc formatted j1;
 …shortened to save space
 Storage Desc Params:
   field.delim u0001
   serialization.formatu0001

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4628) HS2 sessionmanager should synchronize the call to insert/remove session objects from session hash map

2013-05-29 Thread Shreepadma Venugopalan (JIRA)

Shreepadma Venugopalan created HIVE-4628:


 Summary: HS2 sessionmanager should synchronize the call to 
insert/remove session objects from session hash map
 Key: HIVE-4628
 URL: https://issues.apache.org/jira/browse/HIVE-4628
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
Priority: Critical


HS2 SessionManager maintains a hashmap of active HS2 sessions. However, insert 
and deletes to this hashmap is not synchronized. A consequence of this is a 
racing thread could overwrite a valid session object in the hashmap and we 
could end up losing a session!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4629) HS2 should support an API to retrieve query logs

2013-05-29 Thread Shreepadma Venugopalan (JIRA)

Shreepadma Venugopalan created HIVE-4629:


 Summary: HS2 should support an API to retrieve query logs
 Key: HIVE-4629
 URL: https://issues.apache.org/jira/browse/HIVE-4629
 Project: Hive
  Issue Type: Bug
Reporter: Shreepadma Venugopalan


HiveServer2 should support an API to retrieve query logs. This is particularly 
relevant because HiveServer2 supports async execution but doesn't provide a way 
to report progress. Providing an API to retrieve query logs will help report 
progress to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-4629) HS2 should support an API to retrieve query logs

2013-05-29 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan reassigned HIVE-4629:


Assignee: Shreepadma Venugopalan

 HS2 should support an API to retrieve query logs
 

 Key: HIVE-4629
 URL: https://issues.apache.org/jira/browse/HIVE-4629
 Project: Hive
  Issue Type: Bug
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan

 HiveServer2 should support an API to retrieve query logs. This is 
 particularly relevant because HiveServer2 supports async execution but 
 doesn't provide a way to report progress. Providing an API to retrieve query 
 logs will help report progress to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4629) HS2 should support an API to retrieve query logs

2013-05-29 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4629:
-

Issue Type: New Feature  (was: Bug)

 HS2 should support an API to retrieve query logs
 

 Key: HIVE-4629
 URL: https://issues.apache.org/jira/browse/HIVE-4629
 Project: Hive
  Issue Type: New Feature
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan

 HiveServer2 should support an API to retrieve query logs. This is 
 particularly relevant because HiveServer2 supports async execution but 
 doesn't provide a way to report progress. Providing an API to retrieve query 
 logs will help report progress to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4629) HS2 should support an API to retrieve query logs

2013-05-29 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4629:
-

Issue Type: Sub-task  (was: New Feature)
Parent: HIVE-2935

 HS2 should support an API to retrieve query logs
 

 Key: HIVE-4629
 URL: https://issues.apache.org/jira/browse/HIVE-4629
 Project: Hive
  Issue Type: Sub-task
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan

 HiveServer2 should support an API to retrieve query logs. This is 
 particularly relevant because HiveServer2 supports async execution but 
 doesn't provide a way to report progress. Providing an API to retrieve query 
 logs will help report progress to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (HIVE-4628) HS2 sessionmanager should synchronize the call to insert/remove session objects from session hash map

2013-05-29 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-4628 started by Shreepadma Venugopalan.

 HS2 sessionmanager should synchronize the call to insert/remove session 
 objects from session hash map
 -

 Key: HIVE-4628
 URL: https://issues.apache.org/jira/browse/HIVE-4628
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
Priority: Critical

 HS2 SessionManager maintains a hashmap of active HS2 sessions. However, 
 insert and deletes to this hashmap is not synchronized. A consequence of this 
 is a racing thread could overwrite a valid session object in the hashmap and 
 we could end up losing a session!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent

2013-05-28 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13668883#comment-13668883
 ] 

Shreepadma Venugopalan commented on HIVE-4435:
--

Ping :)

 Column stats: Distinct value estimator should use hash functions that are 
 pairwise independent
 --

 Key: HIVE-4435
 URL: https://issues.apache.org/jira/browse/HIVE-4435
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: chart_1(1).png, HIVE-4435.1.patch


 The current implementation of Flajolet-Martin estimator to estimate the 
 number of distinct values doesn't use hash functions that are pairwise 
 independent. This is problematic because the input values don't distribute 
 uniformly. When run on large TPC-H data sets, this leads to a huge 
 discrepancy for primary key columns. Primary key columns are typically a 
 monotonically increasing sequence.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent

2013-05-03 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648987#comment-13648987
 ] 

Shreepadma Venugopalan commented on HIVE-4435:
--

Can a committer take a look at this?

 Column stats: Distinct value estimator should use hash functions that are 
 pairwise independent
 --

 Key: HIVE-4435
 URL: https://issues.apache.org/jira/browse/HIVE-4435
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: chart_1(1).png, HIVE-4435.1.patch


 The current implementation of Flajolet-Martin estimator to estimate the 
 number of distinct values doesn't use hash functions that are pairwise 
 independent. This is problematic because the input values don't distribute 
 uniformly. When run on large TPC-H data sets, this leads to a huge 
 discrepancy for primary key columns. Primary key columns are typically a 
 monotonically increasing sequence.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent

2013-04-29 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4435:
-

Status: Patch Available  (was: Open)

 Column stats: Distinct value estimator should use hash functions that are 
 pairwise independent
 --

 Key: HIVE-4435
 URL: https://issues.apache.org/jira/browse/HIVE-4435
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4435.1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent

2013-04-29 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4435:
-

Attachment: HIVE-4435.1.patch

 Column stats: Distinct value estimator should use hash functions that are 
 pairwise independent
 --

 Key: HIVE-4435
 URL: https://issues.apache.org/jira/browse/HIVE-4435
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4435.1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent

2013-04-29 Thread Shreepadma Venugopalan (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shreepadma Venugopalan updated HIVE-4435:
-

Description: The current implementation of Flajolet-Martin estimator to
estimate the number of distinct values doesn't use hash functions that are
pairwise independent. This is problematic because the input values don't
distribute uniformly. When run on large TPC-H data sets, this leads to a huge
discrepancy for primary key columns. Primary key columns are typically a
monotonically increasing sequence.

Column stats: Distinct value estimator should use hash functions that are
pairwise independent
--

Key: HIVE-4435
URL: https://issues.apache.org/jira/browse/HIVE-4435
Project: Hive
Issue Type: Bug
Components: Statistics
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
Attachments: HIVE-4435.1.patch

The current implementation of Flajolet-Martin estimator to estimate the
number of distinct values doesn't use hash functions that are pairwise
independent. This is problematic because the input values don't distribute
uniformly. When run on large TPC-H data sets, this leads to a huge
discrepancy for primary key columns. Primary key columns are typically a
monotonically increasing sequence.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent

2013-04-29 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644840#comment-13644840
 ] 

Shreepadma Venugopalan commented on HIVE-4435:
--

The fix is to use hash functions that are pairwise independent. More on 
pairwise independence and family of hash functions - 
http://people.csail.mit.edu/ronitt/COURSE/S12/handouts/lec5.pdf

 Column stats: Distinct value estimator should use hash functions that are 
 pairwise independent
 --

 Key: HIVE-4435
 URL: https://issues.apache.org/jira/browse/HIVE-4435
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4435.1.patch


 The current implementation of Flajolet-Martin estimator to estimate the 
 number of distinct values doesn't use hash functions that are pairwise 
 independent. This is problematic because the input values don't distribute 
 uniformly. When run on large TPC-H data sets, this leads to a huge 
 discrepancy for primary key columns. Primary key columns are typically a 
 monotonically increasing sequence.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent

2013-04-29 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644844#comment-13644844
 ] 

Shreepadma Venugopalan commented on HIVE-4435:
--

review board: https://reviews.apache.org/r/10841/

 Column stats: Distinct value estimator should use hash functions that are 
 pairwise independent
 --

 Key: HIVE-4435
 URL: https://issues.apache.org/jira/browse/HIVE-4435
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4435.1.patch


 The current implementation of Flajolet-Martin estimator to estimate the 
 number of distinct values doesn't use hash functions that are pairwise 
 independent. This is problematic because the input values don't distribute 
 uniformly. When run on large TPC-H data sets, this leads to a huge 
 discrepancy for primary key columns. Primary key columns are typically a 
 monotonically increasing sequence.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

1 2 3 4 >

1 - 100 of 339 matches

Mail list logo