[jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour

2014-03-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6157:
---

Fix Version/s: 0.13.0

 Fetching column stats slower than the 101 during rush hour
 --

 Key: HIVE-6157
 URL: https://issues.apache.org/jira/browse/HIVE-6157
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Gunther Hagleitner
Assignee: Sergey Shelukhin
 Fix For: 0.13.0

 Attachments: HIVE-6157.01.patch, HIVE-6157.01.patch, 
 HIVE-6157.03.patch, HIVE-6157.03.patch, HIVE-6157.nogen.patch, 
 HIVE-6157.nogen.patch, HIVE-6157.prelim.patch


 hive.stats.fetch.column.stats controls whether the column stats for a table 
 are fetched during explain (in Tez: during query planning). On my setup (1 
 table 4000 partitions, 24 columns) the time spent in semantic analyze goes 
 from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent 
 fetching column stats...
 The reason is probably that the APIs force you to make separate metastore 
 calls for each column in each partition. That's probably the first thing that 
 has to change. The question is if in addition to that we need to cache this 
 in the client or store the stats as a single blob in the database to further 
 cut down on the time. However, the way it stands right now column stats seem 
 unusable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour

2014-01-29 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-6157:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks Sergey!

 Fetching column stats slower than the 101 during rush hour
 --

 Key: HIVE-6157
 URL: https://issues.apache.org/jira/browse/HIVE-6157
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Gunther Hagleitner
Assignee: Sergey Shelukhin
 Attachments: HIVE-6157.01.patch, HIVE-6157.01.patch, 
 HIVE-6157.03.patch, HIVE-6157.03.patch, HIVE-6157.nogen.patch, 
 HIVE-6157.nogen.patch, HIVE-6157.prelim.patch


 hive.stats.fetch.column.stats controls whether the column stats for a table 
 are fetched during explain (in Tez: during query planning). On my setup (1 
 table 4000 partitions, 24 columns) the time spent in semantic analyze goes 
 from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent 
 fetching column stats...
 The reason is probably that the APIs force you to make separate metastore 
 calls for each column in each partition. That's probably the first thing that 
 has to change. The question is if in addition to that we need to cache this 
 in the client or store the stats as a single blob in the database to further 
 cut down on the time. However, the way it stands right now column stats seem 
 unusable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour

2014-01-28 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6157:
---

Status: Open  (was: Patch Available)

 Fetching column stats slower than the 101 during rush hour
 --

 Key: HIVE-6157
 URL: https://issues.apache.org/jira/browse/HIVE-6157
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Gunther Hagleitner
Assignee: Sergey Shelukhin
 Attachments: HIVE-6157.01.patch, HIVE-6157.01.patch, 
 HIVE-6157.03.patch, HIVE-6157.03.patch, HIVE-6157.nogen.patch, 
 HIVE-6157.nogen.patch, HIVE-6157.prelim.patch


 hive.stats.fetch.column.stats controls whether the column stats for a table 
 are fetched during explain (in Tez: during query planning). On my setup (1 
 table 4000 partitions, 24 columns) the time spent in semantic analyze goes 
 from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent 
 fetching column stats...
 The reason is probably that the APIs force you to make separate metastore 
 calls for each column in each partition. That's probably the first thing that 
 has to change. The question is if in addition to that we need to cache this 
 in the client or store the stats as a single blob in the database to further 
 cut down on the time. However, the way it stands right now column stats seem 
 unusable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour

2014-01-28 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6157:
---

Status: Patch Available  (was: Open)

 Fetching column stats slower than the 101 during rush hour
 --

 Key: HIVE-6157
 URL: https://issues.apache.org/jira/browse/HIVE-6157
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Gunther Hagleitner
Assignee: Sergey Shelukhin
 Attachments: HIVE-6157.01.patch, HIVE-6157.01.patch, 
 HIVE-6157.03.patch, HIVE-6157.03.patch, HIVE-6157.nogen.patch, 
 HIVE-6157.nogen.patch, HIVE-6157.prelim.patch


 hive.stats.fetch.column.stats controls whether the column stats for a table 
 are fetched during explain (in Tez: during query planning). On my setup (1 
 table 4000 partitions, 24 columns) the time spent in semantic analyze goes 
 from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent 
 fetching column stats...
 The reason is probably that the APIs force you to make separate metastore 
 calls for each column in each partition. That's probably the first thing that 
 has to change. The question is if in addition to that we need to cache this 
 in the client or store the stats as a single blob in the database to further 
 cut down on the time. However, the way it stands right now column stats seem 
 unusable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour

2014-01-24 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6157:
---

Status: Open  (was: Patch Available)

 Fetching column stats slower than the 101 during rush hour
 --

 Key: HIVE-6157
 URL: https://issues.apache.org/jira/browse/HIVE-6157
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Gunther Hagleitner
Assignee: Sergey Shelukhin
 Attachments: HIVE-6157.01.patch, HIVE-6157.01.patch, 
 HIVE-6157.03.patch, HIVE-6157.nogen.patch, HIVE-6157.nogen.patch, 
 HIVE-6157.prelim.patch


 hive.stats.fetch.column.stats controls whether the column stats for a table 
 are fetched during explain (in Tez: during query planning). On my setup (1 
 table 4000 partitions, 24 columns) the time spent in semantic analyze goes 
 from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent 
 fetching column stats...
 The reason is probably that the APIs force you to make separate metastore 
 calls for each column in each partition. That's probably the first thing that 
 has to change. The question is if in addition to that we need to cache this 
 in the client or store the stats as a single blob in the database to further 
 cut down on the time. However, the way it stands right now column stats seem 
 unusable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour

2014-01-24 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6157:
---

Attachment: HIVE-6157.03.patch

exact same patch, HiveQA won't run

 Fetching column stats slower than the 101 during rush hour
 --

 Key: HIVE-6157
 URL: https://issues.apache.org/jira/browse/HIVE-6157
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Gunther Hagleitner
Assignee: Sergey Shelukhin
 Attachments: HIVE-6157.01.patch, HIVE-6157.01.patch, 
 HIVE-6157.03.patch, HIVE-6157.03.patch, HIVE-6157.nogen.patch, 
 HIVE-6157.nogen.patch, HIVE-6157.prelim.patch


 hive.stats.fetch.column.stats controls whether the column stats for a table 
 are fetched during explain (in Tez: during query planning). On my setup (1 
 table 4000 partitions, 24 columns) the time spent in semantic analyze goes 
 from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent 
 fetching column stats...
 The reason is probably that the APIs force you to make separate metastore 
 calls for each column in each partition. That's probably the first thing that 
 has to change. The question is if in addition to that we need to cache this 
 in the client or store the stats as a single blob in the database to further 
 cut down on the time. However, the way it stands right now column stats seem 
 unusable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour

2014-01-24 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6157:
---

Status: Patch Available  (was: Open)

 Fetching column stats slower than the 101 during rush hour
 --

 Key: HIVE-6157
 URL: https://issues.apache.org/jira/browse/HIVE-6157
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Gunther Hagleitner
Assignee: Sergey Shelukhin
 Attachments: HIVE-6157.01.patch, HIVE-6157.01.patch, 
 HIVE-6157.03.patch, HIVE-6157.03.patch, HIVE-6157.nogen.patch, 
 HIVE-6157.nogen.patch, HIVE-6157.prelim.patch


 hive.stats.fetch.column.stats controls whether the column stats for a table 
 are fetched during explain (in Tez: during query planning). On my setup (1 
 table 4000 partitions, 24 columns) the time spent in semantic analyze goes 
 from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent 
 fetching column stats...
 The reason is probably that the APIs force you to make separate metastore 
 calls for each column in each partition. That's probably the first thing that 
 has to change. The question is if in addition to that we need to cache this 
 in the client or store the stats as a single blob in the database to further 
 cut down on the time. However, the way it stands right now column stats seem 
 unusable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour

2014-01-23 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6157:
---

Attachment: HIVE-6157.03.patch
HIVE-6157.nogen.patch

RB feedback, also test fixes (02 fixed those but I couldn't add it to jira). I 
am running the tez test now

 Fetching column stats slower than the 101 during rush hour
 --

 Key: HIVE-6157
 URL: https://issues.apache.org/jira/browse/HIVE-6157
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Gunther Hagleitner
Assignee: Sergey Shelukhin
 Attachments: HIVE-6157.01.patch, HIVE-6157.01.patch, 
 HIVE-6157.03.patch, HIVE-6157.nogen.patch, HIVE-6157.nogen.patch, 
 HIVE-6157.prelim.patch


 hive.stats.fetch.column.stats controls whether the column stats for a table 
 are fetched during explain (in Tez: during query planning). On my setup (1 
 table 4000 partitions, 24 columns) the time spent in semantic analyze goes 
 from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent 
 fetching column stats...
 The reason is probably that the APIs force you to make separate metastore 
 calls for each column in each partition. That's probably the first thing that 
 has to change. The question is if in addition to that we need to cache this 
 in the client or store the stats as a single blob in the database to further 
 cut down on the time. However, the way it stands right now column stats seem 
 unusable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour

2014-01-21 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6157:
---

Attachment: HIVE-6157.01.patch
HIVE-6157.nogen.patch

first patch. There's one TODO# left where I think some validation code is dead, 
need to see if any tests fail with it.

Other than that many tests I ran passed, let's see what HiveQA says

 Fetching column stats slower than the 101 during rush hour
 --

 Key: HIVE-6157
 URL: https://issues.apache.org/jira/browse/HIVE-6157
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Gunther Hagleitner
Assignee: Sergey Shelukhin
 Attachments: HIVE-6157.01.patch, HIVE-6157.nogen.patch, 
 HIVE-6157.prelim.patch


 hive.stats.fetch.column.stats controls whether the column stats for a table 
 are fetched during explain (in Tez: during query planning). On my setup (1 
 table 4000 partitions, 24 columns) the time spent in semantic analyze goes 
 from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent 
 fetching column stats...
 The reason is probably that the APIs force you to make separate metastore 
 calls for each column in each partition. That's probably the first thing that 
 has to change. The question is if in addition to that we need to cache this 
 in the client or store the stats as a single blob in the database to further 
 cut down on the time. However, the way it stands right now column stats seem 
 unusable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour

2014-01-21 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6157:
---

Status: Open  (was: Patch Available)

 Fetching column stats slower than the 101 during rush hour
 --

 Key: HIVE-6157
 URL: https://issues.apache.org/jira/browse/HIVE-6157
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Gunther Hagleitner
Assignee: Sergey Shelukhin
 Attachments: HIVE-6157.01.patch, HIVE-6157.nogen.patch, 
 HIVE-6157.prelim.patch


 hive.stats.fetch.column.stats controls whether the column stats for a table 
 are fetched during explain (in Tez: during query planning). On my setup (1 
 table 4000 partitions, 24 columns) the time spent in semantic analyze goes 
 from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent 
 fetching column stats...
 The reason is probably that the APIs force you to make separate metastore 
 calls for each column in each partition. That's probably the first thing that 
 has to change. The question is if in addition to that we need to cache this 
 in the client or store the stats as a single blob in the database to further 
 cut down on the time. However, the way it stands right now column stats seem 
 unusable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour

2014-01-21 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6157:
---

Attachment: HIVE-6157.01.patch

HiveQA won't pick the patch; same file

 Fetching column stats slower than the 101 during rush hour
 --

 Key: HIVE-6157
 URL: https://issues.apache.org/jira/browse/HIVE-6157
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Gunther Hagleitner
Assignee: Sergey Shelukhin
 Attachments: HIVE-6157.01.patch, HIVE-6157.01.patch, 
 HIVE-6157.nogen.patch, HIVE-6157.prelim.patch


 hive.stats.fetch.column.stats controls whether the column stats for a table 
 are fetched during explain (in Tez: during query planning). On my setup (1 
 table 4000 partitions, 24 columns) the time spent in semantic analyze goes 
 from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent 
 fetching column stats...
 The reason is probably that the APIs force you to make separate metastore 
 calls for each column in each partition. That's probably the first thing that 
 has to change. The question is if in addition to that we need to cache this 
 in the client or store the stats as a single blob in the database to further 
 cut down on the time. However, the way it stands right now column stats seem 
 unusable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour

2014-01-21 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6157:
---

Status: Patch Available  (was: Open)

 Fetching column stats slower than the 101 during rush hour
 --

 Key: HIVE-6157
 URL: https://issues.apache.org/jira/browse/HIVE-6157
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Gunther Hagleitner
Assignee: Sergey Shelukhin
 Attachments: HIVE-6157.01.patch, HIVE-6157.01.patch, 
 HIVE-6157.nogen.patch, HIVE-6157.prelim.patch


 hive.stats.fetch.column.stats controls whether the column stats for a table 
 are fetched during explain (in Tez: during query planning). On my setup (1 
 table 4000 partitions, 24 columns) the time spent in semantic analyze goes 
 from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent 
 fetching column stats...
 The reason is probably that the APIs force you to make separate metastore 
 calls for each column in each partition. That's probably the first thing that 
 has to change. The question is if in addition to that we need to cache this 
 in the client or store the stats as a single blob in the database to further 
 cut down on the time. However, the way it stands right now column stats seem 
 unusable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour

2014-01-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6157:
---

Status: Patch Available  (was: Open)

 Fetching column stats slower than the 101 during rush hour
 --

 Key: HIVE-6157
 URL: https://issues.apache.org/jira/browse/HIVE-6157
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Gunther Hagleitner
Assignee: Sergey Shelukhin
 Attachments: HIVE-6157.prelim.patch


 hive.stats.fetch.column.stats controls whether the column stats for a table 
 are fetched during explain (in Tez: during query planning). On my setup (1 
 table 4000 partitions, 24 columns) the time spent in semantic analyze goes 
 from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent 
 fetching column stats...
 The reason is probably that the APIs force you to make separate metastore 
 calls for each column in each partition. That's probably the first thing that 
 has to change. The question is if in addition to that we need to cache this 
 in the client or store the stats as a single blob in the database to further 
 cut down on the time. However, the way it stands right now column stats seem 
 unusable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour

2014-01-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6157:
---

Attachment: HIVE-6157.prelim.patch

Well, it looks like I cannot defeat datanucleus today... SQL path seems to 
work, although I didn't run all the tests. Let me comment out and check for now.

 Fetching column stats slower than the 101 during rush hour
 --

 Key: HIVE-6157
 URL: https://issues.apache.org/jira/browse/HIVE-6157
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Gunther Hagleitner
Assignee: Sergey Shelukhin
 Attachments: HIVE-6157.prelim.patch


 hive.stats.fetch.column.stats controls whether the column stats for a table 
 are fetched during explain (in Tez: during query planning). On my setup (1 
 table 4000 partitions, 24 columns) the time spent in semantic analyze goes 
 from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent 
 fetching column stats...
 The reason is probably that the APIs force you to make separate metastore 
 calls for each column in each partition. That's probably the first thing that 
 has to change. The question is if in addition to that we need to cache this 
 in the client or store the stats as a single blob in the database to further 
 cut down on the time. However, the way it stands right now column stats seem 
 unusable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)