[jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour
[ https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6157: --- Fix Version/s: 0.13.0 Fetching column stats slower than the 101 during rush hour -- Key: HIVE-6157 URL: https://issues.apache.org/jira/browse/HIVE-6157 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Gunther Hagleitner Assignee: Sergey Shelukhin Fix For: 0.13.0 Attachments: HIVE-6157.01.patch, HIVE-6157.01.patch, HIVE-6157.03.patch, HIVE-6157.03.patch, HIVE-6157.nogen.patch, HIVE-6157.nogen.patch, HIVE-6157.prelim.patch hive.stats.fetch.column.stats controls whether the column stats for a table are fetched during explain (in Tez: during query planning). On my setup (1 table 4000 partitions, 24 columns) the time spent in semantic analyze goes from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent fetching column stats... The reason is probably that the APIs force you to make separate metastore calls for each column in each partition. That's probably the first thing that has to change. The question is if in addition to that we need to cache this in the client or store the stats as a single blob in the database to further cut down on the time. However, the way it stands right now column stats seem unusable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour
[ https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6157: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. Thanks Sergey! Fetching column stats slower than the 101 during rush hour -- Key: HIVE-6157 URL: https://issues.apache.org/jira/browse/HIVE-6157 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Gunther Hagleitner Assignee: Sergey Shelukhin Attachments: HIVE-6157.01.patch, HIVE-6157.01.patch, HIVE-6157.03.patch, HIVE-6157.03.patch, HIVE-6157.nogen.patch, HIVE-6157.nogen.patch, HIVE-6157.prelim.patch hive.stats.fetch.column.stats controls whether the column stats for a table are fetched during explain (in Tez: during query planning). On my setup (1 table 4000 partitions, 24 columns) the time spent in semantic analyze goes from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent fetching column stats... The reason is probably that the APIs force you to make separate metastore calls for each column in each partition. That's probably the first thing that has to change. The question is if in addition to that we need to cache this in the client or store the stats as a single blob in the database to further cut down on the time. However, the way it stands right now column stats seem unusable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour
[ https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6157: --- Status: Open (was: Patch Available) Fetching column stats slower than the 101 during rush hour -- Key: HIVE-6157 URL: https://issues.apache.org/jira/browse/HIVE-6157 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Gunther Hagleitner Assignee: Sergey Shelukhin Attachments: HIVE-6157.01.patch, HIVE-6157.01.patch, HIVE-6157.03.patch, HIVE-6157.03.patch, HIVE-6157.nogen.patch, HIVE-6157.nogen.patch, HIVE-6157.prelim.patch hive.stats.fetch.column.stats controls whether the column stats for a table are fetched during explain (in Tez: during query planning). On my setup (1 table 4000 partitions, 24 columns) the time spent in semantic analyze goes from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent fetching column stats... The reason is probably that the APIs force you to make separate metastore calls for each column in each partition. That's probably the first thing that has to change. The question is if in addition to that we need to cache this in the client or store the stats as a single blob in the database to further cut down on the time. However, the way it stands right now column stats seem unusable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour
[ https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6157: --- Status: Patch Available (was: Open) Fetching column stats slower than the 101 during rush hour -- Key: HIVE-6157 URL: https://issues.apache.org/jira/browse/HIVE-6157 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Gunther Hagleitner Assignee: Sergey Shelukhin Attachments: HIVE-6157.01.patch, HIVE-6157.01.patch, HIVE-6157.03.patch, HIVE-6157.03.patch, HIVE-6157.nogen.patch, HIVE-6157.nogen.patch, HIVE-6157.prelim.patch hive.stats.fetch.column.stats controls whether the column stats for a table are fetched during explain (in Tez: during query planning). On my setup (1 table 4000 partitions, 24 columns) the time spent in semantic analyze goes from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent fetching column stats... The reason is probably that the APIs force you to make separate metastore calls for each column in each partition. That's probably the first thing that has to change. The question is if in addition to that we need to cache this in the client or store the stats as a single blob in the database to further cut down on the time. However, the way it stands right now column stats seem unusable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour
[ https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6157: --- Status: Open (was: Patch Available) Fetching column stats slower than the 101 during rush hour -- Key: HIVE-6157 URL: https://issues.apache.org/jira/browse/HIVE-6157 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Gunther Hagleitner Assignee: Sergey Shelukhin Attachments: HIVE-6157.01.patch, HIVE-6157.01.patch, HIVE-6157.03.patch, HIVE-6157.nogen.patch, HIVE-6157.nogen.patch, HIVE-6157.prelim.patch hive.stats.fetch.column.stats controls whether the column stats for a table are fetched during explain (in Tez: during query planning). On my setup (1 table 4000 partitions, 24 columns) the time spent in semantic analyze goes from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent fetching column stats... The reason is probably that the APIs force you to make separate metastore calls for each column in each partition. That's probably the first thing that has to change. The question is if in addition to that we need to cache this in the client or store the stats as a single blob in the database to further cut down on the time. However, the way it stands right now column stats seem unusable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour
[ https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6157: --- Attachment: HIVE-6157.03.patch exact same patch, HiveQA won't run Fetching column stats slower than the 101 during rush hour -- Key: HIVE-6157 URL: https://issues.apache.org/jira/browse/HIVE-6157 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Gunther Hagleitner Assignee: Sergey Shelukhin Attachments: HIVE-6157.01.patch, HIVE-6157.01.patch, HIVE-6157.03.patch, HIVE-6157.03.patch, HIVE-6157.nogen.patch, HIVE-6157.nogen.patch, HIVE-6157.prelim.patch hive.stats.fetch.column.stats controls whether the column stats for a table are fetched during explain (in Tez: during query planning). On my setup (1 table 4000 partitions, 24 columns) the time spent in semantic analyze goes from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent fetching column stats... The reason is probably that the APIs force you to make separate metastore calls for each column in each partition. That's probably the first thing that has to change. The question is if in addition to that we need to cache this in the client or store the stats as a single blob in the database to further cut down on the time. However, the way it stands right now column stats seem unusable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour
[ https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6157: --- Status: Patch Available (was: Open) Fetching column stats slower than the 101 during rush hour -- Key: HIVE-6157 URL: https://issues.apache.org/jira/browse/HIVE-6157 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Gunther Hagleitner Assignee: Sergey Shelukhin Attachments: HIVE-6157.01.patch, HIVE-6157.01.patch, HIVE-6157.03.patch, HIVE-6157.03.patch, HIVE-6157.nogen.patch, HIVE-6157.nogen.patch, HIVE-6157.prelim.patch hive.stats.fetch.column.stats controls whether the column stats for a table are fetched during explain (in Tez: during query planning). On my setup (1 table 4000 partitions, 24 columns) the time spent in semantic analyze goes from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent fetching column stats... The reason is probably that the APIs force you to make separate metastore calls for each column in each partition. That's probably the first thing that has to change. The question is if in addition to that we need to cache this in the client or store the stats as a single blob in the database to further cut down on the time. However, the way it stands right now column stats seem unusable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour
[ https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6157: --- Attachment: HIVE-6157.03.patch HIVE-6157.nogen.patch RB feedback, also test fixes (02 fixed those but I couldn't add it to jira). I am running the tez test now Fetching column stats slower than the 101 during rush hour -- Key: HIVE-6157 URL: https://issues.apache.org/jira/browse/HIVE-6157 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Gunther Hagleitner Assignee: Sergey Shelukhin Attachments: HIVE-6157.01.patch, HIVE-6157.01.patch, HIVE-6157.03.patch, HIVE-6157.nogen.patch, HIVE-6157.nogen.patch, HIVE-6157.prelim.patch hive.stats.fetch.column.stats controls whether the column stats for a table are fetched during explain (in Tez: during query planning). On my setup (1 table 4000 partitions, 24 columns) the time spent in semantic analyze goes from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent fetching column stats... The reason is probably that the APIs force you to make separate metastore calls for each column in each partition. That's probably the first thing that has to change. The question is if in addition to that we need to cache this in the client or store the stats as a single blob in the database to further cut down on the time. However, the way it stands right now column stats seem unusable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour
[ https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6157: --- Attachment: HIVE-6157.01.patch HIVE-6157.nogen.patch first patch. There's one TODO# left where I think some validation code is dead, need to see if any tests fail with it. Other than that many tests I ran passed, let's see what HiveQA says Fetching column stats slower than the 101 during rush hour -- Key: HIVE-6157 URL: https://issues.apache.org/jira/browse/HIVE-6157 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Gunther Hagleitner Assignee: Sergey Shelukhin Attachments: HIVE-6157.01.patch, HIVE-6157.nogen.patch, HIVE-6157.prelim.patch hive.stats.fetch.column.stats controls whether the column stats for a table are fetched during explain (in Tez: during query planning). On my setup (1 table 4000 partitions, 24 columns) the time spent in semantic analyze goes from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent fetching column stats... The reason is probably that the APIs force you to make separate metastore calls for each column in each partition. That's probably the first thing that has to change. The question is if in addition to that we need to cache this in the client or store the stats as a single blob in the database to further cut down on the time. However, the way it stands right now column stats seem unusable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour
[ https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6157: --- Status: Open (was: Patch Available) Fetching column stats slower than the 101 during rush hour -- Key: HIVE-6157 URL: https://issues.apache.org/jira/browse/HIVE-6157 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Gunther Hagleitner Assignee: Sergey Shelukhin Attachments: HIVE-6157.01.patch, HIVE-6157.nogen.patch, HIVE-6157.prelim.patch hive.stats.fetch.column.stats controls whether the column stats for a table are fetched during explain (in Tez: during query planning). On my setup (1 table 4000 partitions, 24 columns) the time spent in semantic analyze goes from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent fetching column stats... The reason is probably that the APIs force you to make separate metastore calls for each column in each partition. That's probably the first thing that has to change. The question is if in addition to that we need to cache this in the client or store the stats as a single blob in the database to further cut down on the time. However, the way it stands right now column stats seem unusable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour
[ https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6157: --- Attachment: HIVE-6157.01.patch HiveQA won't pick the patch; same file Fetching column stats slower than the 101 during rush hour -- Key: HIVE-6157 URL: https://issues.apache.org/jira/browse/HIVE-6157 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Gunther Hagleitner Assignee: Sergey Shelukhin Attachments: HIVE-6157.01.patch, HIVE-6157.01.patch, HIVE-6157.nogen.patch, HIVE-6157.prelim.patch hive.stats.fetch.column.stats controls whether the column stats for a table are fetched during explain (in Tez: during query planning). On my setup (1 table 4000 partitions, 24 columns) the time spent in semantic analyze goes from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent fetching column stats... The reason is probably that the APIs force you to make separate metastore calls for each column in each partition. That's probably the first thing that has to change. The question is if in addition to that we need to cache this in the client or store the stats as a single blob in the database to further cut down on the time. However, the way it stands right now column stats seem unusable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour
[ https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6157: --- Status: Patch Available (was: Open) Fetching column stats slower than the 101 during rush hour -- Key: HIVE-6157 URL: https://issues.apache.org/jira/browse/HIVE-6157 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Gunther Hagleitner Assignee: Sergey Shelukhin Attachments: HIVE-6157.01.patch, HIVE-6157.01.patch, HIVE-6157.nogen.patch, HIVE-6157.prelim.patch hive.stats.fetch.column.stats controls whether the column stats for a table are fetched during explain (in Tez: during query planning). On my setup (1 table 4000 partitions, 24 columns) the time spent in semantic analyze goes from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent fetching column stats... The reason is probably that the APIs force you to make separate metastore calls for each column in each partition. That's probably the first thing that has to change. The question is if in addition to that we need to cache this in the client or store the stats as a single blob in the database to further cut down on the time. However, the way it stands right now column stats seem unusable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour
[ https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6157: --- Status: Patch Available (was: Open) Fetching column stats slower than the 101 during rush hour -- Key: HIVE-6157 URL: https://issues.apache.org/jira/browse/HIVE-6157 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Gunther Hagleitner Assignee: Sergey Shelukhin Attachments: HIVE-6157.prelim.patch hive.stats.fetch.column.stats controls whether the column stats for a table are fetched during explain (in Tez: during query planning). On my setup (1 table 4000 partitions, 24 columns) the time spent in semantic analyze goes from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent fetching column stats... The reason is probably that the APIs force you to make separate metastore calls for each column in each partition. That's probably the first thing that has to change. The question is if in addition to that we need to cache this in the client or store the stats as a single blob in the database to further cut down on the time. However, the way it stands right now column stats seem unusable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour
[ https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6157: --- Attachment: HIVE-6157.prelim.patch Well, it looks like I cannot defeat datanucleus today... SQL path seems to work, although I didn't run all the tests. Let me comment out and check for now. Fetching column stats slower than the 101 during rush hour -- Key: HIVE-6157 URL: https://issues.apache.org/jira/browse/HIVE-6157 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Gunther Hagleitner Assignee: Sergey Shelukhin Attachments: HIVE-6157.prelim.patch hive.stats.fetch.column.stats controls whether the column stats for a table are fetched during explain (in Tez: during query planning). On my setup (1 table 4000 partitions, 24 columns) the time spent in semantic analyze goes from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent fetching column stats... The reason is probably that the APIs force you to make separate metastore calls for each column in each partition. That's probably the first thing that has to change. The question is if in addition to that we need to cache this in the client or store the stats as a single blob in the database to further cut down on the time. However, the way it stands right now column stats seem unusable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)