[jira] [Commented] (DRILL-4826) Query against INFORMATION_SCHEMA.TABLES degrades as the number of views increases

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613960#comment-15613960
 ] 

ASF GitHub Bot commented on DRILL-4826:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/592


> Query against INFORMATION_SCHEMA.TABLES degrades as the number of views 
> increases
> -
>
> Key: DRILL-4826
> URL: https://issues.apache.org/jira/browse/DRILL-4826
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Padma Penumarthy
>
> Queries against INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.VIEWS slow 
> down as the number of views increases. 
> BI tools like Tableau issue a query like the following at connection time:
> {code}
> select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from 
> INFORMATION_SCHEMA.`TABLES` WHERE TABLE_CATALOG LIKE 'DRILL' ESCAPE '\' AND 
> TABLE_SCHEMA <> 'sys' AND TABLE_SCHEMA <> 'INFORMATION_SCHEMA'ORDER BY 
> TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME
> {code}
> The time to query the information schema tables degrades as the number of 
> views increases. On a test system:
> || Views || Time(secs) ||
> |500 | 6 |
> |1000 | 19 |
> |1500 | 33 |
> This can result in a single connection taking more than a minute to establish.
> The problem occurs because we read the view file for every view and this 
> appears to take most of the time.
> Querying information_schema.tables does not, in fact, need to open the view 
> file at all, it merely needs to get a listing of the view files. Eliminating 
> the view file read will speed up the query tremendously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4826) Query against INFORMATION_SCHEMA.TABLES degrades as the number of views increases

2016-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606077#comment-15606077
 ] 

ASF GitHub Bot commented on DRILL-4826:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/592#discussion_r84967719
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/DrillHiveMetaStoreClient.java
 ---
@@ -233,6 +236,30 @@ private DrillHiveMetaStoreClient(final HiveConf 
hiveConf) throws MetaException {
 }
   }
 
+  public static List getTablesByNamesByBulkLoadHelper(
+  final HiveMetaStoreClient mClient, final List tableNames, 
final String schemaName,
+  final int bulkSize) {
+final int totalTables = tableNames.size();
+final List tables = 
Lists.newArrayList();
+
+// In each round, Drill asks for a sub-list of all the requested tables
+for (int fromIndex = 0; fromIndex < totalTables; fromIndex += 
bulkSize) {
+  final int toIndex = Math.min(fromIndex + bulkSize, totalTables);
+  final List eachBulkofTableNames = 
tableNames.subList(fromIndex, toIndex);
+  List eachBulkofTables;
+  // Retries once if the first call to fetch the metadata fails
+  try {
+eachBulkofTables =
--- End diff --

`eachBulkofTables = getTableObjectsByNameHelper(mClient, schemaName, 
eachBulkofTableNames);`


> Query against INFORMATION_SCHEMA.TABLES degrades as the number of views 
> increases
> -
>
> Key: DRILL-4826
> URL: https://issues.apache.org/jira/browse/DRILL-4826
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Padma Penumarthy
>
> Queries against INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.VIEWS slow 
> down as the number of views increases. 
> BI tools like Tableau issue a query like the following at connection time:
> {code}
> select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from 
> INFORMATION_SCHEMA.`TABLES` WHERE TABLE_CATALOG LIKE 'DRILL' ESCAPE '\' AND 
> TABLE_SCHEMA <> 'sys' AND TABLE_SCHEMA <> 'INFORMATION_SCHEMA'ORDER BY 
> TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME
> {code}
> The time to query the information schema tables degrades as the number of 
> views increases. On a test system:
> || Views || Time(secs) ||
> |500 | 6 |
> |1000 | 19 |
> |1500 | 33 |
> This can result in a single connection taking more than a minute to establish.
> The problem occurs because we read the view file for every view and this 
> appears to take most of the time.
> Querying information_schema.tables does not, in fact, need to open the view 
> file at all, it merely needs to get a listing of the view files. Eliminating 
> the view file read will speed up the query tremendously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4826) Query against INFORMATION_SCHEMA.TABLES degrades as the number of views increases

2016-10-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15573421#comment-15573421
 ] 

ASF GitHub Bot commented on DRILL-4826:
---

Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/592#discussion_r83324591
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java
 ---
@@ -78,32 +79,49 @@ public String getTypeName() {
   }
 
   @Override
-  public List> 
getTablesByNamesByBulkLoad(final List tableNames) {
+  public List> 
getTablesByNamesByBulkLoad(final List tableNames,
+  final int bulkSize) {
+final int totalTables = tableNames.size();
 final String schemaName = getName();
-final List> tableNameToTable = 
Lists.newArrayList();
-List tables;
-try {
-  tables = 
DrillHiveMetaStoreClient.getTableObjectsByNameHelper(mClient, schemaName, 
tableNames);
-} catch (TException e) {
-  logger.warn("Exception occurred while trying to list tables by names 
from {}: {}", schemaName, e.getCause());
-  return tableNameToTable;
+final List tables = 
Lists.newArrayList();
+
+// In each round, Drill asks for a sub-list of all the requested tables
+for (int fromIndex = 0; fromIndex < totalTables; fromIndex += 
bulkSize) {
+  final int toIndex = Math.min(fromIndex + bulkSize, totalTables);
+  final List eachBulkofTableNames = 
tableNames.subList(fromIndex, toIndex);
+  List eachBulkofTables;
+  // Retries once if the first call to fetch the metadata fails
+  synchronized (mClient) {
+try {
+  eachBulkofTables = mClient.getTableObjectsByName(schemaName, 
eachBulkofTableNames);
+} catch (TException tException) {
+  try {
+mClient.reconnect();
+eachBulkofTables = mClient.getTableObjectsByName(schemaName, 
eachBulkofTableNames);
+  } catch (Exception e) {
+logger.warn("Exception occurred while trying to read tables 
from {}: {}", schemaName,
+e.getCause());
+return ImmutableList.of();
+  }
+}
+tables.addAll(eachBulkofTables);
+  }
 }
 
-for(final org.apache.hadoop.hive.metastore.api.Table table : tables) {
-  if(table == null) {
+final List> tableNameToTable = 
Lists.newArrayList();
+for (final org.apache.hadoop.hive.metastore.api.Table table : tables) {
+  if (table == null) {
--- End diff --

can this table be null ? 


> Query against INFORMATION_SCHEMA.TABLES degrades as the number of views 
> increases
> -
>
> Key: DRILL-4826
> URL: https://issues.apache.org/jira/browse/DRILL-4826
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Padma Penumarthy
>
> Queries against INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.VIEWS slow 
> down as the number of views increases. 
> BI tools like Tableau issue a query like the following at connection time:
> {code}
> select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from 
> INFORMATION_SCHEMA.`TABLES` WHERE TABLE_CATALOG LIKE 'DRILL' ESCAPE '\' AND 
> TABLE_SCHEMA <> 'sys' AND TABLE_SCHEMA <> 'INFORMATION_SCHEMA'ORDER BY 
> TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME
> {code}
> The time to query the information schema tables degrades as the number of 
> views increases. On a test system:
> || Views || Time(secs) ||
> |500 | 6 |
> |1000 | 19 |
> |1500 | 33 |
> This can result in a single connection taking more than a minute to establish.
> The problem occurs because we read the view file for every view and this 
> appears to take most of the time.
> Querying information_schema.tables does not, in fact, need to open the view 
> file at all, it merely needs to get a listing of the view files. Eliminating 
> the view file read will speed up the query tremendously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4826) Query against INFORMATION_SCHEMA.TABLES degrades as the number of views increases

2016-10-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15573420#comment-15573420
 ] 

ASF GitHub Bot commented on DRILL-4826:
---

Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/592#discussion_r83324461
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaRecordGenerator.java
 ---
@@ -290,28 +290,30 @@ public Tables(OptionManager optionManager) {
   return new PojoRecordReader<>(Records.Table.class, 
records.iterator());
 }
 
-@Override
-public void visitTables(String schemaPath, SchemaPlus schema) {
+@Override public void visitTables(String schemaPath, SchemaPlus 
schema) {
--- End diff --

why this change ?


> Query against INFORMATION_SCHEMA.TABLES degrades as the number of views 
> increases
> -
>
> Key: DRILL-4826
> URL: https://issues.apache.org/jira/browse/DRILL-4826
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Padma Penumarthy
>
> Queries against INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.VIEWS slow 
> down as the number of views increases. 
> BI tools like Tableau issue a query like the following at connection time:
> {code}
> select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from 
> INFORMATION_SCHEMA.`TABLES` WHERE TABLE_CATALOG LIKE 'DRILL' ESCAPE '\' AND 
> TABLE_SCHEMA <> 'sys' AND TABLE_SCHEMA <> 'INFORMATION_SCHEMA'ORDER BY 
> TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME
> {code}
> The time to query the information schema tables degrades as the number of 
> views increases. On a test system:
> || Views || Time(secs) ||
> |500 | 6 |
> |1000 | 19 |
> |1500 | 33 |
> This can result in a single connection taking more than a minute to establish.
> The problem occurs because we read the view file for every view and this 
> appears to take most of the time.
> Querying information_schema.tables does not, in fact, need to open the view 
> file at all, it merely needs to get a listing of the view files. Eliminating 
> the view file read will speed up the query tremendously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4826) Query against INFORMATION_SCHEMA.TABLES degrades as the number of views increases

2016-10-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15573419#comment-15573419
 ] 

ASF GitHub Bot commented on DRILL-4826:
---

Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/592#discussion_r83323951
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/AbstractSchema.java ---
@@ -231,4 +231,21 @@ public void dropTable(String tableName) {
 }
 return tables;
   }
-}
\ No newline at end of file
+
+  public List> 
getTableNamesAndTypes(boolean bulkLoad, int bulkSize) {
+final List tableNames = Lists.newArrayList(getTableNames());
+final List> tableNamesAndTypes = 
Lists.newArrayList();
+final List> tables;
+if (bulkLoad) {
+  tables = getTablesByNamesByBulkLoad(tableNames, bulkSize);
--- End diff --

why do we even have this option to do bulkLoad or not ? why not just do 
bulkLoad always ?


> Query against INFORMATION_SCHEMA.TABLES degrades as the number of views 
> increases
> -
>
> Key: DRILL-4826
> URL: https://issues.apache.org/jira/browse/DRILL-4826
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Padma Penumarthy
>
> Queries against INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.VIEWS slow 
> down as the number of views increases. 
> BI tools like Tableau issue a query like the following at connection time:
> {code}
> select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from 
> INFORMATION_SCHEMA.`TABLES` WHERE TABLE_CATALOG LIKE 'DRILL' ESCAPE '\' AND 
> TABLE_SCHEMA <> 'sys' AND TABLE_SCHEMA <> 'INFORMATION_SCHEMA'ORDER BY 
> TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME
> {code}
> The time to query the information schema tables degrades as the number of 
> views increases. On a test system:
> || Views || Time(secs) ||
> |500 | 6 |
> |1000 | 19 |
> |1500 | 33 |
> This can result in a single connection taking more than a minute to establish.
> The problem occurs because we read the view file for every view and this 
> appears to take most of the time.
> Querying information_schema.tables does not, in fact, need to open the view 
> file at all, it merely needs to get a listing of the view files. Eliminating 
> the view file read will speed up the query tremendously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4826) Query against INFORMATION_SCHEMA.TABLES degrades as the number of views increases

2016-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15508098#comment-15508098
 ] 

ASF GitHub Bot commented on DRILL-4826:
---

Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/592
  
Updated to address review comments


> Query against INFORMATION_SCHEMA.TABLES degrades as the number of views 
> increases
> -
>
> Key: DRILL-4826
> URL: https://issues.apache.org/jira/browse/DRILL-4826
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>
> Queries against INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.VIEWS slow 
> down as the number of views increases. 
> BI tools like Tableau issue a query like the following at connection time:
> {code}
> select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from 
> INFORMATION_SCHEMA.`TABLES` WHERE TABLE_CATALOG LIKE 'DRILL' ESCAPE '\' AND 
> TABLE_SCHEMA <> 'sys' AND TABLE_SCHEMA <> 'INFORMATION_SCHEMA'ORDER BY 
> TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME
> {code}
> The time to query the information schema tables degrades as the number of 
> views increases. On a test system:
> || Views || Time(secs) ||
> |500 | 6 |
> |1000 | 19 |
> |1500 | 33 |
> This can result in a single connection taking more than a minute to establish.
> The problem occurs because we read the view file for every view and this 
> appears to take most of the time.
> Querying information_schema.tables does not, in fact, need to open the view 
> file at all, it merely needs to get a listing of the view files. Eliminating 
> the view file read will speed up the query tremendously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4826) Query against INFORMATION_SCHEMA.TABLES degrades as the number of views increases

2016-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507922#comment-15507922
 ] 

ASF GitHub Bot commented on DRILL-4826:
---

Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/592#discussion_r79721752
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java
 ---
@@ -78,17 +79,34 @@ public String getTypeName() {
   }
 
   @Override
-  public List> 
getTablesByNamesByBulkLoad(final List tableNames) {
+  public List> 
getTablesByNamesByBulkLoad(final List tableNames, final int bulkSize) {
+final int totalTables = tableNames.size();
 final String schemaName = getName();
-final List> tableNameToTable = 
Lists.newArrayList();
-List tables;
-try {
-  tables = 
DrillHiveMetaStoreClient.getTableObjectsByNameHelper(mClient, schemaName, 
tableNames);
-} catch (TException e) {
-  logger.warn("Exception occurred while trying to list tables by names 
from {}: {}", schemaName, e.getCause());
-  return tableNameToTable;
+final List tables = 
Lists.newArrayList();
+
+// In each round, Drill asks for a sub-list of all the requested tables
+for(int fromIndex = 0; fromIndex < totalTables; fromIndex += bulkSize) 
{
--- End diff --

Where?


> Query against INFORMATION_SCHEMA.TABLES degrades as the number of views 
> increases
> -
>
> Key: DRILL-4826
> URL: https://issues.apache.org/jira/browse/DRILL-4826
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>
> Queries against INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.VIEWS slow 
> down as the number of views increases. 
> BI tools like Tableau issue a query like the following at connection time:
> {code}
> select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from 
> INFORMATION_SCHEMA.`TABLES` WHERE TABLE_CATALOG LIKE 'DRILL' ESCAPE '\' AND 
> TABLE_SCHEMA <> 'sys' AND TABLE_SCHEMA <> 'INFORMATION_SCHEMA'ORDER BY 
> TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME
> {code}
> The time to query the information schema tables degrades as the number of 
> views increases. On a test system:
> || Views || Time(secs) ||
> |500 | 6 |
> |1000 | 19 |
> |1500 | 33 |
> This can result in a single connection taking more than a minute to establish.
> The problem occurs because we read the view file for every view and this 
> appears to take most of the time.
> Querying information_schema.tables does not, in fact, need to open the view 
> file at all, it merely needs to get a listing of the view files. Eliminating 
> the view file read will speed up the query tremendously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4826) Query against INFORMATION_SCHEMA.TABLES degrades as the number of views increases

2016-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507918#comment-15507918
 ] 

ASF GitHub Bot commented on DRILL-4826:
---

Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/592#discussion_r79721387
  
--- Diff: 
exec/jdbc/src/test/java/org/apache/drill/jdbc/test/TestJdbcQuery.java ---
@@ -122,6 +122,7 @@ public void testLikeNotLike() throws Exception{
   );
   }
 
+  @Ignore("Returns results in different order depeding on forkCount")
--- End diff --

Or maybe I should just order the results?


> Query against INFORMATION_SCHEMA.TABLES degrades as the number of views 
> increases
> -
>
> Key: DRILL-4826
> URL: https://issues.apache.org/jira/browse/DRILL-4826
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>
> Queries against INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.VIEWS slow 
> down as the number of views increases. 
> BI tools like Tableau issue a query like the following at connection time:
> {code}
> select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from 
> INFORMATION_SCHEMA.`TABLES` WHERE TABLE_CATALOG LIKE 'DRILL' ESCAPE '\' AND 
> TABLE_SCHEMA <> 'sys' AND TABLE_SCHEMA <> 'INFORMATION_SCHEMA'ORDER BY 
> TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME
> {code}
> The time to query the information schema tables degrades as the number of 
> views increases. On a test system:
> || Views || Time(secs) ||
> |500 | 6 |
> |1000 | 19 |
> |1500 | 33 |
> This can result in a single connection taking more than a minute to establish.
> The problem occurs because we read the view file for every view and this 
> appears to take most of the time.
> Querying information_schema.tables does not, in fact, need to open the view 
> file at all, it merely needs to get a listing of the view files. Eliminating 
> the view file read will speed up the query tremendously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4826) Query against INFORMATION_SCHEMA.TABLES degrades as the number of views increases

2016-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507919#comment-15507919
 ] 

ASF GitHub Bot commented on DRILL-4826:
---

Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/592#discussion_r79721423
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java
 ---
@@ -78,17 +79,34 @@ public String getTypeName() {
   }
 
   @Override
-  public List> 
getTablesByNamesByBulkLoad(final List tableNames) {
+  public List> 
getTablesByNamesByBulkLoad(final List tableNames, final int bulkSize) {
+final int totalTables = tableNames.size();
 final String schemaName = getName();
-final List> tableNameToTable = 
Lists.newArrayList();
-List tables;
-try {
-  tables = 
DrillHiveMetaStoreClient.getTableObjectsByNameHelper(mClient, schemaName, 
tableNames);
-} catch (TException e) {
-  logger.warn("Exception occurred while trying to list tables by names 
from {}: {}", schemaName, e.getCause());
-  return tableNameToTable;
+final List tables = 
Lists.newArrayList();
+
+// In each round, Drill asks for a sub-list of all the requested tables
+for(int fromIndex = 0; fromIndex < totalTables; fromIndex += bulkSize) 
{
+  final int toIndex = Math.min(fromIndex + bulkSize, totalTables);
+  final List eachBulkofTableNames = 
tableNames.subList(fromIndex, toIndex);
+  List eachBulkofTables;
+  // Retries once if the first call to fetch the metadata fails
+  synchronized(mClient) {
--- End diff --

see previous comment


> Query against INFORMATION_SCHEMA.TABLES degrades as the number of views 
> increases
> -
>
> Key: DRILL-4826
> URL: https://issues.apache.org/jira/browse/DRILL-4826
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>
> Queries against INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.VIEWS slow 
> down as the number of views increases. 
> BI tools like Tableau issue a query like the following at connection time:
> {code}
> select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from 
> INFORMATION_SCHEMA.`TABLES` WHERE TABLE_CATALOG LIKE 'DRILL' ESCAPE '\' AND 
> TABLE_SCHEMA <> 'sys' AND TABLE_SCHEMA <> 'INFORMATION_SCHEMA'ORDER BY 
> TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME
> {code}
> The time to query the information schema tables degrades as the number of 
> views increases. On a test system:
> || Views || Time(secs) ||
> |500 | 6 |
> |1000 | 19 |
> |1500 | 33 |
> This can result in a single connection taking more than a minute to establish.
> The problem occurs because we read the view file for every view and this 
> appears to take most of the time.
> Querying information_schema.tables does not, in fact, need to open the view 
> file at all, it merely needs to get a listing of the view files. Eliminating 
> the view file read will speed up the query tremendously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4826) Query against INFORMATION_SCHEMA.TABLES degrades as the number of views increases

2016-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507911#comment-15507911
 ] 

ASF GitHub Bot commented on DRILL-4826:
---

Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/592#discussion_r79721140
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaRecordGenerator.java
 ---
@@ -290,28 +290,30 @@ public Tables(OptionManager optionManager) {
   return new PojoRecordReader<>(Records.Table.class, 
records.iterator());
 }
 
-@Override
-public void visitTables(String schemaPath, SchemaPlus schema) {
+@Override public void visitTables(String schemaPath, SchemaPlus 
schema) {
   final AbstractSchema drillSchema = 
schema.unwrap(AbstractSchema.class);
+  final List> tableNamesAndTypes = drillSchema
+  
.getTableNamesAndTypes(optionManager.getOption(ExecConstants.ENABLE_BULK_LOAD_TABLE_LIST),
+  
(int)optionManager.getOption(ExecConstants.BULK_LOAD_TABLE_LIST_BULK_SIZE));
 
-  final List tableNames = 
Lists.newArrayList(schema.getTableNames());
-  final List> tableNameToTables;
-  
if(optionManager.getOption(ExecConstants.ENABLE_BULK_LOAD_TABLE_LIST)) {
-tableNameToTables = 
drillSchema.getTablesByNamesByBulkLoad(tableNames);
-  } else {
-tableNameToTables = drillSchema.getTablesByNames(tableNames);
-  }
-
-  for(Pair tableNameToTable : 
tableNameToTables) {
-final String tableName = tableNameToTable.getKey();
-final Table table = tableNameToTable.getValue();
+  for (Pair tableNameAndType : tableNamesAndTypes) {
+final String tableName = tableNameAndType.getKey();
+final TableType tableType = tableNameAndType.getValue();
 // Visit the table, and if requested ...
-if(shouldVisitTable(schemaPath, tableName)) {
-  visitTable(schemaPath, tableName, table);
+if (shouldVisitTable(schemaPath, tableName)) {
+  visitTableWithType(schemaPath, tableName, tableType);
 }
   }
 }
 
+public boolean visitTableWithType(String schemaName, String tableName, 
TableType type) {
+  Preconditions
+  .checkNotNull(type, "Error. Type information for table %s.%s 
provided is null.", schemaName,
+  tableName);
+  records.add(new Records.Table(IS_CATALOG_NAME, schemaName, 
tableName, type.toString()));
+  return false;
--- End diff --

 to keep it similar to visitTable which does the same, 
unnecessarily. I suppose I could change it to return void.


> Query against INFORMATION_SCHEMA.TABLES degrades as the number of views 
> increases
> -
>
> Key: DRILL-4826
> URL: https://issues.apache.org/jira/browse/DRILL-4826
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>
> Queries against INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.VIEWS slow 
> down as the number of views increases. 
> BI tools like Tableau issue a query like the following at connection time:
> {code}
> select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from 
> INFORMATION_SCHEMA.`TABLES` WHERE TABLE_CATALOG LIKE 'DRILL' ESCAPE '\' AND 
> TABLE_SCHEMA <> 'sys' AND TABLE_SCHEMA <> 'INFORMATION_SCHEMA'ORDER BY 
> TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME
> {code}
> The time to query the information schema tables degrades as the number of 
> views increases. On a test system:
> || Views || Time(secs) ||
> |500 | 6 |
> |1000 | 19 |
> |1500 | 33 |
> This can result in a single connection taking more than a minute to establish.
> The problem occurs because we read the view file for every view and this 
> appears to take most of the time.
> Querying information_schema.tables does not, in fact, need to open the view 
> file at all, it merely needs to get a listing of the view files. Eliminating 
> the view file read will speed up the query tremendously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4826) Query against INFORMATION_SCHEMA.TABLES degrades as the number of views increases

2016-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507882#comment-15507882
 ] 

ASF GitHub Bot commented on DRILL-4826:
---

Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/592#discussion_r79719241
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -738,5 +740,49 @@ public void dropTable(String table) {
 .build(logger);
   }
 }
+
+@Override public List> 
getTableNamesAndTypes(boolean bulkLoad, int bulkSize) {
--- End diff --

IntelliJ keeps reformatting this to be on the same line ! Will fix.


> Query against INFORMATION_SCHEMA.TABLES degrades as the number of views 
> increases
> -
>
> Key: DRILL-4826
> URL: https://issues.apache.org/jira/browse/DRILL-4826
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>
> Queries against INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.VIEWS slow 
> down as the number of views increases. 
> BI tools like Tableau issue a query like the following at connection time:
> {code}
> select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from 
> INFORMATION_SCHEMA.`TABLES` WHERE TABLE_CATALOG LIKE 'DRILL' ESCAPE '\' AND 
> TABLE_SCHEMA <> 'sys' AND TABLE_SCHEMA <> 'INFORMATION_SCHEMA'ORDER BY 
> TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME
> {code}
> The time to query the information schema tables degrades as the number of 
> views increases. On a test system:
> || Views || Time(secs) ||
> |500 | 6 |
> |1000 | 19 |
> |1500 | 33 |
> This can result in a single connection taking more than a minute to establish.
> The problem occurs because we read the view file for every view and this 
> appears to take most of the time.
> Querying information_schema.tables does not, in fact, need to open the view 
> file at all, it merely needs to get a listing of the view files. Eliminating 
> the view file read will speed up the query tremendously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4826) Query against INFORMATION_SCHEMA.TABLES degrades as the number of views increases

2016-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507879#comment-15507879
 ] 

ASF GitHub Bot commented on DRILL-4826:
---

Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/592#discussion_r79719219
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java
 ---
@@ -78,17 +79,34 @@ public String getTypeName() {
   }
 
   @Override
-  public List> 
getTablesByNamesByBulkLoad(final List tableNames) {
+  public List> 
getTablesByNamesByBulkLoad(final List tableNames, final int bulkSize) {
+final int totalTables = tableNames.size();
 final String schemaName = getName();
-final List> tableNameToTable = 
Lists.newArrayList();
-List tables;
-try {
-  tables = 
DrillHiveMetaStoreClient.getTableObjectsByNameHelper(mClient, schemaName, 
tableNames);
-} catch (TException e) {
-  logger.warn("Exception occurred while trying to list tables by names 
from {}: {}", schemaName, e.getCause());
-  return tableNameToTable;
+final List tables = 
Lists.newArrayList();
+
+// In each round, Drill asks for a sub-list of all the requested tables
+for(int fromIndex = 0; fromIndex < totalTables; fromIndex += bulkSize) 
{
+  final int toIndex = Math.min(fromIndex + bulkSize, totalTables);
+  final List eachBulkofTableNames = 
tableNames.subList(fromIndex, toIndex);
+  List eachBulkofTables;
+  // Retries once if the first call to fetch the metadata fails
+  synchronized(mClient) {
--- End diff --

This is refactored code from the fix for DRILL-4577. 
(https://github.com/apache/drill/pull/461)
I didn't really change it.
Going thru the code, it appears that m_client may be cached and reused and 
so probably should be synchronized.


> Query against INFORMATION_SCHEMA.TABLES degrades as the number of views 
> increases
> -
>
> Key: DRILL-4826
> URL: https://issues.apache.org/jira/browse/DRILL-4826
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>
> Queries against INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.VIEWS slow 
> down as the number of views increases. 
> BI tools like Tableau issue a query like the following at connection time:
> {code}
> select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from 
> INFORMATION_SCHEMA.`TABLES` WHERE TABLE_CATALOG LIKE 'DRILL' ESCAPE '\' AND 
> TABLE_SCHEMA <> 'sys' AND TABLE_SCHEMA <> 'INFORMATION_SCHEMA'ORDER BY 
> TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME
> {code}
> The time to query the information schema tables degrades as the number of 
> views increases. On a test system:
> || Views || Time(secs) ||
> |500 | 6 |
> |1000 | 19 |
> |1500 | 33 |
> This can result in a single connection taking more than a minute to establish.
> The problem occurs because we read the view file for every view and this 
> appears to take most of the time.
> Querying information_schema.tables does not, in fact, need to open the view 
> file at all, it merely needs to get a listing of the view files. Eliminating 
> the view file read will speed up the query tremendously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4826) Query against INFORMATION_SCHEMA.TABLES degrades as the number of views increases

2016-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507515#comment-15507515
 ] 

ASF GitHub Bot commented on DRILL-4826:
---

Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/592#discussion_r79664175
  
--- Diff: 
exec/jdbc/src/test/java/org/apache/drill/jdbc/test/TestJdbcQuery.java ---
@@ -122,6 +122,7 @@ public void testLikeNotLike() throws Exception{
   );
   }
 
+  @Ignore("Returns results in different order depeding on forkCount")
--- End diff --

typo: depending


> Query against INFORMATION_SCHEMA.TABLES degrades as the number of views 
> increases
> -
>
> Key: DRILL-4826
> URL: https://issues.apache.org/jira/browse/DRILL-4826
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>
> Queries against INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.VIEWS slow 
> down as the number of views increases. 
> BI tools like Tableau issue a query like the following at connection time:
> {code}
> select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from 
> INFORMATION_SCHEMA.`TABLES` WHERE TABLE_CATALOG LIKE 'DRILL' ESCAPE '\' AND 
> TABLE_SCHEMA <> 'sys' AND TABLE_SCHEMA <> 'INFORMATION_SCHEMA'ORDER BY 
> TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME
> {code}
> The time to query the information schema tables degrades as the number of 
> views increases. On a test system:
> || Views || Time(secs) ||
> |500 | 6 |
> |1000 | 19 |
> |1500 | 33 |
> This can result in a single connection taking more than a minute to establish.
> The problem occurs because we read the view file for every view and this 
> appears to take most of the time.
> Querying information_schema.tables does not, in fact, need to open the view 
> file at all, it merely needs to get a listing of the view files. Eliminating 
> the view file read will speed up the query tremendously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4826) Query against INFORMATION_SCHEMA.TABLES degrades as the number of views increases

2016-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507513#comment-15507513
 ] 

ASF GitHub Bot commented on DRILL-4826:
---

Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/592#discussion_r79688118
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java
 ---
@@ -78,17 +79,34 @@ public String getTypeName() {
   }
 
   @Override
-  public List> 
getTablesByNamesByBulkLoad(final List tableNames) {
+  public List> 
getTablesByNamesByBulkLoad(final List tableNames, final int bulkSize) {
+final int totalTables = tableNames.size();
 final String schemaName = getName();
-final List> tableNameToTable = 
Lists.newArrayList();
-List tables;
-try {
-  tables = 
DrillHiveMetaStoreClient.getTableObjectsByNameHelper(mClient, schemaName, 
tableNames);
-} catch (TException e) {
-  logger.warn("Exception occurred while trying to list tables by names 
from {}: {}", schemaName, e.getCause());
-  return tableNameToTable;
+final List tables = 
Lists.newArrayList();
+
+// In each round, Drill asks for a sub-list of all the requested tables
+for(int fromIndex = 0; fromIndex < totalTables; fromIndex += bulkSize) 
{
--- End diff --

Space?


> Query against INFORMATION_SCHEMA.TABLES degrades as the number of views 
> increases
> -
>
> Key: DRILL-4826
> URL: https://issues.apache.org/jira/browse/DRILL-4826
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>
> Queries against INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.VIEWS slow 
> down as the number of views increases. 
> BI tools like Tableau issue a query like the following at connection time:
> {code}
> select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from 
> INFORMATION_SCHEMA.`TABLES` WHERE TABLE_CATALOG LIKE 'DRILL' ESCAPE '\' AND 
> TABLE_SCHEMA <> 'sys' AND TABLE_SCHEMA <> 'INFORMATION_SCHEMA'ORDER BY 
> TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME
> {code}
> The time to query the information schema tables degrades as the number of 
> views increases. On a test system:
> || Views || Time(secs) ||
> |500 | 6 |
> |1000 | 19 |
> |1500 | 33 |
> This can result in a single connection taking more than a minute to establish.
> The problem occurs because we read the view file for every view and this 
> appears to take most of the time.
> Querying information_schema.tables does not, in fact, need to open the view 
> file at all, it merely needs to get a listing of the view files. Eliminating 
> the view file read will speed up the query tremendously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4826) Query against INFORMATION_SCHEMA.TABLES degrades as the number of views increases

2016-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507516#comment-15507516
 ] 

ASF GitHub Bot commented on DRILL-4826:
---

Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/592#discussion_r79663027
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java
 ---
@@ -108,7 +126,7 @@ public String getTypeName() {
 return tableNameToTable;
   }
 
-  private static class HiveTableWithoutStatisticAndRowType implements 
Table {
+   private static class HiveTableWithoutStatisticAndRowType implements 
Table {
--- End diff --

Extra space


> Query against INFORMATION_SCHEMA.TABLES degrades as the number of views 
> increases
> -
>
> Key: DRILL-4826
> URL: https://issues.apache.org/jira/browse/DRILL-4826
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>
> Queries against INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.VIEWS slow 
> down as the number of views increases. 
> BI tools like Tableau issue a query like the following at connection time:
> {code}
> select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from 
> INFORMATION_SCHEMA.`TABLES` WHERE TABLE_CATALOG LIKE 'DRILL' ESCAPE '\' AND 
> TABLE_SCHEMA <> 'sys' AND TABLE_SCHEMA <> 'INFORMATION_SCHEMA'ORDER BY 
> TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME
> {code}
> The time to query the information schema tables degrades as the number of 
> views increases. On a test system:
> || Views || Time(secs) ||
> |500 | 6 |
> |1000 | 19 |
> |1500 | 33 |
> This can result in a single connection taking more than a minute to establish.
> The problem occurs because we read the view file for every view and this 
> appears to take most of the time.
> Querying information_schema.tables does not, in fact, need to open the view 
> file at all, it merely needs to get a listing of the view files. Eliminating 
> the view file read will speed up the query tremendously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4826) Query against INFORMATION_SCHEMA.TABLES degrades as the number of views increases

2016-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507517#comment-15507517
 ] 

ASF GitHub Bot commented on DRILL-4826:
---

Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/592#discussion_r79687833
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java
 ---
@@ -78,17 +79,34 @@ public String getTypeName() {
   }
 
   @Override
-  public List> 
getTablesByNamesByBulkLoad(final List tableNames) {
+  public List> 
getTablesByNamesByBulkLoad(final List tableNames, final int bulkSize) {
+final int totalTables = tableNames.size();
 final String schemaName = getName();
-final List> tableNameToTable = 
Lists.newArrayList();
-List tables;
-try {
-  tables = 
DrillHiveMetaStoreClient.getTableObjectsByNameHelper(mClient, schemaName, 
tableNames);
-} catch (TException e) {
-  logger.warn("Exception occurred while trying to list tables by names 
from {}: {}", schemaName, e.getCause());
-  return tableNameToTable;
+final List tables = 
Lists.newArrayList();
+
+// In each round, Drill asks for a sub-list of all the requested tables
+for(int fromIndex = 0; fromIndex < totalTables; fromIndex += bulkSize) 
{
+  final int toIndex = Math.min(fromIndex + bulkSize, totalTables);
+  final List eachBulkofTableNames = 
tableNames.subList(fromIndex, toIndex);
+  List eachBulkofTables;
+  // Retries once if the first call to fetch the metadata fails
+  synchronized(mClient) {
--- End diff --

why do we synchronize?


> Query against INFORMATION_SCHEMA.TABLES degrades as the number of views 
> increases
> -
>
> Key: DRILL-4826
> URL: https://issues.apache.org/jira/browse/DRILL-4826
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>
> Queries against INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.VIEWS slow 
> down as the number of views increases. 
> BI tools like Tableau issue a query like the following at connection time:
> {code}
> select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from 
> INFORMATION_SCHEMA.`TABLES` WHERE TABLE_CATALOG LIKE 'DRILL' ESCAPE '\' AND 
> TABLE_SCHEMA <> 'sys' AND TABLE_SCHEMA <> 'INFORMATION_SCHEMA'ORDER BY 
> TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME
> {code}
> The time to query the information schema tables degrades as the number of 
> views increases. On a test system:
> || Views || Time(secs) ||
> |500 | 6 |
> |1000 | 19 |
> |1500 | 33 |
> This can result in a single connection taking more than a minute to establish.
> The problem occurs because we read the view file for every view and this 
> appears to take most of the time.
> Querying information_schema.tables does not, in fact, need to open the view 
> file at all, it merely needs to get a listing of the view files. Eliminating 
> the view file read will speed up the query tremendously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4826) Query against INFORMATION_SCHEMA.TABLES degrades as the number of views increases

2016-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507514#comment-15507514
 ] 

ASF GitHub Bot commented on DRILL-4826:
---

Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/592#discussion_r79689068
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ---
@@ -291,6 +291,9 @@
   String ENABLE_BULK_LOAD_TABLE_LIST_KEY = 
"exec.enable_bulk_load_table_list";
   BooleanValidator ENABLE_BULK_LOAD_TABLE_LIST = new 
BooleanValidator(ENABLE_BULK_LOAD_TABLE_LIST_KEY, false);
 
+  String BULK_LOAD_TABLE_LIST_BULK_SIZE_KEY = 
"exec.bulk_load_table_list.bulk_size";
--- End diff --

Maybe a comment to describe the option?


> Query against INFORMATION_SCHEMA.TABLES degrades as the number of views 
> increases
> -
>
> Key: DRILL-4826
> URL: https://issues.apache.org/jira/browse/DRILL-4826
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>
> Queries against INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.VIEWS slow 
> down as the number of views increases. 
> BI tools like Tableau issue a query like the following at connection time:
> {code}
> select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from 
> INFORMATION_SCHEMA.`TABLES` WHERE TABLE_CATALOG LIKE 'DRILL' ESCAPE '\' AND 
> TABLE_SCHEMA <> 'sys' AND TABLE_SCHEMA <> 'INFORMATION_SCHEMA'ORDER BY 
> TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME
> {code}
> The time to query the information schema tables degrades as the number of 
> views increases. On a test system:
> || Views || Time(secs) ||
> |500 | 6 |
> |1000 | 19 |
> |1500 | 33 |
> This can result in a single connection taking more than a minute to establish.
> The problem occurs because we read the view file for every view and this 
> appears to take most of the time.
> Querying information_schema.tables does not, in fact, need to open the view 
> file at all, it merely needs to get a listing of the view files. Eliminating 
> the view file read will speed up the query tremendously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4826) Query against INFORMATION_SCHEMA.TABLES degrades as the number of views increases

2016-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507222#comment-15507222
 ] 

ASF GitHub Bot commented on DRILL-4826:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/592#discussion_r79667525
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaRecordGenerator.java
 ---
@@ -290,28 +290,30 @@ public Tables(OptionManager optionManager) {
   return new PojoRecordReader<>(Records.Table.class, 
records.iterator());
 }
 
-@Override
-public void visitTables(String schemaPath, SchemaPlus schema) {
+@Override public void visitTables(String schemaPath, SchemaPlus 
schema) {
   final AbstractSchema drillSchema = 
schema.unwrap(AbstractSchema.class);
+  final List> tableNamesAndTypes = drillSchema
+  
.getTableNamesAndTypes(optionManager.getOption(ExecConstants.ENABLE_BULK_LOAD_TABLE_LIST),
+  
(int)optionManager.getOption(ExecConstants.BULK_LOAD_TABLE_LIST_BULK_SIZE));
 
-  final List tableNames = 
Lists.newArrayList(schema.getTableNames());
-  final List> tableNameToTables;
-  
if(optionManager.getOption(ExecConstants.ENABLE_BULK_LOAD_TABLE_LIST)) {
-tableNameToTables = 
drillSchema.getTablesByNamesByBulkLoad(tableNames);
-  } else {
-tableNameToTables = drillSchema.getTablesByNames(tableNames);
-  }
-
-  for(Pair tableNameToTable : 
tableNameToTables) {
-final String tableName = tableNameToTable.getKey();
-final Table table = tableNameToTable.getValue();
+  for (Pair tableNameAndType : tableNamesAndTypes) {
+final String tableName = tableNameAndType.getKey();
+final TableType tableType = tableNameAndType.getValue();
 // Visit the table, and if requested ...
-if(shouldVisitTable(schemaPath, tableName)) {
-  visitTable(schemaPath, tableName, table);
+if (shouldVisitTable(schemaPath, tableName)) {
+  visitTableWithType(schemaPath, tableName, tableType);
 }
   }
 }
 
+public boolean visitTableWithType(String schemaName, String tableName, 
TableType type) {
+  Preconditions
+  .checkNotNull(type, "Error. Type information for table %s.%s 
provided is null.", schemaName,
+  tableName);
+  records.add(new Records.Table(IS_CATALOG_NAME, schemaName, 
tableName, type.toString()));
+  return false;
--- End diff --

why return `false`?


> Query against INFORMATION_SCHEMA.TABLES degrades as the number of views 
> increases
> -
>
> Key: DRILL-4826
> URL: https://issues.apache.org/jira/browse/DRILL-4826
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>
> Queries against INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.VIEWS slow 
> down as the number of views increases. 
> BI tools like Tableau issue a query like the following at connection time:
> {code}
> select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from 
> INFORMATION_SCHEMA.`TABLES` WHERE TABLE_CATALOG LIKE 'DRILL' ESCAPE '\' AND 
> TABLE_SCHEMA <> 'sys' AND TABLE_SCHEMA <> 'INFORMATION_SCHEMA'ORDER BY 
> TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME
> {code}
> The time to query the information schema tables degrades as the number of 
> views increases. On a test system:
> || Views || Time(secs) ||
> |500 | 6 |
> |1000 | 19 |
> |1500 | 33 |
> This can result in a single connection taking more than a minute to establish.
> The problem occurs because we read the view file for every view and this 
> appears to take most of the time.
> Querying information_schema.tables does not, in fact, need to open the view 
> file at all, it merely needs to get a listing of the view files. Eliminating 
> the view file read will speed up the query tremendously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4826) Query against INFORMATION_SCHEMA.TABLES degrades as the number of views increases

2016-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507218#comment-15507218
 ] 

ASF GitHub Bot commented on DRILL-4826:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/592#discussion_r79664962
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java
 ---
@@ -78,17 +79,34 @@ public String getTypeName() {
   }
 
   @Override
-  public List> 
getTablesByNamesByBulkLoad(final List tableNames) {
+  public List> 
getTablesByNamesByBulkLoad(final List tableNames, final int bulkSize) {
+final int totalTables = tableNames.size();
 final String schemaName = getName();
-final List> tableNameToTable = 
Lists.newArrayList();
-List tables;
-try {
-  tables = 
DrillHiveMetaStoreClient.getTableObjectsByNameHelper(mClient, schemaName, 
tableNames);
-} catch (TException e) {
-  logger.warn("Exception occurred while trying to list tables by names 
from {}: {}", schemaName, e.getCause());
-  return tableNameToTable;
+final List tables = 
Lists.newArrayList();
+
+// In each round, Drill asks for a sub-list of all the requested tables
+for(int fromIndex = 0; fromIndex < totalTables; fromIndex += bulkSize) 
{
+  final int toIndex = Math.min(fromIndex + bulkSize, totalTables);
+  final List eachBulkofTableNames = 
tableNames.subList(fromIndex, toIndex);
+  List eachBulkofTables;
+  // Retries once if the first call to fetch the metadata fails
+  synchronized(mClient) {
+try {
+  eachBulkofTables = mClient.getTableObjectsByName(schemaName, 
eachBulkofTableNames);
--- End diff --

+ Why not use the helper? Exception handling and reconnecting logic is 
different in the helper methods in 
[DrillHiveMetaStoreClient](https://github.com/apache/drill/blob/master/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/DrillHiveMetaStoreClient.java#L222).
 
+ Move this logic to a method in that class?


> Query against INFORMATION_SCHEMA.TABLES degrades as the number of views 
> increases
> -
>
> Key: DRILL-4826
> URL: https://issues.apache.org/jira/browse/DRILL-4826
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>
> Queries against INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.VIEWS slow 
> down as the number of views increases. 
> BI tools like Tableau issue a query like the following at connection time:
> {code}
> select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from 
> INFORMATION_SCHEMA.`TABLES` WHERE TABLE_CATALOG LIKE 'DRILL' ESCAPE '\' AND 
> TABLE_SCHEMA <> 'sys' AND TABLE_SCHEMA <> 'INFORMATION_SCHEMA'ORDER BY 
> TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME
> {code}
> The time to query the information schema tables degrades as the number of 
> views increases. On a test system:
> || Views || Time(secs) ||
> |500 | 6 |
> |1000 | 19 |
> |1500 | 33 |
> This can result in a single connection taking more than a minute to establish.
> The problem occurs because we read the view file for every view and this 
> appears to take most of the time.
> Querying information_schema.tables does not, in fact, need to open the view 
> file at all, it merely needs to get a listing of the view files. Eliminating 
> the view file read will speed up the query tremendously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4826) Query against INFORMATION_SCHEMA.TABLES degrades as the number of views increases

2016-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507220#comment-15507220
 ] 

ASF GitHub Bot commented on DRILL-4826:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/592#discussion_r79666399
  
--- Diff: 
exec/jdbc/src/test/java/org/apache/drill/jdbc/test/TestJdbcQuery.java ---
@@ -122,6 +122,7 @@ public void testLikeNotLike() throws Exception{
   );
   }
 
+  @Ignore("Returns results in different order depeding on forkCount")
--- End diff --

Is this a regression due to this patch? Other wise, open a ticket for this 
issue.


> Query against INFORMATION_SCHEMA.TABLES degrades as the number of views 
> increases
> -
>
> Key: DRILL-4826
> URL: https://issues.apache.org/jira/browse/DRILL-4826
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>
> Queries against INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.VIEWS slow 
> down as the number of views increases. 
> BI tools like Tableau issue a query like the following at connection time:
> {code}
> select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from 
> INFORMATION_SCHEMA.`TABLES` WHERE TABLE_CATALOG LIKE 'DRILL' ESCAPE '\' AND 
> TABLE_SCHEMA <> 'sys' AND TABLE_SCHEMA <> 'INFORMATION_SCHEMA'ORDER BY 
> TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME
> {code}
> The time to query the information schema tables degrades as the number of 
> views increases. On a test system:
> || Views || Time(secs) ||
> |500 | 6 |
> |1000 | 19 |
> |1500 | 33 |
> This can result in a single connection taking more than a minute to establish.
> The problem occurs because we read the view file for every view and this 
> appears to take most of the time.
> Querying information_schema.tables does not, in fact, need to open the view 
> file at all, it merely needs to get a listing of the view files. Eliminating 
> the view file read will speed up the query tremendously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4826) Query against INFORMATION_SCHEMA.TABLES degrades as the number of views increases

2016-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507221#comment-15507221
 ] 

ASF GitHub Bot commented on DRILL-4826:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/592#discussion_r79664917
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -738,5 +740,49 @@ public void dropTable(String table) {
 .build(logger);
   }
 }
+
+@Override public List> 
getTableNamesAndTypes(boolean bulkLoad, int bulkSize) {
--- End diff --

Add annotation in a line above?
There are other places too.


> Query against INFORMATION_SCHEMA.TABLES degrades as the number of views 
> increases
> -
>
> Key: DRILL-4826
> URL: https://issues.apache.org/jira/browse/DRILL-4826
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>
> Queries against INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.VIEWS slow 
> down as the number of views increases. 
> BI tools like Tableau issue a query like the following at connection time:
> {code}
> select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from 
> INFORMATION_SCHEMA.`TABLES` WHERE TABLE_CATALOG LIKE 'DRILL' ESCAPE '\' AND 
> TABLE_SCHEMA <> 'sys' AND TABLE_SCHEMA <> 'INFORMATION_SCHEMA'ORDER BY 
> TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME
> {code}
> The time to query the information schema tables degrades as the number of 
> views increases. On a test system:
> || Views || Time(secs) ||
> |500 | 6 |
> |1000 | 19 |
> |1500 | 33 |
> This can result in a single connection taking more than a minute to establish.
> The problem occurs because we read the view file for every view and this 
> appears to take most of the time.
> Querying information_schema.tables does not, in fact, need to open the view 
> file at all, it merely needs to get a listing of the view files. Eliminating 
> the view file read will speed up the query tremendously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4826) Query against INFORMATION_SCHEMA.TABLES degrades as the number of views increases

2016-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507217#comment-15507217
 ] 

ASF GitHub Bot commented on DRILL-4826:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/592#discussion_r79665593
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaRecordGenerator.java
 ---
@@ -55,6 +54,7 @@
  * schema, table or field.
  */
 public abstract class InfoSchemaRecordGenerator {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(InfoSchemaRecordGenerator.class);
--- End diff --

private


> Query against INFORMATION_SCHEMA.TABLES degrades as the number of views 
> increases
> -
>
> Key: DRILL-4826
> URL: https://issues.apache.org/jira/browse/DRILL-4826
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>
> Queries against INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.VIEWS slow 
> down as the number of views increases. 
> BI tools like Tableau issue a query like the following at connection time:
> {code}
> select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from 
> INFORMATION_SCHEMA.`TABLES` WHERE TABLE_CATALOG LIKE 'DRILL' ESCAPE '\' AND 
> TABLE_SCHEMA <> 'sys' AND TABLE_SCHEMA <> 'INFORMATION_SCHEMA'ORDER BY 
> TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME
> {code}
> The time to query the information schema tables degrades as the number of 
> views increases. On a test system:
> || Views || Time(secs) ||
> |500 | 6 |
> |1000 | 19 |
> |1500 | 33 |
> This can result in a single connection taking more than a minute to establish.
> The problem occurs because we read the view file for every view and this 
> appears to take most of the time.
> Querying information_schema.tables does not, in fact, need to open the view 
> file at all, it merely needs to get a listing of the view files. Eliminating 
> the view file read will speed up the query tremendously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4826) Query against INFORMATION_SCHEMA.TABLES degrades as the number of views increases

2016-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507219#comment-15507219
 ] 

ASF GitHub Bot commented on DRILL-4826:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/592#discussion_r79663056
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java
 ---
@@ -78,17 +79,34 @@ public String getTypeName() {
   }
 
   @Override
-  public List> 
getTablesByNamesByBulkLoad(final List tableNames) {
+  public List> 
getTablesByNamesByBulkLoad(final List tableNames, final int bulkSize) {
+final int totalTables = tableNames.size();
 final String schemaName = getName();
-final List> tableNameToTable = 
Lists.newArrayList();
-List tables;
-try {
-  tables = 
DrillHiveMetaStoreClient.getTableObjectsByNameHelper(mClient, schemaName, 
tableNames);
-} catch (TException e) {
-  logger.warn("Exception occurred while trying to list tables by names 
from {}: {}", schemaName, e.getCause());
-  return tableNameToTable;
+final List tables = 
Lists.newArrayList();
+
+// In each round, Drill asks for a sub-list of all the requested tables
+for(int fromIndex = 0; fromIndex < totalTables; fromIndex += bulkSize) 
{
+  final int toIndex = Math.min(fromIndex + bulkSize, totalTables);
+  final List eachBulkofTableNames = 
tableNames.subList(fromIndex, toIndex);
+  List eachBulkofTables;
+  // Retries once if the first call to fetch the metadata fails
+  synchronized(mClient) {
--- End diff --

why synchronized?


> Query against INFORMATION_SCHEMA.TABLES degrades as the number of views 
> increases
> -
>
> Key: DRILL-4826
> URL: https://issues.apache.org/jira/browse/DRILL-4826
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>
> Queries against INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.VIEWS slow 
> down as the number of views increases. 
> BI tools like Tableau issue a query like the following at connection time:
> {code}
> select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from 
> INFORMATION_SCHEMA.`TABLES` WHERE TABLE_CATALOG LIKE 'DRILL' ESCAPE '\' AND 
> TABLE_SCHEMA <> 'sys' AND TABLE_SCHEMA <> 'INFORMATION_SCHEMA'ORDER BY 
> TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME
> {code}
> The time to query the information schema tables degrades as the number of 
> views increases. On a test system:
> || Views || Time(secs) ||
> |500 | 6 |
> |1000 | 19 |
> |1500 | 33 |
> This can result in a single connection taking more than a minute to establish.
> The problem occurs because we read the view file for every view and this 
> appears to take most of the time.
> Querying information_schema.tables does not, in fact, need to open the view 
> file at all, it merely needs to get a listing of the view files. Eliminating 
> the view file read will speed up the query tremendously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4826) Query against INFORMATION_SCHEMA.TABLES degrades as the number of views increases

2016-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507131#comment-15507131
 ] 

ASF GitHub Bot commented on DRILL-4826:
---

GitHub user parthchandra opened a pull request:

https://github.com/apache/drill/pull/592

DRILL-4826: Query against INFORMATION_SCHEMA.TABLES degrades as the n…

…umber of views increases

Changed to get information for all views in a single call instead of of one 
by one

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/parthchandra/drill DRILL-4826

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/592.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #592


commit 07fbb0ac224e53299217263cf2d0510482a4c9b3
Author: Parth Chandra 
Date:   2016-08-04T06:02:01Z

DRILL-4826: Query against INFORMATION_SCHEMA.TABLES degrades as the number 
of views
increases




> Query against INFORMATION_SCHEMA.TABLES degrades as the number of views 
> increases
> -
>
> Key: DRILL-4826
> URL: https://issues.apache.org/jira/browse/DRILL-4826
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>
> Queries against INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.VIEWS slow 
> down as the number of views increases. 
> BI tools like Tableau issue a query like the following at connection time:
> {code}
> select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from 
> INFORMATION_SCHEMA.`TABLES` WHERE TABLE_CATALOG LIKE 'DRILL' ESCAPE '\' AND 
> TABLE_SCHEMA <> 'sys' AND TABLE_SCHEMA <> 'INFORMATION_SCHEMA'ORDER BY 
> TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME
> {code}
> The time to query the information schema tables degrades as the number of 
> views increases. On a test system:
> || Views || Time(secs) ||
> |500 | 6 |
> |1000 | 19 |
> |1500 | 33 |
> This can result in a single connection taking more than a minute to establish.
> The problem occurs because we read the view file for every view and this 
> appears to take most of the time.
> Querying information_schema.tables does not, in fact, need to open the view 
> file at all, it merely needs to get a listing of the view files. Eliminating 
> the view file read will speed up the query tremendously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4826) Query against INFORMATION_SCHEMA.TABLES degrades as the number of views increases

2016-09-14 Thread Joel Bondurant (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15491044#comment-15491044
 ] 

Joel Bondurant commented on DRILL-4826:
---

0: jdbc:drill:> select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from 
INFORMATION_SCHEMA.`TABLES` WHERE TABLE_CATALOG LIKE 'DRILL' ESCAPE '\' AND 
TABLE_SCHEMA <> 'sys' AND TABLE_SCHEMA <> 'INFORMATION_SCHEMA'ORDER BY 
TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME;
++---+-+-+
| TABLE_CATALOG  | TABLE_SCHEMA  | TABLE_NAME  | TABLE_TYPE  |
++---+-+-+
++---+-+-+
No rows selected (534.714 seconds)


> Query against INFORMATION_SCHEMA.TABLES degrades as the number of views 
> increases
> -
>
> Key: DRILL-4826
> URL: https://issues.apache.org/jira/browse/DRILL-4826
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>
> Queries against INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.VIEWS slow 
> down as the number of views increases. 
> BI tools like Tableau issue a query like the following at connection time:
> {code}
> select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from 
> INFORMATION_SCHEMA.`TABLES` WHERE TABLE_CATALOG LIKE 'DRILL' ESCAPE '\' AND 
> TABLE_SCHEMA <> 'sys' AND TABLE_SCHEMA <> 'INFORMATION_SCHEMA'ORDER BY 
> TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME
> {code}
> The time to query the information schema tables degrades as the number of 
> views increases. On a test system:
> || Views || Time(secs) ||
> |500 | 6 |
> |1000 | 19 |
> |1500 | 33 |
> This can result in a single connection taking more than a minute to establish.
> The problem occurs because we read the view file for every view and this 
> appears to take most of the time.
> Querying information_schema.tables does not, in fact, need to open the view 
> file at all, it merely needs to get a listing of the view files. Eliminating 
> the view file read will speed up the query tremendously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)