[jira] [Updated] (SPARK-18700) getCached in HiveMetastoreCatalog not thread safe cause driver OOM

2016-12-21 Thread Herman van Hovell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell updated SPARK-18700:
--
Fix Version/s: 2.0.3

> getCached in HiveMetastoreCatalog not thread safe cause driver OOM
> --
>
> Key: SPARK-18700
> URL: https://issues.apache.org/jira/browse/SPARK-18700
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1, 2.0.0, 2.1.1
>Reporter: Li Yuanjian
>Assignee: Li Yuanjian
> Fix For: 2.0.3, 2.1.1, 2.2.0
>
>
> In our spark sql platform, each query use same HiveContext and 
> independent thread, new data will append to tables as new partitions every 
> 30min. After a new partition added to table T, we should call refreshTable to 
> clear T’s cache in cachedDataSourceTables to make the new partition 
> searchable. 
> For the table have more partitions and files(much bigger than 
> spark.sql.sources.parallelPartitionDiscovery.threshold), a new query of table 
> T will start a job to fetch all FileStatus in listLeafFiles function. Because 
> of the huge number of files, the job will run several seconds, during the 
> time, new queries of table T will also start new jobs to fetch FileStatus 
> because of the function of getCache is not thread safe. Final cause a driver 
> OOM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18700) getCached in HiveMetastoreCatalog not thread safe cause driver OOM

2016-12-19 Thread Li Yuanjian (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Yuanjian updated SPARK-18700:

Affects Version/s: 2.1.1

> getCached in HiveMetastoreCatalog not thread safe cause driver OOM
> --
>
> Key: SPARK-18700
> URL: https://issues.apache.org/jira/browse/SPARK-18700
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1, 2.0.0, 2.1.1
>Reporter: Li Yuanjian
>
> In our spark sql platform, each query use same HiveContext and 
> independent thread, new data will append to tables as new partitions every 
> 30min. After a new partition added to table T, we should call refreshTable to 
> clear T’s cache in cachedDataSourceTables to make the new partition 
> searchable. 
> For the table have more partitions and files(much bigger than 
> spark.sql.sources.parallelPartitionDiscovery.threshold), a new query of table 
> T will start a job to fetch all FileStatus in listLeafFiles function. Because 
> of the huge number of files, the job will run several seconds, during the 
> time, new queries of table T will also start new jobs to fetch FileStatus 
> because of the function of getCache is not thread safe. Final cause a driver 
> OOM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18700) getCached in HiveMetastoreCatalog not thread safe cause driver OOM

2016-12-03 Thread Li Yuanjian (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Yuanjian updated SPARK-18700:

Description: 
In our spark sql platform, each query use same HiveContext and independent 
thread, new data will append to tables as new partitions every 30min. After a 
new partition added to table T, we should call refreshTable to clear T’s cache 
in cachedDataSourceTables to make the new partition searchable. 
For the table have more partitions and files(much bigger than 
spark.sql.sources.parallelPartitionDiscovery.threshold), a new query of table T 
will start a job to fetch all FileStatus in listLeafFiles function. Because of 
the huge number of files, the job will run several seconds, during the time, 
new queries of table T will also start new jobs to fetch FileStatus because of 
the function of getCache is not thread safe. Final cause a driver OOM.

  was:
In our spark sql platform, each query use same HiveContext and independent 
thread, new data will append to tables as new partitions every 30min. After a 
new partition added to table T, we should call refreshTable to clear T’s cache 
in cachedDataSourceTables
to make the new partition searchable. 
For the table have more partitions and files(much bigger than 
spark.sql.sources.parallelPartitionDiscovery.threshold), a new query of table T 
will start a job to fetch all FileStatus in listLeafFiles function. Because of 
the huge number of files, the job will run several seconds, during the time, 
new queries of table T will also start new jobs to fetch FileStatus because of 
the function of getCache is not thread safe. Final cause a driver OOM.


> getCached in HiveMetastoreCatalog not thread safe cause driver OOM
> --
>
> Key: SPARK-18700
> URL: https://issues.apache.org/jira/browse/SPARK-18700
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Li Yuanjian
>
> In our spark sql platform, each query use same HiveContext and 
> independent thread, new data will append to tables as new partitions every 
> 30min. After a new partition added to table T, we should call refreshTable to 
> clear T’s cache in cachedDataSourceTables to make the new partition 
> searchable. 
> For the table have more partitions and files(much bigger than 
> spark.sql.sources.parallelPartitionDiscovery.threshold), a new query of table 
> T will start a job to fetch all FileStatus in listLeafFiles function. Because 
> of the huge number of files, the job will run several seconds, during the 
> time, new queries of table T will also start new jobs to fetch FileStatus 
> because of the function of getCache is not thread safe. Final cause a driver 
> OOM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org