[jira] [Commented] (SPARK-10287) After processing a query using JSON data, Spark SQL continuously refreshes metadata of the table

2015-08-27 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717737#comment-14717737
 ] 

Yin Huai commented on SPARK-10287:
--

We need to put the following release note JSON data source will not 
automatically load new files that are created by other applications (i.e. files 
that are not inserted to the dataset through Spark SQL). [SPARK-10287]..

 After processing a query using JSON data, Spark SQL continuously refreshes 
 metadata of the table
 

 Key: SPARK-10287
 URL: https://issues.apache.org/jira/browse/SPARK-10287
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.0
Reporter: Yin Huai
Assignee: Yin Huai
Priority: Critical
  Labels: releasenotes
 Fix For: 1.5.1


 I have a partitioned json table with 1824 partitions.
 {code}
 val df = sqlContext.read.format(json).load(aPartitionedJsonData)
 val columnStr = df.schema.map(_.name).mkString(,)
 println(scolumns: $columnStr)
 val hash = df
   .selectExpr(shash($columnStr) as hashValue)
   .groupBy()
   .sum(hashValue)
   .head()
   .getLong(0)
 {code}
 Looks like for JSON, we refresh metadata when we call buildScan. For a 
 partitioned table, we call buildScan for every partition. So, looks like we 
 will refresh this table 1824 times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10287) After processing a query using JSON data, Spark SQL continuously refreshes metadata of the table

2015-08-26 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715570#comment-14715570
 ] 

Apache Spark commented on SPARK-10287:
--

User 'yhuai' has created a pull request for this issue:
https://github.com/apache/spark/pull/8469

 After processing a query using JSON data, Spark SQL continuously refreshes 
 metadata of the table
 

 Key: SPARK-10287
 URL: https://issues.apache.org/jira/browse/SPARK-10287
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.0
Reporter: Yin Huai
Priority: Critical

 I have a partitioned json table with 1824 partitions.
 {code}
 val df = sqlContext.read.format(json).load(aPartitionedJsonData)
 val columnStr = df.schema.map(_.name).mkString(,)
 println(scolumns: $columnStr)
 val hash = df
   .selectExpr(shash($columnStr) as hashValue)
   .groupBy()
   .sum(hashValue)
   .head()
   .getLong(0)
 {code}
 Looks like for JSON, we refresh metadata when we call buildScan. For a 
 partitioned table, we call buildScan for every partition. So, looks like we 
 will refresh this table 1824 times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org