[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14207 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62552/ Test PASSed. ---

[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14207 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14207 **[Test build #62552 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62552/consoleFull)** for PR 14207 at commit [`a043ca2`](https://github.com/apache/spark/commit/

[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14207 **[Test build #62552 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62552/consoleFull)** for PR 14207 at commit [`a043ca2`](https://github.com/apache/spark/commit/a

[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14207 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62513/ Test PASSed. ---

[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14207 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14207 **[Test build #62513 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62513/consoleFull)** for PR 14207 at commit [`55c2c5e`](https://github.com/apache/spark/commit/

[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14207 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14207 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62512/ Test PASSed. ---

[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14207 **[Test build #62512 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62512/consoleFull)** for PR 14207 at commit [`c6afbbb`](https://github.com/apache/spark/commit/

[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14207 **[Test build #62513 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62513/consoleFull)** for PR 14207 at commit [`55c2c5e`](https://github.com/apache/spark/commit/5

[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14207 **[Test build #62512 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62512/consoleFull)** for PR 14207 at commit [`c6afbbb`](https://github.com/apache/spark/commit/c

[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-18 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14207 > when the data/files are changed by external system (e.g., appended by a streaming system), the stored schema can be inconsistent with the actual schema of the data. I think this problem

[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-18 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14207 @gatorsmile Yea. I meant that as you use the stored schema without inferred schema for table, when the data/files are changed by external system (e.g., appended by a streaming system), the stored sch

[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-18 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14207 @viirya Schema inference is time-consuming, especially when the number of files is huge. Thus, we should avoid refreshing it every time. That is one of the major reasons why we have a metadata ca

[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-18 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14207 @gatorsmile When the data/files are input by an external system, and Spark is just used to process them in batch. Does it mean that schema can be inconsistent? Or it should call refresh every time it

[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-18 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14207 The table location is not allowed to change. Right? With the changes of this PR, if the changes on the data/files (pointed by the table location) affect the table schema, they need to ma

[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-17 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14207 Does it mean that if users do not issue refresh when the table location is changed, the schema will be wrong when the Spark is re-starting? --- If your project is set up for it, you can reply to thi

[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-17 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14207 @viirya The problem it tries to resolve is from the comment of @rxin in another PR: https://github.com/apache/spark/pull/14148#issuecomment-232273833 --- If your project is set up for it, you ca

[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-17 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14207 I think it is not clear what the problem this PR tries to solve is. It just says it proposes to save the inferred schema in external catalog. --- If your project is set up for it, you can reply to t

[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-14 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14207 @rxin @cloud-fan @yhuai This PR introduces a new concept `SchemaType` for determining the original source of a schema. When `SchemaType` is `USER`, it means this table belongs to `Group

[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14207 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62344/ Test PASSed. ---

[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14207 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14207 **[Test build #62344 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62344/consoleFull)** for PR 14207 at commit [`3be0dc0`](https://github.com/apache/spark/commit/

[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14207 **[Test build #62344 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62344/consoleFull)** for PR 14207 at commit [`3be0dc0`](https://github.com/apache/spark/commit/3