[GitHub] spark pull request: [SPARK-3308][SQL][FOLLOW-UP] Parse JSON rows h...

2016-03-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11752#issuecomment-197173995 Set `null` instead of throwing an exception. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: Parse modes in JSON data source

2016-03-16 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/11756 Parse modes in JSON data source ## What changes were proposed in this pull request? Currently, there is no way to control the behaviour when fails to parse corrupt records in JSON

[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

2016-03-16 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11756#discussion_r56291636 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -991,6 +999,16 @@ class JsonSuite extends

[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

2016-03-16 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11756#discussion_r56292393 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -991,6 +999,16 @@ class JsonSuite extends

[GitHub] spark pull request: [SPARK-13899][SQL] Produce InternalRow instead...

2016-03-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11717#issuecomment-197173289 Does that mean closing https://github.com/apache/spark/pull/11550 and not supporting custom date format? --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-3308][SQL][FOLLOW-UP] Parse JSON rows h...

2016-03-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11752#issuecomment-197184143 Yeap, Would you check this PR, https://github.com/apache/spark/pull/11756? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

2016-03-16 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11756#discussion_r56290903 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JSONOptions.scala --- @@ -49,6 +50,16 @@ private[sql] class

[GitHub] spark pull request: [SPARK-3308][SQL][FOLLOW-UP] Parse JSON rows h...

2016-03-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11752#issuecomment-197181772 @rxin Sure! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3308][SQL][FOLLOW-UP] Parse JSON rows h...

2016-03-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11752#issuecomment-197184490 Oh, sorry I misunderstood. Currently, JSON data source works as a PERMISSIVE mode. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-13667][SQL] Support for specifying cust...

2016-03-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11550#issuecomment-197199259 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

2016-03-16 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11756#discussion_r56306144 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -288,6 +288,9 @@ class DataFrameReader private[sql](sqlContext

[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

2016-03-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11756#issuecomment-197238803 The commit I submitted includes comment changes and avoiding to add a `_corrupt_record` field when it is `DROPMALFORMED` mode in type inference. --- If your

[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

2016-03-16 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11756#discussion_r56296202 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -963,6 +963,31 @@ class JsonSuite extends

[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

2016-03-16 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11756#discussion_r56299745 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -963,6 +963,31 @@ class JsonSuite extends

[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

2016-03-16 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11756#discussion_r56299195 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -963,6 +963,31 @@ class JsonSuite extends

[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

2016-03-16 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11756#discussion_r56299307 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -963,6 +963,31 @@ class JsonSuite extends

[GitHub] spark pull request: [SPARK-13997][SQL] Use Hadoop 2.0 default valu...

2016-03-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11806#issuecomment-198201426 Maybe I think we should leave the configurations as default if users do not specify the configuration. It might be able to use `setIfUnset()` instead of `set

[GitHub] spark pull request: [SPARK-13997][SQL] Use Hadoop 2.0 default valu...

2016-03-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11806#issuecomment-198333219 Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11947#discussion_r57656806 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala --- @@ -101,3 +125,14 @@ private[sql] class

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11947#discussion_r57657765 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVTypeCastSuite.scala --- @@ -64,17 +66,21 @@ class CSVTypeCastSuite

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11947#discussion_r57656879 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -478,4 +479,34 @@ class CSVSuite extends

[GitHub] spark pull request: [SPARK-14103][SQL] Parse unescaped quotes in C...

2016-04-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12226#issuecomment-206644937 cc @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-14103][SQL] Parse unescaped quotes in C...

2016-04-06 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/12226 [SPARK-14103][SQL] Parse unescaped quotes in CSV data source. ## What changes were proposed in this pull request? This PR resolves the problem during parsing unescaped quotes in input

[GitHub] spark pull request: [MINOR][SQL] Remove some unused imports in dat...

2016-04-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12326#issuecomment-209292212 @cloud-fan Sure. Let me open a PR. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14596][SQL] Remove not used SqlNewHadoo...

2016-04-13 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/12354 [SPARK-14596][SQL] Remove not used SqlNewHadoopRDD and some more unused imports ## What changes were proposed in this pull request? Old `HadoopFsRelation` API includes

[GitHub] spark pull request: [MINOR][SQL] Remove some unused imports in dat...

2016-04-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12326#issuecomment-209195204 @cloud-fan Could you please take a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14596][SQL] Remove not used SqlNewHadoo...

2016-04-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12354#issuecomment-209371637 cc @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-14596][SQL] Remove not used SqlNewHadoo...

2016-04-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12354#issuecomment-209402866 Thanks! I just renamed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-14596][SQL] Remove not used SqlNewHadoo...

2016-04-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12354#issuecomment-209406971 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12268#issuecomment-209408827 @yhuai Could you please review this? I don't want to keep resolving conflicts and I am pretty sure that this is a sensible PR. This PR touches pretty

[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12268#issuecomment-207694129 I have two things to note. - It looks `buildInternalScan()` is not called but just remaining. Just in case I did not remove this and tested

[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12268#issuecomment-207694169 cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-08 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/12268 [SPARK-14480][SQL] Simplify CSV parsing process with a better performance ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-14480

[GitHub] spark pull request: [SPARK-14482][SQL] Change default Parquet code...

2016-04-08 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12256#discussion_r58997853 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala --- @@ -136,14 +125,7 @@ private[sql] class

[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12268#issuecomment-207736726 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12268#issuecomment-207714445 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12268#discussion_r59334104 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVRelation.scala --- @@ -17,153 +17,197 @@ package

[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12268#discussion_r59333718 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVRelation.scala --- @@ -17,153 +17,197 @@ package

[GitHub] spark pull request: [MINOR][SQL][DOCS] Remove some unused imports ...

2016-04-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12326#issuecomment-208791039 I also noticed that `SqlNewHadoopRDD` is not used anymore. Just to double check, this wouldn't mean necessarily this has to be removed? --- If your project is set

[GitHub] spark pull request: [MINOR][SQL][DOCS] Remove some unused imports ...

2016-04-12 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/12326 [MINOR][SQL][DOCS] Remove some unused imports in datasources. ## What changes were proposed in this pull request? It looks several commits for datasources missed removing some unused

[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12268#issuecomment-207897824 Could I maybe cc @liancheng and @cloud-fan to review? This resembles JSON data source structure. So, the class structures and input/output in methods

[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12268#discussion_r59180459 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVParserSuite.scala --- @@ -1,125 +0,0 @@ -/* - * Licensed

[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12268#discussion_r59180462 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVParserSuite.scala --- @@ -1,125 +0,0 @@ -/* - * Licensed

[GitHub] spark pull request: [SPARK-14103][SQL] Parse unescaped quotes in C...

2016-04-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12226#issuecomment-207228945 @rxin Could you maybe take another look or merge this please? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-14189][SQL] JSON data sources find comp...

2016-04-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11993#issuecomment-207228869 @rxin Could you maybe take another look or merge this please? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-13997][SQL] Use Hadoop 2.0 default valu...

2016-03-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11806#issuecomment-198326532 @srowen @steveloughram Thank you so much. Could anybody please decide if I should go head or not? For me it's a bit confusing. I will follow the decision

[GitHub] spark pull request: [SPARK-3854][BUILD] Scala style: require space...

2016-03-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11637#issuecomment-198144779 Oh, I have been thinking a space between them is correct. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

2016-03-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11756#issuecomment-197624257 @cloud-fan Actually, I have a question. So, in JSON data source, I thought JSON data format itself can have a flexible schema so it does not necessarily have

[GitHub] spark pull request: [SPARK-13866][SQL] Handle decimal type in CSV ...

2016-03-19 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11724#discussion_r56444849 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -108,14 +109,38 @@ private[csv] object

[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

2016-03-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11756#issuecomment-197636844 @cloud-fan Sorry, one more question. Would it be great if we maybe make `spark.sql.columnNameOfCorruptRecord` as an option just like the compression option

[GitHub] spark pull request: [SPARK-13866][SQL] Handle decimal type in CSV ...

2016-03-21 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11724#discussion_r56785216 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -108,14 +109,38 @@ private[csv] object

[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

2016-03-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11756#issuecomment-199116245 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

2016-03-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11756#issuecomment-199116238 @cloud-fan Is this a typo maybe :)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-13108][SQL] Support for ascii compatibl...

2016-03-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11016#issuecomment-199118697 I found a similar issue with this, [SPARK-1849](https://issues.apache.org/jira/browse/SPARK-1849). I think we might have to do not support non-ascii

[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

2016-03-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11756#issuecomment-197658346 Filed in https://issues.apache.org/jira/browse/SPARK-13953. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-3308][SQL][FOLLOW-UP] Parse JSON rows h...

2016-03-20 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11752#discussion_r56441332 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/TestJsonData.scala --- @@ -209,6 +209,11 @@ private[json] trait

[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

2016-03-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11270#issuecomment-198225157 Let me close this for now because I could not come up with a good idea. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-13997][SQL] Use Hadoop 2.0 default valu...

2016-03-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11806#issuecomment-198364410 @tomwitte Sorry for adding more comments but does that mean the default value in Hadoop 1.x is BLOCK? --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-13953][SQL] Specifying the field name f...

2016-03-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11881#issuecomment-199680230 cc @cloud-fan @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13953][SQL] Specifying the field name f...

2016-03-22 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/11881 [SPARK-13953][SQL] Specifying the field name for corrupted record via option at JSON datasource ## What changes were proposed in this pull request? https://issues.apache.org/jira

[GitHub] spark pull request: [SPARK-13997][SQL] Use Hadoop 2.0 default valu...

2016-03-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11806#issuecomment-198200770 @rxin According to [Hadoop Definitive Guide 3th edition](https://www.safaribooksonline.com/library/view/hadoop-the-definitive/9781449328917/ch04.html), it looks

[GitHub] spark pull request: [SPARK-13667][SQL] Support for specifying cust...

2016-03-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11550#issuecomment-193576069 @falaki Just let to know, I changed the name `CSVInferSchema` to `InferSchema` mainly for consistent names for CSV and JSON data source but maybe they might have

[GitHub] spark pull request: [SPARK-13667][SQL] Support for specifying cust...

2016-03-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11550#issuecomment-193549873 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

2016-03-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11270#issuecomment-193627655 I will take an action as soon as I could have some feedback for this conflict. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-13738][SQL] Cleanup Data Source resolut...

2016-03-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11572#issuecomment-193709857 While creating a PR I also saw some nits. At 'OrcRelation', 1. [`import com.google.common.base.Objects`](https://github.com/apache/spark/blob

[GitHub] spark pull request: [SPARK-13174][SparkR] Add read.csv and write.c...

2016-03-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11457#discussion_r54987993 --- Diff: R/pkg/inst/tests/testthat/test_context.R --- @@ -26,7 +26,7 @@ test_that("Check masked functions", { maskedBySparkR

[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

2016-03-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11270#issuecomment-192106315 @rxin If you think we should not list up not even once then, should we maybe then just detect the source only by given paths without listing up and then just leave

[GitHub] spark pull request: [SPARK-13442][SQL] Make type inference recogni...

2016-03-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11315#issuecomment-191996798 cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13667][SQL] Support for specifying cust...

2016-03-06 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/11550 [SPARK-13667][SQL] Support for specifying custom date format for date and timestamp types at CSV datasource. ## What changes were proposed in this pull request? This PR adds

[GitHub] spark pull request: [SPARK-13667][SQL] Support for specifying cust...

2016-03-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11550#issuecomment-193035562 @rxin There should be a conflict with https://github.com/apache/spark/pull/11315 which I think it's supposed to be merged (assuming from your comment). I

[GitHub] spark pull request: [SPARK-13667][SQL] Support for specifying cust...

2016-03-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11550#issuecomment-193035803 @falaki Would you maybe review this please..? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-13174][SparkR] Add read.csv and write.c...

2016-03-01 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/11457 [SPARK-13174][SparkR] Add read.csv and write.csv for SparkR ## What changes were proposed in this pull request? This PR adds `read.csv` and `write.csv` for SparkR. They were added

[GitHub] spark pull request: [SPARK-13174][SparkR] Add read.csv and write.c...

2016-03-01 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11457#discussion_r54677085 --- Diff: R/pkg/inst/tests/testthat/test_context.R --- @@ -24,7 +24,7 @@ test_that("Check masked functions", { func <- lapply(maske

[GitHub] spark pull request: [SPARK-13174][SparkR] Add read.csv and write.c...

2016-03-01 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11457#discussion_r54677487 --- Diff: R/pkg/inst/tests/testthat/test_context.R --- @@ -26,7 +26,7 @@ test_that("Check masked functions", { maskedBySparkR

[GitHub] spark pull request: [SPARK-13543][SQL] Support for specifying comp...

2016-03-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11464#issuecomment-191112297 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13543][SQL] Support for specifying comp...

2016-03-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11464#issuecomment-191112794 cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-13442][SQL] Make type inference recogni...

2016-03-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11315#issuecomment-191113533 @rxin @falaki Would you take a look please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-13528][SQL] Make the short names of com...

2016-03-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11408#issuecomment-191118120 @rxin Sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13543][SQL] Support for specifying comp...

2016-03-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11464#issuecomment-191148542 @rxin I think I should change the codec names to lower cases [here](https://github.com/HyukjinKwon/spark/blob/SPARK-13543/sql/hive/src/main/scala/org/apache/spark

[GitHub] spark pull request: [SPARK-13543][SQL] Support for specifying comp...

2016-03-01 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/11464 [SPARK-13543][SQL] Support for specifying compression codec for Parquet/ORC via option() ## What changes were proposed in this pull request? This PR adds the support to specify

[GitHub] spark pull request: [SPARK-13543][SQL] Support for specifying comp...

2016-03-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11464#issuecomment-191503747 @rxin Sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13528][SQL] Make the short names of com...

2016-03-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11408#issuecomment-191491581 Yes please. Let me make a followup. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-13528][SQL] Make the short names of com...

2016-03-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11408#issuecomment-191488351 I remember I saw some tests for compression codes which check the extension when they are compressed. Then, let me correct them (or simply check them

[GitHub] spark pull request: [SPARK-13543][SQL] Support for specifying comp...

2016-03-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11464#issuecomment-191497497 @rxin Could you maybe merge this first before dealing with consistent compression name stuff if it looks good? --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-13528][SQL] Make the short names of com...

2016-03-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11408#issuecomment-191489219 Let's talk more in the new PR. I will try to deal with this in capacity myself first. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-13528][SQL] Make the short names of com...

2016-03-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11408#issuecomment-191480329 Let me list up possible codec optionsn for each data source. For JSON, CSV and TEXT data sources, `none` - no compression `bzip2` `snappy

[GitHub] spark pull request: [SPARK-13528][SQL] Make the short names of com...

2016-03-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11408#issuecomment-191482053 Yes would the users set 'uncompressed' as it was possible to set via Spark configuration? --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-13528][SQL] Make the short names of com...

2016-03-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11408#issuecomment-191485339 As I said then we might beed to consider handling the extensions for JSON, TEXT and CSV which are `.deflate`. We might need to manually change them to `.zlib

[GitHub] spark pull request: [SPARK-13528][SQL] Make the short names of com...

2016-03-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11408#issuecomment-191471710 Thanks. Then `zlib` for ORC, `deflate` for Hadoop and `none` for TEXT, JSON, CSV and ORC? --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-13174][SparkR] Add read.csv and write.c...

2016-03-02 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11457#discussion_r54827127 --- Diff: R/pkg/inst/tests/testthat/test_context.R --- @@ -26,7 +26,7 @@ test_that("Check masked functions", { maskedBySparkR

[GitHub] spark pull request: [SPARK-13543][SQL] Support for specifying comp...

2016-03-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11464#issuecomment-191571895 Hm.. This will not explicitly set `uncompressed` or `none` for JSON, CSV and TEXT. Let me correct them and add some tests. --- If your project is set up

[GitHub] spark pull request: [SPARK-13543][SQL] Support for specifying comp...

2016-03-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11464#issuecomment-191578507 @rxin I think this is ready to be reviewed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-13632] [SQL] Move commands.scala to com...

2016-03-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11482#issuecomment-191586617 I think you possibly have a wrong link by mistake. I don't see the relation between this PR and https://github.com/apache/spark/pull/11408.. --- If your project

[GitHub] spark pull request: [SPARK-13543][SQL] Support for specifying comp...

2016-03-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11464#issuecomment-191599397 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13174][SparkR] Add read.csv and write.c...

2016-03-02 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11457#discussion_r54846238 --- Diff: R/pkg/inst/tests/testthat/test_context.R --- @@ -26,7 +26,7 @@ test_that("Check masked functions", { maskedBySparkR

[GitHub] spark pull request: [SPARK-13174][SparkR] Add read.csv and write.c...

2016-03-02 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11457#discussion_r54846091 --- Diff: R/pkg/inst/tests/testthat/test_context.R --- @@ -26,7 +26,7 @@ test_that("Check masked functions", { maskedBySparkR

[GitHub] spark pull request: [SPARK-13543][SQL] Support for specifying comp...

2016-03-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11464#issuecomment-191644382 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13543][SQL] Support for specifying comp...

2016-03-02 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11464#discussion_r54843218 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -396,6 +402,46 @@ class CSVSuite extends

[GitHub] spark pull request: [SPARK-13543][SQL] Support for specifying comp...

2016-03-02 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11464#discussion_r54843475 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala --- @@ -148,6 +148,19 @@ private[sql] class

[GitHub] spark pull request: [SPARK-13543][SQL] Support for specifying comp...

2016-03-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11464#issuecomment-191168259 I will handle the consistent stuff in a new PR based on https://github.com/apache/spark/pull/11464. --- If your project is set up for it, you can reply

<    1   2   3   4   5   6   7   8   9   10   >