[jira] [Commented] (SPARK-11374) skip.header.line.count is ignored in HiveContext

2016-12-08 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15734150#comment-15734150
 ] 

Dongjoon Hyun commented on SPARK-11374:
---

For this issue, there is a discussion now on the PR. It seems that we can make 
a decision now, YES(Resolved) or NO(Wont Fix).

> skip.header.line.count is ignored in HiveContext
> 
>
> Key: SPARK-11374
> URL: https://issues.apache.org/jira/browse/SPARK-11374
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Daniel Haviv
>
> csv table in Hive which is configured to skip the header row using 
> TBLPROPERTIES("skip.header.line.count"="1").
> When querying from Hive the header row is not included in the data, but when 
> running the same query via HiveContext I get the header row.
> "show create table " via the HiveContext confirms that it is aware of the 
> setting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11374) skip.header.line.count is ignored in HiveContext

2016-09-11 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15482812#comment-15482812
 ] 

Dongjoon Hyun commented on SPARK-11374:
---

Which versions of Spark are you using now? For **one** line header removal, 
`spark-csv` package has the workaround for Spark 1.6.x and below. In addition, 
Spark 2.0 also supports that package natively.

https://github.com/databricks/spark-csv

If you want this as a SQL table option, we don't have a workaround.

> skip.header.line.count is ignored in HiveContext
> 
>
> Key: SPARK-11374
> URL: https://issues.apache.org/jira/browse/SPARK-11374
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Daniel Haviv
>
> csv table in Hive which is configured to skip the header row using 
> TBLPROPERTIES("skip.header.line.count"="1").
> When querying from Hive the header row is not included in the data, but when 
> running the same query via HiveContext I get the header row.
> "show create table " via the HiveContext confirms that it is aware of the 
> setting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11374) skip.header.line.count is ignored in HiveContext

2016-09-09 Thread Rahul Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15476343#comment-15476343
 ] 

Rahul Jain commented on SPARK-11374:


Hey guys, i am facing the same issue, just wondering if there is any workaround 
for that or if we can skip the first row somehow.

> skip.header.line.count is ignored in HiveContext
> 
>
> Key: SPARK-11374
> URL: https://issues.apache.org/jira/browse/SPARK-11374
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Daniel Haviv
>
> csv table in Hive which is configured to skip the header row using 
> TBLPROPERTIES("skip.header.line.count"="1").
> When querying from Hive the header row is not included in the data, but when 
> running the same query via HiveContext I get the header row.
> "show create table " via the HiveContext confirms that it is aware of the 
> setting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11374) skip.header.line.count is ignored in HiveContext

2016-08-14 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420503#comment-15420503
 ] 

Dongjoon Hyun commented on SPARK-11374:
---

Hi [~stephane.maa...@gmail.com],

Thank you for comments. Yep. I noticed that option too, but that seems more 
tricky. 

The current approach of Spark Scala API and my PR is checking if the 
partition's file start position is zero. So, it's not straight-forward to apply 
to footer option.

For this issue, I think it could be acceptable since Spark Scala API already 
supports `header` option.

However, for the `footer` option, I think we need a new JIRA issue to get some 
attention and to build consensus for that option.

Thanks,
Dongjoon.

> skip.header.line.count is ignored in HiveContext
> 
>
> Key: SPARK-11374
> URL: https://issues.apache.org/jira/browse/SPARK-11374
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Daniel Haviv
>
> csv table in Hive which is configured to skip the header row using 
> TBLPROPERTIES("skip.header.line.count"="1").
> When querying from Hive the header row is not included in the data, but when 
> running the same query via HiveContext I get the header row.
> "show create table " via the HiveContext confirms that it is aware of the 
> setting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11374) skip.header.line.count is ignored in HiveContext

2016-08-14 Thread Stephane Maarek (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420491#comment-15420491
 ] 

Stephane Maarek commented on SPARK-11374:
-

Hi,

Thanks for the PR. Can you also test for the footer option? Might as well
solve both issues

Thanks
Stéphane




> skip.header.line.count is ignored in HiveContext
> 
>
> Key: SPARK-11374
> URL: https://issues.apache.org/jira/browse/SPARK-11374
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Daniel Haviv
>
> csv table in Hive which is configured to skip the header row using 
> TBLPROPERTIES("skip.header.line.count"="1").
> When querying from Hive the header row is not included in the data, but when 
> running the same query via HiveContext I get the header row.
> "show create table " via the HiveContext confirms that it is aware of the 
> setting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11374) skip.header.line.count is ignored in HiveContext

2016-08-14 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420489#comment-15420489
 ] 

Apache Spark commented on SPARK-11374:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/14638

> skip.header.line.count is ignored in HiveContext
> 
>
> Key: SPARK-11374
> URL: https://issues.apache.org/jira/browse/SPARK-11374
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Daniel Haviv
>
> csv table in Hive which is configured to skip the header row using 
> TBLPROPERTIES("skip.header.line.count"="1").
> When querying from Hive the header row is not included in the data, but when 
> running the same query via HiveContext I get the header row.
> "show create table " via the HiveContext confirms that it is aware of the 
> setting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11374) skip.header.line.count is ignored in HiveContext

2016-04-13 Thread Stephane Maarek (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238689#comment-15238689
 ] 

Stephane Maarek commented on SPARK-11374:
-

any updates on this?
Just some log:

{code}

CREATE SCHEMA IF NOT EXISTS spark_testing;
DROP TABLE IF EXISTS spark_testing.test_csv_2;
CREATE EXTERNAL TABLE `spark_testing.test_csv_2`(
  column_1 varchar(10),
  column_2 decimal(4,2))
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE LOCATION '/spark_testing_2'
TBLPROPERTIES('serialization.null.format'='', "skip.header.line.count"="1");
select * from spark_testing.test_csv_2;

hive> select * from spark_testing.test_csv_2;
OK
NULL3
{code}

spark:

{code}

scala> sqlContext.sql("select * from spark_testing.test_csv_2").show()

+++
|column_1|column_2|
+++
|   a|null|
|null|3.00|
+++

{code}

That's a big problem

> skip.header.line.count is ignored in HiveContext
> 
>
> Key: SPARK-11374
> URL: https://issues.apache.org/jira/browse/SPARK-11374
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Daniel Haviv
>
> csv table in Hive which is configured to skip the header row using 
> TBLPROPERTIES("skip.header.line.count"="1").
> When querying from Hive the header row is not included in the data, but when 
> running the same query via HiveContext I get the header row.
> "show create table " via the HiveContext confirms that it is aware of the 
> setting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11374) skip.header.line.count is ignored in HiveContext

2016-04-06 Thread Stephane Maarek (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229401#comment-15229401
 ] 

Stephane Maarek commented on SPARK-11374:
-

I may add that more metadata isn't processed, namely TBLPROPERTIES 
('serialization.null.format'='')
Also, another issue (may still be related to Spark not reading Hive Metadata or 
not properly using Hive), but if you create a csv with the following (spaces 
intended)

1, 2,3
4, 5,6

use Hive as this:
CREATE EXTERNAL TABLE `my_table`(
  `c1` DECIMAL,
  `c2` DECIMAL,
  `c3` DECIMAL) ... etc

select * from my_table will return in Hive
1,2,3
4,5,6

But using a hive context, in Spark
1,null,3
4,null,6

> skip.header.line.count is ignored in HiveContext
> 
>
> Key: SPARK-11374
> URL: https://issues.apache.org/jira/browse/SPARK-11374
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Daniel Haviv
>
> csv table in Hive which is configured to skip the header row using 
> TBLPROPERTIES("skip.header.line.count"="1").
> When querying from Hive the header row is not included in the data, but when 
> running the same query via HiveContext I get the header row.
> "show create table " via the HiveContext confirms that it is aware of the 
> setting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org