subject:"\[jira\] \[Commented\] \(SPARK\-20851\) Drop spark table failed if a column name is a numeric string"

[jira] [Commented] (SPARK-20851) Drop spark table failed if a column name is a numeric string

2017-06-14 Thread Chen Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16050028#comment-16050028
 ] 

Chen Gong commented on SPARK-20851:
---

[~benyuel] [~maropu] Thanks for your comments. Root cause for this problem 
haven't been figured out, I will keep researching on this.

Some clue I am thinking right now is it may be due to mysql database where 
stores metadata of spark tables.  

> Drop spark table failed if a column name is a numeric string
> 
>
> Key: SPARK-20851
> URL: https://issues.apache.org/jira/browse/SPARK-20851
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: linux redhat
>Reporter: Chen Gong
>
> I tried to read a json file to a spark dataframe
> {noformat}
> df = spark.read.json('path.json')
> df.write.parquet('dataframe', compression='snappy')
> {noformat}
> However, there are some columns' names are numeric strings, such as 
> "989238883". Then I created spark sql table by using this
> {noformat}
> create table if not exists `a` using org.apache.spark.sql.parquet options 
> (path 'dataframe');  // It works well
> {noformat}
> But after created table, any operations, like select, drop table on this 
> table will raise the same exceptions below
> {noformat}
> org.apache.spark.SparkException: Cannot recognize hive type string: 
> array>,url:string,width:bigint>>,audit_id:bigint,author_id:bigint,body:string,brand_id:string,created_at:string,custom_ticket_fields:struct<49244727:string,51588527:string,51591767:string,51950848:string,51950868:string,51950888:string,51950928:string,52359587:string,55276747:string,56958227:string,57080067:string,57080667:string,57107727:string,57112447:string,57113207:string,57411128:string,57424648:string,57442588:string,62382188:string,74862088:string,74871788:string>,event_type:string,group_id:bigint,html_body:string,id:bigint,is_public:string,locale_id:string,organization_id:string,plain_body:string,previous_value:string,priority:string,public:boolean,rel:string,removed_tags:array,requester_id:bigint,satisfaction_probability:string,satisfaction_score:string,sla_policy:string,status:string,tags:array,ticket_form_id:string,type:string,via:string,via_reference_id:bigint>>
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.org$apache$spark$sql$hive$client$HiveClientImpl$$fromHiveColumn(HiveClientImpl.scala:785)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$10$$anonfun$7.apply(HiveClientImpl.scala:365)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$10$$anonfun$7.apply(HiveClientImpl.scala:365)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$10.apply(HiveClientImpl.scala:365)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$10.apply(HiveClientImpl.scala:361)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1.apply(HiveClientImpl.scala:361)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1.apply(HiveClientImpl.scala:359)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:283)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:230)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:229)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:272)
>   at 
>

[jira] [Commented] (SPARK-20851) Drop spark table failed if a column name is a numeric string

2017-06-06 Thread Ben P (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16038445#comment-16038445
 ] 

Ben P commented on SPARK-20851:
---

This issue seems to be handled/prevented in v2.0.1 as 2.0.1 throws below 
exception when spark.sql('create table...')

pyspark.sql.utils.ParseException: "\nmismatched input '123018231' expecting 
'>'(line 1, pos 7)\n\n== SQL 
==\nstruct<123018231:string,123121:bigint>\n---^^^\n"

> Drop spark table failed if a column name is a numeric string
> 
>
> Key: SPARK-20851
> URL: https://issues.apache.org/jira/browse/SPARK-20851
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: linux redhat
>Reporter: Chen Gong
>
> I tried to read a json file to a spark dataframe
> {noformat}
> df = spark.read.json('path.json')
> df.write.parquet('dataframe', compression='snappy')
> {noformat}
> However, there are some columns' names are numeric strings, such as 
> "989238883". Then I created spark sql table by using this
> {noformat}
> create table if not exists `a` using org.apache.spark.sql.parquet options 
> (path 'dataframe');  // It works well
> {noformat}
> But after created table, any operations, like select, drop table on this 
> table will raise the same exceptions below
> {noformat}
> org.apache.spark.SparkException: Cannot recognize hive type string: 
> array>,url:string,width:bigint>>,audit_id:bigint,author_id:bigint,body:string,brand_id:string,created_at:string,custom_ticket_fields:struct<49244727:string,51588527:string,51591767:string,51950848:string,51950868:string,51950888:string,51950928:string,52359587:string,55276747:string,56958227:string,57080067:string,57080667:string,57107727:string,57112447:string,57113207:string,57411128:string,57424648:string,57442588:string,62382188:string,74862088:string,74871788:string>,event_type:string,group_id:bigint,html_body:string,id:bigint,is_public:string,locale_id:string,organization_id:string,plain_body:string,previous_value:string,priority:string,public:boolean,rel:string,removed_tags:array,requester_id:bigint,satisfaction_probability:string,satisfaction_score:string,sla_policy:string,status:string,tags:array,ticket_form_id:string,type:string,via:string,via_reference_id:bigint>>
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.org$apache$spark$sql$hive$client$HiveClientImpl$$fromHiveColumn(HiveClientImpl.scala:785)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$10$$anonfun$7.apply(HiveClientImpl.scala:365)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$10$$anonfun$7.apply(HiveClientImpl.scala:365)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$10.apply(HiveClientImpl.scala:365)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$10.apply(HiveClientImpl.scala:361)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1.apply(HiveClientImpl.scala:361)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1.apply(HiveClientImpl.scala:359)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:283)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:230)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:229)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:272)
>   at 
>

[jira] [Commented] (SPARK-20851) Drop spark table failed if a column name is a numeric string

2017-05-24 Thread Chen Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16022658#comment-16022658
 ] 

Chen Gong commented on SPARK-20851:
---

[~maropu] Thanks for you reply. 

This scenario happened in our own cluster, however, when I tried this scenario 
on my own laptop in which spark is running in standalone mode with one node, no 
this issue at all. So I just list down simplest commands I used in our own 
cluster to reproduce the error, as well as the error log.

*Create a sample dataset*
{noformat}
echo '{"col1":"value", "col2":{"123121":10, "123018231":"rejlsoe"}}' > 
file.json// A simple example of data
hadoop fs -put file.json 
{noformat}

*Process dataset in pyspark*
{noformat}
df = spark.read.json()
df.printSchema()

root
 |-- col1: string (nullable = true)
 |-- col2: struct (nullable = true)
 ||-- 123018231: string (nullable = true)
 ||-- 123121: long (nullable = true)

df.createOrReplaceTempView('df')
spark.sql('create table test_tab like df')   // no error at all
{noformat}

Then I could not do any operations on this test_tab
{noformat}
spark.sql('select * from test_tab')
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/shopee/spark/python/pyspark/sql/session.py", line 541, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
  File 
"/home/shopee/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 
1133, in __call__
  File "/home/shopee/spark/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
  File "/home/shopee/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", 
line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o45.sql.
: org.apache.spark.SparkException: Cannot recognize hive type string: 
struct<123018231:string,123121:bigint>
at 
org.apache.spark.sql.hive.client.HiveClientImpl.org$apache$spark$sql$hive$client$HiveClientImpl$$fromHiveColumn(HiveClientImpl.scala:785)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$10$$anonfun$7.apply(HiveClientImpl.scala:365)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$10$$anonfun$7.apply(HiveClientImpl.scala:365)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$10.apply(HiveClientImpl.scala:365)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$10.apply(HiveClientImpl.scala:361)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1.apply(HiveClientImpl.scala:361)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1.apply(HiveClientImpl.scala:359)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:283)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:230)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:229)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:272)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.getTableOption(HiveClientImpl.scala:359)
at 
org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:76)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)
at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:110)
at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:110)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:95)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:109)
at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:601)
at

[jira] [Commented] (SPARK-20851) Drop spark table failed if a column name is a numeric string

2017-05-23 Thread Takeshi Yamamuro (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16021348#comment-16021348
 ] 

Takeshi Yamamuro commented on SPARK-20851:
--

I think we could use numeric strings as column names and we could drop them;
{code}
scala> sql("CREATE TABLE t(`1` INT)")
scala> sql("INSERT INTO t VALUES(1)")
scala> sql("SELECT * FROM t").show
+---+
|  1|
+---+
|  1|
+---+

scala> sql("DROP TABLE t")
{code}
Could you describe more and show us a simpler query to reproduce this? Thanks.

> Drop spark table failed if a column name is a numeric string
> 
>
> Key: SPARK-20851
> URL: https://issues.apache.org/jira/browse/SPARK-20851
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: linux redhat
>Reporter: Chen Gong
>
> I tried to read a json file to a spark dataframe
> {noformat}
> df = spark.read.json('path.json')
> df.write.parquet('dataframe', compression='snappy')
> {noformat}
> However, there are some columns' names are numeric strings, such as 
> "989238883". Then I created spark sql table by using this
> {noformat}
> create table if not exists `a` using org.apache.spark.sql.parquet options 
> (path 'dataframe');  // It works well
> {noformat}
> But after created table, any operations, like select, drop table on this 
> table will raise the same exceptions below
> {noformat}
> org.apache.spark.SparkException: Cannot recognize hive type string: 
> array>,url:string,width:bigint>>,audit_id:bigint,author_id:bigint,body:string,brand_id:string,created_at:string,custom_ticket_fields:struct<49244727:string,51588527:string,51591767:string,51950848:string,51950868:string,51950888:string,51950928:string,52359587:string,55276747:string,56958227:string,57080067:string,57080667:string,57107727:string,57112447:string,57113207:string,57411128:string,57424648:string,57442588:string,62382188:string,74862088:string,74871788:string>,event_type:string,group_id:bigint,html_body:string,id:bigint,is_public:string,locale_id:string,organization_id:string,plain_body:string,previous_value:string,priority:string,public:boolean,rel:string,removed_tags:array,requester_id:bigint,satisfaction_probability:string,satisfaction_score:string,sla_policy:string,status:string,tags:array,ticket_form_id:string,type:string,via:string,via_reference_id:bigint>>
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.org$apache$spark$sql$hive$client$HiveClientImpl$$fromHiveColumn(HiveClientImpl.scala:785)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$10$$anonfun$7.apply(HiveClientImpl.scala:365)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$10$$anonfun$7.apply(HiveClientImpl.scala:365)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$10.apply(HiveClientImpl.scala:365)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$10.apply(HiveClientImpl.scala:361)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1.apply(HiveClientImpl.scala:361)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1.apply(HiveClientImpl.scala:359)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:283)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:230)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:229)
>   at 
>

[jira] [Commented] (SPARK-20851) Drop spark table failed if a column name is a numeric string

[jira] [Commented] (SPARK-20851) Drop spark table failed if a column name is a numeric string

[jira] [Commented] (SPARK-20851) Drop spark table failed if a column name is a numeric string

[jira] [Commented] (SPARK-20851) Drop spark table failed if a column name is a numeric string

4 matches

Site Navigation

Mail list logo

Footer information