[jira] [Created] (SPARK-27406) UnsafeArrayData serialization breaks when two machines have different Oops size

2019-04-07 Thread peng bo (JIRA)
peng bo created SPARK-27406:
---

 Summary: UnsafeArrayData serialization breaks when two machines 
have different Oops size
 Key: SPARK-27406
 URL: https://issues.apache.org/jira/browse/SPARK-27406
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.1
Reporter: peng bo


java.lang.NullPointerException
at 
org.apache.spark.sql.catalyst.expressions.aggregate.ApproxCountDistinctForIntervals$$anonfun$endpoints$1.apply(ApproxCountDistinctForIntervals.scala:69)
at 
org.apache.spark.sql.catalyst.expressions.aggregate.ApproxCountDistinctForIntervals$$anonfun$endpoints$1.apply(ApproxCountDistinctForIntervals.scala:69)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at 
org.apache.spark.sql.catalyst.expressions.aggregate.ApproxCountDistinctForIntervals.endpoints$lzycompute(ApproxCountDistinctForIntervals.scala:69)
at 
org.apache.spark.sql.catalyst.expressions.aggregate.ApproxCountDistinctForIntervals.endpoints(ApproxCountDistinctForIntervals.scala:66)
at 
org.apache.spark.sql.catalyst.expressions.aggregate.ApproxCountDistinctForIntervals.org$apache$spark$sql$catalyst$expressions$aggregate$ApproxCountDistinctForIntervals$$hllppArray$lzycompute(ApproxCountDistinctForIntervals.scala:94)
at 
org.apache.spark.sql.catalyst.expressions.aggregate.ApproxCountDistinctForIntervals.org$apache$spark$sql$catalyst$expressions$aggregate$ApproxCountDistinctForIntervals$$hllppArray(ApproxCountDistinctForIntervals.scala:93)
at 
org.apache.spark.sql.catalyst.expressions.aggregate.ApproxCountDistinctForIntervals.org$apache$spark$sql$catalyst$expressions$aggregate$ApproxCountDistinctForIntervals$$numWordsPerHllpp$lzycompute(ApproxCountDistinctForIntervals.scala:104)
at 
org.apache.spark.sql.catalyst.expressions.aggregate.ApproxCountDistinctForIntervals.org$apache$spark$sql$catalyst$expressions$aggregate$ApproxCountDistinctForIntervals$$numWordsPerHllpp(ApproxCountDistinctForIntervals.scala:104)
at 
org.apache.spark.sql.catalyst.expressions.aggregate.ApproxCountDistinctForIntervals.totalNumWords$lzycompute(ApproxCountDistinctForIntervals.scala:106)
at 
org.apache.spark.sql.catalyst.expressions.aggregate.ApproxCountDistinctForIntervals.totalNumWords(ApproxCountDistinctForIntervals.scala:106)
at 
org.apache.spark.sql.catalyst.expressions.aggregate.ApproxCountDistinctForIntervals.createAggregationBuffer(ApproxCountDistinctForIntervals.scala:110)
at 
org.apache.spark.sql.catalyst.expressions.aggregate.ApproxCountDistinctForIntervals.createAggregationBuffer(ApproxCountDistinctForIntervals.scala:44)
at 
org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.initialize(interfaces.scala:528)
at 
org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator$$anonfun$initAggregationBuffer$2.apply(ObjectAggregationIterator.scala:120)
at 
org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator$$anonfun$initAggregationBuffer$2.apply(ObjectAggregationIterator.scala:120)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at 
org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.initAggregationBuffer(ObjectAggregationIterator.scala:120)
at 
org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.org$apache$spark$sql$execution$aggregate$ObjectAggregationIterator$$createNewAggregationBuffer(ObjectAggregationIterator.scala:112)
at 
org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.getAggregationBufferByKey(ObjectAggregationIterator.scala:128)
at 
org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.processInputs(ObjectAggregationIterator.scala:150)
at 
org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.(ObjectAggregationIterator.scala:78)
at 
org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2.apply(ObjectHashAggregateExec.scala:114)
at 
org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2.apply(ObjectHashAggregateExec.scala:105)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$12.apply(RDD.scala:823)
at 

[jira] [Commented] (SPARK-27289) spark-submit explicit configuration does not take effect but Spark UI shows it's effective

2019-04-07 Thread KaiXu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812103#comment-16812103
 ] 

KaiXu commented on SPARK-27289:
---

Do you check where the intermediate shuffle data was wrote while changing the 
spark.local.dir?  BTW, changes in spark-defaults.conf seems need to restart to 
take effect.

> spark-submit explicit configuration does not take effect but Spark UI shows 
> it's effective
> --
>
> Key: SPARK-27289
> URL: https://issues.apache.org/jira/browse/SPARK-27289
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Documentation, Spark Submit, Web UI
>Affects Versions: 2.3.3
>Reporter: KaiXu
>Priority: Minor
> Attachments: Capture.PNG
>
>
> The [doc 
> |https://spark.apache.org/docs/latest/submitting-applications.html]says that  
> "In general, configuration values explicitly set on a {{SparkConf}} take the 
> highest precedence, then flags passed to {{spark-submit}}, then values in the 
> defaults file", but when setting spark.local.dir through --conf with 
> spark-submit, it still uses the values from 
> ${SPARK_HOME}/conf/spark-defaults.conf, what's more, the Spark runtime UI 
> environment variables shows the value from --conf, which is really misleading.
> e.g.
> I set submit my application through the command:
> /opt/spark233/bin/spark-submit --properties-file /opt/spark.conf --conf 
> spark.local.dir=/tmp/spark_local -v --class 
> org.apache.spark.examples.mllib.SparseNaiveBayes --master 
> spark://bdw-slave20:7077 
> /opt/sparkbench/assembly/target/sparkbench-assembly-7.1-SNAPSHOT-dist.jar 
> hdfs://bdw-slave20:8020/Bayes/Input
>  
> the spark.local.dir in ${SPARK_HOME}/conf/spark-defaults.conf is:
> spark.local.dir=/mnt/nvme1/spark_local
> when the application is running, I found the intermediate shuffle data was 
> wrote to /mnt/nvme1/spark_local, which is set through 
> ${SPARK_HOME}/conf/spark-defaults.conf, but the Web UI shows that the 
> environment value spark.local.dir=/tmp/spark_local.
> The spark-submit verbose also shows spark.local.dir=/tmp/spark_local, it's 
> misleading. 
>  
> !image-2019-03-27-10-59-38-377.png!
> spark-submit verbose:
> 
> Spark properties used, including those specified through
>  --conf and those from the properties file /opt/spark.conf:
>  (spark.local.dir,/tmp/spark_local)
>  (spark.default.parallelism,132)
>  (spark.driver.memory,10g)
>  (spark.executor.memory,352g)
> X



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27403) Failed to update the table size automatically even though spark.sql.statistics.size.autoUpdate.enabled is set as rue

2019-04-07 Thread Sujith Chacko (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith Chacko updated SPARK-27403:
--
Affects Version/s: 2.3.0
   2.3.1
   2.3.2
   2.3.3
   2.4.0

> Failed to update the table size automatically even though 
> spark.sql.statistics.size.autoUpdate.enabled is set as rue
> 
>
> Key: SPARK-27403
> URL: https://issues.apache.org/jira/browse/SPARK-27403
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1
>Reporter: Sujith Chacko
>Priority: Major
>
> system shall update the table stats automatically if user set 
> spark.sql.statistics.size.autoUpdate.enabled as true, currently this property 
> is not having any significance even if it is enabled or disabled. This 
> feature is similar to Hives auto-gather feature where statistics are 
> automatically computed by default if this feature is enabled.
> Reference:
> [https://cwiki.apache.org/confluence/display/Hive/StatsDev]
> Reproducing steps:
> scala> spark.sql("create table table1 (name string,age int) stored as 
> parquet")
> scala> spark.sql("insert into table1 select 'a',29")
>  res2: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("desc extended table1").show(false)
>  
> +---+---++---
> |col_name|data_type|comment|
> +---+---++---
> |name|string|null|
> |age|int|null|
> | | | |
> | # Detailed Table Information| | |
> |Database|default| |
> |Table|table1| |
> |Owner|Administrator| |
> |Created Time|Sun Apr 07 23:41:56 IST 2019| |
> |Last Access|Thu Jan 01 05:30:00 IST 1970| |
> |Created By|Spark 2.4.1| |
> |Type|MANAGED| |
> |Provider|hive| |
> |Table Properties|[transient_lastDdlTime=1554660716]| |
> |Location|file:/D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/table1| |
> |Serde Library|org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe| |
> |InputFormat|org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat| |
> |OutputFormat|org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat| 
> |
> |Storage Properties|[serialization.format=1]| |
> |Partition Provider|Catalog| |
> +---+---++---



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-27403) Failed to update the table size automatically even though spark.sql.statistics.size.autoUpdate.enabled is set as rue

2019-04-07 Thread Sujith Chacko (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812095#comment-16812095
 ] 

Sujith Chacko edited comment on SPARK-27403 at 4/8/19 4:28 AM:
---

This has impact in the previous version also, will update the JIRA


was (Author: s71955):
This has impact in the previous version also, will upate the JIRA

> Failed to update the table size automatically even though 
> spark.sql.statistics.size.autoUpdate.enabled is set as rue
> 
>
> Key: SPARK-27403
> URL: https://issues.apache.org/jira/browse/SPARK-27403
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: Sujith Chacko
>Priority: Major
>
> system shall update the table stats automatically if user set 
> spark.sql.statistics.size.autoUpdate.enabled as true, currently this property 
> is not having any significance even if it is enabled or disabled. This 
> feature is similar to Hives auto-gather feature where statistics are 
> automatically computed by default if this feature is enabled.
> Reference:
> [https://cwiki.apache.org/confluence/display/Hive/StatsDev]
> Reproducing steps:
> scala> spark.sql("create table table1 (name string,age int) stored as 
> parquet")
> scala> spark.sql("insert into table1 select 'a',29")
>  res2: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("desc extended table1").show(false)
>  
> +---+---++---
> |col_name|data_type|comment|
> +---+---++---
> |name|string|null|
> |age|int|null|
> | | | |
> | # Detailed Table Information| | |
> |Database|default| |
> |Table|table1| |
> |Owner|Administrator| |
> |Created Time|Sun Apr 07 23:41:56 IST 2019| |
> |Last Access|Thu Jan 01 05:30:00 IST 1970| |
> |Created By|Spark 2.4.1| |
> |Type|MANAGED| |
> |Provider|hive| |
> |Table Properties|[transient_lastDdlTime=1554660716]| |
> |Location|file:/D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/table1| |
> |Serde Library|org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe| |
> |InputFormat|org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat| |
> |OutputFormat|org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat| 
> |
> |Storage Properties|[serialization.format=1]| |
> |Partition Provider|Catalog| |
> +---+---++---



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27403) Failed to update the table size automatically even though spark.sql.statistics.size.autoUpdate.enabled is set as rue

2019-04-07 Thread Sujith Chacko (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812095#comment-16812095
 ] 

Sujith Chacko commented on SPARK-27403:
---

This has impact in the previous version also, will upate the JIRA

> Failed to update the table size automatically even though 
> spark.sql.statistics.size.autoUpdate.enabled is set as rue
> 
>
> Key: SPARK-27403
> URL: https://issues.apache.org/jira/browse/SPARK-27403
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: Sujith Chacko
>Priority: Major
>
> system shall update the table stats automatically if user set 
> spark.sql.statistics.size.autoUpdate.enabled as true, currently this property 
> is not having any significance even if it is enabled or disabled. This 
> feature is similar to Hives auto-gather feature where statistics are 
> automatically computed by default if this feature is enabled.
> Reference:
> [https://cwiki.apache.org/confluence/display/Hive/StatsDev]
> Reproducing steps:
> scala> spark.sql("create table table1 (name string,age int) stored as 
> parquet")
> scala> spark.sql("insert into table1 select 'a',29")
>  res2: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("desc extended table1").show(false)
>  
> +---+---++---
> |col_name|data_type|comment|
> +---+---++---
> |name|string|null|
> |age|int|null|
> | | | |
> | # Detailed Table Information| | |
> |Database|default| |
> |Table|table1| |
> |Owner|Administrator| |
> |Created Time|Sun Apr 07 23:41:56 IST 2019| |
> |Last Access|Thu Jan 01 05:30:00 IST 1970| |
> |Created By|Spark 2.4.1| |
> |Type|MANAGED| |
> |Provider|hive| |
> |Table Properties|[transient_lastDdlTime=1554660716]| |
> |Location|file:/D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/table1| |
> |Serde Library|org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe| |
> |InputFormat|org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat| |
> |OutputFormat|org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat| 
> |
> |Storage Properties|[serialization.format=1]| |
> |Partition Provider|Catalog| |
> +---+---++---



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25348) Data source for binary files

2019-04-07 Thread Weichen Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812090#comment-16812090
 ] 

Weichen Xu commented on SPARK-25348:


I am working on this. :)

> Data source for binary files
> 
>
> Key: SPARK-25348
> URL: https://issues.apache.org/jira/browse/SPARK-25348
> Project: Spark
>  Issue Type: Story
>  Components: ML, SQL
>Affects Versions: 3.0.0
>Reporter: Xiangrui Meng
>Priority: Major
>
> It would be useful to have a data source implementation for binary files, 
> which can be used to build features to load images, audio, and videos.
> Microsoft has an implementation at 
> [https://github.com/Azure/mmlspark/tree/master/src/io/binary.] It would be 
> great if we can merge it into Spark main repo.
> cc: [~mhamilton] and [~imatiach]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23831) Add org.apache.derby to IsolatedClientLoader

2019-04-07 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-23831:

Description: 
Add org.apache.derby to IsolatedClientLoader,otherwise it may throw an 
exception:
{noformat}
[info] Cause: java.sql.SQLException: Failed to start database 'metastore_db' 
with class loader 
org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@2439ab23, see the 
next exception for details.
[info] at 
org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
[info] at 
org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
[info] at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown Source)
[info] at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(Unknown 
Source)
[info] at org.apache.derby.impl.jdbc.EmbedConnection.(Unknown Source)
[info] at org.apache.derby.jdbc.InternalDriver$1.run(Unknown Source)
{noformat}
How to reproduce:
{noformat}
build/sbt clean package -Phive -Phive-thriftserver
export SPARK_PREPEND_CLASSES=true
bin/spark-sql --conf spark.sql.hive.metastore.version=2.3.4 --conf 
spark.sql.hive.metastore.jars=maven -e "create table t1 as select 1 as c"
{noformat}

  was:
Add org.apache.derby to IsolatedClientLoader,otherwise it may throw an 
exception:
{noformat}
[info] Cause: java.sql.SQLException: Failed to start database 'metastore_db' 
with class loader 
org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@2439ab23, see the 
next exception for details.
[info] at 
org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
[info] at 
org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
[info] at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown Source)
[info] at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(Unknown 
Source)
[info] at org.apache.derby.impl.jdbc.EmbedConnection.(Unknown Source)
[info] at org.apache.derby.jdbc.InternalDriver$1.run(Unknown Source)
{noformat}
How to reproduce:
{noformat}
sed 's/HiveExternalCatalogSuite/HiveExternalCatalog2Suite/g' 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogSuite.scala
 > 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalog2Suite.scala

build/sbt -Phive "hive/test-only *.HiveExternalCatalogSuite 
*.HiveExternalCatalog2Suite"
{noformat}


> Add org.apache.derby to IsolatedClientLoader
> 
>
> Key: SPARK-23831
> URL: https://issues.apache.org/jira/browse/SPARK-23831
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> Add org.apache.derby to IsolatedClientLoader,otherwise it may throw an 
> exception:
> {noformat}
> [info] Cause: java.sql.SQLException: Failed to start database 'metastore_db' 
> with class loader 
> org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@2439ab23, see 
> the next exception for details.
> [info] at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
> [info] at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
> [info] at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown Source)
> [info] at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(Unknown 
> Source)
> [info] at org.apache.derby.impl.jdbc.EmbedConnection.(Unknown Source)
> [info] at org.apache.derby.jdbc.InternalDriver$1.run(Unknown Source)
> {noformat}
> How to reproduce:
> {noformat}
> build/sbt clean package -Phive -Phive-thriftserver
> export SPARK_PREPEND_CLASSES=true
> bin/spark-sql --conf spark.sql.hive.metastore.version=2.3.4 --conf 
> spark.sql.hive.metastore.jars=maven -e "create table t1 as select 1 as c"
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27405) Restrict the range of generated random timestamps

2019-04-07 Thread Maxim Gekk (JIRA)
Maxim Gekk created SPARK-27405:
--

 Summary: Restrict the range of generated random timestamps
 Key: SPARK-27405
 URL: https://issues.apache.org/jira/browse/SPARK-27405
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Affects Versions: 2.4.0
Reporter: Maxim Gekk


The timestampLiteralGen of LiteralGenerator can produce instances of 
java.sql.Timestamp that cause Long arithmetic overflow on conversion 
milliseconds to microseconds. The former conversion is performed because 
Catalyst's Timestamp type stores microseconds since epoch internally. The 
ticket aims to restrict the range of generated random timestamps to 
[Long.MaxValue / 1000, Long.MinValue/ 1000].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27403) Failed to update the table size automatically even though spark.sql.statistics.size.autoUpdate.enabled is set as rue

2019-04-07 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811959#comment-16811959
 ] 

Dongjoon Hyun commented on SPARK-27403:
---

Hi, [~S71955]. Could you check Spark 2.3.x behavior and update the affected 
versions? The existing code lands at 2.3.0 by SPARK-21237 .

> Failed to update the table size automatically even though 
> spark.sql.statistics.size.autoUpdate.enabled is set as rue
> 
>
> Key: SPARK-27403
> URL: https://issues.apache.org/jira/browse/SPARK-27403
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: Sujith Chacko
>Priority: Major
>
> system shall update the table stats automatically if user set 
> spark.sql.statistics.size.autoUpdate.enabled as true, currently this property 
> is not having any significance even if it is enabled or disabled. This 
> feature is similar to Hives auto-gather feature where statistics are 
> automatically computed by default if this feature is enabled.
> Reference:
> [https://cwiki.apache.org/confluence/display/Hive/StatsDev]
> Reproducing steps:
> scala> spark.sql("create table table1 (name string,age int) stored as 
> parquet")
> scala> spark.sql("insert into table1 select 'a',29")
>  res2: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("desc extended table1").show(false)
>  
> +---+---++---
> |col_name|data_type|comment|
> +---+---++---
> |name|string|null|
> |age|int|null|
> | | | |
> | # Detailed Table Information| | |
> |Database|default| |
> |Table|table1| |
> |Owner|Administrator| |
> |Created Time|Sun Apr 07 23:41:56 IST 2019| |
> |Last Access|Thu Jan 01 05:30:00 IST 1970| |
> |Created By|Spark 2.4.1| |
> |Type|MANAGED| |
> |Provider|hive| |
> |Table Properties|[transient_lastDdlTime=1554660716]| |
> |Location|file:/D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/table1| |
> |Serde Library|org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe| |
> |InputFormat|org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat| |
> |OutputFormat|org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat| 
> |
> |Storage Properties|[serialization.format=1]| |
> |Partition Provider|Catalog| |
> +---+---++---



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27404) Fix build warnings for 3.0: postfixOps edition

2019-04-07 Thread Sean Owen (JIRA)
Sean Owen created SPARK-27404:
-

 Summary: Fix build warnings for 3.0: postfixOps edition
 Key: SPARK-27404
 URL: https://issues.apache.org/jira/browse/SPARK-27404
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL, Structured Streaming, YARN
Affects Versions: 3.0.0
Reporter: Sean Owen
Assignee: Sean Owen


I'd like to fix various build warnings showing in the build right now -- see 
the upcoming PR for details as they are varied and small.

However while fixing warnings about use of postfix notation (i.e. "foo bar" 
instead of "foo.bar"), I'd like to just remove use of postfix entirely to 
standardize. They aren't deprecated exactly, but seemed to be frowned upon as 
usually adding more confusion than clarify 
(https://contributors.scala-lang.org/t/lets-drop-postfix-operators/1457) and 
have to be enabled by importing scala.language.postfixOps to avoid warnings.

I find that use of scalatest postfix syntax doesn't cause warnings, and that's 
normal usage for scalatest, so will leave that. "0 until n" syntax also doesn't 
trigger the warnings, it seems. But things like "10 seconds" do, and can be 
"10.seconds".

Part of the reason I went ahead in changing that is that we have many instances 
of things like "12 milliseconds" in the code, which are simpler as 
"2.minutes" anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-27403) Failed to update the table size automatically even though spark.sql.statistics.size.autoUpdate.enabled is set as rue

2019-04-07 Thread Sujith Chacko (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811940#comment-16811940
 ] 

Sujith Chacko edited comment on SPARK-27403 at 4/7/19 6:33 PM:
---

if user set spark.sql.statistics.size.autoUpdate.enabled as true, then system 
shall calculate the table size and record the same in metastore, 

On describe command the statistics shall be displayed

I will analyze further and raise a PR for handling the issue. please let me 
know for any suggestions. thanks


was (Author: s71955):
I will analyze further and raise a PR for handling the issue. please let me 
know for any suggestions. thanks

> Failed to update the table size automatically even though 
> spark.sql.statistics.size.autoUpdate.enabled is set as rue
> 
>
> Key: SPARK-27403
> URL: https://issues.apache.org/jira/browse/SPARK-27403
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: Sujith Chacko
>Priority: Major
>
> system shall update the table stats automatically if user set 
> spark.sql.statistics.size.autoUpdate.enabled as true, currently this property 
> is not having any significance even if it is enabled or disabled. This 
> feature is similar to Hives auto-gather feature where statistics are 
> automatically computed by default if this feature is enabled.
> Reference:
> [https://cwiki.apache.org/confluence/display/Hive/StatsDev]
> Reproducing steps:
> scala> spark.sql("create table table1 (name string,age int) stored as 
> parquet")
> scala> spark.sql("insert into table1 select 'a',29")
>  res2: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("desc extended table1").show(false)
>  
> +---+---++---
> |col_name|data_type|comment|
> +---+---++---
> |name|string|null|
> |age|int|null|
> | | | |
> | # Detailed Table Information| | |
> |Database|default| |
> |Table|table1| |
> |Owner|Administrator| |
> |Created Time|Sun Apr 07 23:41:56 IST 2019| |
> |Last Access|Thu Jan 01 05:30:00 IST 1970| |
> |Created By|Spark 2.4.1| |
> |Type|MANAGED| |
> |Provider|hive| |
> |Table Properties|[transient_lastDdlTime=1554660716]| |
> |Location|file:/D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/table1| |
> |Serde Library|org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe| |
> |InputFormat|org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat| |
> |OutputFormat|org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat| 
> |
> |Storage Properties|[serialization.format=1]| |
> |Partition Provider|Catalog| |
> +---+---++---



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27403) Failed to update the table size automatically even though spark.sql.statistics.size.autoUpdate.enabled is set as rue

2019-04-07 Thread Sujith Chacko (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith Chacko updated SPARK-27403:
--
Description: 
system shall update the table stats automatically if user set 
spark.sql.statistics.size.autoUpdate.enabled as true, currently this property 
is not having any significance even if it is enabled or disabled. This feature 
is similar to Hives auto-gather feature where statistics are automatically 
computed by default if this feature is enabled.

Reference:

[https://cwiki.apache.org/confluence/display/Hive/StatsDev]

Reproducing steps:

scala> spark.sql("create table table1 (name string,age int) stored as 
parquet")scala> spark.sql("insert into table1 select 'a',29")
 res2: org.apache.spark.sql.DataFrame = []

scala> spark.sql("desc extended table1").show(false)
 
+--+++---
|col_name|data_type|comment|

+--+++---
|name|string|null|
|age|int|null|
| | | |
| # Detailed Table Information| | |
|Database|default| |
|Table|table1| |
|Owner|Administrator| |
|Created Time|Sun Apr 07 23:41:56 IST 2019| |
|Last Access|Thu Jan 01 05:30:00 IST 1970| |
|Created By|Spark 2.4.1| |
|Type|MANAGED| |
|Provider|hive| |
|Table Properties|[transient_lastDdlTime=1554660716]| |
|Location|file:/D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/table1| |
|Serde Library|org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe| |
|InputFormat|org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat| |
|OutputFormat|org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat| |
|Storage Properties|[serialization.format=1]| |
|Partition Provider|Catalog| |

+--+++---

  was:
system shall update the table stats automatiaclly if user set 
spark.sql.statistics.size.autoUpdate.enabled as true, currently this property 
is not having any significance even if it is anabled or disabled. This feature 
is similar to Hives auto-gather feature where statistics are automatically 
computed by default if this feature is enabled.

Reference:

[https://cwiki.apache.org/confluence/display/Hive/StatsDev]

Reproducing steps:

scala> spark.sql("create table table1 (name string,age int) stored as 
parquet")scala> spark.sql("insert into table1 select 'a',29")
 res2: org.apache.spark.sql.DataFrame = []

scala> spark.sql("desc extended table1").show(false)
 
+-+-++---
|col_name|data_type|comment|

+-+-++---
|name|string|null|
|age|int|null|
| | | |
| # Detailed Table Information| | |
|Database|default| |
|Table|table1| |
|Owner|Administrator| |
|Created Time|Sun Apr 07 23:41:56 IST 2019| |
|Last Access|Thu Jan 01 05:30:00 IST 1970| |
|Created By|Spark 2.4.1| |
|Type|MANAGED| |
|Provider|hive| |
|Table Properties|[transient_lastDdlTime=1554660716]| |
|Location|file:/D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/table1| |
|Serde Library|org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe| |
|InputFormat|org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat| |
|OutputFormat|org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat| |
|Storage Properties|[serialization.format=1]| |
|Partition Provider|Catalog| |

+-+-++---


> Failed to update the table size automatically even though 
> spark.sql.statistics.size.autoUpdate.enabled is set as rue
> 
>
> Key: SPARK-27403
> URL: https://issues.apache.org/jira/browse/SPARK-27403
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: Sujith Chacko
>Priority: Major
>
> system shall update the table stats automatically if user set 
> spark.sql.statistics.size.autoUpdate.enabled as true, currently this property 
> is not having any significance even if it is enabled or disabled. This 
> feature is similar to Hives auto-gather feature where statistics are 
> automatically computed by default if this feature is enabled.
> Reference:
> [https://cwiki.apache.org/confluence/display/Hive/StatsDev]
> Reproducing steps:
> scala> spark.sql("create table table1 (name string,age int) stored as 
> parquet")scala> spark.sql("insert into table1 select 'a',29")
>  res2: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("desc extended table1").show(false)
>  
> 

[jira] [Updated] (SPARK-27403) Failed to update the table size automatically even though spark.sql.statistics.size.autoUpdate.enabled is set as rue

2019-04-07 Thread Sujith Chacko (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith Chacko updated SPARK-27403:
--
Description: 
system shall update the table stats automatically if user set 
spark.sql.statistics.size.autoUpdate.enabled as true, currently this property 
is not having any significance even if it is enabled or disabled. This feature 
is similar to Hives auto-gather feature where statistics are automatically 
computed by default if this feature is enabled.

Reference:

[https://cwiki.apache.org/confluence/display/Hive/StatsDev]

Reproducing steps:

scala> spark.sql("create table table1 (name string,age int) stored as parquet")

scala> spark.sql("insert into table1 select 'a',29")
 res2: org.apache.spark.sql.DataFrame = []

scala> spark.sql("desc extended table1").show(false)
 
+---+---++---
|col_name|data_type|comment|

+---+---++---
|name|string|null|
|age|int|null|
| | | |
| # Detailed Table Information| | |
|Database|default| |
|Table|table1| |
|Owner|Administrator| |
|Created Time|Sun Apr 07 23:41:56 IST 2019| |
|Last Access|Thu Jan 01 05:30:00 IST 1970| |
|Created By|Spark 2.4.1| |
|Type|MANAGED| |
|Provider|hive| |
|Table Properties|[transient_lastDdlTime=1554660716]| |
|Location|file:/D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/table1| |
|Serde Library|org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe| |
|InputFormat|org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat| |
|OutputFormat|org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat| |
|Storage Properties|[serialization.format=1]| |
|Partition Provider|Catalog| |

+---+---++---

  was:
system shall update the table stats automatically if user set 
spark.sql.statistics.size.autoUpdate.enabled as true, currently this property 
is not having any significance even if it is enabled or disabled. This feature 
is similar to Hives auto-gather feature where statistics are automatically 
computed by default if this feature is enabled.

Reference:

[https://cwiki.apache.org/confluence/display/Hive/StatsDev]

Reproducing steps:

scala> spark.sql("create table table1 (name string,age int) stored as 
parquet")scala> spark.sql("insert into table1 select 'a',29")
 res2: org.apache.spark.sql.DataFrame = []

scala> spark.sql("desc extended table1").show(false)
 
+--+++---
|col_name|data_type|comment|

+--+++---
|name|string|null|
|age|int|null|
| | | |
| # Detailed Table Information| | |
|Database|default| |
|Table|table1| |
|Owner|Administrator| |
|Created Time|Sun Apr 07 23:41:56 IST 2019| |
|Last Access|Thu Jan 01 05:30:00 IST 1970| |
|Created By|Spark 2.4.1| |
|Type|MANAGED| |
|Provider|hive| |
|Table Properties|[transient_lastDdlTime=1554660716]| |
|Location|file:/D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/table1| |
|Serde Library|org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe| |
|InputFormat|org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat| |
|OutputFormat|org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat| |
|Storage Properties|[serialization.format=1]| |
|Partition Provider|Catalog| |

+--+++---


> Failed to update the table size automatically even though 
> spark.sql.statistics.size.autoUpdate.enabled is set as rue
> 
>
> Key: SPARK-27403
> URL: https://issues.apache.org/jira/browse/SPARK-27403
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: Sujith Chacko
>Priority: Major
>
> system shall update the table stats automatically if user set 
> spark.sql.statistics.size.autoUpdate.enabled as true, currently this property 
> is not having any significance even if it is enabled or disabled. This 
> feature is similar to Hives auto-gather feature where statistics are 
> automatically computed by default if this feature is enabled.
> Reference:
> [https://cwiki.apache.org/confluence/display/Hive/StatsDev]
> Reproducing steps:
> scala> spark.sql("create table table1 (name string,age int) stored as 
> parquet")
> scala> spark.sql("insert into table1 select 'a',29")
>  res2: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("desc extended table1").show(false)
>  
> 

[jira] [Commented] (SPARK-27403) Failed to update the table size automatically even though spark.sql.statistics.size.autoUpdate.enabled is set as rue

2019-04-07 Thread Sujith Chacko (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811940#comment-16811940
 ] 

Sujith Chacko commented on SPARK-27403:
---

I will analyze further and raise a PR for handling the issue. please let me 
know for any suggestions. thanks

> Failed to update the table size automatically even though 
> spark.sql.statistics.size.autoUpdate.enabled is set as rue
> 
>
> Key: SPARK-27403
> URL: https://issues.apache.org/jira/browse/SPARK-27403
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: Sujith Chacko
>Priority: Major
>
> system shall update the table stats automatiaclly if user set 
> spark.sql.statistics.size.autoUpdate.enabled as true, currently this property 
> is not having any significance even if it is anabled or disabled. This 
> feature is similar to Hives auto-gather feature where statistics are 
> automatically computed by default if this feature is enabled.
> Reference:
> [https://cwiki.apache.org/confluence/display/Hive/StatsDev]
> Reproducing steps:
> scala> spark.sql("create table table1 (name string,age int) stored as 
> parquet")scala> spark.sql("insert into table1 select 'a',29")
>  res2: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("desc extended table1").show(false)
>  
> +-+-++---
> |col_name|data_type|comment|
> +-+-++---
> |name|string|null|
> |age|int|null|
> | | | |
> | # Detailed Table Information| | |
> |Database|default| |
> |Table|table1| |
> |Owner|Administrator| |
> |Created Time|Sun Apr 07 23:41:56 IST 2019| |
> |Last Access|Thu Jan 01 05:30:00 IST 1970| |
> |Created By|Spark 2.4.1| |
> |Type|MANAGED| |
> |Provider|hive| |
> |Table Properties|[transient_lastDdlTime=1554660716]| |
> |Location|file:/D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/table1| |
> |Serde Library|org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe| |
> |InputFormat|org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat| |
> |OutputFormat|org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat| 
> |
> |Storage Properties|[serialization.format=1]| |
> |Partition Provider|Catalog| |
> +-+-++---



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27403) Failed to update the table size automatically even though spark.sql.statistics.size.autoUpdate.enabled is set as rue

2019-04-07 Thread Sujith Chacko (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith Chacko updated SPARK-27403:
--
Description: 
system shall update the table stats automatiaclly if user set 
spark.sql.statistics.size.autoUpdate.enabled as true, currently this property 
is not having any significance even if it is anabled or disabled. This feature 
is similar to Hives auto-gather feature where statistics are automatically 
computed by default if this feature is enabled.

Reference:

[https://cwiki.apache.org/confluence/display/Hive/StatsDev]

Reproducing steps:

scala> spark.sql("create table table1 (name string,age int) stored as 
parquet")scala> spark.sql("insert into table1 select 'a',29")
 res2: org.apache.spark.sql.DataFrame = []

scala> spark.sql("desc extended table1").show(false)
 
+-+-++---
|col_name|data_type|comment|

+-+-++---
|name|string|null|
|age|int|null|
| | | |
| # Detailed Table Information| | |
|Database|default| |
|Table|table1| |
|Owner|Administrator| |
|Created Time|Sun Apr 07 23:41:56 IST 2019| |
|Last Access|Thu Jan 01 05:30:00 IST 1970| |
|Created By|Spark 2.4.1| |
|Type|MANAGED| |
|Provider|hive| |
|Table Properties|[transient_lastDdlTime=1554660716]| |
|Location|file:/D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/table1| |
|Serde Library|org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe| |
|InputFormat|org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat| |
|OutputFormat|org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat| |
|Storage Properties|[serialization.format=1]| |
|Partition Provider|Catalog| |

+-+-++---

  was:
scala> spark.sql("insert into table1 select 'a',29")
res2: org.apache.spark.sql.DataFrame = []

scala> spark.sql("desc extended table1").show(false)
++--+---+
|col_name |data_type |comment|
++--+---+
|name |string |null |
|age |int |null |
| | | |
|# Detailed Table Information| | |
|Database |default | |
|Table |table1 | |
|Owner |Administrator | |
|Created Time |Sun Apr 07 23:41:56 IST 2019 | |
|Last Access |Thu Jan 01 05:30:00 IST 1970 | |
|Created By |Spark 2.4.1 | |
|Type |MANAGED | |
|Provider |hive | |
|Table Properties |[transient_lastDdlTime=1554660716] | |
|Location |file:/D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/table1 | |
|Serde Library |org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | |
|InputFormat |org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat | |
|OutputFormat |org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat| |
|Storage Properties |[serialization.format=1] | |
|Partition Provider |Catalog | |
++--+---+


> Failed to update the table size automatically even though 
> spark.sql.statistics.size.autoUpdate.enabled is set as rue
> 
>
> Key: SPARK-27403
> URL: https://issues.apache.org/jira/browse/SPARK-27403
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: Sujith Chacko
>Priority: Major
>
> system shall update the table stats automatiaclly if user set 
> spark.sql.statistics.size.autoUpdate.enabled as true, currently this property 
> is not having any significance even if it is anabled or disabled. This 
> feature is similar to Hives auto-gather feature where statistics are 
> automatically computed by default if this feature is enabled.
> Reference:
> [https://cwiki.apache.org/confluence/display/Hive/StatsDev]
> Reproducing steps:
> scala> spark.sql("create table table1 (name string,age int) stored as 
> parquet")scala> spark.sql("insert into table1 select 'a',29")
>  res2: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("desc extended table1").show(false)
>  
> +-+-++---
> |col_name|data_type|comment|
> +-+-++---
> |name|string|null|
> |age|int|null|
> | | | |
> | # Detailed Table Information| | |
> |Database|default| |
> |Table|table1| |
> |Owner|Administrator| |
> |Created Time|Sun Apr 07 23:41:56 IST 2019| |
> |Last Access|Thu Jan 01 05:30:00 IST 1970| |
> |Created By|Spark 2.4.1| |
> |Type|MANAGED| |
> 

[jira] [Updated] (SPARK-27403) Failed to update the table size automatically even though spark.sql.statistics.size.autoUpdate.enabled is set as rue

2019-04-07 Thread Sujith Chacko (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith Chacko updated SPARK-27403:
--
Environment: (was: system shall able to update the table stats 
automatically if user sets  spark.sql.statistics.size.autoUpdate.enabled as 
true  , Currently this property doesnt have any significance even if it is 
enabled  or disabled.

Please follow the below steps to reproduce the issue

scala> spark.sql("create table x222 (name string,age int) stored as parquet")

19/04/07 23:41:56 WARN HiveMetaStore: Location: 
file:/D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/table1 specified for 
non-external table:table1
-chgrp: 'HTIPL-23270\None' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
res0: org.apache.spark.sql.DataFrame = []

scala> spark.sql("insert into table1 select 'a',29")
res2: org.apache.spark.sql.DataFrame = []

scala> spark.sql("desc extended table1").show(false)
++--+---+
|col_name |data_type |comment|
++--+---+
|name |string |null |
|age |int |null |
| | | |
|# Detailed Table Information| | |
|Database |default | |
|Table |table1 | |
|Owner |Administrator | |
|Created Time |Sun Apr 07 23:41:56 IST 2019 | |
|Last Access |Thu Jan 01 05:30:00 IST 1970 | |
|Created By |Spark 2.4.1 | |
|Type |MANAGED | |
|Provider |hive | |
|Table Properties |[transient_lastDdlTime=1554660716] | |
|Location |file:/D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/table1 | |
|Serde Library |org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | |
|InputFormat |org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat | |
|OutputFormat |org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat| |
|Storage Properties |[serialization.format=1] | |
|Partition Provider |Catalog | |
++--+---+)

> Failed to update the table size automatically even though 
> spark.sql.statistics.size.autoUpdate.enabled is set as rue
> 
>
> Key: SPARK-27403
> URL: https://issues.apache.org/jira/browse/SPARK-27403
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: Sujith Chacko
>Priority: Major
>
> scala> spark.sql("insert into table1 select 'a',29")
> res2: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("desc extended table1").show(false)
> ++--+---+
> |col_name |data_type |comment|
> ++--+---+
> |name |string |null |
> |age |int |null |
> | | | |
> |# Detailed Table Information| | |
> |Database |default | |
> |Table |table1 | |
> |Owner |Administrator | |
> |Created Time |Sun Apr 07 23:41:56 IST 2019 | |
> |Last Access |Thu Jan 01 05:30:00 IST 1970 | |
> |Created By |Spark 2.4.1 | |
> |Type |MANAGED | |
> |Provider |hive | |
> |Table Properties |[transient_lastDdlTime=1554660716] | |
> |Location |file:/D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/table1 | |
> |Serde Library |org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | 
> |
> |InputFormat |org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat | 
> |
> |OutputFormat 
> |org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat| |
> |Storage Properties |[serialization.format=1] | |
> |Partition Provider |Catalog | |
> ++--+---+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27403) Failed to update the table size automatically even though spark.sql.statistics.size.autoUpdate.enabled is set as rue

2019-04-07 Thread Sujith Chacko (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith Chacko updated SPARK-27403:
--
Description: 
scala> spark.sql("insert into table1 select 'a',29")
res2: org.apache.spark.sql.DataFrame = []

scala> spark.sql("desc extended table1").show(false)
++--+---+
|col_name |data_type |comment|
++--+---+
|name |string |null |
|age |int |null |
| | | |
|# Detailed Table Information| | |
|Database |default | |
|Table |table1 | |
|Owner |Administrator | |
|Created Time |Sun Apr 07 23:41:56 IST 2019 | |
|Last Access |Thu Jan 01 05:30:00 IST 1970 | |
|Created By |Spark 2.4.1 | |
|Type |MANAGED | |
|Provider |hive | |
|Table Properties |[transient_lastDdlTime=1554660716] | |
|Location |file:/D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/table1 | |
|Serde Library |org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | |
|InputFormat |org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat | |
|OutputFormat |org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat| |
|Storage Properties |[serialization.format=1] | |
|Partition Provider |Catalog | |
++--+---+

> Failed to update the table size automatically even though 
> spark.sql.statistics.size.autoUpdate.enabled is set as rue
> 
>
> Key: SPARK-27403
> URL: https://issues.apache.org/jira/browse/SPARK-27403
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.1
> Environment: system shall able to update the table stats 
> automatically if user sets  spark.sql.statistics.size.autoUpdate.enabled as 
> true  , Currently this property doesnt have any significance even if it is 
> enabled  or disabled.
> Please follow the below steps to reproduce the issue
> scala> spark.sql("create table x222 (name string,age int) stored as parquet")
> 19/04/07 23:41:56 WARN HiveMetaStore: Location: 
> file:/D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/table1 specified for 
> non-external table:table1
> -chgrp: 'HTIPL-23270\None' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> res0: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("insert into table1 select 'a',29")
> res2: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("desc extended table1").show(false)
> ++--+---+
> |col_name |data_type |comment|
> ++--+---+
> |name |string |null |
> |age |int |null |
> | | | |
> |# Detailed Table Information| | |
> |Database |default | |
> |Table |table1 | |
> |Owner |Administrator | |
> |Created Time |Sun Apr 07 23:41:56 IST 2019 | |
> |Last Access |Thu Jan 01 05:30:00 IST 1970 | |
> |Created By |Spark 2.4.1 | |
> |Type |MANAGED | |
> |Provider |hive | |
> |Table Properties |[transient_lastDdlTime=1554660716] | |
> |Location |file:/D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/table1 | |
> |Serde Library |org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | 
> |
> |InputFormat |org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat | 
> |
> |OutputFormat 
> |org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat| |
> |Storage Properties |[serialization.format=1] | |
> |Partition Provider |Catalog | |
> ++--+---+
>Reporter: Sujith Chacko
>Priority: Major
>
> scala> spark.sql("insert into table1 select 'a',29")
> res2: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("desc extended table1").show(false)
> ++--+---+
> |col_name |data_type |comment|
> ++--+---+
> |name |string |null |
> |age |int |null |
> | | | |
> |# Detailed Table Information| | |
> |Database |default | |
> |Table |table1 | |
> |Owner |Administrator | |
> |Created Time |Sun Apr 07 23:41:56 IST 2019 | |
> |Last Access |Thu Jan 01 05:30:00 IST 1970 | |
> |Created By |Spark 2.4.1 | |
> |Type |MANAGED | |
> |Provider |hive | |
> |Table Properties |[transient_lastDdlTime=1554660716] | |
> |Location |file:/D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/table1 | |
> |Serde Library |org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe 

[jira] [Created] (SPARK-27403) Failed to update the table size automatically even though spark.sql.statistics.size.autoUpdate.enabled is set as rue

2019-04-07 Thread Sujith Chacko (JIRA)
Sujith Chacko created SPARK-27403:
-

 Summary: Failed to update the table size automatically even though 
spark.sql.statistics.size.autoUpdate.enabled is set as rue
 Key: SPARK-27403
 URL: https://issues.apache.org/jira/browse/SPARK-27403
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.1
 Environment: system shall able to update the table stats automatically 
if user sets  spark.sql.statistics.size.autoUpdate.enabled as true  , Currently 
this property doesnt have any significance even if it is enabled  or disabled.

Please follow the below steps to reproduce the issue

scala> spark.sql("create table x222 (name string,age int) stored as parquet")

19/04/07 23:41:56 WARN HiveMetaStore: Location: 
file:/D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/table1 specified for 
non-external table:table1
-chgrp: 'HTIPL-23270\None' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
res0: org.apache.spark.sql.DataFrame = []

scala> spark.sql("insert into table1 select 'a',29")
res2: org.apache.spark.sql.DataFrame = []

scala> spark.sql("desc extended table1").show(false)
++--+---+
|col_name |data_type |comment|
++--+---+
|name |string |null |
|age |int |null |
| | | |
|# Detailed Table Information| | |
|Database |default | |
|Table |table1 | |
|Owner |Administrator | |
|Created Time |Sun Apr 07 23:41:56 IST 2019 | |
|Last Access |Thu Jan 01 05:30:00 IST 1970 | |
|Created By |Spark 2.4.1 | |
|Type |MANAGED | |
|Provider |hive | |
|Table Properties |[transient_lastDdlTime=1554660716] | |
|Location |file:/D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/table1 | |
|Serde Library |org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | |
|InputFormat |org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat | |
|OutputFormat |org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat| |
|Storage Properties |[serialization.format=1] | |
|Partition Provider |Catalog | |
++--+---+
Reporter: Sujith Chacko






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26970) Can't load PipelineModel that was created in Scala with Python due to missing Interaction transformer

2019-04-07 Thread Jen Darrouzet (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811932#comment-16811932
 ] 

Jen Darrouzet commented on SPARK-26970:
---

I am a newbie but would very much like to use the interaction transformer in 
pyspark and am available to help QA test it, when/if it becomes available.

> Can't load PipelineModel that was created in Scala with Python due to missing 
> Interaction transformer
> -
>
> Key: SPARK-26970
> URL: https://issues.apache.org/jira/browse/SPARK-26970
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 2.4.0
>Reporter: Andrew Crosby
>Priority: Minor
>
> The Interaction transformer 
> [https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/Interaction.scala|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/Interaction.scala]
>  is missing from the set of pyspark feature transformers 
> [https://github.com/apache/spark/blob/master/python/pyspark/ml/feature.py|https://github.com/apache/spark/blob/master/python/pyspark/ml/feature.py]
>  
> This means that it is impossible to create a model that includes an 
> Interaction transformer with pyspark. It also means that attempting to load a 
> PipelineModel created in Scala that includes an Interaction transformer with 
> pyspark fails with the following error:
> {code:java}
> AttributeError: module 'pyspark.ml.feature' has no attribute 'Interaction'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26910) Re-release SparkR to CRAN

2019-04-07 Thread Michael Chirico (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811930#comment-16811930
 ] 

Michael Chirico commented on SPARK-26910:
-

awesome! thanks and congrats :)

> Re-release SparkR to CRAN
> -
>
> Key: SPARK-26910
> URL: https://issues.apache.org/jira/browse/SPARK-26910
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Michael Chirico
>Assignee: Felix Cheung
>Priority: Major
> Fix For: 2.4.1
>
>
> The logical successor to https://issues.apache.org/jira/browse/SPARK-15799
> I don't see anything specifically tracking re-release in the Jira list. It 
> would be helpful to have an issue tracking this to refer to as an outsider, 
> as well as to document what the blockers are in case some outside help could 
> be useful.
>  * Is there a plan to re-release SparkR to CRAN?
>  * What are the major blockers to doing so at the moment?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27388) expression encoder for avro like objects

2019-04-07 Thread Taoufik DACHRAOUI (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Taoufik DACHRAOUI updated SPARK-27388:
--
Description: 
*What changes were proposed in this pull request?*

This PR adds expression encoders for beans, java.util.List, java.util.Map and 
java enum.

The Beans are objects defined by properties; A property is defined by a setter 
and a getter functions where the getter return type is equal to the setter 
unique parameter type and the getter and setter functions have the same name; 
if the getter name is prefixed by "get" then the setter name must be prefixed 
by "set"; see tests for bean examples.

Avro objects are beans and thus we can create an expression encoder for avro 
objects as follows:
{code:java}
implicit val exprEncoder = ExpressionEncoder[Foo]()
{code}
All avro types, including fixed types, and excluding complex union types, are 
suppported by this addition.

The avro fixed types are beans with exactly one property: bytes.

Currently complex avro unions are not supported because a complex union is 
declared as Object and there cannot be an expression encoder for Object type 
(need to use a custom serializer like kryo for example)

*How was this patch tested?*

currently only 1 encodeDecodeTest was added to ExpressionEncoderSuite; the test 
uses an avro object with map, array and fixed fields, as in the test 3 (below).

I used the modified spark-sql package in a local project to test it

the tests are as follows (where Barcode is a large Avro object):

1. test with simple beans
{code:java}
class Bar {
  private var bar$: String = _
  def bar(value: String): Unit = {
bar$ = value
  }
  def bar(): String = bar$
  override def toString() = { s"Bar($bar)" }
}

class Foo extends Bar {
  var a: Int = _
  var b: Bar = _

  def getA() = a

  def setA(x: Int) {
a = x
  }
  def getB(): Bar = b

  def setB(x: Bar) {
b = x
  }
  override def toString() = { s"Foo($a,$b, ${bar()})" }
}

{code}
{code:java}
implicit val encoderFoo = ExpressionEncoder[Foo]
val bar = new Bar
bar.bar("ok")
val a = new Foo
a.setA(55)
a.setB(bar)
a.bar("BAR")
val ds = List(a).toDS()
println(ds.collect().toList)

val df = List(a).toDF()

val r = df.collect().foreach(println)

println(df.schema)

Result => 

List(Foo(55,Bar(ok), BAR))
[[ok],55,BAR]
StructType(StructField(B,StructType(StructField(bar,StringType,true)),true), 
StructField(A,IntegerType,false), StructField(bar,StringType,true))

{code}
2. test with a large Avro schema (Barcode) with nested avro objects, java enums 
and arrays:
{code:java}
implicit val barcodeEncoder = ExpressionEncoder[Barcode]()

val ds: Dataset[Barcode] = 0.until(1).map(i => {
  val crpAddDesc = new java.util.ArrayList[CrpAddDesc]()
  crpAddDesc.add(CrpAddDesc.newBuilder()
.setCrpAddEngDesc(s"Crp description1 $i")
.build())
  crpAddDesc.add(CrpAddDesc.newBuilder()
.setCrpAddEngDesc(s"Crp description2 $i")
.build())
  val crpAttrs = new java.util.ArrayList[CrpAttributes]()
  crpAttrs.add(CrpAttributes.newBuilder()
.setCrpCode(s"crp attr1 $i")
.setCrpAddDesc(crpAddDesc)
.build())
  crpAttrs.add(CrpAttributes.newBuilder()
.setCrpCode(s"crp attr2 $i")
.build())
  val barcode = Barcode.newBuilder()
.setBarcode(s"Bar$i")
.setPrdTaxVal(Money.newBuilder()
  .setUnscaledAmount(i.toLong)
  .setScale(0)
  .setCurrency(Currency.EUR)
  .setCurrencyAlphaCode("EUR")
  .build())
.setCrpAttributes(crpAttrs)
.build()
  barcode
})
  .toDS()

val x = ds.map(a => {
  (
a.getBarcode,
a.getPrdTaxVal.getCurrency,
a.getCrpAttributes.get(0).getCrpCode,
a.getCrpAttributes.get(1).getCrpCode)
})
   
println(x.collect().toList.drop(100).head)

result => (Bar100,EUR,crp attr1 100,crp attr2 100)
{code}
3. test with Avro schema having map, array and fixed types
{code:java}
implicit val logEncoder = ExpressionEncoder[Log]()
val ds: Dataset[Log] = List(Log.newBuilder()
  .setIps(List("127.0.0.1", "127.0.0.0").asJava)
  .setAdditional(Map(
"foo" -> new java.lang.Integer(1),
"bar" -> new java.lang.Integer(2)).asJava)
  .setTimestamp("12345678")
  .setMessage("test map")
  .setMagic(new Magic("magic".getBytes))
  .build())
  .toDS()

println(ds.collect().toList)

result: List({"ips": ["127.0.0.1", "127.0.0.0"], "timestamp": "12345678", 
"message": "test map", "magic": [109, 97, 103, 105, 99], "additional": {"foo": 
1, "bar": 2}})

{code}
Log Avro schema:
{code:java}
{"namespace": "example.avro", "type": "record", "name": "Log",
 "fields": [ {"name": "ips", "type": {"type": "array", "items": 
"string"}},
 {"name": "timestamp", "type": "string"},
   

[jira] [Updated] (SPARK-27402) Support HiveExternalCatalog backward compatibility test

2019-04-07 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-27402:

Description: When we upgrade the built-in Hive to 2.3.4, the default 
spark.sql.hive.metastore.version should be 2.3.4. This will not be compatible 
with spark-2.3.3-bin-hadoop2.7.tgz and spark-2.4.1-bin-hadoop2.7.tgz.

> Support HiveExternalCatalog backward compatibility test
> ---
>
> Key: SPARK-27402
> URL: https://issues.apache.org/jira/browse/SPARK-27402
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> When we upgrade the built-in Hive to 2.3.4, the default 
> spark.sql.hive.metastore.version should be 2.3.4. This will not be compatible 
> with spark-2.3.3-bin-hadoop2.7.tgz and spark-2.4.1-bin-hadoop2.7.tgz.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27402) Support HiveExternalCatalog backward compatibility test

2019-04-07 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-27402:
---

 Summary: Support HiveExternalCatalog backward compatibility test
 Key: SPARK-27402
 URL: https://issues.apache.org/jira/browse/SPARK-27402
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yuming Wang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27398) Get rid of sun.nio.cs.StreamDecoder in CreateJacksonParser

2019-04-07 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-27398.
---
Resolution: Not A Problem

> Get rid of sun.nio.cs.StreamDecoder in CreateJacksonParser
> --
>
> Key: SPARK-27398
> URL: https://issues.apache.org/jira/browse/SPARK-27398
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Priority: Trivial
>
> The CreateJacksonParser.getStreamDecoder method creates an instance of 
> ReadableByteChannel and returns the result as of sun.nio.cs.StreamDecoder. 
> This is unnecessary and overcomplicates the method. This code can be replaced 
> by:
> {code:scala}
> val bais = new ByteArrayInputStream(in, 0, length)
> new InputStreamReader(bais, enc)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27352) Apply for translation of the Chinese version, I hope to get authorization!

2019-04-07 Thread Teng Peng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811835#comment-16811835
 ] 

Teng Peng commented on SPARK-27352:
---

I would say go ahead and send a PR to add the link to the doc. This might be a 
good place for the link [https://spark.apache.org/docs/latest/index.html]

> Apply for translation of the Chinese version, I hope to get authorization! 
> ---
>
> Key: SPARK-27352
> URL: https://issues.apache.org/jira/browse/SPARK-27352
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Yuan Yifan
>Priority: Minor
>
> Hello everyone, we are [ApacheCN|https://www.apachecn.org/], an open-source 
> community in China, focusing on Big Data and AI.
> Recently, we have been making progress on translating Spark documents.
>  - [Source Of Document|https://github.com/apachecn/spark-doc-zh]
>  - [Document Preview|http://spark.apachecn.org/]
> There are several reasons:
>  *1. The English level of many Chinese users is not very good.*
>  *2. Network problems, you know (China's magic network)!*
>  *3. Online blogs are very messy.*
> We are very willing to do some Chinese localization for your project. If 
> possible, please give us some authorization.
> Yifan Yuan from Apache CN
> You may contact me by mail [tsingjyuj...@163.com|mailto:tsingjyuj...@163.com] 
> for more details



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27352) Apply for translation of the Chinese version, I hope to get authorization!

2019-04-07 Thread Yuan Yifan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811815#comment-16811815
 ] 

Yuan Yifan commented on SPARK-27352:


[~Teng Peng]

Thank you for your attention on this wish, actually, translate the documents 
into Chinese will not against the license and don't have to get an 
"authorization".

But we're looking forward to the deeper cooperation between Apache with us like 
put a "ZH Doc" super-link in http://spark.apache.org/ maybe it will provide our 
document to more Chinese users and easy their using of Spark.

Thank you again for your attention on this, please feel free to mail/reply to 
me if there're any questions about it.

> Apply for translation of the Chinese version, I hope to get authorization! 
> ---
>
> Key: SPARK-27352
> URL: https://issues.apache.org/jira/browse/SPARK-27352
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Yuan Yifan
>Priority: Minor
>
> Hello everyone, we are [ApacheCN|https://www.apachecn.org/], an open-source 
> community in China, focusing on Big Data and AI.
> Recently, we have been making progress on translating Spark documents.
>  - [Source Of Document|https://github.com/apachecn/spark-doc-zh]
>  - [Document Preview|http://spark.apachecn.org/]
> There are several reasons:
>  *1. The English level of many Chinese users is not very good.*
>  *2. Network problems, you know (China's magic network)!*
>  *3. Online blogs are very messy.*
> We are very willing to do some Chinese localization for your project. If 
> possible, please give us some authorization.
> Yifan Yuan from Apache CN
> You may contact me by mail [tsingjyuj...@163.com|mailto:tsingjyuj...@163.com] 
> for more details



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27278) Optimize GetMapValue when the map is a foldable and the key is not

2019-04-07 Thread Marco Gaido (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811808#comment-16811808
 ] 

Marco Gaido commented on SPARK-27278:
-

[~huonw] I think the point is: in the existing case which is optimized, the 
{{CreateMap}} operation is removed and replaced with the {{CASE ... WHEN}} 
syntax: this is fine because the code generated by the {{CreateMap}} operation 
is linear in size with the number of elements, exactly as for the {{CASE ... 
WHEN}} approach. So the overall code generated size is similar. While when the 
{{CreateMap}} operation is replaced with a {{Literal}}, this code is not there 
and we replace the for loop present with the list of if statements. So the code 
size is significantly bigger. Despite trivial tests show that the second case 
is faster, having a bigger code generated can lead to huge perf issues in more 
complex scenario (the worts cases are probably when it causes the method size 
to grow bigger than 4k so no JIT is performed anymore, or even it may cause the 
failure to use WholeStageCodegen): so the effect of introducing it in 
performance would be highly query-dependent and hard to predict.

> Optimize GetMapValue when the map is a foldable and the key is not
> --
>
> Key: SPARK-27278
> URL: https://issues.apache.org/jira/browse/SPARK-27278
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
> Environment: Spark 2.4.0
>Reporter: Huon Wilson
>Priority: Minor
>
> With a map that isn't constant-foldable, spark will optimise an access to a 
> series of {{CASE WHEN ... THEN ... WHEN ... THEN ... END}}, for instance
> {code:none}
> scala> spark.range(1000).select(map(lit(1), lit(1), lit(2), 'id)('id) as 
> "x").explain
> == Physical Plan ==
> *(1) Project [CASE WHEN (cast(id#180L as int) = 1) THEN 1 WHEN (cast(id#180L 
> as int) = 2) THEN id#180L END AS x#182L]
> +- *(1) Range (0, 1000, step=1, splits=12)
> {code}
> This results in an efficient series of ifs and elses, in the code generation:
> {code:java}
> /* 037 */   boolean project_isNull_3 = false;
> /* 038 */   int project_value_3 = -1;
> /* 039 */   if (!false) {
> /* 040 */ project_value_3 = (int) project_expr_0_0;
> /* 041 */   }
> /* 042 */
> /* 043 */   boolean project_value_2 = false;
> /* 044 */   project_value_2 = project_value_3 == 1;
> /* 045 */   if (!false && project_value_2) {
> /* 046 */ project_caseWhenResultState_0 = (byte)(false ? 1 : 0);
> /* 047 */ project_project_value_1_0 = 1L;
> /* 048 */ continue;
> /* 049 */   }
> /* 050 */
> /* 051 */   boolean project_isNull_8 = false;
> /* 052 */   int project_value_8 = -1;
> /* 053 */   if (!false) {
> /* 054 */ project_value_8 = (int) project_expr_0_0;
> /* 055 */   }
> /* 056 */
> /* 057 */   boolean project_value_7 = false;
> /* 058 */   project_value_7 = project_value_8 == 2;
> /* 059 */   if (!false && project_value_7) {
> /* 060 */ project_caseWhenResultState_0 = (byte)(false ? 1 : 0);
> /* 061 */ project_project_value_1_0 = project_expr_0_0;
> /* 062 */ continue;
> /* 063 */   }
> {code}
> If the map can be constant folded, the constant folding happens first, and 
> the {{SimplifyExtractValueOps}} optimisation doesn't trigger, resulting doing 
> a map traversal and more dynamic checks:
> {code:none}
> scala> spark.range(1000).select(map(lit(1), lit(1), lit(2), lit(2))('id) as 
> "x").explain
> == Physical Plan ==
> *(1) Project [keys: [1,2], values: [1,2][cast(id#195L as int)] AS x#197]
> +- *(1) Range (0, 1000, step=1, splits=12)
> {code}
> The {{keys: ..., values: ...}} is from the {{ArrayBasedMapData}} type, which 
> is what is stored in the {{Literal}} form of the {{map(...)}} expression in 
> that select. The code generated is less efficient, since it has to do a 
> manual dynamic traversal of the map's array of keys, with type casts etc.:
> {code:java}
> /* 099 */   int project_index_0 = 0;
> /* 100 */   boolean project_found_0 = false;
> /* 101 */   while (project_index_0 < project_length_0 && 
> !project_found_0) {
> /* 102 */ final int project_key_0 = 
> project_keys_0.getInt(project_index_0);
> /* 103 */ if (project_key_0 == project_value_2) {
> /* 104 */   project_found_0 = true;
> /* 105 */ } else {
> /* 106 */   project_index_0++;
> /* 107 */ }
> /* 108 */   }
> /* 109 */
> /* 110 */   if (!project_found_0) {
> /* 111 */ project_isNull_0 = true;
> /* 112 */   } else {
> /* 113 */ project_value_0 = 
> project_values_0.getInt(project_index_0);
> /* 114 */   }
> {code}
> It 

[jira] [Issue Comment Deleted] (SPARK-14220) Build and test Spark against Scala 2.12

2019-04-07 Thread antonkulaga (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

antonkulaga updated SPARK-14220:

Comment: was deleted

(was: I suggest to use Spark 2.4.1 as there Scala 2.12 is not longer 
experimental)

> Build and test Spark against Scala 2.12
> ---
>
> Key: SPARK-14220
> URL: https://issues.apache.org/jira/browse/SPARK-14220
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Project Infra
>Affects Versions: 2.1.0
>Reporter: Josh Rosen
>Assignee: Sean Owen
>Priority: Blocker
>  Labels: release-notes
> Fix For: 2.4.0
>
>
> This umbrella JIRA tracks the requirements for building and testing Spark 
> against the current Scala 2.12 milestone.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14220) Build and test Spark against Scala 2.12

2019-04-07 Thread antonkulaga (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811726#comment-16811726
 ] 

antonkulaga commented on SPARK-14220:
-

I suggest to use Spark 2.4.1 as there Scala 2.12 is not longer experimental

> Build and test Spark against Scala 2.12
> ---
>
> Key: SPARK-14220
> URL: https://issues.apache.org/jira/browse/SPARK-14220
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Project Infra
>Affects Versions: 2.1.0
>Reporter: Josh Rosen
>Assignee: Sean Owen
>Priority: Blocker
>  Labels: release-notes
> Fix For: 2.4.0
>
>
> This umbrella JIRA tracks the requirements for building and testing Spark 
> against the current Scala 2.12 milestone.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27399) Spark streaming of kafka 0.10 contains some scattered config

2019-04-07 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-27399.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 24267
[https://github.com/apache/spark/pull/24267]

> Spark streaming of kafka 0.10 contains some scattered config
> 
>
> Key: SPARK-27399
> URL: https://issues.apache.org/jira/browse/SPARK-27399
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.3.0, 2.4.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Minor
> Fix For: 3.0.0
>
>
> I found a lot scattered config in Kafka streaming.
> I think should arrange these config in unified position.
> There are also exists some hardcode like 
> {code:java}
> spark.network.timeout{code}
> need to change.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27383) Avoid using hard-coded jar names in Hive tests

2019-04-07 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-27383.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 24294
[https://github.com/apache/spark/pull/24294]

> Avoid using hard-coded jar names in Hive tests
> --
>
> Key: SPARK-27383
> URL: https://issues.apache.org/jira/browse/SPARK-27383
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
> Fix For: 3.0.0
>
>
> Avoid using hard-coded jar names({{hive-contrib-0.13.1.jar}} and 
> {{hive-hcatalog-core-0.13.1.jar}}) in Hive tests. This makes it easy to 
> change when upgrading the built-in Hive to 2.3.4.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27399) Spark streaming of kafka 0.10 contains some scattered config

2019-04-07 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-27399:
-

Assignee: jiaan.geng

> Spark streaming of kafka 0.10 contains some scattered config
> 
>
> Key: SPARK-27399
> URL: https://issues.apache.org/jira/browse/SPARK-27399
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.3.0, 2.4.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Minor
>
> I found a lot scattered config in Kafka streaming.
> I think should arrange these config in unified position.
> There are also exists some hardcode like 
> {code:java}
> spark.network.timeout{code}
> need to change.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27383) Avoid using hard-coded jar names in Hive tests

2019-04-07 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-27383:
-

Assignee: Yuming Wang
Priority: Minor  (was: Major)

> Avoid using hard-coded jar names in Hive tests
> --
>
> Key: SPARK-27383
> URL: https://issues.apache.org/jira/browse/SPARK-27383
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Minor
> Fix For: 3.0.0
>
>
> Avoid using hard-coded jar names({{hive-contrib-0.13.1.jar}} and 
> {{hive-hcatalog-core-0.13.1.jar}}) in Hive tests. This makes it easy to 
> change when upgrading the built-in Hive to 2.3.4.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27278) Optimize GetMapValue when the map is a foldable and the key is not

2019-04-07 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811720#comment-16811720
 ] 

Dongjoon Hyun commented on SPARK-27278:
---

The reverting PR should not reuse this JIRA because the purpose is different. 
This JIRA is dedicated to [~mgaido]'s improvement approach and his PR code. I 
prefer Marco's way and believe that you do.

> Optimize GetMapValue when the map is a foldable and the key is not
> --
>
> Key: SPARK-27278
> URL: https://issues.apache.org/jira/browse/SPARK-27278
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
> Environment: Spark 2.4.0
>Reporter: Huon Wilson
>Priority: Minor
>
> With a map that isn't constant-foldable, spark will optimise an access to a 
> series of {{CASE WHEN ... THEN ... WHEN ... THEN ... END}}, for instance
> {code:none}
> scala> spark.range(1000).select(map(lit(1), lit(1), lit(2), 'id)('id) as 
> "x").explain
> == Physical Plan ==
> *(1) Project [CASE WHEN (cast(id#180L as int) = 1) THEN 1 WHEN (cast(id#180L 
> as int) = 2) THEN id#180L END AS x#182L]
> +- *(1) Range (0, 1000, step=1, splits=12)
> {code}
> This results in an efficient series of ifs and elses, in the code generation:
> {code:java}
> /* 037 */   boolean project_isNull_3 = false;
> /* 038 */   int project_value_3 = -1;
> /* 039 */   if (!false) {
> /* 040 */ project_value_3 = (int) project_expr_0_0;
> /* 041 */   }
> /* 042 */
> /* 043 */   boolean project_value_2 = false;
> /* 044 */   project_value_2 = project_value_3 == 1;
> /* 045 */   if (!false && project_value_2) {
> /* 046 */ project_caseWhenResultState_0 = (byte)(false ? 1 : 0);
> /* 047 */ project_project_value_1_0 = 1L;
> /* 048 */ continue;
> /* 049 */   }
> /* 050 */
> /* 051 */   boolean project_isNull_8 = false;
> /* 052 */   int project_value_8 = -1;
> /* 053 */   if (!false) {
> /* 054 */ project_value_8 = (int) project_expr_0_0;
> /* 055 */   }
> /* 056 */
> /* 057 */   boolean project_value_7 = false;
> /* 058 */   project_value_7 = project_value_8 == 2;
> /* 059 */   if (!false && project_value_7) {
> /* 060 */ project_caseWhenResultState_0 = (byte)(false ? 1 : 0);
> /* 061 */ project_project_value_1_0 = project_expr_0_0;
> /* 062 */ continue;
> /* 063 */   }
> {code}
> If the map can be constant folded, the constant folding happens first, and 
> the {{SimplifyExtractValueOps}} optimisation doesn't trigger, resulting doing 
> a map traversal and more dynamic checks:
> {code:none}
> scala> spark.range(1000).select(map(lit(1), lit(1), lit(2), lit(2))('id) as 
> "x").explain
> == Physical Plan ==
> *(1) Project [keys: [1,2], values: [1,2][cast(id#195L as int)] AS x#197]
> +- *(1) Range (0, 1000, step=1, splits=12)
> {code}
> The {{keys: ..., values: ...}} is from the {{ArrayBasedMapData}} type, which 
> is what is stored in the {{Literal}} form of the {{map(...)}} expression in 
> that select. The code generated is less efficient, since it has to do a 
> manual dynamic traversal of the map's array of keys, with type casts etc.:
> {code:java}
> /* 099 */   int project_index_0 = 0;
> /* 100 */   boolean project_found_0 = false;
> /* 101 */   while (project_index_0 < project_length_0 && 
> !project_found_0) {
> /* 102 */ final int project_key_0 = 
> project_keys_0.getInt(project_index_0);
> /* 103 */ if (project_key_0 == project_value_2) {
> /* 104 */   project_found_0 = true;
> /* 105 */ } else {
> /* 106 */   project_index_0++;
> /* 107 */ }
> /* 108 */   }
> /* 109 */
> /* 110 */   if (!project_found_0) {
> /* 111 */ project_isNull_0 = true;
> /* 112 */   } else {
> /* 113 */ project_value_0 = 
> project_values_0.getInt(project_index_0);
> /* 114 */   }
> {code}
> It looks like the problem is in {{SimplifyExtractValueOps}}, which doesn't 
> handle {{GetMapValue(Literal(...), key)}}, only the {{CreateMap}} form:
> {code:scala}
>   case GetMapValue(CreateMap(elems), key) => CaseKeyWhen(key, elems)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27278) Optimize GetMapValue when the map is a foldable and the key is not

2019-04-07 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811717#comment-16811717
 ] 

Dongjoon Hyun commented on SPARK-27278:
---

[~huonw]. You can make a PR (for reverting the old one) if you are 
uncomfortable. Your PR will be reviewed in the same review process; Pros and 
Cons. Something missed is also the current behavior, too.

> Optimize GetMapValue when the map is a foldable and the key is not
> --
>
> Key: SPARK-27278
> URL: https://issues.apache.org/jira/browse/SPARK-27278
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
> Environment: Spark 2.4.0
>Reporter: Huon Wilson
>Priority: Minor
>
> With a map that isn't constant-foldable, spark will optimise an access to a 
> series of {{CASE WHEN ... THEN ... WHEN ... THEN ... END}}, for instance
> {code:none}
> scala> spark.range(1000).select(map(lit(1), lit(1), lit(2), 'id)('id) as 
> "x").explain
> == Physical Plan ==
> *(1) Project [CASE WHEN (cast(id#180L as int) = 1) THEN 1 WHEN (cast(id#180L 
> as int) = 2) THEN id#180L END AS x#182L]
> +- *(1) Range (0, 1000, step=1, splits=12)
> {code}
> This results in an efficient series of ifs and elses, in the code generation:
> {code:java}
> /* 037 */   boolean project_isNull_3 = false;
> /* 038 */   int project_value_3 = -1;
> /* 039 */   if (!false) {
> /* 040 */ project_value_3 = (int) project_expr_0_0;
> /* 041 */   }
> /* 042 */
> /* 043 */   boolean project_value_2 = false;
> /* 044 */   project_value_2 = project_value_3 == 1;
> /* 045 */   if (!false && project_value_2) {
> /* 046 */ project_caseWhenResultState_0 = (byte)(false ? 1 : 0);
> /* 047 */ project_project_value_1_0 = 1L;
> /* 048 */ continue;
> /* 049 */   }
> /* 050 */
> /* 051 */   boolean project_isNull_8 = false;
> /* 052 */   int project_value_8 = -1;
> /* 053 */   if (!false) {
> /* 054 */ project_value_8 = (int) project_expr_0_0;
> /* 055 */   }
> /* 056 */
> /* 057 */   boolean project_value_7 = false;
> /* 058 */   project_value_7 = project_value_8 == 2;
> /* 059 */   if (!false && project_value_7) {
> /* 060 */ project_caseWhenResultState_0 = (byte)(false ? 1 : 0);
> /* 061 */ project_project_value_1_0 = project_expr_0_0;
> /* 062 */ continue;
> /* 063 */   }
> {code}
> If the map can be constant folded, the constant folding happens first, and 
> the {{SimplifyExtractValueOps}} optimisation doesn't trigger, resulting doing 
> a map traversal and more dynamic checks:
> {code:none}
> scala> spark.range(1000).select(map(lit(1), lit(1), lit(2), lit(2))('id) as 
> "x").explain
> == Physical Plan ==
> *(1) Project [keys: [1,2], values: [1,2][cast(id#195L as int)] AS x#197]
> +- *(1) Range (0, 1000, step=1, splits=12)
> {code}
> The {{keys: ..., values: ...}} is from the {{ArrayBasedMapData}} type, which 
> is what is stored in the {{Literal}} form of the {{map(...)}} expression in 
> that select. The code generated is less efficient, since it has to do a 
> manual dynamic traversal of the map's array of keys, with type casts etc.:
> {code:java}
> /* 099 */   int project_index_0 = 0;
> /* 100 */   boolean project_found_0 = false;
> /* 101 */   while (project_index_0 < project_length_0 && 
> !project_found_0) {
> /* 102 */ final int project_key_0 = 
> project_keys_0.getInt(project_index_0);
> /* 103 */ if (project_key_0 == project_value_2) {
> /* 104 */   project_found_0 = true;
> /* 105 */ } else {
> /* 106 */   project_index_0++;
> /* 107 */ }
> /* 108 */   }
> /* 109 */
> /* 110 */   if (!project_found_0) {
> /* 111 */ project_isNull_0 = true;
> /* 112 */   } else {
> /* 113 */ project_value_0 = 
> project_values_0.getInt(project_index_0);
> /* 114 */   }
> {code}
> It looks like the problem is in {{SimplifyExtractValueOps}}, which doesn't 
> handle {{GetMapValue(Literal(...), key)}}, only the {{CreateMap}} form:
> {code:scala}
>   case GetMapValue(CreateMap(elems), key) => CaseKeyWhen(key, elems)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27352) Apply for translation of the Chinese version, I hope to get authorization!

2019-04-07 Thread Teng Peng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811765#comment-16811765
 ] 

Teng Peng commented on SPARK-27352:
---

Correct me if I am wrong. I do not think any authorization are required for 
translation to other languages. 

> Apply for translation of the Chinese version, I hope to get authorization! 
> ---
>
> Key: SPARK-27352
> URL: https://issues.apache.org/jira/browse/SPARK-27352
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Yuan Yifan
>Priority: Minor
>
> Hello everyone, we are [ApacheCN|https://www.apachecn.org/], an open-source 
> community in China, focusing on Big Data and AI.
> Recently, we have been making progress on translating Spark documents.
>  - [Source Of Document|https://github.com/apachecn/spark-doc-zh]
>  - [Document Preview|http://spark.apachecn.org/]
> There are several reasons:
>  *1. The English level of many Chinese users is not very good.*
>  *2. Network problems, you know (China's magic network)!*
>  *3. Online blogs are very messy.*
> We are very willing to do some Chinese localization for your project. If 
> possible, please give us some authorization.
> Yifan Yuan from Apache CN
> You may contact me by mail [tsingjyuj...@163.com|mailto:tsingjyuj...@163.com] 
> for more details



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26992) Fix STS scheduler pool correct delivery

2019-04-07 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-26992:
-

Assignee: dzcxzl

> Fix STS scheduler pool correct delivery
> ---
>
> Key: SPARK-26992
> URL: https://issues.apache.org/jira/browse/SPARK-26992
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.4.0
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Minor
> Attachments: error_session.png, error_stage.png
>
>
> The user sets the value of spark.sql.thriftserver.scheduler.pool.
>  Spark thrift server saves this value in the LocalProperty of threadlocal 
> type, but does not clean up after running, causing other sessions to run in 
> the previously set pool name.
>  
> For example
> The second session does not manually set the pool name. The default pool name 
> should be used, but the pool name of the previous user's settings is used. 
> This is incorrect.
> !error_session.png!
>  
> !error_stage.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26992) Fix STS scheduler pool correct delivery

2019-04-07 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-26992.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 23895
[https://github.com/apache/spark/pull/23895]

> Fix STS scheduler pool correct delivery
> ---
>
> Key: SPARK-26992
> URL: https://issues.apache.org/jira/browse/SPARK-26992
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.4.0
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: error_session.png, error_stage.png
>
>
> The user sets the value of spark.sql.thriftserver.scheduler.pool.
>  Spark thrift server saves this value in the LocalProperty of threadlocal 
> type, but does not clean up after running, causing other sessions to run in 
> the previously set pool name.
>  
> For example
> The second session does not manually set the pool name. The default pool name 
> should be used, but the pool name of the previous user's settings is used. 
> This is incorrect.
> !error_session.png!
>  
> !error_stage.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org