[jira] [Commented] (SPARK-26330) Duplicate query execution events generated for SQL commands

2019-03-11 Thread Sandeep Katta (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790227#comment-16790227
 ] 

Sandeep Katta commented on SPARK-26330:
---

looks same as [SPARK-27114|https://issues.apache.org/jira/browse/SPARK-27114]

> Duplicate query execution events generated for SQL commands
> ---
>
> Key: SPARK-26330
> URL: https://issues.apache.org/jira/browse/SPARK-26330
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Marcelo Vanzin
>Priority: Minor
>
> Consider the following code:
> {code:java}
> spark.sql("create table foo (bar int)").show()
> {code}
> The command is executed eagerly (i.e. before {{show()}} is called) and 
> generates a query execution event. But when you call {{show()}}, a duplicate 
> event is generated, even though Spark does not execute anything at that point.
> This can be a little more misleading when you do something like a CTAS, since 
> the duplicate events may cause listeners to think there were multiple inserts 
> when that's not true.
> A fuller example that shows this (and you can look at the output that both 
> inputs to the listener are the same):
> {code:java}
> import org.apache.spark.sql.execution.QueryExecution
> import org.apache.spark.sql.util.QueryExecutionListener
> val lsnr = new QueryExecutionListener() {
>   override def onSuccess(funcName: String, qe: QueryExecution, durationNs: 
> Long): Unit = {
> println(s"on success: $funcName -> ${qe.analyzed}")
>   }
>   override def onFailure(funcName: String, qe: QueryExecution, exception: 
> Exception): Unit = {
> println(s"on failure: $funcName -> ${qe.analyzed}")
>   }
> }
> spark.sessionState.listenerManager.register(lsnr)
> spark.sql("drop table if exists test")
> val df = spark.sql("create table test(i int)")
> df.show()
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27011) reset command fails after cache table

2019-03-11 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-27011.
-
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 23918
[https://github.com/apache/spark/pull/23918]

> reset command fails after cache table
> -
>
> Key: SPARK-27011
> URL: https://issues.apache.org/jira/browse/SPARK-27011
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.3.3, 2.4.0, 3.0.0
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Minor
> Fix For: 3.0.0
>
>
>  
> h3. Commands to reproduce 
> spark-sql> create table abcde ( a int);
> spark-sql> reset; // can work success
> spark-sql> cache table abcde;
> spark-sql> reset; //fails with exception
> h3. Below is the stack
> {{org.apache.spark.sql.catalyst.errors.package$TreeNodeException: makeCopy, 
> tree:}}
> {{ResetCommand$}}{{at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)}}
> {{ at 
> org.apache.spark.sql.catalyst.trees.TreeNode.makeCopy(TreeNode.scala:379)}}
> {{ at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:216)}}
> {{ at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:211)}}
> {{ at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.sameResult(QueryPlan.scala:259)}}
> {{ at 
> org.apache.spark.sql.execution.CacheManager.$anonfun$lookupCachedData$3(CacheManager.scala:236)}}
> {{ at 
> org.apache.spark.sql.execution.CacheManager.$anonfun$lookupCachedData$3$adapted(CacheManager.scala:236)}}
> {{ at scala.collection.Iterator.find(Iterator.scala:993)}}
> {{ at scala.collection.Iterator.find$(Iterator.scala:990)}}
> {{ at scala.collection.AbstractIterator.find(Iterator.scala:1429)}}
> {{ at scala.collection.IterableLike.find(IterableLike.scala:81)}}
> {{ at scala.collection.IterableLike.find$(IterableLike.scala:80)}}
> {{ at scala.collection.AbstractIterable.find(Iterable.scala:56)}}
> {{ at 
> org.apache.spark.sql.execution.CacheManager.$anonfun$lookupCachedData$2(CacheManager.scala:236)}}
> {{ at 
> org.apache.spark.sql.execution.CacheManager.readLock(CacheManager.scala:59)}}
> {{ at 
> org.apache.spark.sql.execution.CacheManager.lookupCachedData(CacheManager.scala:236)}}
> {{ at 
> org.apache.spark.sql.execution.CacheManager$$anonfun$1.applyOrElse(CacheManager.scala:250)}}
> {{ at 
> org.apache.spark.sql.execution.CacheManager$$anonfun$1.applyOrElse(CacheManager.scala:241)}}
> {{ at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:258)}}
> {{ at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)}}
> {{ at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:258)}}
> {{ at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)}}
> {{ at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:149)}}
> {{ at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:147)}}
> {{ at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)}}
> {{ at 
> org.apache.spark.sql.execution.CacheManager.useCachedData(CacheManager.scala:241)}}
> {{ at 
> org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:68)}}
> {{ at 
> org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:65)}}
> {{ at 
> org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:72)}}
> {{ at 
> org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)}}
> {{ at 
> org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:72)}}
> {{ at 
> org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:71)}}
> {{ at 
> org.apache.spark.sql.execution.QueryExecution.$anonfun$writePlans$4(QueryExecution.scala:139)}}
> {{ at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$.append(QueryPlan.scala:316)}}
> {{ at 
> org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$writePlans(QueryExecution.scala:139)}}
> {{ at 
> org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:146)}}
> {{ at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:82)}}
> {{ at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:147)}}
> {{ at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)}}
> {{ at 

[jira] [Assigned] (SPARK-27011) reset command fails after cache table

2019-03-11 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-27011:
---

Assignee: Ajith S

> reset command fails after cache table
> -
>
> Key: SPARK-27011
> URL: https://issues.apache.org/jira/browse/SPARK-27011
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.3.3, 2.4.0, 3.0.0
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Minor
>
>  
> h3. Commands to reproduce 
> spark-sql> create table abcde ( a int);
> spark-sql> reset; // can work success
> spark-sql> cache table abcde;
> spark-sql> reset; //fails with exception
> h3. Below is the stack
> {{org.apache.spark.sql.catalyst.errors.package$TreeNodeException: makeCopy, 
> tree:}}
> {{ResetCommand$}}{{at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)}}
> {{ at 
> org.apache.spark.sql.catalyst.trees.TreeNode.makeCopy(TreeNode.scala:379)}}
> {{ at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:216)}}
> {{ at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:211)}}
> {{ at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.sameResult(QueryPlan.scala:259)}}
> {{ at 
> org.apache.spark.sql.execution.CacheManager.$anonfun$lookupCachedData$3(CacheManager.scala:236)}}
> {{ at 
> org.apache.spark.sql.execution.CacheManager.$anonfun$lookupCachedData$3$adapted(CacheManager.scala:236)}}
> {{ at scala.collection.Iterator.find(Iterator.scala:993)}}
> {{ at scala.collection.Iterator.find$(Iterator.scala:990)}}
> {{ at scala.collection.AbstractIterator.find(Iterator.scala:1429)}}
> {{ at scala.collection.IterableLike.find(IterableLike.scala:81)}}
> {{ at scala.collection.IterableLike.find$(IterableLike.scala:80)}}
> {{ at scala.collection.AbstractIterable.find(Iterable.scala:56)}}
> {{ at 
> org.apache.spark.sql.execution.CacheManager.$anonfun$lookupCachedData$2(CacheManager.scala:236)}}
> {{ at 
> org.apache.spark.sql.execution.CacheManager.readLock(CacheManager.scala:59)}}
> {{ at 
> org.apache.spark.sql.execution.CacheManager.lookupCachedData(CacheManager.scala:236)}}
> {{ at 
> org.apache.spark.sql.execution.CacheManager$$anonfun$1.applyOrElse(CacheManager.scala:250)}}
> {{ at 
> org.apache.spark.sql.execution.CacheManager$$anonfun$1.applyOrElse(CacheManager.scala:241)}}
> {{ at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:258)}}
> {{ at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)}}
> {{ at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:258)}}
> {{ at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)}}
> {{ at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:149)}}
> {{ at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:147)}}
> {{ at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)}}
> {{ at 
> org.apache.spark.sql.execution.CacheManager.useCachedData(CacheManager.scala:241)}}
> {{ at 
> org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:68)}}
> {{ at 
> org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:65)}}
> {{ at 
> org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:72)}}
> {{ at 
> org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)}}
> {{ at 
> org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:72)}}
> {{ at 
> org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:71)}}
> {{ at 
> org.apache.spark.sql.execution.QueryExecution.$anonfun$writePlans$4(QueryExecution.scala:139)}}
> {{ at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$.append(QueryPlan.scala:316)}}
> {{ at 
> org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$writePlans(QueryExecution.scala:139)}}
> {{ at 
> org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:146)}}
> {{ at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:82)}}
> {{ at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:147)}}
> {{ at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)}}
> {{ at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3346)}}
> {{ at org.apache.spark.sql.Dataset.(Dataset.scala:203)}}
> {{ at 

[jira] [Commented] (SPARK-27011) reset command fails after cache table

2019-03-11 Thread Ajith S (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790220#comment-16790220
 ] 

Ajith S commented on SPARK-27011:
-

@ [~cloud_fan] As [https://github.com/apache/spark/pull/23918] is merged, can 
we close this.?

> reset command fails after cache table
> -
>
> Key: SPARK-27011
> URL: https://issues.apache.org/jira/browse/SPARK-27011
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.3.3, 2.4.0, 3.0.0
>Reporter: Ajith S
>Priority: Minor
>
>  
> h3. Commands to reproduce 
> spark-sql> create table abcde ( a int);
> spark-sql> reset; // can work success
> spark-sql> cache table abcde;
> spark-sql> reset; //fails with exception
> h3. Below is the stack
> {{org.apache.spark.sql.catalyst.errors.package$TreeNodeException: makeCopy, 
> tree:}}
> {{ResetCommand$}}{{at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)}}
> {{ at 
> org.apache.spark.sql.catalyst.trees.TreeNode.makeCopy(TreeNode.scala:379)}}
> {{ at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:216)}}
> {{ at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:211)}}
> {{ at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.sameResult(QueryPlan.scala:259)}}
> {{ at 
> org.apache.spark.sql.execution.CacheManager.$anonfun$lookupCachedData$3(CacheManager.scala:236)}}
> {{ at 
> org.apache.spark.sql.execution.CacheManager.$anonfun$lookupCachedData$3$adapted(CacheManager.scala:236)}}
> {{ at scala.collection.Iterator.find(Iterator.scala:993)}}
> {{ at scala.collection.Iterator.find$(Iterator.scala:990)}}
> {{ at scala.collection.AbstractIterator.find(Iterator.scala:1429)}}
> {{ at scala.collection.IterableLike.find(IterableLike.scala:81)}}
> {{ at scala.collection.IterableLike.find$(IterableLike.scala:80)}}
> {{ at scala.collection.AbstractIterable.find(Iterable.scala:56)}}
> {{ at 
> org.apache.spark.sql.execution.CacheManager.$anonfun$lookupCachedData$2(CacheManager.scala:236)}}
> {{ at 
> org.apache.spark.sql.execution.CacheManager.readLock(CacheManager.scala:59)}}
> {{ at 
> org.apache.spark.sql.execution.CacheManager.lookupCachedData(CacheManager.scala:236)}}
> {{ at 
> org.apache.spark.sql.execution.CacheManager$$anonfun$1.applyOrElse(CacheManager.scala:250)}}
> {{ at 
> org.apache.spark.sql.execution.CacheManager$$anonfun$1.applyOrElse(CacheManager.scala:241)}}
> {{ at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:258)}}
> {{ at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)}}
> {{ at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:258)}}
> {{ at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)}}
> {{ at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:149)}}
> {{ at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:147)}}
> {{ at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)}}
> {{ at 
> org.apache.spark.sql.execution.CacheManager.useCachedData(CacheManager.scala:241)}}
> {{ at 
> org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:68)}}
> {{ at 
> org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:65)}}
> {{ at 
> org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:72)}}
> {{ at 
> org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)}}
> {{ at 
> org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:72)}}
> {{ at 
> org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:71)}}
> {{ at 
> org.apache.spark.sql.execution.QueryExecution.$anonfun$writePlans$4(QueryExecution.scala:139)}}
> {{ at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$.append(QueryPlan.scala:316)}}
> {{ at 
> org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$writePlans(QueryExecution.scala:139)}}
> {{ at 
> org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:146)}}
> {{ at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:82)}}
> {{ at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:147)}}
> {{ at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)}}
> {{ at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3346)}}
> {{ at 

[jira] [Commented] (SPARK-26961) Found Java-level deadlock in Spark Driver

2019-03-11 Thread Ajith S (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790208#comment-16790208
 ] 

Ajith S commented on SPARK-26961:
-

[~srowen] Yes. I too have same opinion of fixing it via 
registerAsParallelCapable. Will raise a PR for this

[~xsapphire] i think these class loaders are child classloaders of 
LaunchAppClassLoader which already has classes for jar in class path. So 
overhead may not be of higher magnitude

> Found Java-level deadlock in Spark Driver
> -
>
> Key: SPARK-26961
> URL: https://issues.apache.org/jira/browse/SPARK-26961
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Rong Jialei
>Priority: Major
>
> Our spark job usually will finish in minutes, however, we recently found it 
> take days to run, and we can only kill it when this happened.
> An investigation show all worker container could not connect drive after 
> start, and driver is hanging, using jstack, we found a Java-level deadlock.
>  
> *Jstack output for deadlock part is showing below:*
>  
> Found one Java-level deadlock:
> =
> "SparkUI-907":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> "ForkJoinPool-1-worker-57":
>  waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a 
> org.apache.spark.util.MutableURLClassLoader),
>  which is held by "ForkJoinPool-1-worker-7"
> "ForkJoinPool-1-worker-7":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> Java stack information for the threads listed above:
> ===
> "SparkUI-907":
>  at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328)
>  - waiting to lock <0x0005c0c1e5e0> (a 
> org.apache.hadoop.conf.Configuration)
>  at 
> org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145)
>  at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363)
>  at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840)
>  at 
> org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74)
>  at java.net.URL.getURLStreamHandler(URL.java:1142)
>  at java.net.URL.(URL.java:599)
>  at java.net.URL.(URL.java:490)
>  at java.net.URL.(URL.java:439)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>  at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>  at 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>  at 
> org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>  at 
> org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>  at 
> org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>  at 
> org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>  at org.spark_project.jetty.server.Server.handle(Server.java:534)
>  at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>  at 
> org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>  at 
> org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>  at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108)
>  at 
> org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>  at 
> org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>  at 
> 

[jira] [Resolved] (SPARK-27117) current_date/current_timestamp should not refer to columns with ansi parser mode

2019-03-11 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-27117.
-
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 24039
[https://github.com/apache/spark/pull/24039]

> current_date/current_timestamp should not refer to columns with ansi parser 
> mode
> 
>
> Key: SPARK-27117
> URL: https://issues.apache.org/jira/browse/SPARK-27117
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27114) SQL Tab shows duplicate executions for some commands

2019-03-11 Thread Ajith S (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790136#comment-16790136
 ] 

Ajith S commented on SPARK-27114:
-

[~srowen] as *LocalRelation* is eagerly evaluated hence can skip second 
evaluation, i suppose its not executed on second time as it will throw a 
exception(in this case table already exists). Currently it will use two 
execution IDs and it will fire a duplicate *SparkListenerSQLExecutionStart* 
event. This cause app store to record a duplicate event and hence it shows up 
in UI twice

> SQL Tab shows duplicate executions for some commands
> 
>
> Key: SPARK-27114
> URL: https://issues.apache.org/jira/browse/SPARK-27114
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ajith S
>Priority: Minor
> Attachments: Screenshot from 2019-03-09 14-04-07.png
>
>
> run simple sql  command
> {{create table abc ( a int );}}
> Open SQL tab in SparkUI, we can see duplicate entries for the execution. 
> Tested behaviour in thriftserver and sparksql
> *check attachment*
> The Problem seems be due to eager execution of commands @ 
> org.apache.spark.sql.Dataset#logicalPlan
> After analysis for spark-sql, the call stacks for duplicate execution id 
> seems to be
> {code:java}
> $anonfun$withNewExecutionId$1:78, SQLExecution$ 
> (org.apache.spark.sql.execution)
> apply:-1, 2057192703 
> (org.apache.spark.sql.execution.SQLExecution$$$Lambda$1036)
> withSQLConfPropagated:147, SQLExecution$ (org.apache.spark.sql.execution)
> withNewExecutionId:74, SQLExecution$ (org.apache.spark.sql.execution)
> withAction:3346, Dataset (org.apache.spark.sql)
> :203, Dataset (org.apache.spark.sql)
> ofRows:88, Dataset$ (org.apache.spark.sql)
> sql:656, SparkSession (org.apache.spark.sql)
> sql:685, SQLContext (org.apache.spark.sql)
> run:63, SparkSQLDriver (org.apache.spark.sql.hive.thriftserver)
> processCmd:372, SparkSQLCLIDriver (org.apache.spark.sql.hive.thriftserver)
> processLine:376, CliDriver (org.apache.hadoop.hive.cli)
> main:275, SparkSQLCLIDriver$ (org.apache.spark.sql.hive.thriftserver)
> main:-1, SparkSQLCLIDriver (org.apache.spark.sql.hive.thriftserver)
> invoke0:-1, NativeMethodAccessorImpl (sun.reflect)
> invoke:62, NativeMethodAccessorImpl (sun.reflect)
> invoke:43, DelegatingMethodAccessorImpl (sun.reflect)
> invoke:498, Method (java.lang.reflect)
> start:52, JavaMainApplication (org.apache.spark.deploy)
> org$apache$spark$deploy$SparkSubmit$$runMain:855, SparkSubmit 
> (org.apache.spark.deploy)
> doRunMain$1:162, SparkSubmit (org.apache.spark.deploy)
> submit:185, SparkSubmit (org.apache.spark.deploy)
> doSubmit:87, SparkSubmit (org.apache.spark.deploy)
> doSubmit:934, SparkSubmit$$anon$2 (org.apache.spark.deploy)
> main:943, SparkSubmit$ (org.apache.spark.deploy)
> main:-1, SparkSubmit (org.apache.spark.deploy){code}
> {code:java}
> $anonfun$withNewExecutionId$1:78, SQLExecution$ 
> (org.apache.spark.sql.execution)
> apply:-1, 2057192703 
> (org.apache.spark.sql.execution.SQLExecution$$$Lambda$1036)
> withSQLConfPropagated:147, SQLExecution$ (org.apache.spark.sql.execution)
> withNewExecutionId:74, SQLExecution$ (org.apache.spark.sql.execution)
> run:65, SparkSQLDriver (org.apache.spark.sql.hive.thriftserver)
> processCmd:372, SparkSQLCLIDriver (org.apache.spark.sql.hive.thriftserver)
> processLine:376, CliDriver (org.apache.hadoop.hive.cli)
> main:275, SparkSQLCLIDriver$ (org.apache.spark.sql.hive.thriftserver)
> main:-1, SparkSQLCLIDriver (org.apache.spark.sql.hive.thriftserver)
> invoke0:-1, NativeMethodAccessorImpl (sun.reflect)
> invoke:62, NativeMethodAccessorImpl (sun.reflect)
> invoke:43, DelegatingMethodAccessorImpl (sun.reflect)
> invoke:498, Method (java.lang.reflect)
> start:52, JavaMainApplication (org.apache.spark.deploy)
> org$apache$spark$deploy$SparkSubmit$$runMain:855, SparkSubmit 
> (org.apache.spark.deploy)
> doRunMain$1:162, SparkSubmit (org.apache.spark.deploy)
> submit:185, SparkSubmit (org.apache.spark.deploy)
> doSubmit:87, SparkSubmit (org.apache.spark.deploy)
> doSubmit:934, SparkSubmit$$anon$2 (org.apache.spark.deploy)
> main:943, SparkSubmit$ (org.apache.spark.deploy)
> main:-1, SparkSubmit (org.apache.spark.deploy){code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2019-03-11 Thread Xianyin Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790119#comment-16790119
 ] 

Xianyin Xin commented on SPARK-21067:
-

Yep, [~Moriarty279] , nice analysis.

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0, 2.4.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>Priority: Major
> Attachments: SPARK-21067.patch
>
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
> at org.apache.spark.sql.Dataset.(Dataset.scala:185)
> at 

[jira] [Commented] (SPARK-27124) Expose org.apache.spark.sql.avro.SchemaConverters as developer API

2019-03-11 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790083#comment-16790083
 ] 

Hyukjin Kwon commented on SPARK-27124:
--

{{SchemaConverters}} isn't an API as you said. I think it's a bit odd that we 
document this specific internal class can be used via Python.

> Expose org.apache.spark.sql.avro.SchemaConverters as developer API
> --
>
> Key: SPARK-27124
> URL: https://issues.apache.org/jira/browse/SPARK-27124
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Priority: Minor
>
> org.apache.spark.sql.avro.SchemaConverters provides extremely useful APIs to 
> convert schema between Spark SQL and avro. This is reachable from scala side 
> but not from pyspark. I suggest to add this as a developer API to ease 
> development for pyspark users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27109) Refactoring of TimestampFormatter and DateFormatter

2019-03-11 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-27109.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 24030
[https://github.com/apache/spark/pull/24030]

> Refactoring of TimestampFormatter and DateFormatter
> ---
>
> Key: SPARK-27109
> URL: https://issues.apache.org/jira/browse/SPARK-27109
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Minor
> Fix For: 3.0.0
>
>
> * Date/TimestampFormatter converts parsed input to Instant before converting 
> it to days/micros. This is unnecessary conversion because seconds and 
> fraction of second can be extracted (calculated) from ZoneDateTime directly
>  * Avoid additional extraction of TemporalQueries.localTime from 
> temporalAccessor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27109) Refactoring of TimestampFormatter and DateFormatter

2019-03-11 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-27109:
-

Assignee: Maxim Gekk

> Refactoring of TimestampFormatter and DateFormatter
> ---
>
> Key: SPARK-27109
> URL: https://issues.apache.org/jira/browse/SPARK-27109
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Minor
>
> * Date/TimestampFormatter converts parsed input to Instant before converting 
> it to days/micros. This is unnecessary conversion because seconds and 
> fraction of second can be extracted (calculated) from ZoneDateTime directly
>  * Avoid additional extraction of TemporalQueries.localTime from 
> temporalAccessor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26921) Fix CRAN hack as soon as Arrow is available on CRAN

2019-03-11 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-26921:
-
Description: 
Arrow optimization was added but Arrow is not available on CRAN.

So, it had to add some hacks to avoid CRAN check in SparkR side. For example, 
see 

https://github.com/apache/spark/search?q=requireNamespace1_q=requireNamespace1

These should be removed to properly check CRAN in SparkR

See also ARROW-3204


  was:
Arrow optimization was added but Arrow is not available on CRAN.

So, it had to add some hacks to avoid CRAN check in SparkR side. For example, 
see 

https://github.com/apache/spark/search?q=requireNamespace1_q=requireNamespace1

These should be removed to properly check CRAN in SparkR


> Fix CRAN hack as soon as Arrow is available on CRAN
> ---
>
> Key: SPARK-26921
> URL: https://issues.apache.org/jira/browse/SPARK-26921
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Arrow optimization was added but Arrow is not available on CRAN.
> So, it had to add some hacks to avoid CRAN check in SparkR side. For example, 
> see 
> https://github.com/apache/spark/search?q=requireNamespace1_q=requireNamespace1
> These should be removed to properly check CRAN in SparkR
> See also ARROW-3204



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26923) Refactor ArrowRRunner and RRunner to share the same base

2019-03-11 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-26923.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 23977
[https://github.com/apache/spark/pull/23977]

> Refactor ArrowRRunner and RRunner to share the same base
> 
>
> Key: SPARK-26923
> URL: https://issues.apache.org/jira/browse/SPARK-26923
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR, SQL
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.0.0
>
>
> ArrowRRunner and RRunner has already duplicated codes. We should refactor and 
> deduplicate them. Also, ArrowRRunner happened to have a rather hacky code 
> (see 
> https://github.com/apache/spark/pull/23787/files#diff-a0b6a11cc2e2299455c795fe3c96b823R61
> ).
> We might even be able to deduplicate some codes with PythonRunners.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26923) Refactor ArrowRRunner and RRunner to share the same base

2019-03-11 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-26923:


Assignee: Hyukjin Kwon

> Refactor ArrowRRunner and RRunner to share the same base
> 
>
> Key: SPARK-26923
> URL: https://issues.apache.org/jira/browse/SPARK-26923
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR, SQL
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> ArrowRRunner and RRunner has already duplicated codes. We should refactor and 
> deduplicate them. Also, ArrowRRunner happened to have a rather hacky code 
> (see 
> https://github.com/apache/spark/pull/23787/files#diff-a0b6a11cc2e2299455c795fe3c96b823R61
> ).
> We might even be able to deduplicate some codes with PythonRunners.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26594) DataSourceOptions.asMap should return CaseInsensitiveMap

2019-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26594:


Assignee: (was: Apache Spark)

> DataSourceOptions.asMap should return CaseInsensitiveMap
> 
>
> Key: SPARK-26594
> URL: https://issues.apache.org/jira/browse/SPARK-26594
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Shixiong Zhu
>Priority: Major
>
> I'm pretty surprised that the following codes will fail.
> {code}
> import scala.collection.JavaConverters._
> import org.apache.spark.sql.sources.v2.DataSourceOptions
> val map = new DataSourceOptions(Map("fooBar" -> "x").asJava).asMap
> assert(map.get("fooBar") == "x")
> {code}
> It's better to make DataSourceOptions.asMap return CaseInsensitiveMap.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26594) DataSourceOptions.asMap should return CaseInsensitiveMap

2019-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26594:


Assignee: Apache Spark

> DataSourceOptions.asMap should return CaseInsensitiveMap
> 
>
> Key: SPARK-26594
> URL: https://issues.apache.org/jira/browse/SPARK-26594
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Shixiong Zhu
>Assignee: Apache Spark
>Priority: Major
>
> I'm pretty surprised that the following codes will fail.
> {code}
> import scala.collection.JavaConverters._
> import org.apache.spark.sql.sources.v2.DataSourceOptions
> val map = new DataSourceOptions(Map("fooBar" -> "x").asJava).asMap
> assert(map.get("fooBar") == "x")
> {code}
> It's better to make DataSourceOptions.asMap return CaseInsensitiveMap.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27123) Improve CollapseProject to handle projects cross limit/repartition/sample

2019-03-11 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-27123:
-

Assignee: Dongjoon Hyun

> Improve CollapseProject to handle projects cross limit/repartition/sample
> -
>
> Key: SPARK-27123
> URL: https://issues.apache.org/jira/browse/SPARK-27123
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> `CollapseProject` optimizer simplifies the plan by merging the adjacent 
> projects and performing alias substitution.
> {code:java}
> scala> sql("SELECT b c FROM (SELECT a b FROM t)").explain
> == Physical Plan ==
> *(1) Project [a#5 AS c#1]
> +- Scan hive default.t [a#5], HiveTableRelation `default`.`t`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#5]
> {code}
> We can do that more complex cases like the following.
> *BEFORE*
> {code:java}
> scala> sql("SELECT b c FROM (SELECT /*+ REPARTITION(1) */ a b FROM 
> t)").explain
> == Physical Plan ==
> *(2) Project [b#0 AS c#1]
> +- Exchange RoundRobinPartitioning(1)
>+- *(1) Project [a#5 AS b#0]
>   +- Scan hive default.t [a#5], HiveTableRelation `default`.`t`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#5]
> {code}
> *AFTER*
> {code:java}
> scala> sql("SELECT b c FROM (SELECT /*+ REPARTITION(1) */ a b FROM 
> t)").explain
> == Physical Plan ==
> Exchange RoundRobinPartitioning(1)
> +- *(1) Project [a#11 AS c#7]
>+- Scan hive default.t [a#11], HiveTableRelation `default`.`t`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#11]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26958) Add NestedSchemaPruningBenchmark

2019-03-11 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-26958:
-

Assignee: Dongjoon Hyun

> Add NestedSchemaPruningBenchmark
> 
>
> Key: SPARK-26958
> URL: https://issues.apache.org/jira/browse/SPARK-26958
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>
> This adds `NestedSchemaPruningBenchmark` to verify the going PR performance 
> benefits clearly and to prevent the future performance degradation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27123) Improve CollapseProject to handle projects cross limit/repartition/sample

2019-03-11 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27123:
--
Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-25603

> Improve CollapseProject to handle projects cross limit/repartition/sample
> -
>
> Key: SPARK-27123
> URL: https://issues.apache.org/jira/browse/SPARK-27123
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> `CollapseProject` optimizer simplifies the plan by merging the adjacent 
> projects and performing alias substitution.
> {code:java}
> scala> sql("SELECT b c FROM (SELECT a b FROM t)").explain
> == Physical Plan ==
> *(1) Project [a#5 AS c#1]
> +- Scan hive default.t [a#5], HiveTableRelation `default`.`t`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#5]
> {code}
> We can do that more complex cases like the following.
> *BEFORE*
> {code:java}
> scala> sql("SELECT b c FROM (SELECT /*+ REPARTITION(1) */ a b FROM 
> t)").explain
> == Physical Plan ==
> *(2) Project [b#0 AS c#1]
> +- Exchange RoundRobinPartitioning(1)
>+- *(1) Project [a#5 AS b#0]
>   +- Scan hive default.t [a#5], HiveTableRelation `default`.`t`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#5]
> {code}
> *AFTER*
> {code:java}
> scala> sql("SELECT b c FROM (SELECT /*+ REPARTITION(1) */ a b FROM 
> t)").explain
> == Physical Plan ==
> Exchange RoundRobinPartitioning(1)
> +- *(1) Project [a#11 AS c#7]
>+- Scan hive default.t [a#11], HiveTableRelation `default`.`t`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#11]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26822) Upgrade the deprecated module 'optparse'

2019-03-11 Thread Thincrs (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790018#comment-16790018
 ] 

Thincrs commented on SPARK-26822:
-

A user of thincrs has selected this issue. Deadline: Mon, Mar 18, 2019 10:19 PM

> Upgrade the deprecated module 'optparse'
> 
>
> Key: SPARK-26822
> URL: https://issues.apache.org/jira/browse/SPARK-26822
> Project: Spark
>  Issue Type: Task
>  Components: Tests
>Affects Versions: 2.4.0
>Reporter: Neo Chien
>Assignee: Neo Chien
>Priority: Minor
>  Labels: pull-request-available, test
> Fix For: 3.0.0
>
>
> Follow the [official 
> document|https://docs.python.org/2/library/argparse.html#upgrading-optparse-code]
>  to upgrade the deprecated module 'optparse' to 'argparse'.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27073) Fix a race condition when handling of IdleStateEvent

2019-03-11 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-27073.

   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 23989
[https://github.com/apache/spark/pull/23989]

> Fix a race condition when handling of IdleStateEvent
> 
>
> Key: SPARK-27073
> URL: https://issues.apache.org/jira/browse/SPARK-27073
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0, 2.4.0
>Reporter: dzcxzl
>Priority: Minor
> Fix For: 3.0.0
>
>
> When TransportChannelHandler processes IdleStateEvent, it first calculates 
> whether the last request time has timed out.
> At this time, TransportClient.sendRpc initiates a request.
> TransportChannelHandler gets responseHandler.numOutstandingRequests() > 0, 
> causing the normal connection to be closed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27004) Code for https uri authentication in Spark Submit needs to be removed

2019-03-11 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-27004.

   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 24033
[https://github.com/apache/spark/pull/24033]

> Code for https uri authentication in Spark Submit needs to be removed
> -
>
> Key: SPARK-27004
> URL: https://issues.apache.org/jira/browse/SPARK-27004
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Stavros Kontopoulos
>Assignee: Marcelo Vanzin
>Priority: Minor
> Fix For: 3.0.0
>
>
> The old code in Spark Submit used for uri verification according to the 
> comments 
> [here|https://github.com/apache/spark/pull/23546#issuecomment-463340476] and 
> [here|https://github.com/apache/spark/pull/23546#issuecomment-463366075] 
> needs to be removed or refactored otherwise it will cause failures with 
> secure http uris.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27004) Code for https uri authentication in Spark Submit needs to be removed

2019-03-11 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-27004:
--

Assignee: Marcelo Vanzin

> Code for https uri authentication in Spark Submit needs to be removed
> -
>
> Key: SPARK-27004
> URL: https://issues.apache.org/jira/browse/SPARK-27004
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Stavros Kontopoulos
>Assignee: Marcelo Vanzin
>Priority: Minor
>
> The old code in Spark Submit used for uri verification according to the 
> comments 
> [here|https://github.com/apache/spark/pull/23546#issuecomment-463340476] and 
> [here|https://github.com/apache/spark/pull/23546#issuecomment-463366075] 
> needs to be removed or refactored otherwise it will cause failures with 
> secure http uris.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23160) Add more window sql tests

2019-03-11 Thread Dylan Guedes (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789877#comment-16789877
 ] 

Dylan Guedes commented on SPARK-23160:
--

Hi,

I would like to work on this one, but to be fair I didn't get the meaning of 
"tests in other major databases". [~jiangxb1987] do you remember what scenarios 
you had in mind?

> Add more window sql tests
> -
>
> Key: SPARK-23160
> URL: https://issues.apache.org/jira/browse/SPARK-23160
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xingbo Jiang
>Priority: Minor
>
> We should also cover the window sql interface, example in 
> `sql/core/src/test/resources/sql-tests/inputs/window.sql`, it should also be 
> funny to see whether we can generate consistent results for window tests in 
> other major databases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27129) Add JSON event serialization methods in JSONUtils for blacklisted events

2019-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27129:


Assignee: (was: Apache Spark)

> Add JSON event serialization methods in JSONUtils for blacklisted events
> 
>
> Key: SPARK-27129
> URL: https://issues.apache.org/jira/browse/SPARK-27129
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Shahid K I
>Priority: Minor
>
> Add JSON event serialization methods in JSONUtils for blacklisted events



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27129) Add JSON event serialization methods in JSONUtils for blacklisted events

2019-03-11 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-27129.

Resolution: Invalid

Unless you can show why this (or in this case the lack of this) is a problem, 
there is nothing to do here.

> Add JSON event serialization methods in JSONUtils for blacklisted events
> 
>
> Key: SPARK-27129
> URL: https://issues.apache.org/jira/browse/SPARK-27129
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Shahid K I
>Priority: Minor
>
> Add JSON event serialization methods in JSONUtils for blacklisted events



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27129) Add JSON event serialization methods in JSONUtils for blacklisted events

2019-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27129:


Assignee: Apache Spark

> Add JSON event serialization methods in JSONUtils for blacklisted events
> 
>
> Key: SPARK-27129
> URL: https://issues.apache.org/jira/browse/SPARK-27129
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Shahid K I
>Assignee: Apache Spark
>Priority: Minor
>
> Add JSON event serialization methods in JSONUtils for blacklisted events



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27129) Add JSON event serialization methods in JSONUtils for blacklisted events

2019-03-11 Thread Shahid K I (JIRA)
Shahid K I created SPARK-27129:
--

 Summary: Add JSON event serialization methods in JSONUtils for 
blacklisted events
 Key: SPARK-27129
 URL: https://issues.apache.org/jira/browse/SPARK-27129
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: Shahid K I


Add JSON event serialization methods in JSONUtils for blacklisted events



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26961) Found Java-level deadlock in Spark Driver

2019-03-11 Thread Mi Zi (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789829#comment-16789829
 ] 

Mi Zi commented on SPARK-26961:
---

I think the problem could be fixed from a more general prospect if there is a 
way to read Configuration without updating its content. That should help to 
ensure accessing Configuration object without triggering class loader. I'm 
neither familiar with Hadoop nor Spark. So I don't know if that's a practical 
solution or not.

I don't know how far will registerAsParallelCapable affect the whole system. 
Since it will cause storing one lock for each loaded class. I'm not sure how 
large will that overhead be.

> Found Java-level deadlock in Spark Driver
> -
>
> Key: SPARK-26961
> URL: https://issues.apache.org/jira/browse/SPARK-26961
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Rong Jialei
>Priority: Major
>
> Our spark job usually will finish in minutes, however, we recently found it 
> take days to run, and we can only kill it when this happened.
> An investigation show all worker container could not connect drive after 
> start, and driver is hanging, using jstack, we found a Java-level deadlock.
>  
> *Jstack output for deadlock part is showing below:*
>  
> Found one Java-level deadlock:
> =
> "SparkUI-907":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> "ForkJoinPool-1-worker-57":
>  waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a 
> org.apache.spark.util.MutableURLClassLoader),
>  which is held by "ForkJoinPool-1-worker-7"
> "ForkJoinPool-1-worker-7":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> Java stack information for the threads listed above:
> ===
> "SparkUI-907":
>  at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328)
>  - waiting to lock <0x0005c0c1e5e0> (a 
> org.apache.hadoop.conf.Configuration)
>  at 
> org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145)
>  at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363)
>  at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840)
>  at 
> org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74)
>  at java.net.URL.getURLStreamHandler(URL.java:1142)
>  at java.net.URL.(URL.java:599)
>  at java.net.URL.(URL.java:490)
>  at java.net.URL.(URL.java:439)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>  at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>  at 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>  at 
> org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>  at 
> org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>  at 
> org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>  at 
> org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>  at org.spark_project.jetty.server.Server.handle(Server.java:534)
>  at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>  at 
> org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>  at 
> org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>  at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108)
>  at 
> 

[jira] [Commented] (SPARK-27093) Honor ParseMode in AvroFileFormat

2019-03-11 Thread Tim Cerexhe (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789816#comment-16789816
 ] 

Tim Cerexhe commented on SPARK-27093:
-

We have three distinct failure modes that we need to not throw:
 * corrupt avro file due to bad signal data, eg. file is truncated / throws 
{{EOFException}} before parsing complete. This can be reproduced by truncating 
a valid avro file
 * corrupt avro file due to invalid schema header, eg. corrupted over network. 
I believe this can be reproduced by opening in hexedit and writing zeros over 
some of the json header keys ;)
 * incompatible avro schema (eg. 
{{org.apache.spark.sql.avro.IncompatibleSchemaException: Cannot convert Avro to 
catalyst because schema at path rec.xxx is not compatible (avroType = 
\{"type":"array","items":"float"}, sqlType = StructType( ...}})

{{ignoreCorruptFiles}} successfully squashes the first error, but not the 
second or third schema-related errors.

I'll see if I can get a release on our test files, otherwise I'll generate new 
ones for you.

> Honor ParseMode in AvroFileFormat
> -
>
> Key: SPARK-27093
> URL: https://issues.apache.org/jira/browse/SPARK-27093
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output
>Affects Versions: 2.4.0
>Reporter: Tim Cerexhe
>Priority: Major
>
> The Avro reader is missing the ability to handle malformed or truncated files 
> like the JSON reader. Currently it throws exceptions when it encounters any 
> bad or truncated record in an Avro file, causing the entire Spark job to fail 
> from a single dodgy file. 
> Ideally the AvroFileFormat would accept a Permissive or DropMalformed 
> ParseMode like Spark's JSON format. This would enable the the Avro reader to 
> drop bad records and continue processing the good records rather than abort 
> the entire job. 
> Obviously the default could remain as FailFastMode, which is the current 
> effective behavior, so this wouldn’t break any existing users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27061) Expose Driver UI port on driver service to access UI using service

2019-03-11 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-27061:
--

Assignee: Chandu Kavar

> Expose Driver UI port on driver service to access UI using service
> --
>
> Key: SPARK-27061
> URL: https://issues.apache.org/jira/browse/SPARK-27061
> Project: Spark
>  Issue Type: Task
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Chandu Kavar
>Assignee: Chandu Kavar
>Priority: Minor
>  Labels: Kubernetes
> Fix For: 3.0.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, we can access the driver logs using 
> {{kubectl port-forward  4040:4040}}
> mentioned in 
> [https://spark.apache.org/docs/latest/running-on-kubernetes.html#accessing-driver-ui]
> We have users who submit spark jobs to Kubernetes, but they don't have access 
> to the cluster. so, they can't user kubectl port-forward command.
> If we can expose 4040 port on driver service, we can easily relay these logs 
> to UI using driver service and Nginx reverse proxy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27061) Expose Driver UI port on driver service to access UI using service

2019-03-11 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-27061.

   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 23990
[https://github.com/apache/spark/pull/23990]

> Expose Driver UI port on driver service to access UI using service
> --
>
> Key: SPARK-27061
> URL: https://issues.apache.org/jira/browse/SPARK-27061
> Project: Spark
>  Issue Type: Task
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Chandu Kavar
>Priority: Minor
>  Labels: Kubernetes
> Fix For: 3.0.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, we can access the driver logs using 
> {{kubectl port-forward  4040:4040}}
> mentioned in 
> [https://spark.apache.org/docs/latest/running-on-kubernetes.html#accessing-driver-ui]
> We have users who submit spark jobs to Kubernetes, but they don't have access 
> to the cluster. so, they can't user kubectl port-forward command.
> If we can expose 4040 port on driver service, we can easily relay these logs 
> to UI using driver service and Nginx reverse proxy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26998) spark.ssl.keyStorePassword in plaintext on 'ps -ef' output of executor processes in Standalone mode

2019-03-11 Thread Gabor Somogyi (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789786#comment-16789786
 ] 

Gabor Somogyi commented on SPARK-26998:
---

Same understanding, chosen the file approach.

> spark.ssl.keyStorePassword in plaintext on 'ps -ef' output of executor 
> processes in Standalone mode
> ---
>
> Key: SPARK-26998
> URL: https://issues.apache.org/jira/browse/SPARK-26998
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Security, Spark Core
>Affects Versions: 2.3.3, 2.4.0
>Reporter: t oo
>Priority: Major
>  Labels: SECURITY, Security, secur, security, security-issue
>
> Run spark standalone mode, then start a spark-submit requiring at least 1 
> executor. Do a 'ps -ef' on linux (ie putty terminal) and you will be able to 
> see  spark.ssl.keyStorePassword value in plaintext!
>  
> spark.ssl.keyStorePassword and  spark.ssl.keyPassword don't need to be passed 
> to  CoarseGrainedExecutorBackend. Only  spark.ssl.trustStorePassword is used.
>  
> Can be resolved if below PR is merged:
> [[Github] Pull Request #21514 
> (tooptoop4)|https://github.com/apache/spark/pull/21514]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26990) Difference in handling of mixed-case partition column names after SPARK-26188

2019-03-11 Thread Bruce Robbins (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruce Robbins resolved SPARK-26990.
---
Resolution: Fixed

> Difference in handling of mixed-case partition column names after SPARK-26188
> -
>
> Key: SPARK-26990
> URL: https://issues.apache.org/jira/browse/SPARK-26990
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: Bruce Robbins
>Priority: Major
> Fix For: 2.4.1, 3.0.0
>
>
> I noticed that the [PR for 
> SPARK-26188|https://github.com/apache/spark/pull/23165] changed how 
> mixed-cased partition columns are handled when the user provides a schema.
> Say I have this file structure (note that each instance of `pS` is mixed 
> case):
> {noformat}
> bash-3.2$ find partitioned5 -type d
> partitioned5
> partitioned5/pi=2
> partitioned5/pi=2/pS=foo
> partitioned5/pi=2/pS=bar
> partitioned5/pi=1
> partitioned5/pi=1/pS=foo
> partitioned5/pi=1/pS=bar
> bash-3.2$
> {noformat}
> If I load the file with a user-provided schema in 2.4 (before the PR was 
> committed) or 2.3, I see:
> {noformat}
> scala> val df = spark.read.schema("intField int, pi int, ps 
> string").parquet("partitioned5")
> df: org.apache.spark.sql.DataFrame = [intField: int, pi: int ... 1 more field]
> scala> df.printSchema
> root
>  |-- intField: integer (nullable = true)
>  |-- pi: integer (nullable = true)
>  |-- ps: string (nullable = true)
> scala>
> {noformat}
> However, using 2.4 after the PR was committed. I see:
> {noformat}
> scala> val df = spark.read.schema("intField int, pi int, ps 
> string").parquet("partitioned5")
> df: org.apache.spark.sql.DataFrame = [intField: int, pi: int ... 1 more field]
> scala> df.printSchema
> root
>  |-- intField: integer (nullable = true)
>  |-- pi: integer (nullable = true)
>  |-- pS: string (nullable = true)
> scala>
> {noformat}
> Spark is picking up the mixed-case column name {{pS}} from the directory 
> name, not the lower-case {{ps}} from my specified schema.
> In all tests, {{spark.sql.caseSensitive}} is set to the default (false).
> Not sure is this is an bug, but it is a difference.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26990) Difference in handling of mixed-case partition column names after SPARK-26188

2019-03-11 Thread Bruce Robbins (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruce Robbins updated SPARK-26990:
--
Fix Version/s: 2.4.1

> Difference in handling of mixed-case partition column names after SPARK-26188
> -
>
> Key: SPARK-26990
> URL: https://issues.apache.org/jira/browse/SPARK-26990
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: Bruce Robbins
>Priority: Major
> Fix For: 2.4.1, 3.0.0
>
>
> I noticed that the [PR for 
> SPARK-26188|https://github.com/apache/spark/pull/23165] changed how 
> mixed-cased partition columns are handled when the user provides a schema.
> Say I have this file structure (note that each instance of `pS` is mixed 
> case):
> {noformat}
> bash-3.2$ find partitioned5 -type d
> partitioned5
> partitioned5/pi=2
> partitioned5/pi=2/pS=foo
> partitioned5/pi=2/pS=bar
> partitioned5/pi=1
> partitioned5/pi=1/pS=foo
> partitioned5/pi=1/pS=bar
> bash-3.2$
> {noformat}
> If I load the file with a user-provided schema in 2.4 (before the PR was 
> committed) or 2.3, I see:
> {noformat}
> scala> val df = spark.read.schema("intField int, pi int, ps 
> string").parquet("partitioned5")
> df: org.apache.spark.sql.DataFrame = [intField: int, pi: int ... 1 more field]
> scala> df.printSchema
> root
>  |-- intField: integer (nullable = true)
>  |-- pi: integer (nullable = true)
>  |-- ps: string (nullable = true)
> scala>
> {noformat}
> However, using 2.4 after the PR was committed. I see:
> {noformat}
> scala> val df = spark.read.schema("intField int, pi int, ps 
> string").parquet("partitioned5")
> df: org.apache.spark.sql.DataFrame = [intField: int, pi: int ... 1 more field]
> scala> df.printSchema
> root
>  |-- intField: integer (nullable = true)
>  |-- pi: integer (nullable = true)
>  |-- pS: string (nullable = true)
> scala>
> {noformat}
> Spark is picking up the mixed-case column name {{pS}} from the directory 
> name, not the lower-case {{ps}} from my specified schema.
> In all tests, {{spark.sql.caseSensitive}} is set to the default (false).
> Not sure is this is an bug, but it is a difference.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26998) spark.ssl.keyStorePassword in plaintext on 'ps -ef' output of executor processes in Standalone mode

2019-03-11 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789766#comment-16789766
 ] 

Marcelo Vanzin commented on SPARK-26998:


There are 3 ways to solve this: pipe, file, or env variable. Pick one.

> spark.ssl.keyStorePassword in plaintext on 'ps -ef' output of executor 
> processes in Standalone mode
> ---
>
> Key: SPARK-26998
> URL: https://issues.apache.org/jira/browse/SPARK-26998
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Security, Spark Core
>Affects Versions: 2.3.3, 2.4.0
>Reporter: t oo
>Priority: Major
>  Labels: SECURITY, Security, secur, security, security-issue
>
> Run spark standalone mode, then start a spark-submit requiring at least 1 
> executor. Do a 'ps -ef' on linux (ie putty terminal) and you will be able to 
> see  spark.ssl.keyStorePassword value in plaintext!
>  
> spark.ssl.keyStorePassword and  spark.ssl.keyPassword don't need to be passed 
> to  CoarseGrainedExecutorBackend. Only  spark.ssl.trustStorePassword is used.
>  
> Can be resolved if below PR is merged:
> [[Github] Pull Request #21514 
> (tooptoop4)|https://github.com/apache/spark/pull/21514]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27128) Migrate JSON to File Data Source V2

2019-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27128:


Assignee: (was: Apache Spark)

> Migrate JSON to File Data Source V2
> ---
>
> Key: SPARK-27128
> URL: https://issues.apache.org/jira/browse/SPARK-27128
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27119) Do not infer schema when reading Hive serde table with native data source

2019-03-11 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-27119.
-
   Resolution: Fixed
Fix Version/s: 3.0.0

> Do not infer schema when reading Hive serde table with native data source
> -
>
> Key: SPARK-27119
> URL: https://issues.apache.org/jira/browse/SPARK-27119
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27128) Migrate JSON to File Data Source V2

2019-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27128:


Assignee: Apache Spark

> Migrate JSON to File Data Source V2
> ---
>
> Key: SPARK-27128
> URL: https://issues.apache.org/jira/browse/SPARK-27128
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27128) Migrate JSON to File Data Source V2

2019-03-11 Thread Gengliang Wang (JIRA)
Gengliang Wang created SPARK-27128:
--

 Summary: Migrate JSON to File Data Source V2
 Key: SPARK-27128
 URL: https://issues.apache.org/jira/browse/SPARK-27128
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Gengliang Wang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26839) on JDK11, IsolatedClientLoader must be able to load java.sql classes

2019-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26839:


Assignee: Apache Spark

> on JDK11, IsolatedClientLoader must be able to load java.sql classes
> 
>
> Key: SPARK-26839
> URL: https://issues.apache.org/jira/browse/SPARK-26839
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Imran Rashid
>Assignee: Apache Spark
>Priority: Major
>
> This might be very specific to my fork & a kind of weird system setup I'm 
> working on, I haven't completely confirmed yet, but I wanted to report it 
> anyway in case anybody else sees this.
> When I try to do anything which touches the metastore on java11, I 
> immediately get errors from IsolatedClientLoader that it can't load anything 
> in java.sql.  eg.
> {noformat}
> scala> spark.sql("show tables").show()
> java.lang.ClassNotFoundException: java.lang.NoClassDefFoundError: 
> java/sql/SQLTransientException when creating Hive client using classpath: 
> file:/home/systest/jdk-11.0.2/, ...
> ...
> Caused by: java.lang.ClassNotFoundException: java.sql.SQLTransientException
>   at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:471)
>   at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:588)
>   at 
> org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.doLoadClass(IsolatedClientLoader.scala:230)
>   at 
> org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.loadClass(IsolatedClientLoader.scala:219)
>   at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
> {noformat}
> After a bit of debugging, I also discovered that the {{rootClassLoader}} is 
> {{null}} in {{IsolatedClientLoader}}.  I think this would work if either 
> {{rootClassLoader}} could load those classes, or if {{isShared()}} was 
> changed to allow any class starting with "java."  (I'm not sure why it only 
> allows "java.lang" and "java.net" currently.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26839) on JDK11, IsolatedClientLoader must be able to load java.sql classes

2019-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26839:


Assignee: (was: Apache Spark)

> on JDK11, IsolatedClientLoader must be able to load java.sql classes
> 
>
> Key: SPARK-26839
> URL: https://issues.apache.org/jira/browse/SPARK-26839
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Imran Rashid
>Priority: Major
>
> This might be very specific to my fork & a kind of weird system setup I'm 
> working on, I haven't completely confirmed yet, but I wanted to report it 
> anyway in case anybody else sees this.
> When I try to do anything which touches the metastore on java11, I 
> immediately get errors from IsolatedClientLoader that it can't load anything 
> in java.sql.  eg.
> {noformat}
> scala> spark.sql("show tables").show()
> java.lang.ClassNotFoundException: java.lang.NoClassDefFoundError: 
> java/sql/SQLTransientException when creating Hive client using classpath: 
> file:/home/systest/jdk-11.0.2/, ...
> ...
> Caused by: java.lang.ClassNotFoundException: java.sql.SQLTransientException
>   at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:471)
>   at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:588)
>   at 
> org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.doLoadClass(IsolatedClientLoader.scala:230)
>   at 
> org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.loadClass(IsolatedClientLoader.scala:219)
>   at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
> {noformat}
> After a bit of debugging, I also discovered that the {{rootClassLoader}} is 
> {{null}} in {{IsolatedClientLoader}}.  I think this would work if either 
> {{rootClassLoader}} could load those classes, or if {{isShared()}} was 
> changed to allow any class starting with "java."  (I'm not sure why it only 
> allows "java.lang" and "java.net" currently.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26860) Improve RangeBetween docs in Pyspark, SparkR

2019-03-11 Thread Jagadesh Kiran N (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789703#comment-16789703
 ] 

Jagadesh Kiran N commented on SPARK-26860:
--

Thanks [~srowen] for committing the same 

> Improve RangeBetween docs in Pyspark, SparkR
> 
>
> Key: SPARK-26860
> URL: https://issues.apache.org/jira/browse/SPARK-26860
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, PySpark
>Affects Versions: 2.4.0
>Reporter: Shelby Vanhooser
>Assignee: Jagadesh Kiran N
>Priority: Minor
> Fix For: 3.0.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The docs describing 
> [RangeBetween|http://spark.apache.org/docs/2.4.0/api/python/_modules/pyspark/sql/window.html#Window.rangeBetween]
>  for PySpark appear to be duplicates of 
> [RowsBetween|http://spark.apache.org/docs/2.4.0/api/python/_modules/pyspark/sql/window.html#Window.rowsBetween]
>  even though these are functionally different windows.  Rows between 
> reference proceeding and succeeding rows, but rangeBetween is based on the 
> values in these rows.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26839) on JDK11, IsolatedClientLoader must be able to load java.sql classes

2019-03-11 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789706#comment-16789706
 ] 

Sean Owen commented on SPARK-26839:
---

I'll open my WIP PR on this.

I am not sure datanucleus is the issue after all. What is an issue for sure is 
IsolatedClientLoader. In "builtin" mode, it tries to get all the JARs from the 
current ClassLoader. These aren't available anymore in Java 9+. I tried just 
avoiding making a new URLClassLoader with these JARs in this case, but got some 
more CNFEs. I may need some extra eyes on this. PR coming.

> on JDK11, IsolatedClientLoader must be able to load java.sql classes
> 
>
> Key: SPARK-26839
> URL: https://issues.apache.org/jira/browse/SPARK-26839
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Imran Rashid
>Priority: Major
>
> This might be very specific to my fork & a kind of weird system setup I'm 
> working on, I haven't completely confirmed yet, but I wanted to report it 
> anyway in case anybody else sees this.
> When I try to do anything which touches the metastore on java11, I 
> immediately get errors from IsolatedClientLoader that it can't load anything 
> in java.sql.  eg.
> {noformat}
> scala> spark.sql("show tables").show()
> java.lang.ClassNotFoundException: java.lang.NoClassDefFoundError: 
> java/sql/SQLTransientException when creating Hive client using classpath: 
> file:/home/systest/jdk-11.0.2/, ...
> ...
> Caused by: java.lang.ClassNotFoundException: java.sql.SQLTransientException
>   at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:471)
>   at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:588)
>   at 
> org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.doLoadClass(IsolatedClientLoader.scala:230)
>   at 
> org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.loadClass(IsolatedClientLoader.scala:219)
>   at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
> {noformat}
> After a bit of debugging, I also discovered that the {{rootClassLoader}} is 
> {{null}} in {{IsolatedClientLoader}}.  I think this would work if either 
> {{rootClassLoader}} could load those classes, or if {{isShared()}} was 
> changed to allow any class starting with "java."  (I'm not sure why it only 
> allows "java.lang" and "java.net" currently.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27107) Spark SQL Job failing because of Kryo buffer overflow with ORC

2019-03-11 Thread Dhruve Ashar (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dhruve Ashar updated SPARK-27107:
-
Description: 
The issue occurs while trying to read ORC data and setting the SearchArgument.
{code:java}
 Caused by: com.esotericsoftware.kryo.KryoException: Buffer overflow. 
Available: 0, required: 9
Serialization trace:
literalList 
(org.apache.orc.storage.ql.io.sarg.SearchArgumentImpl$PredicateLeafImpl)
leaves (org.apache.orc.storage.ql.io.sarg.SearchArgumentImpl)
at com.esotericsoftware.kryo.io.Output.require(Output.java:163)
at com.esotericsoftware.kryo.io.Output.writeVarLong(Output.java:614)
at com.esotericsoftware.kryo.io.Output.writeLong(Output.java:538)
at 
com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.write(DefaultSerializers.java:147)
at 
com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.write(DefaultSerializers.java:141)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
at 
com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
at 
com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
at 
com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
at 
com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
at 
com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
at 
com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:534)
at 
org.apache.orc.mapred.OrcInputFormat.setSearchArgument(OrcInputFormat.java:96)
at 
org.apache.orc.mapreduce.OrcInputFormat.setSearchArgument(OrcInputFormat.java:57)
at 
org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(OrcFileFormat.scala:159)
at 
org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(OrcFileFormat.scala:156)
at scala.Option.foreach(Option.scala:257)
at 
org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.buildReaderWithPartitionValues(OrcFileFormat.scala:156)
at 
org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:297)
at 
org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:295)
at 
org.apache.spark.sql.execution.FileSourceScanExec.inputRDDs(DataSourceScanExec.scala:315)
at 
org.apache.spark.sql.execution.FilterExec.inputRDDs(basicPhysicalOperators.scala:121)
at 
org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.python.EvalPythonExec.doExecute(EvalPythonExec.scala:89)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:371)
at 
org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41)
at 

[jira] [Commented] (SPARK-27107) Spark SQL Job failing because of Kryo buffer overflow with ORC

2019-03-11 Thread Dhruve Ashar (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789668#comment-16789668
 ] 

Dhruve Ashar commented on SPARK-27107:
--

This is something we can consistently reproduce every single time. I do not 
have an example with company details stripped off that I can share for this use 
case at this point. Is there anything specific that you are looking for? I can 
try to come up with a reproducible case, but it seems to be difficult to 
reproduce as this is dependent on user query and the parameters that are being 
passed to filter the data.
{quote} Could you provide a reproducible test case here?
{quote}
The Hive default is 4K and not 4M (that was a typo). Thanks for correcting that.
{quote}BTW, the Hive default is 4K instead of 4M, isn't it?
{quote}
 Yes. The hive implementation should fail when it exceeds the 10M limit for a 
SArg and the PR that I have against the Orc implementation tries to make this 
configurable so that spark can control the buffer size if we hit a buffer 
overflow error.
{quote}Technically, Hive implementation also fails when it exceeds the 
limitation because it's a non-configurable parameter issue.
{quote}

> Spark SQL Job failing because of Kryo buffer overflow with ORC
> --
>
> Key: SPARK-27107
> URL: https://issues.apache.org/jira/browse/SPARK-27107
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Dhruve Ashar
>Priority: Major
>
> The issue occurs while trying to read ORC data and setting the SearchArgument.
> {code:java}
>  Caused by: com.esotericsoftware.kryo.KryoException: Buffer overflow. 
> Available: 0, required: 9
> Serialization trace:
> literalList 
> (org.apache.orc.storage.ql.io.sarg.SearchArgumentImpl$PredicateLeafImpl)
> leaves (org.apache.orc.storage.ql.io.sarg.SearchArgumentImpl)
>   at com.esotericsoftware.kryo.io.Output.require(Output.java:163)
>   at com.esotericsoftware.kryo.io.Output.writeVarLong(Output.java:614)
>   at com.esotericsoftware.kryo.io.Output.writeLong(Output.java:538)
>   at 
> com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.write(DefaultSerializers.java:147)
>   at 
> com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.write(DefaultSerializers.java:141)
>   at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
>   at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
>   at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
>   at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
>   at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
>   at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
>   at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
>   at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
>   at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
>   at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
>   at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:534)
>   at 
> org.apache.orc.mapred.OrcInputFormat.setSearchArgument(OrcInputFormat.java:96)
>   at 
> org.apache.orc.mapreduce.OrcInputFormat.setSearchArgument(OrcInputFormat.java:57)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(OrcFileFormat.scala:159)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(OrcFileFormat.scala:156)
>   at scala.Option.foreach(Option.scala:257)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.buildReaderWithPartitionValues(OrcFileFormat.scala:156)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:297)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:295)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.inputRDDs(DataSourceScanExec.scala:315)
>   at 
> org.apache.spark.sql.execution.FilterExec.inputRDDs(basicPhysicalOperators.scala:121)
>   at 
> org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41)
>   at 
> 

[jira] [Comment Edited] (SPARK-26998) spark.ssl.keyStorePassword in plaintext on 'ps -ef' output of executor processes in Standalone mode

2019-03-11 Thread Gabor Somogyi (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789653#comment-16789653
 ] 

Gabor Somogyi edited comment on SPARK-26998 at 3/11/19 2:47 PM:


Since the first part of the PR solved (http URLs in case of secure mode) 
continuing with the second issue.
In my view the problem can be mitigated to ask users to provide configuration 
parameters in configuration file (several commercial products does this)
* Either spark-defaults.conf
* or --properties-file

That way the command line options will show either nothing (spark-defaults.conf 
picked up by default) or something like "... --properties-file 
my-secret-spark-properties.conf ...".
As a side note this workaround is available at the moment but I would like to 
warn users for such situations.

The other approach what I've considered (and abandoned) is to open a pipe and 
send the password through this channel but since this approach is not really 
conform with Spark's configuration system
it would imply heavy changes and don't see the return of investment.

[~vanzin] what do you think since you have quite a bit experience with security?



was (Author: gsomogyi):
Since the first part of the PR solved (http URLs in case of secure mode) 
continuing with the second issue.
In my view the problem can be mitigated to ask users to provide configuration 
parameters either in configuration file (several commercial products does this)
* Either spark-defaults.conf
* or --properties-file

That way the command line options will show either nothing (spark-defaults.conf 
picked up by default) or something like "... --properties-file 
my-secret-spark-properties.conf ...".
As a side note this workaround is available at the moment but I would like to 
warn users for such situations.

The other approach what I've considered (and abandoned) is to open a pipe and 
send the password through this channel but since this approach is not really 
conform with Spark's configuration system
it would imply heavy changes and don't see the return of investment.

[~vanzin] what do you think since you have quite a bit experience with security?


> spark.ssl.keyStorePassword in plaintext on 'ps -ef' output of executor 
> processes in Standalone mode
> ---
>
> Key: SPARK-26998
> URL: https://issues.apache.org/jira/browse/SPARK-26998
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Security, Spark Core
>Affects Versions: 2.3.3, 2.4.0
>Reporter: t oo
>Priority: Major
>  Labels: SECURITY, Security, secur, security, security-issue
>
> Run spark standalone mode, then start a spark-submit requiring at least 1 
> executor. Do a 'ps -ef' on linux (ie putty terminal) and you will be able to 
> see  spark.ssl.keyStorePassword value in plaintext!
>  
> spark.ssl.keyStorePassword and  spark.ssl.keyPassword don't need to be passed 
> to  CoarseGrainedExecutorBackend. Only  spark.ssl.trustStorePassword is used.
>  
> Can be resolved if below PR is merged:
> [[Github] Pull Request #21514 
> (tooptoop4)|https://github.com/apache/spark/pull/21514]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26998) spark.ssl.keyStorePassword in plaintext on 'ps -ef' output of executor processes in Standalone mode

2019-03-11 Thread Gabor Somogyi (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789653#comment-16789653
 ] 

Gabor Somogyi commented on SPARK-26998:
---

Since the first part of the PR solved (http URLs in case of secure mode) 
continuing with the second issue.
In my view the problem can be mitigated to ask users to provide configuration 
parameters either in configuration file (several commercial products does this)
* Either spark-defaults.conf
* or --properties-file

That way the command line options will show either nothing (spark-defaults.conf 
picked up by default) or something like "... --properties-file 
my-secret-spark-properties.conf ...".
As a side note this workaround is available at the moment but I would like to 
warn users for such situations.

The other approach what I've considered (and abandoned) is to open a pipe and 
send the password through this channel but since this approach is not really 
conform with Spark's configuration system
it would imply heavy changes and don't see the return of investment.

[~vanzin] what do you think since you have quite a bit experience with security?


> spark.ssl.keyStorePassword in plaintext on 'ps -ef' output of executor 
> processes in Standalone mode
> ---
>
> Key: SPARK-26998
> URL: https://issues.apache.org/jira/browse/SPARK-26998
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Security, Spark Core
>Affects Versions: 2.3.3, 2.4.0
>Reporter: t oo
>Priority: Major
>  Labels: SECURITY, Security, secur, security, security-issue
>
> Run spark standalone mode, then start a spark-submit requiring at least 1 
> executor. Do a 'ps -ef' on linux (ie putty terminal) and you will be able to 
> see  spark.ssl.keyStorePassword value in plaintext!
>  
> spark.ssl.keyStorePassword and  spark.ssl.keyPassword don't need to be passed 
> to  CoarseGrainedExecutorBackend. Only  spark.ssl.trustStorePassword is used.
>  
> Can be resolved if below PR is merged:
> [[Github] Pull Request #21514 
> (tooptoop4)|https://github.com/apache/spark/pull/21514]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2019-03-11 Thread Dominic Ricard (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Ricard updated SPARK-21067:
---
Affects Version/s: 2.4.0

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0, 2.4.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>Priority: Major
> Attachments: SPARK-21067.patch
>
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
> at org.apache.spark.sql.Dataset.(Dataset.scala:185)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
> at 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2019-03-11 Thread Mori[A]rty (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789589#comment-16789589
 ] 

Mori[A]rty edited comment on SPARK-21067 at 3/11/19 2:04 PM:
-

Spark 2.4.0 still has the same problem. And I want to elaborate a bit more 
about how this bug happens.
  
 To reproduce this bug, we have to set hive.server2.enable.doAs to true in 
hive-site.xml. It causes SparkThriftServer to execute 

SparkExecuteStatementOperation with 
org.apache.hive.service.cli.session.HiveSessionImplwithUGI#sessionUgi.

When the first time a CTAS statement is executed, 
HiveClientImpl#state#hdfsEncryptionShim#hdfsAdmin#dfs is initialized using ugi 
from HiveSessionImplwithUGI#sessionUgi. 
HiveClientImpl#state#hdfsEncryptionShim#hdfsAdmin#dfs will be closed when 
session closes.

Since HiveClientImpl is shared among SparkThriftServer and 
HiveClientImpl#state#hdfsEncryptionShim won't be initialized again, subsequent  
usage of HiveClientImpl#state#hdfsEncryptionShim will throw 
"java.io.IOException: Filesystem closed".

Here is a simpler way to reproduce:

1. Open Session1 with beeline and execute SQL: 
{code:java}
CREATE TABLE tbl (i int);
INSERT INTO tbl SELECT 1;{code}
2. Close Session1

3. Open Session2 with beeline and execute SQL: 
{code:java}
INSERT INTO tbl SELECT 1;
{code}
4. Get exception in Session2:
{code:java}
Error: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
hdfs://xxx/user/hive/warehouse/tbl/.hive-staging_hive_2019-03-11_20-22-53_627_4903214085803350096-2/-ext-1/part-0-b6277b02-c70f-4784-bc99-1201ac6e6364-c000
 to destination 
hdfs://xxx/user/hive/warehouse/tbl/part-0-b6277b02-c70f-4784-bc99-1201ac6e6364-c000;
 (state=,code=0){code}
By the way, [~xinxianyin]'s patch works for me and I think it is the simplest 
to solve this issue.


was (Author: moriarty279):
Spark 2.4.0 still has the same problem. And I want to elaborate a bit more 
about how this bug happened.
  
 To reproduce this bug, we have to set hive.server2.enable.doAs to true in 
hive-site.xml. It causes SparkThriftServer to execute 

SparkExecuteStatementOperation with 
org.apache.hive.service.cli.session.HiveSessionImplwithUGI#sessionUgi.

When the first time a CTAS statement is executed, 
HiveClientImpl#state#hdfsEncryptionShim#hdfsAdmin#dfs is initialized using ugi 
from HiveSessionImplwithUGI#sessionUgi. 
HiveClientImpl#state#hdfsEncryptionShim#hdfsAdmin#dfs will be closed when 
session closes.

Since HiveClientImpl is shared among SparkThriftServer and 
HiveClientImpl#state#hdfsEncryptionShim won't be initialized again, subsequent  
usage of HiveClientImpl#state#hdfsEncryptionShim will throw 
"java.io.IOException: Filesystem closed".

Here are the most simple steps to reproduce:

1. Open Session1 with beeline and execute SQL: 
{code:java}
CREATE TABLE tbl (i int);
INSERT INTO tbl SELECT 1;{code}
2. Close Session1

3. Open Session2 with beeline and execute SQL: 
{code:java}
INSERT INTO tbl SELECT 1;
{code}
4. Get exception in Session2:
{code:java}
Error: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
hdfs://xxx/user/hive/warehouse/tbl/.hive-staging_hive_2019-03-11_20-22-53_627_4903214085803350096-2/-ext-1/part-0-b6277b02-c70f-4784-bc99-1201ac6e6364-c000
 to destination 
hdfs://xxx/user/hive/warehouse/tbl/part-0-b6277b02-c70f-4784-bc99-1201ac6e6364-c000;
 (state=,code=0)
{code}

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>Priority: Major
> Attachments: SPARK-21067.patch
>
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} 

[jira] [Commented] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2019-03-11 Thread Mori[A]rty (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789589#comment-16789589
 ] 

Mori[A]rty commented on SPARK-21067:


Spark 2.4.0 still has the same problem. And I want to elaborate a bit more 
about how this bug happened.
 
To reproduce this bug, we have to set hive.server2.enable.doAs to true in 
hive-site.xml. It causes SparkThriftServer to execute 

SparkExecuteStatementOperation with 
org.apache.hive.service.cli.session.HiveSessionImplwithUGI#sessionUgi.

When the first time a CTAS statement is executed, 
HiveClientImpl#state#hdfsEncryptionShim#hdfsAdmin#dfs is initialized using ugi 
from HiveSessionImplwithUGI#sessionUgi. 
HiveClientImpl#state#hdfsEncryptionShim#hdfsAdmin#dfs will be closed when 
session closes.

Since HiveClientImpl is shared among SparkThriftServer and 
HiveClientImpl#state#hdfsEncryptionShim won't be initialized again. Subsequent  
usage of HiveClientImpl#state#hdfsEncryptionShim will throw 
"java.io.IOException: Filesystem closed".

Here are the most simple steps to reproduce:

1. Open Session1 with beeline and execute SQL: 
{code:java}
CREATE TABLE tbl (i int);
INSERT INTO tbl select 1;{code}
2. Close Session1

3. Open Session2 with beeline and execute SQL: 
{code:java}
INSERT INTO tbl select 1;
{code}
4. Get exception in Session2:
{code:java}
Error: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
hdfs://xxx/user/hive/warehouse/tbl/.hive-staging_hive_2019-03-11_20-22-53_627_4903214085803350096-2/-ext-1/part-0-b6277b02-c70f-4784-bc99-1201ac6e6364-c000
 to destination 
hdfs://xxx/user/hive/warehouse/tbl/part-0-b6277b02-c70f-4784-bc99-1201ac6e6364-c000;
 (state=,code=0)
{code}

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>Priority: Major
> Attachments: SPARK-21067.patch
>
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2019-03-11 Thread Mori[A]rty (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789589#comment-16789589
 ] 

Mori[A]rty edited comment on SPARK-21067 at 3/11/19 1:57 PM:
-

Spark 2.4.0 still has the same problem. And I want to elaborate a bit more 
about how this bug happened.
  
 To reproduce this bug, we have to set hive.server2.enable.doAs to true in 
hive-site.xml. It causes SparkThriftServer to execute 

SparkExecuteStatementOperation with 
org.apache.hive.service.cli.session.HiveSessionImplwithUGI#sessionUgi.

When the first time a CTAS statement is executed, 
HiveClientImpl#state#hdfsEncryptionShim#hdfsAdmin#dfs is initialized using ugi 
from HiveSessionImplwithUGI#sessionUgi. 
HiveClientImpl#state#hdfsEncryptionShim#hdfsAdmin#dfs will be closed when 
session closes.

Since HiveClientImpl is shared among SparkThriftServer and 
HiveClientImpl#state#hdfsEncryptionShim won't be initialized again, subsequent  
usage of HiveClientImpl#state#hdfsEncryptionShim will throw 
"java.io.IOException: Filesystem closed".

Here are the most simple steps to reproduce:

1. Open Session1 with beeline and execute SQL: 
{code:java}
CREATE TABLE tbl (i int);
INSERT INTO tbl SELECT 1;{code}
2. Close Session1

3. Open Session2 with beeline and execute SQL: 
{code:java}
INSERT INTO tbl SELECT 1;
{code}
4. Get exception in Session2:
{code:java}
Error: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
hdfs://xxx/user/hive/warehouse/tbl/.hive-staging_hive_2019-03-11_20-22-53_627_4903214085803350096-2/-ext-1/part-0-b6277b02-c70f-4784-bc99-1201ac6e6364-c000
 to destination 
hdfs://xxx/user/hive/warehouse/tbl/part-0-b6277b02-c70f-4784-bc99-1201ac6e6364-c000;
 (state=,code=0)
{code}


was (Author: moriarty279):
Spark 2.4.0 still has the same problem. And I want to elaborate a bit more 
about how this bug happened.
 
To reproduce this bug, we have to set hive.server2.enable.doAs to true in 
hive-site.xml. It causes SparkThriftServer to execute 

SparkExecuteStatementOperation with 
org.apache.hive.service.cli.session.HiveSessionImplwithUGI#sessionUgi.

When the first time a CTAS statement is executed, 
HiveClientImpl#state#hdfsEncryptionShim#hdfsAdmin#dfs is initialized using ugi 
from HiveSessionImplwithUGI#sessionUgi. 
HiveClientImpl#state#hdfsEncryptionShim#hdfsAdmin#dfs will be closed when 
session closes.

Since HiveClientImpl is shared among SparkThriftServer and 
HiveClientImpl#state#hdfsEncryptionShim won't be initialized again. Subsequent  
usage of HiveClientImpl#state#hdfsEncryptionShim will throw 
"java.io.IOException: Filesystem closed".

Here are the most simple steps to reproduce:

1. Open Session1 with beeline and execute SQL: 
{code:java}
CREATE TABLE tbl (i int);
INSERT INTO tbl select 1;{code}
2. Close Session1

3. Open Session2 with beeline and execute SQL: 
{code:java}
INSERT INTO tbl select 1;
{code}
4. Get exception in Session2:
{code:java}
Error: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
hdfs://xxx/user/hive/warehouse/tbl/.hive-staging_hive_2019-03-11_20-22-53_627_4903214085803350096-2/-ext-1/part-0-b6277b02-c70f-4784-bc99-1201ac6e6364-c000
 to destination 
hdfs://xxx/user/hive/warehouse/tbl/part-0-b6277b02-c70f-4784-bc99-1201ac6e6364-c000;
 (state=,code=0)
{code}

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>Priority: Major
> Attachments: SPARK-21067.patch
>
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. 

[jira] [Updated] (SPARK-26860) Improve RangeBetween docs in Pyspark, SparkR

2019-03-11 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-26860:
--
 Labels:   (was: docs easyfix python)
   Priority: Minor  (was: Major)
Component/s: Documentation
 Issue Type: Improvement  (was: Bug)
Summary: Improve RangeBetween docs in Pyspark, SparkR  (was: 
RangeBetween docs appear to be wrong )

> Improve RangeBetween docs in Pyspark, SparkR
> 
>
> Key: SPARK-26860
> URL: https://issues.apache.org/jira/browse/SPARK-26860
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, PySpark
>Affects Versions: 2.4.0
>Reporter: Shelby Vanhooser
>Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The docs describing 
> [RangeBetween|http://spark.apache.org/docs/2.4.0/api/python/_modules/pyspark/sql/window.html#Window.rangeBetween]
>  for PySpark appear to be duplicates of 
> [RowsBetween|http://spark.apache.org/docs/2.4.0/api/python/_modules/pyspark/sql/window.html#Window.rowsBetween]
>  even though these are functionally different windows.  Rows between 
> reference proceeding and succeeding rows, but rangeBetween is based on the 
> values in these rows.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26961) Found Java-level deadlock in Spark Driver

2019-03-11 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789598#comment-16789598
 ] 

Sean Owen commented on SPARK-26961:
---

I think registerAsParallelCapable() sounds like it could resolve the actual 
issue in practice here. Let's do that, because these ClassLoader subclasses are 
called concurrently in some cases. We probably can't fix Hadoop here.

Looks like there are 4-5 implementations in Spark. I think we can address all 
of them. It would have to go in a companion object to these classes, in Scala. 
I don't see a downside here other than that the locking is potentially more 
expensive as it's finer-grained?

> Found Java-level deadlock in Spark Driver
> -
>
> Key: SPARK-26961
> URL: https://issues.apache.org/jira/browse/SPARK-26961
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Rong Jialei
>Priority: Major
>
> Our spark job usually will finish in minutes, however, we recently found it 
> take days to run, and we can only kill it when this happened.
> An investigation show all worker container could not connect drive after 
> start, and driver is hanging, using jstack, we found a Java-level deadlock.
>  
> *Jstack output for deadlock part is showing below:*
>  
> Found one Java-level deadlock:
> =
> "SparkUI-907":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> "ForkJoinPool-1-worker-57":
>  waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a 
> org.apache.spark.util.MutableURLClassLoader),
>  which is held by "ForkJoinPool-1-worker-7"
> "ForkJoinPool-1-worker-7":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> Java stack information for the threads listed above:
> ===
> "SparkUI-907":
>  at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328)
>  - waiting to lock <0x0005c0c1e5e0> (a 
> org.apache.hadoop.conf.Configuration)
>  at 
> org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145)
>  at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363)
>  at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840)
>  at 
> org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74)
>  at java.net.URL.getURLStreamHandler(URL.java:1142)
>  at java.net.URL.(URL.java:599)
>  at java.net.URL.(URL.java:490)
>  at java.net.URL.(URL.java:439)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>  at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>  at 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>  at 
> org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>  at 
> org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>  at 
> org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>  at 
> org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>  at org.spark_project.jetty.server.Server.handle(Server.java:534)
>  at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>  at 
> org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>  at 
> org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>  at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108)
>  at 
> 

[jira] [Commented] (SPARK-27124) Expose org.apache.spark.sql.avro.SchemaConverters as developer API

2019-03-11 Thread Gabor Somogyi (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789591#comment-16789591
 ] 

Gabor Somogyi commented on SPARK-27124:
---

I would mention "SchemaConverters can be used to make schema conversion" and 
"in python Py4J can be used to reach it" or something like this.
Agree that all the technical details should not be added since it can change 
easily.

> Expose org.apache.spark.sql.avro.SchemaConverters as developer API
> --
>
> Key: SPARK-27124
> URL: https://issues.apache.org/jira/browse/SPARK-27124
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Priority: Minor
>
> org.apache.spark.sql.avro.SchemaConverters provides extremely useful APIs to 
> convert schema between Spark SQL and avro. This is reachable from scala side 
> but not from pyspark. I suggest to add this as a developer API to ease 
> development for pyspark users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26860) Improve RangeBetween docs in Pyspark, SparkR

2019-03-11 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-26860.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 23946
[https://github.com/apache/spark/pull/23946]

> Improve RangeBetween docs in Pyspark, SparkR
> 
>
> Key: SPARK-26860
> URL: https://issues.apache.org/jira/browse/SPARK-26860
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, PySpark
>Affects Versions: 2.4.0
>Reporter: Shelby Vanhooser
>Assignee: Jagadesh Kiran N
>Priority: Minor
> Fix For: 3.0.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The docs describing 
> [RangeBetween|http://spark.apache.org/docs/2.4.0/api/python/_modules/pyspark/sql/window.html#Window.rangeBetween]
>  for PySpark appear to be duplicates of 
> [RowsBetween|http://spark.apache.org/docs/2.4.0/api/python/_modules/pyspark/sql/window.html#Window.rowsBetween]
>  even though these are functionally different windows.  Rows between 
> reference proceeding and succeeding rows, but rangeBetween is based on the 
> values in these rows.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26860) Improve RangeBetween docs in Pyspark, SparkR

2019-03-11 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-26860:
-

Assignee: Jagadesh Kiran N

> Improve RangeBetween docs in Pyspark, SparkR
> 
>
> Key: SPARK-26860
> URL: https://issues.apache.org/jira/browse/SPARK-26860
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, PySpark
>Affects Versions: 2.4.0
>Reporter: Shelby Vanhooser
>Assignee: Jagadesh Kiran N
>Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The docs describing 
> [RangeBetween|http://spark.apache.org/docs/2.4.0/api/python/_modules/pyspark/sql/window.html#Window.rangeBetween]
>  for PySpark appear to be duplicates of 
> [RowsBetween|http://spark.apache.org/docs/2.4.0/api/python/_modules/pyspark/sql/window.html#Window.rowsBetween]
>  even though these are functionally different windows.  Rows between 
> reference proceeding and succeeding rows, but rangeBetween is based on the 
> values in these rows.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27116) Environment tab must sort Hadoop Configuration by default

2019-03-11 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-27116:
-

Assignee: Ajith S

> Environment tab must sort Hadoop Configuration by default
> -
>
> Key: SPARK-27116
> URL: https://issues.apache.org/jira/browse/SPARK-27116
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Minor
>
> Environment tab in SparkUI do not have Hadoop Configuration sorted. All other 
> tables in the same page like Spark Configrations, System Configuration etc 
> are sorted by keys by default



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-26860) RangeBetween docs appear to be wrong

2019-03-11 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reopened SPARK-26860:
---

> RangeBetween docs appear to be wrong 
> -
>
> Key: SPARK-26860
> URL: https://issues.apache.org/jira/browse/SPARK-26860
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.0
>Reporter: Shelby Vanhooser
>Priority: Major
>  Labels: docs, easyfix, python
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The docs describing 
> [RangeBetween|http://spark.apache.org/docs/2.4.0/api/python/_modules/pyspark/sql/window.html#Window.rangeBetween]
>  for PySpark appear to be duplicates of 
> [RowsBetween|http://spark.apache.org/docs/2.4.0/api/python/_modules/pyspark/sql/window.html#Window.rowsBetween]
>  even though these are functionally different windows.  Rows between 
> reference proceeding and succeeding rows, but rangeBetween is based on the 
> values in these rows.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27116) Environment tab must sort Hadoop Configuration by default

2019-03-11 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-27116.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 24038
[https://github.com/apache/spark/pull/24038]

> Environment tab must sort Hadoop Configuration by default
> -
>
> Key: SPARK-27116
> URL: https://issues.apache.org/jira/browse/SPARK-27116
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Minor
> Fix For: 3.0.0
>
>
> Environment tab in SparkUI do not have Hadoop Configuration sorted. All other 
> tables in the same page like Spark Configrations, System Configuration etc 
> are sorted by keys by default



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26152) Flaky test: BroadcastSuite

2019-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26152:


Assignee: (was: Apache Spark)

> Flaky test: BroadcastSuite
> --
>
> Key: SPARK-26152
> URL: https://issues.apache.org/jira/browse/SPARK-26152
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Critical
> Attachments: Screenshot from 2019-03-11 17-03-40.png
>
>
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/5627
>  (2018-11-16)
> {code}
> BroadcastSuite:
> - Using TorrentBroadcast locally
> - Accessing TorrentBroadcast variables from multiple threads
> - Accessing TorrentBroadcast variables in a local cluster (encryption = off)
> java.util.concurrent.RejectedExecutionException: Task 
> scala.concurrent.impl.CallbackRunnable@59428a1 rejected from 
> java.util.concurrent.ThreadPoolExecutor@4096a677[Shutting down, pool size = 
> 1, active threads = 1, queued tasks = 0, completed tasks = 0]
>   at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
>   at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
>   at 
> java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668)
>   at 
> scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:134)
>   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284)
>   at scala.concurrent.Promise.complete(Promise.scala:49)
>   at scala.concurrent.Promise.complete$(Promise.scala:48)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:183)
>   at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
>   at 
> scala.concurrent.BatchingExecutor$Batch.processBatch$1(BatchingExecutor.scala:63)
>   at 
> scala.concurrent.BatchingExecutor$Batch.$anonfun$run$1(BatchingExecutor.scala:78)
>   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
>   at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81)
>   at 
> scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:55)
>   at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:870)
>   at scala.concurrent.BatchingExecutor.execute(BatchingExecutor.scala:106)
>   at 
> scala.concurrent.BatchingExecutor.execute$(BatchingExecutor.scala:103)
>   at 
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:868)
>   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284)
>   at scala.concurrent.Promise.complete(Promise.scala:49)
>   at scala.concurrent.Promise.complete$(Promise.scala:48)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:183)
>   at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> java.util.concurrent.RejectedExecutionException: Task 
> scala.concurrent.impl.CallbackRunnable@40a5bf17 rejected from 
> java.util.concurrent.ThreadPoolExecutor@5a73967[Shutting down, pool size = 1, 
> active threads = 1, queued tasks = 0, completed tasks = 0]
>   at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
>   at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
>   at 
> java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668)
>   at 
> 

[jira] [Comment Edited] (SPARK-27124) Expose org.apache.spark.sql.avro.SchemaConverters as developer API

2019-03-11 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789562#comment-16789562
 ] 

Hyukjin Kwon edited comment on SPARK-27124 at 3/11/19 1:32 PM:
---

I think we can mention that generally somewhere, say, Py4J approach can be used 
to access to internal codes in Spark. To be honest, this approach requires to 
understand Py4J. So, we shouldn't focus on how to use it tho. Technically Py4J 
should describe how to use it.


was (Author: hyukjin.kwon):
I think we can mention that generally somewhere, say, Py4J approach can be used 
to access to internal codes in Spark. To be honest, this approach requires to 
understand Py4J. So, we shouldn't focus on how to use it tho.

> Expose org.apache.spark.sql.avro.SchemaConverters as developer API
> --
>
> Key: SPARK-27124
> URL: https://issues.apache.org/jira/browse/SPARK-27124
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Priority: Minor
>
> org.apache.spark.sql.avro.SchemaConverters provides extremely useful APIs to 
> convert schema between Spark SQL and avro. This is reachable from scala side 
> but not from pyspark. I suggest to add this as a developer API to ease 
> development for pyspark users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27124) Expose org.apache.spark.sql.avro.SchemaConverters as developer API

2019-03-11 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789562#comment-16789562
 ] 

Hyukjin Kwon commented on SPARK-27124:
--

I think we can mention that generally somewhere, say, Py4J approach can be used 
to access to internal codes in Spark. To be honest, this approach requires to 
understand Py4J. So, we shouldn't focus on how to use it tho.

> Expose org.apache.spark.sql.avro.SchemaConverters as developer API
> --
>
> Key: SPARK-27124
> URL: https://issues.apache.org/jira/browse/SPARK-27124
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Priority: Minor
>
> org.apache.spark.sql.avro.SchemaConverters provides extremely useful APIs to 
> convert schema between Spark SQL and avro. This is reachable from scala side 
> but not from pyspark. I suggest to add this as a developer API to ease 
> development for pyspark users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26152) Flaky test: BroadcastSuite

2019-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26152:


Assignee: Apache Spark

> Flaky test: BroadcastSuite
> --
>
> Key: SPARK-26152
> URL: https://issues.apache.org/jira/browse/SPARK-26152
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Critical
> Attachments: Screenshot from 2019-03-11 17-03-40.png
>
>
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/5627
>  (2018-11-16)
> {code}
> BroadcastSuite:
> - Using TorrentBroadcast locally
> - Accessing TorrentBroadcast variables from multiple threads
> - Accessing TorrentBroadcast variables in a local cluster (encryption = off)
> java.util.concurrent.RejectedExecutionException: Task 
> scala.concurrent.impl.CallbackRunnable@59428a1 rejected from 
> java.util.concurrent.ThreadPoolExecutor@4096a677[Shutting down, pool size = 
> 1, active threads = 1, queued tasks = 0, completed tasks = 0]
>   at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
>   at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
>   at 
> java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668)
>   at 
> scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:134)
>   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284)
>   at scala.concurrent.Promise.complete(Promise.scala:49)
>   at scala.concurrent.Promise.complete$(Promise.scala:48)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:183)
>   at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
>   at 
> scala.concurrent.BatchingExecutor$Batch.processBatch$1(BatchingExecutor.scala:63)
>   at 
> scala.concurrent.BatchingExecutor$Batch.$anonfun$run$1(BatchingExecutor.scala:78)
>   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
>   at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81)
>   at 
> scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:55)
>   at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:870)
>   at scala.concurrent.BatchingExecutor.execute(BatchingExecutor.scala:106)
>   at 
> scala.concurrent.BatchingExecutor.execute$(BatchingExecutor.scala:103)
>   at 
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:868)
>   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284)
>   at scala.concurrent.Promise.complete(Promise.scala:49)
>   at scala.concurrent.Promise.complete$(Promise.scala:48)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:183)
>   at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> java.util.concurrent.RejectedExecutionException: Task 
> scala.concurrent.impl.CallbackRunnable@40a5bf17 rejected from 
> java.util.concurrent.ThreadPoolExecutor@5a73967[Shutting down, pool size = 1, 
> active threads = 1, queued tasks = 0, completed tasks = 0]
>   at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
>   at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
>   at 
> 

[jira] [Assigned] (SPARK-26820) Issue Error/Warning when Hint is not applicable

2019-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26820:


Assignee: (was: Apache Spark)

> Issue Error/Warning when Hint is not applicable
> ---
>
> Key: SPARK-26820
> URL: https://issues.apache.org/jira/browse/SPARK-26820
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Major
>
> We should issue an error or a warning when the HINT is not applicable. This 
> should be configurable. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26820) Issue Error/Warning when Hint is not applicable

2019-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26820:


Assignee: Apache Spark

> Issue Error/Warning when Hint is not applicable
> ---
>
> Key: SPARK-26820
> URL: https://issues.apache.org/jira/browse/SPARK-26820
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>Priority: Major
>
> We should issue an error or a warning when the HINT is not applicable. This 
> should be configurable. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27124) Expose org.apache.spark.sql.avro.SchemaConverters as developer API

2019-03-11 Thread Gabor Somogyi (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789494#comment-16789494
 ] 

Gabor Somogyi commented on SPARK-27124:
---

Yeah, the python implementation would use the same what you've shown and this 
would be a helper to make life easier.
At the moment SchemaConverters not mentioned in the doc at all. Do you think it 
worth to add documentation to explain this area a bit more?
(I know it's not API but since it's hard to come up with proper schema 
especially for beginners peoples somehow forced to use it)


> Expose org.apache.spark.sql.avro.SchemaConverters as developer API
> --
>
> Key: SPARK-27124
> URL: https://issues.apache.org/jira/browse/SPARK-27124
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Priority: Minor
>
> org.apache.spark.sql.avro.SchemaConverters provides extremely useful APIs to 
> convert schema between Spark SQL and avro. This is reachable from scala side 
> but not from pyspark. I suggest to add this as a developer API to ease 
> development for pyspark users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26152) Flaky test: BroadcastSuite

2019-03-11 Thread Ajith S (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated SPARK-26152:

Attachment: Screenshot from 2019-03-11 17-03-40.png

> Flaky test: BroadcastSuite
> --
>
> Key: SPARK-26152
> URL: https://issues.apache.org/jira/browse/SPARK-26152
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Critical
> Attachments: Screenshot from 2019-03-11 17-03-40.png
>
>
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/5627
>  (2018-11-16)
> {code}
> BroadcastSuite:
> - Using TorrentBroadcast locally
> - Accessing TorrentBroadcast variables from multiple threads
> - Accessing TorrentBroadcast variables in a local cluster (encryption = off)
> java.util.concurrent.RejectedExecutionException: Task 
> scala.concurrent.impl.CallbackRunnable@59428a1 rejected from 
> java.util.concurrent.ThreadPoolExecutor@4096a677[Shutting down, pool size = 
> 1, active threads = 1, queued tasks = 0, completed tasks = 0]
>   at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
>   at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
>   at 
> java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668)
>   at 
> scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:134)
>   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284)
>   at scala.concurrent.Promise.complete(Promise.scala:49)
>   at scala.concurrent.Promise.complete$(Promise.scala:48)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:183)
>   at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
>   at 
> scala.concurrent.BatchingExecutor$Batch.processBatch$1(BatchingExecutor.scala:63)
>   at 
> scala.concurrent.BatchingExecutor$Batch.$anonfun$run$1(BatchingExecutor.scala:78)
>   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
>   at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81)
>   at 
> scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:55)
>   at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:870)
>   at scala.concurrent.BatchingExecutor.execute(BatchingExecutor.scala:106)
>   at 
> scala.concurrent.BatchingExecutor.execute$(BatchingExecutor.scala:103)
>   at 
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:868)
>   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284)
>   at scala.concurrent.Promise.complete(Promise.scala:49)
>   at scala.concurrent.Promise.complete$(Promise.scala:48)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:183)
>   at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> java.util.concurrent.RejectedExecutionException: Task 
> scala.concurrent.impl.CallbackRunnable@40a5bf17 rejected from 
> java.util.concurrent.ThreadPoolExecutor@5a73967[Shutting down, pool size = 1, 
> active threads = 1, queued tasks = 0, completed tasks = 0]
>   at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
>   at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
>   at 
> java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668)
>   at 
> 

[jira] [Assigned] (SPARK-27127) Deduplicate codes reading from/writing to unsafe object

2019-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27127:


Assignee: (was: Apache Spark)

> Deduplicate codes reading from/writing to unsafe object
> ---
>
> Key: SPARK-27127
> URL: https://issues.apache.org/jira/browse/SPARK-27127
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Priority: Minor
>
> IntelliJ found lots of code duplication among some Unsafe classes, and code 
> duplications occur from reading from/writing to unsafe object. This issue 
> tracks the efforts to deduplicate code block around reading from/writing to 
> unsafe object among various classes.
> This would help not only reducing codes, but also let the way of dealing with 
> unsafe object consistently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27127) Deduplicate codes reading from/writing to unsafe object

2019-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27127:


Assignee: Apache Spark

> Deduplicate codes reading from/writing to unsafe object
> ---
>
> Key: SPARK-27127
> URL: https://issues.apache.org/jira/browse/SPARK-27127
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Apache Spark
>Priority: Minor
>
> IntelliJ found lots of code duplication among some Unsafe classes, and code 
> duplications occur from reading from/writing to unsafe object. This issue 
> tracks the efforts to deduplicate code block around reading from/writing to 
> unsafe object among various classes.
> This would help not only reducing codes, but also let the way of dealing with 
> unsafe object consistently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27127) Deduplicate codes reading from/writing to unsafe object

2019-03-11 Thread Jungtaek Lim (JIRA)
Jungtaek Lim created SPARK-27127:


 Summary: Deduplicate codes reading from/writing to unsafe object
 Key: SPARK-27127
 URL: https://issues.apache.org/jira/browse/SPARK-27127
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Jungtaek Lim


IntelliJ found lots of code duplication among some Unsafe classes, and code 
duplications occur from reading from/writing to unsafe object. This issue 
tracks the efforts to deduplicate code block around reading from/writing to 
unsafe object among various classes.

This would help not only reducing codes, but also let the way of dealing with 
unsafe object consistently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-26152) Flaky test: BroadcastSuite

2019-03-11 Thread Ajith S (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789462#comment-16789462
 ] 

Ajith S edited comment on SPARK-26152 at 3/11/19 12:07 PM:
---

I encountered this issue on latest master branch and see that its the race 
between *org.apache.spark.deploy.DeployMessages.WorkDirCleanup* event and  
*org.apache.spark.deploy.worker.Worker#onStop*. Here its possible that while 
the WorkDirCleanup event is being processed, 
*org.apache.spark.deploy.worker.Worker#cleanupThreadExecutor* was shutdown. 
hence any submission after ThreadPoolExecutor will result in 
*java.util.concurrent.RejectedExecutionException*

Attaching the debug snapshot of same. I would like to work on this. Please 
suggest


was (Author: ajithshetty):
I encountered this issue and see that its the race between 
*org.apache.spark.deploy.DeployMessages.WorkDirCleanup* event and  
*org.apache.spark.deploy.worker.Worker#onStop*. Here its possible that while 
the WorkDirCleanup event is being processed, 
*org.apache.spark.deploy.worker.Worker#cleanupThreadExecutor* was shutdown. 
hence any submission after ThreadPoolExecutor will result in 
*java.util.concurrent.RejectedExecutionException*

Attaching the debug snapshot of same. I would like to work on this. Please 
suggest

> Flaky test: BroadcastSuite
> --
>
> Key: SPARK-26152
> URL: https://issues.apache.org/jira/browse/SPARK-26152
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Critical
>
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/5627
>  (2018-11-16)
> {code}
> BroadcastSuite:
> - Using TorrentBroadcast locally
> - Accessing TorrentBroadcast variables from multiple threads
> - Accessing TorrentBroadcast variables in a local cluster (encryption = off)
> java.util.concurrent.RejectedExecutionException: Task 
> scala.concurrent.impl.CallbackRunnable@59428a1 rejected from 
> java.util.concurrent.ThreadPoolExecutor@4096a677[Shutting down, pool size = 
> 1, active threads = 1, queued tasks = 0, completed tasks = 0]
>   at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
>   at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
>   at 
> java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668)
>   at 
> scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:134)
>   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284)
>   at scala.concurrent.Promise.complete(Promise.scala:49)
>   at scala.concurrent.Promise.complete$(Promise.scala:48)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:183)
>   at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
>   at 
> scala.concurrent.BatchingExecutor$Batch.processBatch$1(BatchingExecutor.scala:63)
>   at 
> scala.concurrent.BatchingExecutor$Batch.$anonfun$run$1(BatchingExecutor.scala:78)
>   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
>   at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81)
>   at 
> scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:55)
>   at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:870)
>   at scala.concurrent.BatchingExecutor.execute(BatchingExecutor.scala:106)
>   at 
> scala.concurrent.BatchingExecutor.execute$(BatchingExecutor.scala:103)
>   at 
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:868)
>   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284)
>   at scala.concurrent.Promise.complete(Promise.scala:49)
>   at scala.concurrent.Promise.complete$(Promise.scala:48)
>   at 
> 

[jira] [Comment Edited] (SPARK-26152) Flaky test: BroadcastSuite

2019-03-11 Thread Ajith S (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789462#comment-16789462
 ] 

Ajith S edited comment on SPARK-26152 at 3/11/19 12:07 PM:
---

I encountered this issue and see that its the race between 
*org.apache.spark.deploy.DeployMessages.WorkDirCleanup* event and  
*org.apache.spark.deploy.worker.Worker#onStop*. Here its possible that while 
the WorkDirCleanup event is being processed, 
*org.apache.spark.deploy.worker.Worker#cleanupThreadExecutor* was shutdown. 
hence any submission after ThreadPoolExecutor will result in 
*java.util.concurrent.RejectedExecutionException*

Attaching the debug snapshot of same. I would like to work on this. Please 
suggest


was (Author: ajithshetty):
I encountered this issue and see that its the race between 
``org.apache.spark.deploy.DeployMessages.WorkDirCleanup`` event and onStop call 
of org.apache.spark.deploy.worker.Worker#onStop. Here its possible that while 
the WorkDirCleanup event is being processed, 
org.apache.spark.deploy.worker.Worker#cleanupThreadExecutor was shutdown

Attaching the debug snapshot of same. I would like to work on this. Please 
suggest

> Flaky test: BroadcastSuite
> --
>
> Key: SPARK-26152
> URL: https://issues.apache.org/jira/browse/SPARK-26152
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Critical
>
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/5627
>  (2018-11-16)
> {code}
> BroadcastSuite:
> - Using TorrentBroadcast locally
> - Accessing TorrentBroadcast variables from multiple threads
> - Accessing TorrentBroadcast variables in a local cluster (encryption = off)
> java.util.concurrent.RejectedExecutionException: Task 
> scala.concurrent.impl.CallbackRunnable@59428a1 rejected from 
> java.util.concurrent.ThreadPoolExecutor@4096a677[Shutting down, pool size = 
> 1, active threads = 1, queued tasks = 0, completed tasks = 0]
>   at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
>   at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
>   at 
> java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668)
>   at 
> scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:134)
>   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284)
>   at scala.concurrent.Promise.complete(Promise.scala:49)
>   at scala.concurrent.Promise.complete$(Promise.scala:48)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:183)
>   at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
>   at 
> scala.concurrent.BatchingExecutor$Batch.processBatch$1(BatchingExecutor.scala:63)
>   at 
> scala.concurrent.BatchingExecutor$Batch.$anonfun$run$1(BatchingExecutor.scala:78)
>   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
>   at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81)
>   at 
> scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:55)
>   at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:870)
>   at scala.concurrent.BatchingExecutor.execute(BatchingExecutor.scala:106)
>   at 
> scala.concurrent.BatchingExecutor.execute$(BatchingExecutor.scala:103)
>   at 
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:868)
>   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284)
>   at scala.concurrent.Promise.complete(Promise.scala:49)
>   at scala.concurrent.Promise.complete$(Promise.scala:48)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:183)
>   at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)
>   at 

[jira] [Commented] (SPARK-26152) Flaky test: BroadcastSuite

2019-03-11 Thread Ajith S (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789462#comment-16789462
 ] 

Ajith S commented on SPARK-26152:
-

I encountered this issue and see that its the race between 
``org.apache.spark.deploy.DeployMessages.WorkDirCleanup`` event and onStop call 
of org.apache.spark.deploy.worker.Worker#onStop. Here its possible that while 
the WorkDirCleanup event is being processed, 
org.apache.spark.deploy.worker.Worker#cleanupThreadExecutor was shutdown

Attaching the debug snapshot of same. I would like to work on this. Please 
suggest

> Flaky test: BroadcastSuite
> --
>
> Key: SPARK-26152
> URL: https://issues.apache.org/jira/browse/SPARK-26152
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Critical
>
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/5627
>  (2018-11-16)
> {code}
> BroadcastSuite:
> - Using TorrentBroadcast locally
> - Accessing TorrentBroadcast variables from multiple threads
> - Accessing TorrentBroadcast variables in a local cluster (encryption = off)
> java.util.concurrent.RejectedExecutionException: Task 
> scala.concurrent.impl.CallbackRunnable@59428a1 rejected from 
> java.util.concurrent.ThreadPoolExecutor@4096a677[Shutting down, pool size = 
> 1, active threads = 1, queued tasks = 0, completed tasks = 0]
>   at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
>   at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
>   at 
> java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668)
>   at 
> scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:134)
>   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284)
>   at scala.concurrent.Promise.complete(Promise.scala:49)
>   at scala.concurrent.Promise.complete$(Promise.scala:48)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:183)
>   at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
>   at 
> scala.concurrent.BatchingExecutor$Batch.processBatch$1(BatchingExecutor.scala:63)
>   at 
> scala.concurrent.BatchingExecutor$Batch.$anonfun$run$1(BatchingExecutor.scala:78)
>   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
>   at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81)
>   at 
> scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:55)
>   at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:870)
>   at scala.concurrent.BatchingExecutor.execute(BatchingExecutor.scala:106)
>   at 
> scala.concurrent.BatchingExecutor.execute$(BatchingExecutor.scala:103)
>   at 
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:868)
>   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284)
>   at scala.concurrent.Promise.complete(Promise.scala:49)
>   at scala.concurrent.Promise.complete$(Promise.scala:48)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:183)
>   at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> java.util.concurrent.RejectedExecutionException: Task 
> scala.concurrent.impl.CallbackRunnable@40a5bf17 rejected from 
> java.util.concurrent.ThreadPoolExecutor@5a73967[Shutting down, pool size = 1, 
> active threads = 1, queued tasks = 0, completed tasks = 0]
>   at 
> 

[jira] [Assigned] (SPARK-27126) Consolidate Scala and Java type deserializerFor

2019-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27126:


Assignee: (was: Apache Spark)

> Consolidate Scala and Java type deserializerFor
> ---
>
> Key: SPARK-27126
> URL: https://issues.apache.org/jira/browse/SPARK-27126
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Liang-Chi Hsieh
>Priority: Major
>
> There are some duplicates between {{ScalaReflection}} and 
> {{JavaTypeInference}}. Ideally we should consolidate them.
> This proposes to consolidate {{deserializerFor}} between {{ScalaReflection}} 
> and {{JavaTypeInference}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27126) Consolidate Scala and Java type deserializerFor

2019-03-11 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-27126:
---

 Summary: Consolidate Scala and Java type deserializerFor
 Key: SPARK-27126
 URL: https://issues.apache.org/jira/browse/SPARK-27126
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Liang-Chi Hsieh


There are some duplicates between {{ScalaReflection}} and 
{{JavaTypeInference}}. Ideally we should consolidate them.

This proposes to consolidate {{deserializerFor}} between {{ScalaReflection}} 
and {{JavaTypeInference}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27126) Consolidate Scala and Java type deserializerFor

2019-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27126:


Assignee: Apache Spark

> Consolidate Scala and Java type deserializerFor
> ---
>
> Key: SPARK-27126
> URL: https://issues.apache.org/jira/browse/SPARK-27126
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>Priority: Major
>
> There are some duplicates between {{ScalaReflection}} and 
> {{JavaTypeInference}}. Ideally we should consolidate them.
> This proposes to consolidate {{deserializerFor}} between {{ScalaReflection}} 
> and {{JavaTypeInference}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27124) Expose org.apache.spark.sql.avro.SchemaConverters as developer API

2019-03-11 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789388#comment-16789388
 ] 

Hyukjin Kwon commented on SPARK-27124:
--

The way of reaching it will be same as the Python implementation. Py4J allows 
JVM access fully. Of course, it's hacky - I wasn't trying to say this is an 
official way of using it.

{code}
>>> spark._jvm.org.apache.spark.sql.avro.SchemaConverters.toSqlType(spark._jvm.org.apache.avro.Schema.Parser().parse("""{"type":
>>>  "int", "name": "fieldA"}""")).toString()
u'SchemaType(IntegerType,false)'
{code}

Usually the signatures are matched between Scala and Python sides. I suspect 
that you'd open a function that takes JSON-formatted schema in Avro in PySpark 
side, right?


> Expose org.apache.spark.sql.avro.SchemaConverters as developer API
> --
>
> Key: SPARK-27124
> URL: https://issues.apache.org/jira/browse/SPARK-27124
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Priority: Minor
>
> org.apache.spark.sql.avro.SchemaConverters provides extremely useful APIs to 
> convert schema between Spark SQL and avro. This is reachable from scala side 
> but not from pyspark. I suggest to add this as a developer API to ease 
> development for pyspark users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-26985) Test "access only some column of the all of columns " fails on big endian

2019-03-11 Thread Anuja Jakhade (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anuja Jakhade reopened SPARK-26985:
---

I did tried with OpenJDK. However same behavior is observed. 

The test fails with the same error. 

> Test "access only some column of the all of columns " fails on big endian
> -
>
> Key: SPARK-26985
> URL: https://issues.apache.org/jira/browse/SPARK-26985
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
> Environment: Linux Ubuntu 16.04 
> openjdk version "1.8.0_202"
> OpenJDK Runtime Environment (build 1.8.0_202-b08)
> Eclipse OpenJ9 VM (build openj9-0.12.1, JRE 1.8.0 64-Bit Compressed 
> References 20190205_218 (JIT enabled, AOT enabled)
> OpenJ9 - 90dd8cb40
> OMR - d2f4534b
> JCL - d002501a90 based on jdk8u202-b08)
>  
>Reporter: Anuja Jakhade
>Priority: Major
>  Labels: BigEndian
> Attachments: DataFrameTungstenSuite.txt, 
> InMemoryColumnarQuerySuite.txt, access only some column of the all of 
> columns.txt
>
>
> While running tests on Apache Spark v2.3.2 with AdoptJDK on big endian, I am 
> observing test failures for 2 Suites of Project SQL.
>  1. InMemoryColumnarQuerySuite
>  2. DataFrameTungstenSuite
>  In both the cases test "access only some column of the all of columns" fails 
> due to mismatch in the final assert.
> Observed that the data obtained after df.cache() is causing the error. Please 
> find attached the log with the details. 
> cache() works perfectly fine if double and  float values are not in picture.
> Inside test !!- access only some column of the all of columns *** FAILED 
> ***



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26940) Observed greater deviation on big endian platform for SingletonReplSuite test case

2019-03-11 Thread Anuja Jakhade (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789393#comment-16789393
 ] 

Anuja Jakhade commented on SPARK-26940:
---

I did tried with OpenJDK. However same behavior is observed. 

The test fails with the same error. 

> Observed greater deviation on big endian platform for SingletonReplSuite test 
> case
> --
>
> Key: SPARK-26940
> URL: https://issues.apache.org/jira/browse/SPARK-26940
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.3.2
> Environment: Ubuntu 16.04 LTS
> openjdk version "1.8.0_202"
> OpenJDK Runtime Environment (build 1.8.0_202-b08)
> Eclipse OpenJ9 VM (build openj9-0.12.1, JRE 1.8.0 Linux (JIT enabled, AOT 
> enabled)
> OpenJ9 - 90dd8cb40
> OMR - d2f4534b
> JCL - d002501a90 based on jdk8u202-b08)
>Reporter: Anuja Jakhade
>Priority: Minor
>  Labels: BigEndian
> Attachments: failure_log.txt
>
>
> I have built Apache Spark v2.3.2 on Big Endian platform with AdoptJDK OpenJ9 
> 1.8.0_202.
> My build is successful. However while running the scala tests of "*Spark 
> Project REPL*" module, I am facing failures at SingletonReplSuite with error 
> log as attached.
> The deviation observed on big endian is greater than the acceptable deviation 
> 0.2.
> How efficient is it to increase the deviation defined in 
> SingletonReplSuite.scala
> Can this be fixed? 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-26940) Observed greater deviation on big endian platform for SingletonReplSuite test case

2019-03-11 Thread Anuja Jakhade (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anuja Jakhade reopened SPARK-26940:
---

I did tried with OpenJDK. However same behavior is observed. 

The test fails with the same error. 

> Observed greater deviation on big endian platform for SingletonReplSuite test 
> case
> --
>
> Key: SPARK-26940
> URL: https://issues.apache.org/jira/browse/SPARK-26940
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.3.2
> Environment: Ubuntu 16.04 LTS
> openjdk version "1.8.0_202"
> OpenJDK Runtime Environment (build 1.8.0_202-b08)
> Eclipse OpenJ9 VM (build openj9-0.12.1, JRE 1.8.0 Linux (JIT enabled, AOT 
> enabled)
> OpenJ9 - 90dd8cb40
> OMR - d2f4534b
> JCL - d002501a90 based on jdk8u202-b08)
>Reporter: Anuja Jakhade
>Priority: Minor
>  Labels: BigEndian
> Attachments: failure_log.txt
>
>
> I have built Apache Spark v2.3.2 on Big Endian platform with AdoptJDK OpenJ9 
> 1.8.0_202.
> My build is successful. However while running the scala tests of "*Spark 
> Project REPL*" module, I am facing failures at SingletonReplSuite with error 
> log as attached.
> The deviation observed on big endian is greater than the acceptable deviation 
> 0.2.
> How efficient is it to increase the deviation defined in 
> SingletonReplSuite.scala
> Can this be fixed? 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26985) Test "access only some column of the all of columns " fails on big endian

2019-03-11 Thread Anuja Jakhade (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789392#comment-16789392
 ] 

Anuja Jakhade commented on SPARK-26985:
---

I did tried with OpenJDK. However same behavior is observed. 

The test fails with the same error. 

> Test "access only some column of the all of columns " fails on big endian
> -
>
> Key: SPARK-26985
> URL: https://issues.apache.org/jira/browse/SPARK-26985
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
> Environment: Linux Ubuntu 16.04 
> openjdk version "1.8.0_202"
> OpenJDK Runtime Environment (build 1.8.0_202-b08)
> Eclipse OpenJ9 VM (build openj9-0.12.1, JRE 1.8.0 64-Bit Compressed 
> References 20190205_218 (JIT enabled, AOT enabled)
> OpenJ9 - 90dd8cb40
> OMR - d2f4534b
> JCL - d002501a90 based on jdk8u202-b08)
>  
>Reporter: Anuja Jakhade
>Priority: Major
>  Labels: BigEndian
> Attachments: DataFrameTungstenSuite.txt, 
> InMemoryColumnarQuerySuite.txt, access only some column of the all of 
> columns.txt
>
>
> While running tests on Apache Spark v2.3.2 with AdoptJDK on big endian, I am 
> observing test failures for 2 Suites of Project SQL.
>  1. InMemoryColumnarQuerySuite
>  2. DataFrameTungstenSuite
>  In both the cases test "access only some column of the all of columns" fails 
> due to mismatch in the final assert.
> Observed that the data obtained after df.cache() is causing the error. Please 
> find attached the log with the details. 
> cache() works perfectly fine if double and  float values are not in picture.
> Inside test !!- access only some column of the all of columns *** FAILED 
> ***



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27124) Expose org.apache.spark.sql.avro.SchemaConverters as developer API

2019-03-11 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789391#comment-16789391
 ] 

Hyukjin Kwon commented on SPARK-27124:
--

I think we have workarounds, and looks difficult to match the signatures. 
Usually PySpark, R and Scala side have matched APIs. I am not sure if this is 
absolutely important to get over all those concerns.

> Expose org.apache.spark.sql.avro.SchemaConverters as developer API
> --
>
> Key: SPARK-27124
> URL: https://issues.apache.org/jira/browse/SPARK-27124
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Priority: Minor
>
> org.apache.spark.sql.avro.SchemaConverters provides extremely useful APIs to 
> convert schema between Spark SQL and avro. This is reachable from scala side 
> but not from pyspark. I suggest to add this as a developer API to ease 
> development for pyspark users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-27124) Expose org.apache.spark.sql.avro.SchemaConverters as developer API

2019-03-11 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789388#comment-16789388
 ] 

Hyukjin Kwon edited comment on SPARK-27124 at 3/11/19 10:39 AM:


The way of reaching it will be same as the Python implementation (the POC you 
worked on). Py4J allows JVM access fully. Of course, it's hacky - I wasn't 
trying to say this is an official way of using it.

{code}
>>> spark._jvm.org.apache.spark.sql.avro.SchemaConverters.toSqlType(spark._jvm.org.apache.avro.Schema.Parser().parse("""{"type":
>>>  "int", "name": "fieldA"}""")).toString()
u'SchemaType(IntegerType,false)'
{code}

Usually the signatures are matched between Scala and Python sides. I suspect 
that you'd open a function that takes JSON-formatted schema in Avro in PySpark 
side, right?



was (Author: hyukjin.kwon):
The way of reaching it will be same as the Python implementation. Py4J allows 
JVM access fully. Of course, it's hacky - I wasn't trying to say this is an 
official way of using it.

{code}
>>> spark._jvm.org.apache.spark.sql.avro.SchemaConverters.toSqlType(spark._jvm.org.apache.avro.Schema.Parser().parse("""{"type":
>>>  "int", "name": "fieldA"}""")).toString()
u'SchemaType(IntegerType,false)'
{code}

Usually the signatures are matched between Scala and Python sides. I suspect 
that you'd open a function that takes JSON-formatted schema in Avro in PySpark 
side, right?


> Expose org.apache.spark.sql.avro.SchemaConverters as developer API
> --
>
> Key: SPARK-27124
> URL: https://issues.apache.org/jira/browse/SPARK-27124
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Priority: Minor
>
> org.apache.spark.sql.avro.SchemaConverters provides extremely useful APIs to 
> convert schema between Spark SQL and avro. This is reachable from scala side 
> but not from pyspark. I suggest to add this as a developer API to ease 
> development for pyspark users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27124) Expose org.apache.spark.sql.avro.SchemaConverters as developer API

2019-03-11 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789366#comment-16789366
 ] 

Hyukjin Kwon commented on SPARK-27124:
--

To me, I am not sure. How would you map Avro schema in PySpark? This is 
reachable via Py4J FWIW. Also, I'm personally skeptical about exposing those as 
APIs in general if there aren't strong usecases. 

> Expose org.apache.spark.sql.avro.SchemaConverters as developer API
> --
>
> Key: SPARK-27124
> URL: https://issues.apache.org/jira/browse/SPARK-27124
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Priority: Minor
>
> org.apache.spark.sql.avro.SchemaConverters provides extremely useful APIs to 
> convert schema between Spark SQL and avro. This is reachable from scala side 
> but not from pyspark. I suggest to add this as a developer API to ease 
> development for pyspark users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27125) Add test suite for sql execution page

2019-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27125:


Assignee: Apache Spark

> Add test suite for sql execution page
> -
>
> Key: SPARK-27125
> URL: https://issues.apache.org/jira/browse/SPARK-27125
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 3.0.0
>Reporter: Shahid K I
>Assignee: Apache Spark
>Priority: Minor
>
> Add test suite for sql execution page



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27124) Expose org.apache.spark.sql.avro.SchemaConverters as developer API

2019-03-11 Thread Gabor Somogyi (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789376#comment-16789376
 ] 

Gabor Somogyi commented on SPARK-27124:
---

{quote}This is reachable via Py4J FWIW{quote}

How exactly? Maybe just doc has to be extended to highlight this functionality.

>From use-case perspective I've seen many users struggling with avro and report 
>problems but most of the time it was caused by wrong schema.
I would like to mitigate this somehow.

> Expose org.apache.spark.sql.avro.SchemaConverters as developer API
> --
>
> Key: SPARK-27124
> URL: https://issues.apache.org/jira/browse/SPARK-27124
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Priority: Minor
>
> org.apache.spark.sql.avro.SchemaConverters provides extremely useful APIs to 
> convert schema between Spark SQL and avro. This is reachable from scala side 
> but not from pyspark. I suggest to add this as a developer API to ease 
> development for pyspark users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27125) Add test suite for sql execution page

2019-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27125:


Assignee: (was: Apache Spark)

> Add test suite for sql execution page
> -
>
> Key: SPARK-27125
> URL: https://issues.apache.org/jira/browse/SPARK-27125
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 3.0.0
>Reporter: Shahid K I
>Priority: Minor
>
> Add test suite for sql execution page



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26879) Inconsistency in default column names for functions like inline and stack

2019-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26879:


Assignee: (was: Apache Spark)

> Inconsistency in default column names for functions like inline and stack
> -
>
> Key: SPARK-26879
> URL: https://issues.apache.org/jira/browse/SPARK-26879
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Jash Gala
>Priority: Minor
>
> In the Spark SQL functions definitions, `inline` uses col1, col2, etc. (i.e. 
> 1-indexed columns), while `stack` uses col0, col1, col2, etc. (i.e. 0-indexed 
> columns).
> {code:title=spark-shell|borderStyle=solid}
> scala> spark.sql("SELECT stack(2, 1, 2, 3)").show
> +++
> |col0|col1|
> +++
> |   1|   2|
> |   3|null|
> +++
> scala>  spark.sql("SELECT inline_outer(array(struct(1, 'a'), struct(2, 
> 'b')))").show
> +++
> |col1|col2|
> +++
> |   1|   a|
> |   2|   b|
> +++
> {code}
> This feels like an issue with consistency. As discussed on [PR 
> #23748|https://github.com/apache/spark/pull/23748], it might be a good idea 
> to standardize this to something specific (like zero-based indexing) for 
> these and other similar functions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26879) Inconsistency in default column names for functions like inline and stack

2019-03-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26879:


Assignee: Apache Spark

> Inconsistency in default column names for functions like inline and stack
> -
>
> Key: SPARK-26879
> URL: https://issues.apache.org/jira/browse/SPARK-26879
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Jash Gala
>Assignee: Apache Spark
>Priority: Minor
>
> In the Spark SQL functions definitions, `inline` uses col1, col2, etc. (i.e. 
> 1-indexed columns), while `stack` uses col0, col1, col2, etc. (i.e. 0-indexed 
> columns).
> {code:title=spark-shell|borderStyle=solid}
> scala> spark.sql("SELECT stack(2, 1, 2, 3)").show
> +++
> |col0|col1|
> +++
> |   1|   2|
> |   3|null|
> +++
> scala>  spark.sql("SELECT inline_outer(array(struct(1, 'a'), struct(2, 
> 'b')))").show
> +++
> |col1|col2|
> +++
> |   1|   a|
> |   2|   b|
> +++
> {code}
> This feels like an issue with consistency. As discussed on [PR 
> #23748|https://github.com/apache/spark/pull/23748], it might be a good idea 
> to standardize this to something specific (like zero-based indexing) for 
> these and other similar functions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27125) Add test suite for sql execution page

2019-03-11 Thread Shahid K I (JIRA)
Shahid K I created SPARK-27125:
--

 Summary: Add test suite for sql execution page
 Key: SPARK-27125
 URL: https://issues.apache.org/jira/browse/SPARK-27125
 Project: Spark
  Issue Type: Improvement
  Components: SQL, Web UI
Affects Versions: 3.0.0
Reporter: Shahid K I


Add test suite for sql execution page



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27125) Add test suite for sql execution page

2019-03-11 Thread Shahid K I (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789367#comment-16789367
 ] 

Shahid K I commented on SPARK-27125:


I will raise a PR

> Add test suite for sql execution page
> -
>
> Key: SPARK-27125
> URL: https://issues.apache.org/jira/browse/SPARK-27125
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 3.0.0
>Reporter: Shahid K I
>Priority: Minor
>
> Add test suite for sql execution page



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27124) Expose org.apache.spark.sql.avro.SchemaConverters as developer API

2019-03-11 Thread Gabor Somogyi (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789356#comment-16789356
 ] 

Gabor Somogyi commented on SPARK-27124:
---

I've already put together a POC but interested in what do you think 
[~hyukjin.kwon] [~Gengliang.Wang] [~cloud_fan]?


> Expose org.apache.spark.sql.avro.SchemaConverters as developer API
> --
>
> Key: SPARK-27124
> URL: https://issues.apache.org/jira/browse/SPARK-27124
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Priority: Minor
>
> org.apache.spark.sql.avro.SchemaConverters provides extremely useful APIs to 
> convert schema between Spark SQL and avro. This is reachable from scala side 
> but not from pyspark. I suggest to add this as a developer API to ease 
> development for pyspark users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27124) Expose org.apache.spark.sql.avro.SchemaConverters as developer API

2019-03-11 Thread Gabor Somogyi (JIRA)
Gabor Somogyi created SPARK-27124:
-

 Summary: Expose org.apache.spark.sql.avro.SchemaConverters as 
developer API
 Key: SPARK-27124
 URL: https://issues.apache.org/jira/browse/SPARK-27124
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, SQL
Affects Versions: 3.0.0
Reporter: Gabor Somogyi


org.apache.spark.sql.avro.SchemaConverters provides extremely useful APIs to 
convert schema between Spark SQL and avro. This is reachable from scala side 
but not from pyspark. I suggest to add this as a developer API to ease 
development for pyspark users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20075) Support classifier, packaging in Maven coordinates

2019-03-11 Thread Gabor Somogyi (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789341#comment-16789341
 ] 

Gabor Somogyi commented on SPARK-20075:
---

[~kabhwan] cool. I think we both can give a try and share the experience and 
maybe together we can sort it out. Will see...

> Support classifier, packaging in Maven coordinates
> --
>
> Key: SPARK-20075
> URL: https://issues.apache.org/jira/browse/SPARK-20075
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell, Spark Submit
>Affects Versions: 2.1.0
>Reporter: Sean Owen
>Priority: Minor
>
> Currently, it's possible to add dependencies to an app using its Maven 
> coordinates on the command line: {{group:artifact:version}}. However, really 
> Maven coordinates are 5-dimensional: 
> {{group:artifact:packaging:classifier:version}}. In some rare but real cases 
> it's important to be able to specify the classifier. And while we're at it 
> why not try to support packaging?
> I have a WIP PR that I'll post soon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20075) Support classifier, packaging in Maven coordinates

2019-03-11 Thread Jungtaek Lim (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789338#comment-16789338
 ] 

Jungtaek Lim commented on SPARK-20075:
--

[~gsomogyi]

Yeah, I have experience with Aether and it worked like a charm, but according 
to the long discussion in Sean's PR, looks like not easy with Ivy. I'll also 
give it a try, but may not also be easy to me since some semantics between 
Maven and Ivy are different.

> Support classifier, packaging in Maven coordinates
> --
>
> Key: SPARK-20075
> URL: https://issues.apache.org/jira/browse/SPARK-20075
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell, Spark Submit
>Affects Versions: 2.1.0
>Reporter: Sean Owen
>Priority: Minor
>
> Currently, it's possible to add dependencies to an app using its Maven 
> coordinates on the command line: {{group:artifact:version}}. However, really 
> Maven coordinates are 5-dimensional: 
> {{group:artifact:packaging:classifier:version}}. In some rare but real cases 
> it's important to be able to specify the classifier. And while we're at it 
> why not try to support packaging?
> I have a WIP PR that I'll post soon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20075) Support classifier, packaging in Maven coordinates

2019-03-11 Thread Gabor Somogyi (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789329#comment-16789329
 ] 

Gabor Somogyi commented on SPARK-20075:
---

[~kabhwan] To continue the discussion started on SPARK-27044 as I was 
proceeding with the implementation I've faced similar issues just like Sean.
Not telling I'm giving up just want to highlight it's not a trivial issue so 
feel free to create PR if you've solved it. If someone can go through and 
remove all the obstacles well deserves a commit.
In the meantime trying out several things...


> Support classifier, packaging in Maven coordinates
> --
>
> Key: SPARK-20075
> URL: https://issues.apache.org/jira/browse/SPARK-20075
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell, Spark Submit
>Affects Versions: 2.1.0
>Reporter: Sean Owen
>Priority: Minor
>
> Currently, it's possible to add dependencies to an app using its Maven 
> coordinates on the command line: {{group:artifact:version}}. However, really 
> Maven coordinates are 5-dimensional: 
> {{group:artifact:packaging:classifier:version}}. In some rare but real cases 
> it's important to be able to specify the classifier. And while we're at it 
> why not try to support packaging?
> I have a WIP PR that I'll post soon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27096) Reconcile the join type support between data frame and sql interface

2019-03-11 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-27096.
-
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 23982
[https://github.com/apache/spark/pull/23982]

> Reconcile the join type support between data frame and sql interface
> 
>
> Key: SPARK-27096
> URL: https://issues.apache.org/jira/browse/SPARK-27096
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Dilip Biswal
>Assignee: Dilip Biswal
>Priority: Minor
> Fix For: 3.0.0
>
>
> Currently in the grammar file, we have the joinType rule defined as following 
> :
> {code:java}
> joinType
>  : INNER?
>  
>  
>  | LEFT SEMI
>  | LEFT? ANTI
>  ;
> {code:java}
>  {code}
> The keyword LEFT is optional for ANTI join even though its not optional for 
> SEMI join. When
>  using data frame interface join type "anti" is not allowed. The allowed 
> types are "left_anti" or 
>  "leftanti" for anti joins. We should also make LEFT optional for SEMI join 
> and allow "semi" and "anti" join types from data frame.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27096) Reconcile the join type support between data frame and sql interface

2019-03-11 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-27096:
---

Assignee: Dilip Biswal

> Reconcile the join type support between data frame and sql interface
> 
>
> Key: SPARK-27096
> URL: https://issues.apache.org/jira/browse/SPARK-27096
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Dilip Biswal
>Assignee: Dilip Biswal
>Priority: Minor
>
> Currently in the grammar file, we have the joinType rule defined as following 
> :
> {code:java}
> joinType
>  : INNER?
>  
>  
>  | LEFT SEMI
>  | LEFT? ANTI
>  ;
> {code:java}
>  {code}
> The keyword LEFT is optional for ANTI join even though its not optional for 
> SEMI join. When
>  using data frame interface join type "anti" is not allowed. The allowed 
> types are "left_anti" or 
>  "leftanti" for anti joins. We should also make LEFT optional for SEMI join 
> and allow "semi" and "anti" join types from data frame.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25299) Use remote storage for persisting shuffle data

2019-03-11 Thread Hu Ziqian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789189#comment-16789189
 ] 

Hu Ziqian commented on SPARK-25299:
---

Hi [~yifeih], your google doc posted at 25/Feb/19 is mainly talked about the 
new api of shuffle ant the mileStone is about implementing existing shuffle 
with new API. 

Do we have any further decision about which architecture would be used in new 
shuffle service? I found there are 5 options in [architecture discussion 
document|https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit#heading=h.btqugnmt2h40]
 and do we already choose one of them to be the candidate?

thank you

> Use remote storage for persisting shuffle data
> --
>
> Key: SPARK-25299
> URL: https://issues.apache.org/jira/browse/SPARK-25299
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Affects Versions: 2.4.0
>Reporter: Matt Cheah
>Priority: Major
>
> In Spark, the shuffle primitive requires Spark executors to persist data to 
> the local disk of the worker nodes. If executors crash, the external shuffle 
> service can continue to serve the shuffle data that was written beyond the 
> lifetime of the executor itself. In YARN, Mesos, and Standalone mode, the 
> external shuffle service is deployed on every worker node. The shuffle 
> service shares local disk with the executors that run on its node.
> There are some shortcomings with the way shuffle is fundamentally implemented 
> right now. Particularly:
>  * If any external shuffle service process or node becomes unavailable, all 
> applications that had an executor that ran on that node must recompute the 
> shuffle blocks that were lost.
>  * Similarly to the above, the external shuffle service must be kept running 
> at all times, which may waste resources when no applications are using that 
> shuffle service node.
>  * Mounting local storage can prevent users from taking advantage of 
> desirable isolation benefits from using containerized environments, like 
> Kubernetes. We had an external shuffle service implementation in an early 
> prototype of the Kubernetes backend, but it was rejected due to its strict 
> requirement to be able to mount hostPath volumes or other persistent volume 
> setups.
> In the following [architecture discussion 
> document|https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit#heading=h.btqugnmt2h40]
>  (note: _not_ an SPIP), we brainstorm various high level architectures for 
> improving the external shuffle service in a way that addresses the above 
> problems. The purpose of this umbrella JIRA is to promote additional 
> discussion on how we can approach these problems, both at the architecture 
> level and the implementation level. We anticipate filing sub-issues that 
> break down the tasks that must be completed to achieve this goal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >