date:20191010

[GitHub] [spark] rafaelkyrdan edited a comment on issue #19424: [SPARK-22197][SQL] push down operators to data source before planning

2019-10-10 Thread GitBox

rafaelkyrdan edited a comment on issue #19424: [SPARK-22197][SQL] push down 
operators to data source before planning
URL: https://github.com/apache/spark/pull/19424#issuecomment-540059983
 
 
   Could you confirm that `limit` now optimized and pushed to PushedFilters? In 
the physical plan I see that it is not. I'm using spark 2.4.4


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26042: [SPARK-29092][SQL] Report additional information about DataSourceScanExec in EXPLAIN FORMATTED

2019-10-10 Thread GitBox

cloud-fan commented on a change in pull request #26042: [SPARK-29092][SQL] 
Report additional information about DataSourceScanExec in EXPLAIN FORMATTED
URL: https://github.com/apache/spark/pull/26042#discussion_r69937
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
 ##
 @@ -65,6 +65,23 @@ trait DataSourceScanExec extends LeafExecNode {
   s"$nodeNamePrefix$nodeName${truncatedString(output, "[", ",", "]", 
maxFields)}$metadataStr")
   }
 
+  override def verboseStringWithOperatorId(): String = {
+ val metadataStr = metadata.toSeq.sorted.map {
 
 Review comment:
   nit: we can do `filter` fist then `map`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk opened a new pull request #26079: [SPARK-29369][SQL] Support string intervals without the `interval` prefix

2019-10-10 Thread GitBox

MaxGekk opened a new pull request #26079: [SPARK-29369][SQL] Support string 
intervals without the `interval` prefix
URL: https://github.com/apache/spark/pull/26079
 
 
   ### What changes were proposed in this pull request?
   In the PR, I propose to move interval parsing to 
`CalendarInterval.fromCaseInsensitiveString()` which throws an 
`IllegalArgumentException` for invalid strings, and reuse it from 
`CalendarInterval.fromString()`. The former one handles 
`IllegalArgumentException` only and returns `NULL` for invalid interval 
strings. This will allow to support interval strings without the `interval` 
prefix in casting strings to intervals and in interval type constructor.
   For example:
   ```sql
   spark-sql> select cast('1 year 10 days' as interval);
   interval 1 years 1 weeks 3 days
   spark-sql> SELECT INTERVAL '1 YEAR 10 DAYS';
   interval 1 years 1 weeks 3 days
   ```
   
   ### Why are the changes needed?
   To maintain feature parity with PostgreSQL which supports interval strings 
without prefix:
   ```sql
   # select interval '2 months 1 microsecond';
   interval
   
2 mons 00:00:00.01
   ```
   
   ### Does this PR introduce any user-facing change?
   Yes, previously parsing interval strings without `interval` gives `NULL`:
   ```sql
   spark-sql> select interval '2 months 1 microsecond';
   NULL
   ```
   After:
   ```sql
   spark-sql> select interval '2 months 1 microsecond';
   interval 2 months 1 microseconds
   ```
   
   ### How was this patch tested?
   - Added new tests to `CalendarIntervalSuite.java`
   - A test for casting strings to intervals in `CastSuite`
   - Test for interval type constructor from strings in `ExpressionParserSuite`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26042: [SPARK-29092][SQL] Report additional information about DataSourceScanExec in EXPLAIN FORMATTED

2019-10-10 Thread GitBox

cloud-fan commented on a change in pull request #26042: [SPARK-29092][SQL] 
Report additional information about DataSourceScanExec in EXPLAIN FORMATTED
URL: https://github.com/apache/spark/pull/26042#discussion_r71980
 
 

 ##
 File path: sql/core/src/test/resources/sql-tests/results/explain.sql.out
 ##
 @@ -58,6 +58,11 @@ struct
 
 (1) Scan parquet default.explain_temp1 
 Output: [key#x, val#x]
+Batched: true
+Format: Parquet
 
 Review comment:
   This is already in the node name `Scan parquet default.explain_temp1`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26042: [SPARK-29092][SQL] Report additional information about DataSourceScanExec in EXPLAIN FORMATTED

2019-10-10 Thread GitBox

cloud-fan commented on a change in pull request #26042: [SPARK-29092][SQL] 
Report additional information about DataSourceScanExec in EXPLAIN FORMATTED
URL: https://github.com/apache/spark/pull/26042#discussion_r72628
 
 

 ##
 File path: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerQueryTestSuite.scala
 ##
 @@ -100,7 +100,8 @@ class ThriftServerQueryTestSuite extends SQLQueryTestSuite 
{
 "subquery/in-subquery/in-group-by.sql",
 "subquery/in-subquery/simple-in.sql",
 "subquery/in-subquery/in-order-by.sql",
-"subquery/in-subquery/in-set-operations.sql"
+"subquery/in-subquery/in-set-operations.sql",
+"explain.sql"
 
 Review comment:
   why?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on issue #26079: [SPARK-29369][SQL] Support string intervals without the `interval` prefix

2019-10-10 Thread GitBox

MaxGekk commented on issue #26079: [SPARK-29369][SQL] Support string intervals 
without the `interval` prefix
URL: https://github.com/apache/spark/pull/26079#issuecomment-540441798
 
 
   @cloud-fan @dongjoon-hyun @zsxwing May I ask you to take a look at this PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26042: [SPARK-29092][SQL] Report additional information about DataSourceScanExec in EXPLAIN FORMATTED

2019-10-10 Thread GitBox

cloud-fan commented on a change in pull request #26042: [SPARK-29092][SQL] 
Report additional information about DataSourceScanExec in EXPLAIN FORMATTED
URL: https://github.com/apache/spark/pull/26042#discussion_r72836
 
 

 ##
 File path: sql/core/src/test/resources/sql-tests/results/explain.sql.out
 ##
 @@ -654,6 +740,11 @@ struct
 
 (1) Scan parquet default.explain_temp1 
 Output: [key#x, val#x]
+Batched: true
+Format: Parquet
+Location [not included in comparison]sql/core/spark-warehouse/[not included in 
comparison]
+PushedFilters: [IsNotNull(key), GreaterThan(key,10)]
+ReadSchema: struct
 
 Review comment:
   Can we have a test case to show the DPP filter?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] TomokoKomiyama commented on a change in pull request #26065: [SPARK-29404][DOCS] Add an explanation about the executor color changed in WebUI documentation

2019-10-10 Thread GitBox

TomokoKomiyama commented on a change in pull request #26065: 
[SPARK-29404][DOCS] Add an explanation about the executor color changed in 
WebUI documentation
URL: https://github.com/apache/spark/pull/26065#discussion_r74304
 
 

 ##
 File path: docs/web-ui.md
 ##
 @@ -51,12 +51,17 @@ The information that is displayed in this section is
   
 
 
-* Details of jobs grouped by status: Displays detailed information of the jobs 
including Job ID, description (with a link to detailed job page), submitted 
time, duration, stages summary and tasks progress bar
+When you click on a executor, you can make the color darker to point it at.
 
 Review comment:
   Users may be confused because of the meaning of executor bar color changes 
with clicking is not obvious. 
   I think this explanation works to show that the color changes is not bug, 
and to make user know the function which is just point up the bar. 
   
   Do you have any better ideas to help users understand this non-obvious 
behavior?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] s1ck commented on a change in pull request #24851: [SPARK-27303][GRAPH] Add Spark Graph API

2019-10-10 Thread GitBox

s1ck commented on a change in pull request #24851: [SPARK-27303][GRAPH] Add 
Spark Graph API
URL: https://github.com/apache/spark/pull/24851#discussion_r74185
 
 

 ##
 File path: 
graph/api/src/main/scala/org/apache/spark/graph/api/PropertyGraphWriter.scala
 ##
 @@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.graph.api
+
+import java.util.Locale
+
+import org.apache.spark.sql.SaveMode
+
+abstract class PropertyGraphWriter(val graph: PropertyGraph) {
+
+  protected var saveMode: SaveMode = SaveMode.ErrorIfExists
+  protected var format: String =
+graph.cypherSession.sparkSession.sessionState.conf.defaultDataSourceName
+
+  /**
+   * Specifies the behavior when the graph already exists. Options include:
+   * 
+   * `SaveMode.Overwrite`: overwrite the existing data.
+   * `SaveMode.Ignore`: ignore the operation (i.e. no-op).
+   * `SaveMode.ErrorIfExists`: throw an exception at runtime.
+   * 
+   * 
+   * When writing the default option is `ErrorIfExists`.
+   *
+   * @since 3.0.0
+   */
+  def mode(mode: SaveMode): PropertyGraphWriter = {
+mode match {
+  case SaveMode.Append =>
+throw new IllegalArgumentException(s"Unsupported save mode: $mode. " +
+  "Accepted save modes are 'overwrite', 'ignore', 'error', 
'errorifexists'.")
+  case _ =>
+this.saveMode = mode
+}
+this
+  }
+
+  /**
+   * Specifies the behavior when the graph already exists. Options include:
+   * 
+   * `overwrite`: overwrite the existing graph.
+   * `ignore`: ignore the operation (i.e. no-op).
+   * `error` or `errorifexists`: default option, throw an exception at 
runtime.
+   * 
+   *
+   * @since 3.0.0
+   */
+  def mode(saveMode: String): PropertyGraphWriter = {
+saveMode.toLowerCase(Locale.ROOT) match {
+  case "overwrite" => mode(SaveMode.Overwrite)
+  case "ignore" => mode(SaveMode.Ignore)
+  case "error" | "errorifexists" => mode(SaveMode.ErrorIfExists)
+  case "default" => this
 
 Review comment:
   Makes sense. It was a copy paste error from the `DataFrameWriter`: 
https://github.com/apache/spark/blob/c8159c7941f51caf3d420c8ad8599b869fc56a08/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L91
   
   Maybe somebody should fix that too ;)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] peter-toth commented on issue #26044: [SPARK-29375][SQL] Exchange reuse across all subquery levels

2019-10-10 Thread GitBox

peter-toth commented on issue #26044: [SPARK-29375][SQL] Exchange reuse across 
all subquery levels
URL: https://github.com/apache/spark/pull/26044#issuecomment-540445410
 
 
   cc @gatorsmile @maryannxue @mgaido91 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] s1ck commented on issue #24851: [SPARK-27303][GRAPH] Add Spark Graph API

2019-10-10 Thread GitBox

s1ck commented on issue #24851: [SPARK-27303][GRAPH] Add Spark Graph API
URL: https://github.com/apache/spark/pull/24851#issuecomment-540456978
 
 
   @dongjoon-hyun Thanks for the additional review. Your comments are addressed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mob-ai commented on issue #25909: [SPARK-29224]Implement Factorization Machines as a ml-pipeline component

2019-10-10 Thread GitBox

mob-ai commented on issue #25909: [SPARK-29224]Implement Factorization Machines 
as a ml-pipeline component
URL: https://github.com/apache/spark/pull/25909#issuecomment-540466862
 
 
   > @mob-ai Thanks for this work!
   > But before you continue, I guess you can refer to previous dicsussion 
[SPARK-7008](https://issues.apache.org/jira/browse/SPARK-7008). I think you 
should provide some information like convergence curves on common datasets, and 
proof that mini-batch SGD is a good choice as an efficient solver.
   > As to the PR itself, besides owen's comments, I think there should be 
`FMClassifier` & `FMRegressor` respectively.
   
   Thank you for your suggests. I will test FM use public datasets.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mob-ai edited a comment on issue #25909: [SPARK-29224]Implement Factorization Machines as a ml-pipeline component

2019-10-10 Thread GitBox

mob-ai edited a comment on issue #25909: [SPARK-29224]Implement Factorization 
Machines as a ml-pipeline component
URL: https://github.com/apache/spark/pull/25909#issuecomment-540466862
 
 
   > @mob-ai Thanks for this work!
   > But before you continue, I guess you can refer to previous dicsussion 
[SPARK-7008](https://issues.apache.org/jira/browse/SPARK-7008). I think you 
should provide some information like convergence curves on common datasets, and 
proof that mini-batch SGD is a good choice as an efficient solver.
   > As to the PR itself, besides owen's comments, I think there should be 
`FMClassifier` & `FMRegressor` respectively.
   
   Thank you for your suggestion. I will test FM use public datasets.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mob-ai edited a comment on issue #25909: [SPARK-29224]Implement Factorization Machines as a ml-pipeline component

2019-10-10 Thread GitBox

mob-ai edited a comment on issue #25909: [SPARK-29224]Implement Factorization 
Machines as a ml-pipeline component
URL: https://github.com/apache/spark/pull/25909#issuecomment-540466862
 
 
   > @mob-ai Thanks for this work!
   > But before you continue, I guess you can refer to previous dicsussion 
[SPARK-7008](https://issues.apache.org/jira/browse/SPARK-7008). I think you 
should provide some information like convergence curves on common datasets, and 
proof that mini-batch SGD is a good choice as an efficient solver.
   > As to the PR itself, besides owen's comments, I think there should be 
`FMClassifier` & `FMRegressor` respectively.
   
   Thank you for your suggestion. I will test FM on public datasets.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] IgorBerman closed pull request #20640: [SPARK-19755][Mesos] Blacklist is always active for MesosCoarseGrainedSchedulerBackend

2019-10-10 Thread GitBox

IgorBerman closed pull request #20640: [SPARK-19755][Mesos] Blacklist is always 
active for MesosCoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/20640
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] IgorBerman commented on issue #20640: [SPARK-19755][Mesos] Blacklist is always active for MesosCoarseGrainedSchedulerBackend

2019-10-10 Thread GitBox

IgorBerman commented on issue #20640: [SPARK-19755][Mesos] Blacklist is always 
active for MesosCoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/20640#issuecomment-540468693
 
 
   I'm closing this PR as it is long open one and there is no agreement that 
it's valuable.
   anyone who wants to pick it, are welcome to create some.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #25416: [SPARK-28330][SQL] Support ANSI SQL: result offset clause in query expression

2019-10-10 Thread GitBox

beliefer commented on a change in pull request #25416: [SPARK-28330][SQL] 
Support ANSI SQL: result offset clause in query expression
URL: https://github.com/apache/spark/pull/25416#discussion_r333402192
 
 

 ##
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/offset.scala
 ##
 @@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.serializer.Serializer
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.{Attribute, SortOrder}
+import org.apache.spark.sql.catalyst.plans.physical.{Partitioning, 
SinglePartition}
+
+
+/**
+ * Skip the first `offset` elements and collect them to a single partition.
+ * This operator will be used when a logical `Offset` operation is the final 
operator in an
+ * logical plan, which happens when the user is collecting results back to the 
driver.
+ */
+case class CollectOffsetExec(offset: Int, child: SparkPlan) extends 
UnaryExecNode {
+
+  override def output: Seq[Attribute] = child.output
+
+  override def outputPartitioning: Partitioning = SinglePartition
+
+  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+
+  override def executeCollect(): Array[InternalRow] = 
child.executeCollect.drop(offset)
+
+  private val serializer: Serializer = new 
UnsafeRowSerializer(child.output.size)
+
+  protected override def doExecute(): RDD[InternalRow] = {
+sparkContext.parallelize(executeCollect(), 1)
+  }
+
+}
+
+/**
+ * Skip the first `offset` elements and collect them to a single partition.
+ */
+case class OffsetExec(offset: Int, child: SparkPlan) extends UnaryExecNode {
+
+  override def output: Seq[Attribute] = child.output
+
+  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+
+  protected override def doExecute(): RDD[InternalRow] = {
+val rdd = child.execute()
+val arr = rdd.take(offset)
+rdd.filter(!arr.contains(_))
 
 Review comment:
   @cloud-fan There are exists a problem the index of partition and the order 
of data are inconsistent.
   I have a new implement but not works file as I can't assurance the order of 
output produced by child plans.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] peter-toth edited a comment on issue #26028: [SPARK-29359][SQL][TESTS] Better exception handling in (SQL|ThriftServer)QueryTestSuite

2019-10-10 Thread GitBox

peter-toth edited a comment on issue #26028: [SPARK-29359][SQL][TESTS] Better 
exception handling in (SQL|ThriftServer)QueryTestSuite
URL: https://github.com/apache/spark/pull/26028#issuecomment-540414457
 
 
   @dongjoon-hyun @wangyum do you think this PR is ok now?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #25416: [SPARK-28330][SQL] Support ANSI SQL: result offset clause in query expression

2019-10-10 Thread GitBox

beliefer commented on a change in pull request #25416: [SPARK-28330][SQL] 
Support ANSI SQL: result offset clause in query expression
URL: https://github.com/apache/spark/pull/25416#discussion_r333402192
 
 

 ##
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/offset.scala
 ##
 @@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.serializer.Serializer
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.{Attribute, SortOrder}
+import org.apache.spark.sql.catalyst.plans.physical.{Partitioning, 
SinglePartition}
+
+
+/**
+ * Skip the first `offset` elements and collect them to a single partition.
+ * This operator will be used when a logical `Offset` operation is the final 
operator in an
+ * logical plan, which happens when the user is collecting results back to the 
driver.
+ */
+case class CollectOffsetExec(offset: Int, child: SparkPlan) extends 
UnaryExecNode {
+
+  override def output: Seq[Attribute] = child.output
+
+  override def outputPartitioning: Partitioning = SinglePartition
+
+  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+
+  override def executeCollect(): Array[InternalRow] = 
child.executeCollect.drop(offset)
+
+  private val serializer: Serializer = new 
UnsafeRowSerializer(child.output.size)
+
+  protected override def doExecute(): RDD[InternalRow] = {
+sparkContext.parallelize(executeCollect(), 1)
+  }
+
+}
+
+/**
+ * Skip the first `offset` elements and collect them to a single partition.
+ */
+case class OffsetExec(offset: Int, child: SparkPlan) extends UnaryExecNode {
+
+  override def output: Seq[Attribute] = child.output
+
+  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+
+  protected override def doExecute(): RDD[InternalRow] = {
+val rdd = child.execute()
+val arr = rdd.take(offset)
+rdd.filter(!arr.contains(_))
 
 Review comment:
   @cloud-fan There are exists a problem the index of partition and the order 
of data are inconsistent.
   I have a new implement but not works file as I can't assurance the order of 
output produced by child plans.
   ```
 protected override def doExecute(): RDD[InternalRow] = {
   val rdd = child.execute()
   val partIdxToCountItr = rdd.mapPartitionsWithIndex{(partIdx, iter) => {
 val partIdxToRowCount = scala.collection.mutable.Map[Int,Int]()
 var rowCount = 0
 while(iter.hasNext){
   rowCount += 1
   iter.next()
 }
 partIdxToRowCount.put(partIdx, rowCount)
 partIdxToRowCount.iterator
   }}.collect().iterator
   var remainder = offset
   val partIdxToSkipCount = scala.collection.mutable.Map[Int,Int]()
   while (partIdxToCountItr.hasNext && remainder > 0) {
 val kv = partIdxToCountItr.next()
val partIdx = kv._1
 val count = kv._2
 if (count > remainder) {
   partIdxToSkipCount(partIdx) = remainder
   remainder = 0
 } else {
   partIdxToSkipCount(partIdx) = count
   remainder -= count
 }
   }
   val broadcastPartIdxToSkipCount = 
sparkContext.broadcast(partIdxToSkipCount)
   rdd.mapPartitionsWithIndex{(partIdx, iter) => {
 val skipCount = broadcastPartIdxToSkipCount.value.getOrElse(partIdx, 0)
 iter.drop(skipCount)
   }}
 }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #25416: [SPARK-28330][SQL] Support ANSI SQL: result offset clause in query expression

2019-10-10 Thread GitBox

beliefer commented on a change in pull request #25416: [SPARK-28330][SQL] 
Support ANSI SQL: result offset clause in query expression
URL: https://github.com/apache/spark/pull/25416#discussion_r333403518
 
 

 ##
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/offset.scala
 ##
 @@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.serializer.Serializer
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.{Attribute, SortOrder}
+import org.apache.spark.sql.catalyst.plans.physical.{Partitioning, 
SinglePartition}
+
+
+/**
+ * Skip the first `offset` elements and collect them to a single partition.
+ * This operator will be used when a logical `Offset` operation is the final 
operator in an
+ * logical plan, which happens when the user is collecting results back to the 
driver.
+ */
+case class CollectOffsetExec(offset: Int, child: SparkPlan) extends 
UnaryExecNode {
+
+  override def output: Seq[Attribute] = child.output
+
+  override def outputPartitioning: Partitioning = SinglePartition
+
+  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+
+  override def executeCollect(): Array[InternalRow] = 
child.executeCollect.drop(offset)
+
+  private val serializer: Serializer = new 
UnsafeRowSerializer(child.output.size)
+
+  protected override def doExecute(): RDD[InternalRow] = {
+sparkContext.parallelize(executeCollect(), 1)
+  }
+
+}
+
+/**
+ * Skip the first `offset` elements and collect them to a single partition.
+ */
+case class OffsetExec(offset: Int, child: SparkPlan) extends UnaryExecNode {
+
+  override def output: Seq[Attribute] = child.output
+
+  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+
+  protected override def doExecute(): RDD[InternalRow] = {
+val rdd = child.execute()
+val arr = rdd.take(offset)
+rdd.filter(!arr.contains(_))
 
 Review comment:
   So, I want know how to assurance the order?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] juliuszsompolski commented on a change in pull request #26014: [SPARK-29349][SQL] Support FETCH_PRIOR in Thriftserver fetch request

2019-10-10 Thread GitBox

juliuszsompolski commented on a change in pull request #26014: 
[SPARK-29349][SQL] Support FETCH_PRIOR in Thriftserver fetch request
URL: https://github.com/apache/spark/pull/26014#discussion_r333412752
 
 

 ##
 File path: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
 ##
 @@ -684,6 +685,92 @@ class HiveThriftBinaryServerSuite extends 
HiveThriftJdbcTest {
   
assert(e.getMessage.contains("org.apache.spark.sql.catalyst.parser.ParseException"))
 }
   }
+
+  test("ThriftCLIService FetchResults FETCH_FIRST, FETCH_NEXT, FETCH_PRIOR") {
+def checkResult(rows: RowSet, start: Long, end: Long): Unit = {
+  assert(rows.getStartOffset() == start)
+  assert(rows.numRows() == end - start)
+  rows.iterator.asScala.zip((start until end).iterator).foreach { case 
(row, v) =>
+assert(row(0).asInstanceOf[Long] === v)
+  }
+}
+
+withCLIServiceClient { client =>
+  val user = System.getProperty("user.name")
+  val sessionHandle = client.openSession(user, "")
+
+  val confOverlay = new java.util.HashMap[java.lang.String, 
java.lang.String]
+  val operationHandle = client.executeStatement(
+sessionHandle,
+"SELECT * FROM range(10)",
+confOverlay) // 10 rows result with sequence 0, 1, 2, ..., 9
+  var rows: RowSet = null
+
+  // Fetch 5 rows with FETCH_NEXT
+  rows = client.fetchResults(
+operationHandle, FetchOrientation.FETCH_NEXT, 5, 
FetchType.QUERY_OUTPUT)
+  checkResult(rows, 0, 5) // fetched [0, 5)
+
+  // Fetch another 2 rows with FETCH_NEXT
+  rows = client.fetchResults(
+operationHandle, FetchOrientation.FETCH_NEXT, 2, 
FetchType.QUERY_OUTPUT)
+  checkResult(rows, 5, 7) // fetched [5, 7)
+
+  // FETCH_PRIOR 3 rows
 
 Review comment:
   @wangyum this is expected.
   `FETCH_PRIOR` of the Thriftserver is not the same as FETCH PRIOR in the 
cursor of the client.
   Fetch in Thriftserver operates in batches of rows, and the cursor in the 
client caches these batches and returns results row by row. Let's say it's 
batching by maxRows=100, and we returned row 99. The client has rows [0, 100) 
from the first batch and is at row 99. The next FETCH NEXT on the cursor will 
have to call FETCH_NEXT to the Thriftserver to get a batch of rows [100, 200) 
and return row 100 to the client. Another FETCH NEXT will return row 101 from 
the batch without having to call FETCH_NEXT on the Thriftserver. Another FETCH 
NEXT on the cursor will return row 102. Then FETCH PRIOR will return row 101 
again. Then FETCH PRIOR will return row 100. Only then, another FETCH PRIOR 
should return row 99, but the cursor doesn't have its current batch. Then it 
has to call FETCH_PRIOR on Thriftserver to get rows [0, 99) again.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on issue #25984: [SPARK-29308][BUILD] Update deps in dev/deps/spark-deps-hadoop-3.2 for hadoop-3.2

2019-10-10 Thread GitBox

AngersZh commented on issue #25984: [SPARK-29308][BUILD] Update deps in 
dev/deps/spark-deps-hadoop-3.2 for hadoop-3.2
URL: https://github.com/apache/spark/pull/25984#issuecomment-540485456
 
 
   > Unfortunately, Jenkins is down. Let's wait until Jenkins is back.
   
   
   > Unfortunately, Jenkins is down. Let's wait until Jenkins is back.
   
   Bad news, wait for Jenkins back.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu edited a comment on issue #25984: [SPARK-29308][BUILD] Update deps in dev/deps/spark-deps-hadoop-3.2 for hadoop-3.2

2019-10-10 Thread GitBox

AngersZh edited a comment on issue #25984: [SPARK-29308][BUILD] Update deps 
in dev/deps/spark-deps-hadoop-3.2 for hadoop-3.2
URL: https://github.com/apache/spark/pull/25984#issuecomment-540485456
 
 
   > Unfortunately, Jenkins is down. Let's wait until Jenkins is back.
   
   Bad news, wait for Jenkins back.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mob-ai edited a comment on issue #25909: [SPARK-29224]Implement Factorization Machines as a ml-pipeline component

2019-10-10 Thread GitBox

mob-ai edited a comment on issue #25909: [SPARK-29224]Implement Factorization 
Machines as a ml-pipeline component
URL: https://github.com/apache/spark/pull/25909#issuecomment-540466862
 
 
   > @mob-ai Thanks for this work!
   > But before you continue, I guess you can refer to previous dicsussion 
[SPARK-7008](https://issues.apache.org/jira/browse/SPARK-7008). I think you 
should provide some information like convergence curves on common datasets, and 
proof that mini-batch SGD is a good choice as an efficient solver.
   > As to the PR itself, besides owen's comments, I think there should be 
`FMClassifier` & `FMRegressor` respectively.
   
   Thank you for your suggestion. I will test FM on public datasets.
   Actually, this PR also implements AdamW solver to optimize FM. AdamW is more 
efficient than normal SGD, so default solver is AdamW. I will compare normal 
SGD and AdamW on public datasets.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn opened a new pull request #26080: [SPARK-29425][SQL] The ownership of a database should be respected

2019-10-10 Thread GitBox

yaooqinn opened a new pull request #26080: [SPARK-29425][SQL] The ownership of 
a database should be respected
URL: https://github.com/apache/spark/pull/26080
 
 
   
   
   ### What changes were proposed in this pull request?
   
   
   Keep the owner of a database when executing alter database commands
   
   ### Why are the changes needed?
   
   
   Spark will inadvertently delete the owner of a database for executing 
databases ddls
   
   ### Does this PR introduce any user-facing change?
   
   
   NO
   
   ### How was this patch tested?
   
   
   add and modify uts


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn removed a comment on issue #26068: [SPARK-29405][SQL] Alter table / Insert statements should not change a table's ownership

2019-10-10 Thread GitBox

yaooqinn removed a comment on issue #26068: [SPARK-29405][SQL] Alter table / 
Insert statements should not change a table's ownership
URL: https://github.com/apache/spark/pull/26068#issuecomment-540413720
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on issue #26068: [SPARK-29405][SQL] Alter table / Insert statements should not change a table's ownership

2019-10-10 Thread GitBox

yaooqinn commented on issue #26068: [SPARK-29405][SQL] Alter table / Insert 
statements should not change a table's ownership
URL: https://github.com/apache/spark/pull/26068#issuecomment-540521129
 
 
   cc @wangyum 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26079: [SPARK-29369][SQL] Support string intervals without the `interval` prefix

2019-10-10 Thread GitBox

cloud-fan commented on a change in pull request #26079: [SPARK-29369][SQL] 
Support string intervals without the `interval` prefix
URL: https://github.com/apache/spark/pull/26079#discussion_r333465668
 
 

 ##
 File path: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java
 ##
 @@ -73,45 +72,50 @@ private static long toLong(String s) {
* This method is case-insensitive.
*/
   public static CalendarInterval fromString(String s) {
-if (s == null) {
-  return null;
-}
-s = s.trim();
-Matcher m = p.matcher(s);
-if (!m.matches() || s.compareToIgnoreCase("interval") == 0) {
+try {
+  return fromCaseInsensitiveString(s);
+} catch (IllegalArgumentException e) {
   return null;
-} else {
-  long months = toLong(m.group(1)) * 12 + toLong(m.group(2));
-  long microseconds = toLong(m.group(3)) * MICROS_PER_WEEK;
-  microseconds += toLong(m.group(4)) * MICROS_PER_DAY;
-  microseconds += toLong(m.group(5)) * MICROS_PER_HOUR;
-  microseconds += toLong(m.group(6)) * MICROS_PER_MINUTE;
-  microseconds += toLong(m.group(7)) * MICROS_PER_SECOND;
-  microseconds += toLong(m.group(8)) * MICROS_PER_MILLI;
-  microseconds += toLong(m.group(9));
-  return new CalendarInterval((int) months, microseconds);
 }
   }
 
   /**
-   * Convert a string to CalendarInterval. Unlike fromString, this method can 
handle
+   * Convert a string to CalendarInterval. This method can handle
* strings without the `interval` prefix and throws IllegalArgumentException
* when the input string is not a valid interval.
*
* @throws IllegalArgumentException if the string is not a valid internal.
*/
   public static CalendarInterval fromCaseInsensitiveString(String s) {
 
 Review comment:
   Is it the only place we parse interval string? I thought we parse it with 
antlr parser.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #26079: [SPARK-29369][SQL] Support string intervals without the `interval` prefix

2019-10-10 Thread GitBox

MaxGekk commented on a change in pull request #26079: [SPARK-29369][SQL] 
Support string intervals without the `interval` prefix
URL: https://github.com/apache/spark/pull/26079#discussion_r333469680
 
 

 ##
 File path: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java
 ##
 @@ -73,45 +72,50 @@ private static long toLong(String s) {
* This method is case-insensitive.
*/
   public static CalendarInterval fromString(String s) {
-if (s == null) {
-  return null;
-}
-s = s.trim();
-Matcher m = p.matcher(s);
-if (!m.matches() || s.compareToIgnoreCase("interval") == 0) {
+try {
+  return fromCaseInsensitiveString(s);
+} catch (IllegalArgumentException e) {
   return null;
-} else {
-  long months = toLong(m.group(1)) * 12 + toLong(m.group(2));
-  long microseconds = toLong(m.group(3)) * MICROS_PER_WEEK;
-  microseconds += toLong(m.group(4)) * MICROS_PER_DAY;
-  microseconds += toLong(m.group(5)) * MICROS_PER_HOUR;
-  microseconds += toLong(m.group(6)) * MICROS_PER_MINUTE;
-  microseconds += toLong(m.group(7)) * MICROS_PER_SECOND;
-  microseconds += toLong(m.group(8)) * MICROS_PER_MILLI;
-  microseconds += toLong(m.group(9));
-  return new CalendarInterval((int) months, microseconds);
 }
   }
 
   /**
-   * Convert a string to CalendarInterval. Unlike fromString, this method can 
handle
+   * Convert a string to CalendarInterval. This method can handle
* strings without the `interval` prefix and throws IllegalArgumentException
* when the input string is not a valid interval.
*
* @throws IllegalArgumentException if the string is not a valid internal.
*/
   public static CalendarInterval fromCaseInsensitiveString(String s) {
 
 Review comment:
   antlr parser does this as well but it parses sql elements like
   ```sql
   spark-sql> select interval 10 days 1 second;
   interval 1 weeks 3 days 1 seconds
   ```
   here is only the place where we parse string values:
   ```sql
   spark-sql> select interval 'interval 10 days 1 second';
   interval 1 weeks 3 days 1 seconds
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #26079: [SPARK-29369][SQL] Support string intervals without the `interval` prefix

2019-10-10 Thread GitBox

MaxGekk commented on a change in pull request #26079: [SPARK-29369][SQL] 
Support string intervals without the `interval` prefix
URL: https://github.com/apache/spark/pull/26079#discussion_r333469680
 
 

 ##
 File path: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java
 ##
 @@ -73,45 +72,50 @@ private static long toLong(String s) {
* This method is case-insensitive.
*/
   public static CalendarInterval fromString(String s) {
-if (s == null) {
-  return null;
-}
-s = s.trim();
-Matcher m = p.matcher(s);
-if (!m.matches() || s.compareToIgnoreCase("interval") == 0) {
+try {
+  return fromCaseInsensitiveString(s);
+} catch (IllegalArgumentException e) {
   return null;
-} else {
-  long months = toLong(m.group(1)) * 12 + toLong(m.group(2));
-  long microseconds = toLong(m.group(3)) * MICROS_PER_WEEK;
-  microseconds += toLong(m.group(4)) * MICROS_PER_DAY;
-  microseconds += toLong(m.group(5)) * MICROS_PER_HOUR;
-  microseconds += toLong(m.group(6)) * MICROS_PER_MINUTE;
-  microseconds += toLong(m.group(7)) * MICROS_PER_SECOND;
-  microseconds += toLong(m.group(8)) * MICROS_PER_MILLI;
-  microseconds += toLong(m.group(9));
-  return new CalendarInterval((int) months, microseconds);
 }
   }
 
   /**
-   * Convert a string to CalendarInterval. Unlike fromString, this method can 
handle
+   * Convert a string to CalendarInterval. This method can handle
* strings without the `interval` prefix and throws IllegalArgumentException
* when the input string is not a valid interval.
*
* @throws IllegalArgumentException if the string is not a valid internal.
*/
   public static CalendarInterval fromCaseInsensitiveString(String s) {
 
 Review comment:
   antlr parser does this as well but [it parses sql 
elements](https://github.com/apache/spark/blob/f2ead4d0b50715e3dec79ce762c31f145d46a5b5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala#L1907-L1961)
 like
   ```sql
   spark-sql> select interval 10 days 1 second;
   interval 1 weeks 3 days 1 seconds
   ```
   here is only the place where we parse string values:
   ```sql
   spark-sql> select interval 'interval 10 days 1 second';
   interval 1 weeks 3 days 1 seconds
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26079: [SPARK-29369][SQL] Support string intervals without the `interval` prefix

2019-10-10 Thread GitBox

cloud-fan commented on a change in pull request #26079: [SPARK-29369][SQL] 
Support string intervals without the `interval` prefix
URL: https://github.com/apache/spark/pull/26079#discussion_r333472752
 
 

 ##
 File path: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java
 ##
 @@ -73,45 +72,50 @@ private static long toLong(String s) {
* This method is case-insensitive.
*/
   public static CalendarInterval fromString(String s) {
-if (s == null) {
-  return null;
-}
-s = s.trim();
-Matcher m = p.matcher(s);
-if (!m.matches() || s.compareToIgnoreCase("interval") == 0) {
+try {
+  return fromCaseInsensitiveString(s);
+} catch (IllegalArgumentException e) {
   return null;
-} else {
-  long months = toLong(m.group(1)) * 12 + toLong(m.group(2));
-  long microseconds = toLong(m.group(3)) * MICROS_PER_WEEK;
-  microseconds += toLong(m.group(4)) * MICROS_PER_DAY;
-  microseconds += toLong(m.group(5)) * MICROS_PER_HOUR;
-  microseconds += toLong(m.group(6)) * MICROS_PER_MINUTE;
-  microseconds += toLong(m.group(7)) * MICROS_PER_SECOND;
-  microseconds += toLong(m.group(8)) * MICROS_PER_MILLI;
-  microseconds += toLong(m.group(9));
-  return new CalendarInterval((int) months, microseconds);
 }
   }
 
   /**
-   * Convert a string to CalendarInterval. Unlike fromString, this method can 
handle
+   * Convert a string to CalendarInterval. This method can handle
* strings without the `interval` prefix and throws IllegalArgumentException
* when the input string is not a valid interval.
*
* @throws IllegalArgumentException if the string is not a valid internal.
*/
   public static CalendarInterval fromCaseInsensitiveString(String s) {
 
 Review comment:
   This looks duplicated. Shall we add a `parseInterval` method to the 
`ParserInterface` interface and call the parser here? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LucaCanali opened a new pull request #26081: [SPARK-29032][FOLLOWUP][DOC] Document PrometheusServlet in the monitoring documentation

2019-10-10 Thread GitBox

LucaCanali opened a new pull request #26081: [SPARK-29032][FOLLOWUP][DOC] 
Document PrometheusServlet in the monitoring documentation
URL: https://github.com/apache/spark/pull/26081
 
 
   This adds an entry about PrometheusServlet to the documentation, following 
SPARK-29032
   
   ### Why are the changes needed?
   
   The monitoring documentation lists all the available metrics sinks, this 
should be added to the list for completeness.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #26079: [SPARK-29369][SQL] Support string intervals without the `interval` prefix

2019-10-10 Thread GitBox

MaxGekk commented on a change in pull request #26079: [SPARK-29369][SQL] 
Support string intervals without the `interval` prefix
URL: https://github.com/apache/spark/pull/26079#discussion_r333480766
 
 

 ##
 File path: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java
 ##
 @@ -73,45 +72,50 @@ private static long toLong(String s) {
* This method is case-insensitive.
*/
   public static CalendarInterval fromString(String s) {
-if (s == null) {
-  return null;
-}
-s = s.trim();
-Matcher m = p.matcher(s);
-if (!m.matches() || s.compareToIgnoreCase("interval") == 0) {
+try {
+  return fromCaseInsensitiveString(s);
+} catch (IllegalArgumentException e) {
   return null;
-} else {
-  long months = toLong(m.group(1)) * 12 + toLong(m.group(2));
-  long microseconds = toLong(m.group(3)) * MICROS_PER_WEEK;
-  microseconds += toLong(m.group(4)) * MICROS_PER_DAY;
-  microseconds += toLong(m.group(5)) * MICROS_PER_HOUR;
-  microseconds += toLong(m.group(6)) * MICROS_PER_MINUTE;
-  microseconds += toLong(m.group(7)) * MICROS_PER_SECOND;
-  microseconds += toLong(m.group(8)) * MICROS_PER_MILLI;
-  microseconds += toLong(m.group(9));
-  return new CalendarInterval((int) months, microseconds);
 }
   }
 
   /**
-   * Convert a string to CalendarInterval. Unlike fromString, this method can 
handle
+   * Convert a string to CalendarInterval. This method can handle
* strings without the `interval` prefix and throws IllegalArgumentException
* when the input string is not a valid interval.
*
* @throws IllegalArgumentException if the string is not a valid internal.
*/
   public static CalendarInterval fromCaseInsensitiveString(String s) {
 
 Review comment:
   Maybe something has been duplicated, and can be reused but this is heavy 
refactoring for this PR. 
   
   For instance, `AstBuilder.visitInterval` gets already split interval units 
but `CalendarInterval.fromString()` uses regular expression to parse & split: 
https://github.com/apache/spark/blob/b10344956d672fca495d0684b351ab5cf9f210d8/common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java#L50-L53
   
   If you don't mind, I would try to do that in a separate PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #26079: [SPARK-29369][SQL] Support string intervals without the `interval` prefix

2019-10-10 Thread GitBox

MaxGekk commented on a change in pull request #26079: [SPARK-29369][SQL] 
Support string intervals without the `interval` prefix
URL: https://github.com/apache/spark/pull/26079#discussion_r333483069
 
 

 ##
 File path: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java
 ##
 @@ -73,45 +72,50 @@ private static long toLong(String s) {
* This method is case-insensitive.
*/
   public static CalendarInterval fromString(String s) {
-if (s == null) {
-  return null;
-}
-s = s.trim();
-Matcher m = p.matcher(s);
-if (!m.matches() || s.compareToIgnoreCase("interval") == 0) {
+try {
+  return fromCaseInsensitiveString(s);
+} catch (IllegalArgumentException e) {
   return null;
-} else {
-  long months = toLong(m.group(1)) * 12 + toLong(m.group(2));
-  long microseconds = toLong(m.group(3)) * MICROS_PER_WEEK;
-  microseconds += toLong(m.group(4)) * MICROS_PER_DAY;
-  microseconds += toLong(m.group(5)) * MICROS_PER_HOUR;
-  microseconds += toLong(m.group(6)) * MICROS_PER_MINUTE;
-  microseconds += toLong(m.group(7)) * MICROS_PER_SECOND;
-  microseconds += toLong(m.group(8)) * MICROS_PER_MILLI;
-  microseconds += toLong(m.group(9));
-  return new CalendarInterval((int) months, microseconds);
 }
   }
 
   /**
-   * Convert a string to CalendarInterval. Unlike fromString, this method can 
handle
+   * Convert a string to CalendarInterval. This method can handle
* strings without the `interval` prefix and throws IllegalArgumentException
* when the input string is not a valid interval.
*
* @throws IllegalArgumentException if the string is not a valid internal.
*/
   public static CalendarInterval fromCaseInsensitiveString(String s) {
 
 Review comment:
   This PR introduced code duplication 
https://github.com/apache/spark/pull/8034 for your code 
https://github.com/apache/spark/pull/7355 5 years ago.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #26079: [SPARK-29369][SQL] Support string intervals without the `interval` prefix

2019-10-10 Thread GitBox

MaxGekk commented on a change in pull request #26079: [SPARK-29369][SQL] 
Support string intervals without the `interval` prefix
URL: https://github.com/apache/spark/pull/26079#discussion_r333484672
 
 

 ##
 File path: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java
 ##
 @@ -73,45 +72,50 @@ private static long toLong(String s) {
* This method is case-insensitive.
*/
   public static CalendarInterval fromString(String s) {
-if (s == null) {
-  return null;
-}
-s = s.trim();
-Matcher m = p.matcher(s);
-if (!m.matches() || s.compareToIgnoreCase("interval") == 0) {
+try {
+  return fromCaseInsensitiveString(s);
+} catch (IllegalArgumentException e) {
   return null;
-} else {
-  long months = toLong(m.group(1)) * 12 + toLong(m.group(2));
-  long microseconds = toLong(m.group(3)) * MICROS_PER_WEEK;
-  microseconds += toLong(m.group(4)) * MICROS_PER_DAY;
-  microseconds += toLong(m.group(5)) * MICROS_PER_HOUR;
-  microseconds += toLong(m.group(6)) * MICROS_PER_MINUTE;
-  microseconds += toLong(m.group(7)) * MICROS_PER_SECOND;
-  microseconds += toLong(m.group(8)) * MICROS_PER_MILLI;
-  microseconds += toLong(m.group(9));
-  return new CalendarInterval((int) months, microseconds);
 }
   }
 
   /**
-   * Convert a string to CalendarInterval. Unlike fromString, this method can 
handle
+   * Convert a string to CalendarInterval. This method can handle
* strings without the `interval` prefix and throws IllegalArgumentException
* when the input string is not a valid interval.
*
* @throws IllegalArgumentException if the string is not a valid internal.
*/
   public static CalendarInterval fromCaseInsensitiveString(String s) {
 
 Review comment:
   And your regexp is not tolerant to the order of interval units, see:
   ```sql
   spark-sql> select interval 'interval 1 microsecond 2 months';
   NULL
   spark-sql> select interval 1 microsecond 2 months;
   interval 2 months 1 microseconds
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #26079: [SPARK-29369][SQL] Support string intervals without the `interval` prefix

2019-10-10 Thread GitBox

MaxGekk commented on a change in pull request #26079: [SPARK-29369][SQL] 
Support string intervals without the `interval` prefix
URL: https://github.com/apache/spark/pull/26079#discussion_r333485581
 
 

 ##
 File path: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java
 ##
 @@ -73,45 +72,50 @@ private static long toLong(String s) {
* This method is case-insensitive.
*/
   public static CalendarInterval fromString(String s) {
-if (s == null) {
-  return null;
-}
-s = s.trim();
-Matcher m = p.matcher(s);
-if (!m.matches() || s.compareToIgnoreCase("interval") == 0) {
+try {
+  return fromCaseInsensitiveString(s);
+} catch (IllegalArgumentException e) {
   return null;
-} else {
-  long months = toLong(m.group(1)) * 12 + toLong(m.group(2));
-  long microseconds = toLong(m.group(3)) * MICROS_PER_WEEK;
-  microseconds += toLong(m.group(4)) * MICROS_PER_DAY;
-  microseconds += toLong(m.group(5)) * MICROS_PER_HOUR;
-  microseconds += toLong(m.group(6)) * MICROS_PER_MINUTE;
-  microseconds += toLong(m.group(7)) * MICROS_PER_SECOND;
-  microseconds += toLong(m.group(8)) * MICROS_PER_MILLI;
-  microseconds += toLong(m.group(9));
-  return new CalendarInterval((int) months, microseconds);
 }
   }
 
   /**
-   * Convert a string to CalendarInterval. Unlike fromString, this method can 
handle
+   * Convert a string to CalendarInterval. This method can handle
* strings without the `interval` prefix and throws IllegalArgumentException
* when the input string is not a valid interval.
*
* @throws IllegalArgumentException if the string is not a valid internal.
*/
   public static CalendarInterval fromCaseInsensitiveString(String s) {
 
 Review comment:
   Let's keep them separate so far. And I will try to write flexible and common 
code for parsing string interval that could handle other features found in 
https://github.com/apache/spark/pull/26055


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #26079: [SPARK-29369][SQL] Support string intervals without the `interval` prefix

2019-10-10 Thread GitBox

MaxGekk commented on a change in pull request #26079: [SPARK-29369][SQL] 
Support string intervals without the `interval` prefix
URL: https://github.com/apache/spark/pull/26079#discussion_r333485581
 
 

 ##
 File path: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java
 ##
 @@ -73,45 +72,50 @@ private static long toLong(String s) {
* This method is case-insensitive.
*/
   public static CalendarInterval fromString(String s) {
-if (s == null) {
-  return null;
-}
-s = s.trim();
-Matcher m = p.matcher(s);
-if (!m.matches() || s.compareToIgnoreCase("interval") == 0) {
+try {
+  return fromCaseInsensitiveString(s);
+} catch (IllegalArgumentException e) {
   return null;
-} else {
-  long months = toLong(m.group(1)) * 12 + toLong(m.group(2));
-  long microseconds = toLong(m.group(3)) * MICROS_PER_WEEK;
-  microseconds += toLong(m.group(4)) * MICROS_PER_DAY;
-  microseconds += toLong(m.group(5)) * MICROS_PER_HOUR;
-  microseconds += toLong(m.group(6)) * MICROS_PER_MINUTE;
-  microseconds += toLong(m.group(7)) * MICROS_PER_SECOND;
-  microseconds += toLong(m.group(8)) * MICROS_PER_MILLI;
-  microseconds += toLong(m.group(9));
-  return new CalendarInterval((int) months, microseconds);
 }
   }
 
   /**
-   * Convert a string to CalendarInterval. Unlike fromString, this method can 
handle
+   * Convert a string to CalendarInterval. This method can handle
* strings without the `interval` prefix and throws IllegalArgumentException
* when the input string is not a valid interval.
*
* @throws IllegalArgumentException if the string is not a valid internal.
*/
   public static CalendarInterval fromCaseInsensitiveString(String s) {
 
 Review comment:
   Let's keep them separate so far. And I will try to write flexible and common 
code in the near future for parsing string intervals that could handle other 
features found in https://github.com/apache/spark/pull/26055


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26079: [SPARK-29369][SQL] Support string intervals without the `interval` prefix

2019-10-10 Thread GitBox

cloud-fan commented on a change in pull request #26079: [SPARK-29369][SQL] 
Support string intervals without the `interval` prefix
URL: https://github.com/apache/spark/pull/26079#discussion_r333503281
 
 

 ##
 File path: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java
 ##
 @@ -73,45 +72,50 @@ private static long toLong(String s) {
* This method is case-insensitive.
*/
   public static CalendarInterval fromString(String s) {
-if (s == null) {
-  return null;
-}
-s = s.trim();
-Matcher m = p.matcher(s);
-if (!m.matches() || s.compareToIgnoreCase("interval") == 0) {
+try {
+  return fromCaseInsensitiveString(s);
+} catch (IllegalArgumentException e) {
   return null;
-} else {
-  long months = toLong(m.group(1)) * 12 + toLong(m.group(2));
-  long microseconds = toLong(m.group(3)) * MICROS_PER_WEEK;
-  microseconds += toLong(m.group(4)) * MICROS_PER_DAY;
-  microseconds += toLong(m.group(5)) * MICROS_PER_HOUR;
-  microseconds += toLong(m.group(6)) * MICROS_PER_MINUTE;
-  microseconds += toLong(m.group(7)) * MICROS_PER_SECOND;
-  microseconds += toLong(m.group(8)) * MICROS_PER_MILLI;
-  microseconds += toLong(m.group(9));
-  return new CalendarInterval((int) months, microseconds);
 }
   }
 
   /**
-   * Convert a string to CalendarInterval. Unlike fromString, this method can 
handle
+   * Convert a string to CalendarInterval. This method can handle
* strings without the `interval` prefix and throws IllegalArgumentException
* when the input string is not a valid interval.
*
* @throws IllegalArgumentException if the string is not a valid internal.
*/
   public static CalendarInterval fromCaseInsensitiveString(String s) {
 
 Review comment:
   SGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26079: [SPARK-29369][SQL] Support string intervals without the `interval` prefix

2019-10-10 Thread GitBox

cloud-fan commented on a change in pull request #26079: [SPARK-29369][SQL] 
Support string intervals without the `interval` prefix
URL: https://github.com/apache/spark/pull/26079#discussion_r333504359
 
 

 ##
 File path: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java
 ##
 @@ -73,45 +72,50 @@ private static long toLong(String s) {
* This method is case-insensitive.
*/
   public static CalendarInterval fromString(String s) {
-if (s == null) {
-  return null;
-}
-s = s.trim();
-Matcher m = p.matcher(s);
-if (!m.matches() || s.compareToIgnoreCase("interval") == 0) {
+try {
+  return fromCaseInsensitiveString(s);
+} catch (IllegalArgumentException e) {
   return null;
-} else {
-  long months = toLong(m.group(1)) * 12 + toLong(m.group(2));
-  long microseconds = toLong(m.group(3)) * MICROS_PER_WEEK;
-  microseconds += toLong(m.group(4)) * MICROS_PER_DAY;
-  microseconds += toLong(m.group(5)) * MICROS_PER_HOUR;
-  microseconds += toLong(m.group(6)) * MICROS_PER_MINUTE;
-  microseconds += toLong(m.group(7)) * MICROS_PER_SECOND;
-  microseconds += toLong(m.group(8)) * MICROS_PER_MILLI;
-  microseconds += toLong(m.group(9));
-  return new CalendarInterval((int) months, microseconds);
 }
   }
 
   /**
-   * Convert a string to CalendarInterval. Unlike fromString, this method can 
handle
+   * Convert a string to CalendarInterval. This method can handle
* strings without the `interval` prefix and throws IllegalArgumentException
* when the input string is not a valid interval.
*
* @throws IllegalArgumentException if the string is not a valid internal.
*/
   public static CalendarInterval fromCaseInsensitiveString(String s) {
-if (s == null || s.trim().isEmpty()) {
-  throw new IllegalArgumentException("Interval cannot be null or blank.");
+if (s == null) {
+  throw new IllegalArgumentException("Interval cannot be null");
 }
-String sInLowerCase = s.trim().toLowerCase(Locale.ROOT);
-String interval =
-  sInLowerCase.startsWith("interval ") ? sInLowerCase : "interval " + 
sInLowerCase;
-CalendarInterval cal = fromString(interval);
-if (cal == null) {
+String trimmed = s.trim();
+if (trimmed.isEmpty()) {
+  throw new IllegalArgumentException("Interval cannot be blank");
+}
+String prefix = "interval";
+String intervalStr = trimmed;
+if (!intervalStr.regionMatches(true, 0, prefix, 0, prefix.length())) {
 
 Review comment:
   what does this condition mean?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #26079: [SPARK-29369][SQL] Support string intervals without the `interval` prefix

2019-10-10 Thread GitBox

MaxGekk commented on a change in pull request #26079: [SPARK-29369][SQL] 
Support string intervals without the `interval` prefix
URL: https://github.com/apache/spark/pull/26079#discussion_r333513598
 
 

 ##
 File path: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java
 ##
 @@ -73,45 +72,50 @@ private static long toLong(String s) {
* This method is case-insensitive.
*/
   public static CalendarInterval fromString(String s) {
-if (s == null) {
-  return null;
-}
-s = s.trim();
-Matcher m = p.matcher(s);
-if (!m.matches() || s.compareToIgnoreCase("interval") == 0) {
+try {
+  return fromCaseInsensitiveString(s);
+} catch (IllegalArgumentException e) {
   return null;
-} else {
-  long months = toLong(m.group(1)) * 12 + toLong(m.group(2));
-  long microseconds = toLong(m.group(3)) * MICROS_PER_WEEK;
-  microseconds += toLong(m.group(4)) * MICROS_PER_DAY;
-  microseconds += toLong(m.group(5)) * MICROS_PER_HOUR;
-  microseconds += toLong(m.group(6)) * MICROS_PER_MINUTE;
-  microseconds += toLong(m.group(7)) * MICROS_PER_SECOND;
-  microseconds += toLong(m.group(8)) * MICROS_PER_MILLI;
-  microseconds += toLong(m.group(9));
-  return new CalendarInterval((int) months, microseconds);
 }
   }
 
   /**
-   * Convert a string to CalendarInterval. Unlike fromString, this method can 
handle
+   * Convert a string to CalendarInterval. This method can handle
* strings without the `interval` prefix and throws IllegalArgumentException
* when the input string is not a valid interval.
*
* @throws IllegalArgumentException if the string is not a valid internal.
*/
   public static CalendarInterval fromCaseInsensitiveString(String s) {
-if (s == null || s.trim().isEmpty()) {
-  throw new IllegalArgumentException("Interval cannot be null or blank.");
+if (s == null) {
+  throw new IllegalArgumentException("Interval cannot be null");
 }
-String sInLowerCase = s.trim().toLowerCase(Locale.ROOT);
-String interval =
-  sInLowerCase.startsWith("interval ") ? sInLowerCase : "interval " + 
sInLowerCase;
-CalendarInterval cal = fromString(interval);
-if (cal == null) {
+String trimmed = s.trim();
+if (trimmed.isEmpty()) {
+  throw new IllegalArgumentException("Interval cannot be blank");
+}
+String prefix = "interval";
+String intervalStr = trimmed;
+if (!intervalStr.regionMatches(true, 0, prefix, 0, prefix.length())) {
 
 Review comment:
   I added comments about this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #26079: [SPARK-29369][SQL] Support string intervals without the `interval` prefix

2019-10-10 Thread GitBox

MaxGekk commented on a change in pull request #26079: [SPARK-29369][SQL] 
Support string intervals without the `interval` prefix
URL: https://github.com/apache/spark/pull/26079#discussion_r333514414
 
 

 ##
 File path: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java
 ##
 @@ -73,45 +72,53 @@ private static long toLong(String s) {
* This method is case-insensitive.
*/
   public static CalendarInterval fromString(String s) {
-if (s == null) {
-  return null;
-}
-s = s.trim();
-Matcher m = p.matcher(s);
-if (!m.matches() || s.compareToIgnoreCase("interval") == 0) {
+try {
+  return fromCaseInsensitiveString(s);
+} catch (IllegalArgumentException e) {
   return null;
-} else {
-  long months = toLong(m.group(1)) * 12 + toLong(m.group(2));
-  long microseconds = toLong(m.group(3)) * MICROS_PER_WEEK;
-  microseconds += toLong(m.group(4)) * MICROS_PER_DAY;
-  microseconds += toLong(m.group(5)) * MICROS_PER_HOUR;
-  microseconds += toLong(m.group(6)) * MICROS_PER_MINUTE;
-  microseconds += toLong(m.group(7)) * MICROS_PER_SECOND;
-  microseconds += toLong(m.group(8)) * MICROS_PER_MILLI;
-  microseconds += toLong(m.group(9));
-  return new CalendarInterval((int) months, microseconds);
 }
   }
 
   /**
-   * Convert a string to CalendarInterval. Unlike fromString, this method can 
handle
+   * Convert a string to CalendarInterval. This method can handle
* strings without the `interval` prefix and throws IllegalArgumentException
* when the input string is not a valid interval.
*
* @throws IllegalArgumentException if the string is not a valid internal.
*/
   public static CalendarInterval fromCaseInsensitiveString(String s) {
-if (s == null || s.trim().isEmpty()) {
-  throw new IllegalArgumentException("Interval cannot be null or blank.");
+if (s == null) {
+  throw new IllegalArgumentException("Interval cannot be null");
 }
-String sInLowerCase = s.trim().toLowerCase(Locale.ROOT);
-String interval =
-  sInLowerCase.startsWith("interval ") ? sInLowerCase : "interval " + 
sInLowerCase;
-CalendarInterval cal = fromString(interval);
-if (cal == null) {
+String trimmed = s.trim();
+if (trimmed.isEmpty()) {
+  throw new IllegalArgumentException("Interval cannot be blank");
+}
+String prefix = "interval";
+String intervalStr = trimmed;
+// Checks the given interval string does not start with the `interval` 
prefix
+if (!intervalStr.regionMatches(true, 0, prefix, 0, prefix.length())) {
+  // Prepend `interval` if it does not present because
+  // the regular expression strictly require it.
 
 Review comment:
   I have not figure out how to modify the regular expression to make the 
`interval` prefix optional.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #26079: [SPARK-29369][SQL] Support string intervals without the `interval` prefix

2019-10-10 Thread GitBox

MaxGekk commented on a change in pull request #26079: [SPARK-29369][SQL] 
Support string intervals without the `interval` prefix
URL: https://github.com/apache/spark/pull/26079#discussion_r333514414
 
 

 ##
 File path: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java
 ##
 @@ -73,45 +72,53 @@ private static long toLong(String s) {
* This method is case-insensitive.
*/
   public static CalendarInterval fromString(String s) {
-if (s == null) {
-  return null;
-}
-s = s.trim();
-Matcher m = p.matcher(s);
-if (!m.matches() || s.compareToIgnoreCase("interval") == 0) {
+try {
+  return fromCaseInsensitiveString(s);
+} catch (IllegalArgumentException e) {
   return null;
-} else {
-  long months = toLong(m.group(1)) * 12 + toLong(m.group(2));
-  long microseconds = toLong(m.group(3)) * MICROS_PER_WEEK;
-  microseconds += toLong(m.group(4)) * MICROS_PER_DAY;
-  microseconds += toLong(m.group(5)) * MICROS_PER_HOUR;
-  microseconds += toLong(m.group(6)) * MICROS_PER_MINUTE;
-  microseconds += toLong(m.group(7)) * MICROS_PER_SECOND;
-  microseconds += toLong(m.group(8)) * MICROS_PER_MILLI;
-  microseconds += toLong(m.group(9));
-  return new CalendarInterval((int) months, microseconds);
 }
   }
 
   /**
-   * Convert a string to CalendarInterval. Unlike fromString, this method can 
handle
+   * Convert a string to CalendarInterval. This method can handle
* strings without the `interval` prefix and throws IllegalArgumentException
* when the input string is not a valid interval.
*
* @throws IllegalArgumentException if the string is not a valid internal.
*/
   public static CalendarInterval fromCaseInsensitiveString(String s) {
-if (s == null || s.trim().isEmpty()) {
-  throw new IllegalArgumentException("Interval cannot be null or blank.");
+if (s == null) {
+  throw new IllegalArgumentException("Interval cannot be null");
 }
-String sInLowerCase = s.trim().toLowerCase(Locale.ROOT);
-String interval =
-  sInLowerCase.startsWith("interval ") ? sInLowerCase : "interval " + 
sInLowerCase;
-CalendarInterval cal = fromString(interval);
-if (cal == null) {
+String trimmed = s.trim();
+if (trimmed.isEmpty()) {
+  throw new IllegalArgumentException("Interval cannot be blank");
+}
+String prefix = "interval";
+String intervalStr = trimmed;
+// Checks the given interval string does not start with the `interval` 
prefix
+if (!intervalStr.regionMatches(true, 0, prefix, 0, prefix.length())) {
+  // Prepend `interval` if it does not present because
+  // the regular expression strictly require it.
 
 Review comment:
   I have not figured out how to modify the regular expression to make the 
`interval` prefix optional.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #26079: [SPARK-29369][SQL] Support string intervals without the `interval` prefix

2019-10-10 Thread GitBox

MaxGekk commented on a change in pull request #26079: [SPARK-29369][SQL] 
Support string intervals without the `interval` prefix
URL: https://github.com/apache/spark/pull/26079#discussion_r333516377
 
 

 ##
 File path: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java
 ##
 @@ -73,45 +72,53 @@ private static long toLong(String s) {
* This method is case-insensitive.
*/
   public static CalendarInterval fromString(String s) {
-if (s == null) {
-  return null;
-}
-s = s.trim();
-Matcher m = p.matcher(s);
-if (!m.matches() || s.compareToIgnoreCase("interval") == 0) {
+try {
+  return fromCaseInsensitiveString(s);
+} catch (IllegalArgumentException e) {
   return null;
-} else {
-  long months = toLong(m.group(1)) * 12 + toLong(m.group(2));
-  long microseconds = toLong(m.group(3)) * MICROS_PER_WEEK;
-  microseconds += toLong(m.group(4)) * MICROS_PER_DAY;
-  microseconds += toLong(m.group(5)) * MICROS_PER_HOUR;
-  microseconds += toLong(m.group(6)) * MICROS_PER_MINUTE;
-  microseconds += toLong(m.group(7)) * MICROS_PER_SECOND;
-  microseconds += toLong(m.group(8)) * MICROS_PER_MILLI;
-  microseconds += toLong(m.group(9));
-  return new CalendarInterval((int) months, microseconds);
 }
   }
 
   /**
-   * Convert a string to CalendarInterval. Unlike fromString, this method can 
handle
+   * Convert a string to CalendarInterval. This method can handle
* strings without the `interval` prefix and throws IllegalArgumentException
* when the input string is not a valid interval.
*
* @throws IllegalArgumentException if the string is not a valid internal.
*/
   public static CalendarInterval fromCaseInsensitiveString(String s) {
-if (s == null || s.trim().isEmpty()) {
-  throw new IllegalArgumentException("Interval cannot be null or blank.");
+if (s == null) {
+  throw new IllegalArgumentException("Interval cannot be null");
 }
-String sInLowerCase = s.trim().toLowerCase(Locale.ROOT);
-String interval =
-  sInLowerCase.startsWith("interval ") ? sInLowerCase : "interval " + 
sInLowerCase;
-CalendarInterval cal = fromString(interval);
-if (cal == null) {
+String trimmed = s.trim();
+if (trimmed.isEmpty()) {
+  throw new IllegalArgumentException("Interval cannot be blank");
+}
+String prefix = "interval";
+String intervalStr = trimmed;
+// Checks the given interval string does not start with the `interval` 
prefix
+if (!intervalStr.regionMatches(true, 0, prefix, 0, prefix.length())) {
+  // Prepend `interval` if it does not present because
+  // the regular expression strictly require it.
 
 Review comment:
   Probably, this needs this feature 
https://www.regular-expressions.info/branchreset.html which Java's regexps 
doesn't have.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum commented on a change in pull request #26014: [SPARK-29349][SQL] Support FETCH_PRIOR in Thriftserver fetch request

2019-10-10 Thread GitBox

wangyum commented on a change in pull request #26014: [SPARK-29349][SQL] 
Support FETCH_PRIOR in Thriftserver fetch request
URL: https://github.com/apache/spark/pull/26014#discussion_r333516458
 
 

 ##
 File path: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
 ##
 @@ -684,6 +685,92 @@ class HiveThriftBinaryServerSuite extends 
HiveThriftJdbcTest {
   
assert(e.getMessage.contains("org.apache.spark.sql.catalyst.parser.ParseException"))
 }
   }
+
+  test("ThriftCLIService FetchResults FETCH_FIRST, FETCH_NEXT, FETCH_PRIOR") {
+def checkResult(rows: RowSet, start: Long, end: Long): Unit = {
+  assert(rows.getStartOffset() == start)
+  assert(rows.numRows() == end - start)
+  rows.iterator.asScala.zip((start until end).iterator).foreach { case 
(row, v) =>
+assert(row(0).asInstanceOf[Long] === v)
+  }
+}
+
+withCLIServiceClient { client =>
+  val user = System.getProperty("user.name")
+  val sessionHandle = client.openSession(user, "")
+
+  val confOverlay = new java.util.HashMap[java.lang.String, 
java.lang.String]
+  val operationHandle = client.executeStatement(
+sessionHandle,
+"SELECT * FROM range(10)",
+confOverlay) // 10 rows result with sequence 0, 1, 2, ..., 9
+  var rows: RowSet = null
+
+  // Fetch 5 rows with FETCH_NEXT
+  rows = client.fetchResults(
+operationHandle, FetchOrientation.FETCH_NEXT, 5, 
FetchType.QUERY_OUTPUT)
+  checkResult(rows, 0, 5) // fetched [0, 5)
+
+  // Fetch another 2 rows with FETCH_NEXT
+  rows = client.fetchResults(
+operationHandle, FetchOrientation.FETCH_NEXT, 2, 
FetchType.QUERY_OUTPUT)
+  checkResult(rows, 5, 7) // fetched [5, 7)
+
+  // FETCH_PRIOR 3 rows
 
 Review comment:
   Thank you @juliuszsompolski?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum commented on a change in pull request #26014: [SPARK-29349][SQL] Support FETCH_PRIOR in Thriftserver fetch request

2019-10-10 Thread GitBox

wangyum commented on a change in pull request #26014: [SPARK-29349][SQL] 
Support FETCH_PRIOR in Thriftserver fetch request
URL: https://github.com/apache/spark/pull/26014#discussion_r333516458
 
 

 ##
 File path: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
 ##
 @@ -684,6 +685,92 @@ class HiveThriftBinaryServerSuite extends 
HiveThriftJdbcTest {
   
assert(e.getMessage.contains("org.apache.spark.sql.catalyst.parser.ParseException"))
 }
   }
+
+  test("ThriftCLIService FetchResults FETCH_FIRST, FETCH_NEXT, FETCH_PRIOR") {
+def checkResult(rows: RowSet, start: Long, end: Long): Unit = {
+  assert(rows.getStartOffset() == start)
+  assert(rows.numRows() == end - start)
+  rows.iterator.asScala.zip((start until end).iterator).foreach { case 
(row, v) =>
+assert(row(0).asInstanceOf[Long] === v)
+  }
+}
+
+withCLIServiceClient { client =>
+  val user = System.getProperty("user.name")
+  val sessionHandle = client.openSession(user, "")
+
+  val confOverlay = new java.util.HashMap[java.lang.String, 
java.lang.String]
+  val operationHandle = client.executeStatement(
+sessionHandle,
+"SELECT * FROM range(10)",
+confOverlay) // 10 rows result with sequence 0, 1, 2, ..., 9
+  var rows: RowSet = null
+
+  // Fetch 5 rows with FETCH_NEXT
+  rows = client.fetchResults(
+operationHandle, FetchOrientation.FETCH_NEXT, 5, 
FetchType.QUERY_OUTPUT)
+  checkResult(rows, 0, 5) // fetched [0, 5)
+
+  // Fetch another 2 rows with FETCH_NEXT
+  rows = client.fetchResults(
+operationHandle, FetchOrientation.FETCH_NEXT, 2, 
FetchType.QUERY_OUTPUT)
+  checkResult(rows, 5, 7) // fetched [5, 7)
+
+  // FETCH_PRIOR 3 rows
 
 Review comment:
   Thank you @juliuszsompolski


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on issue #26078: [SPARK-29151][CORE] Support fractional resources for task resource scheduling

2019-10-10 Thread GitBox

tgravescs commented on issue #26078: [SPARK-29151][CORE] Support fractional 
resources for task resource scheduling
URL: https://github.com/apache/spark/pull/26078#issuecomment-540591922
 
 
   ok to test


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] jkremser commented on issue #26075: [WIP][K8S] Spark operator

2019-10-10 Thread GitBox

jkremser commented on issue #26075: [WIP][K8S] Spark operator
URL: https://github.com/apache/spark/pull/26075#issuecomment-540605666
 
 
   Thanks for looking into this. Ok, I will open the discussion on the dev 
list. I wanted to have the PR open before the summit and put the link on the 
slides.
   
   And yes, it's basically dump of that repo (radanalyticsio/spark-operator) 
with couple of modifications. I for instance aligned the versions of the 
libraries to be the same as those that are used by Spark.
   
   > Is this the following? Please give a clear explanation if this is your own 
contribution.
   https://github.com/GoogleCloudPlatform/spark-on-k8s-operator
   
   No, it's different operator written in Java and it supports three different 
custom resources, `SparkCluster` (spark running in standalone mode in the 
pods), `SparkApplication` (this has the same API as the operator from GCP, so 
it's using spark-on-k8s scheduling) and `SparkHistoryServer`. So there is some 
overlap, but it's a different thing. This operator is also available on 
[operatorhub.io](operatorhub.io) 
([link](https://operatorhub.io/operator/radanalytics-spark))


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] jkremser edited a comment on issue #26075: [WIP][K8S] Spark operator

2019-10-10 Thread GitBox

jkremser edited a comment on issue #26075: [WIP][K8S] Spark operator
URL: https://github.com/apache/spark/pull/26075#issuecomment-540605666
 
 
   Thanks for looking into this. Ok, I will open the discussion on the dev 
list. I wanted to have the PR open before the summit and put the link on the 
slides.
   
   And yes, it's basically dump of that repo (radanalyticsio/spark-operator) 
with couple of modifications. I for instance aligned the versions of the 
libraries to be the same as those that are used by Spark.
   
   > Is this the following? Please give a clear explanation if this is your own 
contribution.
   https://github.com/GoogleCloudPlatform/spark-on-k8s-operator
   
   No, it's different operator written in Java (GCP's one is in Go) and it 
supports three different custom resources, `SparkCluster` (spark running in 
standalone mode in the pods), `SparkApplication` (this has the same API as the 
operator from GCP, so it's using spark-on-k8s scheduling) and 
`SparkHistoryServer`. So there is some overlap, but it's a different thing. 
This operator is also available on [operatorhub.io](operatorhub.io) 
([link](https://operatorhub.io/operator/radanalytics-spark))


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26079: [SPARK-29369][SQL] Support string intervals without the `interval` prefix

2019-10-10 Thread GitBox

cloud-fan commented on a change in pull request #26079: [SPARK-29369][SQL] 
Support string intervals without the `interval` prefix
URL: https://github.com/apache/spark/pull/26079#discussion_r333562923
 
 

 ##
 File path: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java
 ##
 @@ -73,45 +72,53 @@ private static long toLong(String s) {
* This method is case-insensitive.
*/
   public static CalendarInterval fromString(String s) {
-if (s == null) {
-  return null;
-}
-s = s.trim();
-Matcher m = p.matcher(s);
-if (!m.matches() || s.compareToIgnoreCase("interval") == 0) {
+try {
+  return fromCaseInsensitiveString(s);
+} catch (IllegalArgumentException e) {
   return null;
-} else {
-  long months = toLong(m.group(1)) * 12 + toLong(m.group(2));
-  long microseconds = toLong(m.group(3)) * MICROS_PER_WEEK;
-  microseconds += toLong(m.group(4)) * MICROS_PER_DAY;
-  microseconds += toLong(m.group(5)) * MICROS_PER_HOUR;
-  microseconds += toLong(m.group(6)) * MICROS_PER_MINUTE;
-  microseconds += toLong(m.group(7)) * MICROS_PER_SECOND;
-  microseconds += toLong(m.group(8)) * MICROS_PER_MILLI;
-  microseconds += toLong(m.group(9));
-  return new CalendarInterval((int) months, microseconds);
 }
   }
 
   /**
-   * Convert a string to CalendarInterval. Unlike fromString, this method can 
handle
+   * Convert a string to CalendarInterval. This method can handle
* strings without the `interval` prefix and throws IllegalArgumentException
* when the input string is not a valid interval.
*
* @throws IllegalArgumentException if the string is not a valid internal.
*/
   public static CalendarInterval fromCaseInsensitiveString(String s) {
-if (s == null || s.trim().isEmpty()) {
-  throw new IllegalArgumentException("Interval cannot be null or blank.");
+if (s == null) {
+  throw new IllegalArgumentException("Interval cannot be null");
 }
-String sInLowerCase = s.trim().toLowerCase(Locale.ROOT);
-String interval =
-  sInLowerCase.startsWith("interval ") ? sInLowerCase : "interval " + 
sInLowerCase;
-CalendarInterval cal = fromString(interval);
-if (cal == null) {
+String trimmed = s.trim();
+if (trimmed.isEmpty()) {
+  throw new IllegalArgumentException("Interval cannot be blank");
+}
+String prefix = "interval";
+String intervalStr = trimmed;
+// Checks the given interval string does not start with the `interval` 
prefix
+if (!intervalStr.regionMatches(true, 0, prefix, 0, prefix.length())) {
+  // Prepend `interval` if it does not present because
+  // the regular expression strictly require it.
 
 Review comment:
   How about something like
   ```
   String intervalStr = trimmed.toLowerCase();
   if (intervalStr.startsWith("interval")) {
 intervalStr = intervalStr.drop(8)
   }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26079: [SPARK-29369][SQL] Support string intervals without the `interval` prefix

2019-10-10 Thread GitBox

cloud-fan commented on a change in pull request #26079: [SPARK-29369][SQL] 
Support string intervals without the `interval` prefix
URL: https://github.com/apache/spark/pull/26079#discussion_r333562923
 
 

 ##
 File path: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java
 ##
 @@ -73,45 +72,53 @@ private static long toLong(String s) {
* This method is case-insensitive.
*/
   public static CalendarInterval fromString(String s) {
-if (s == null) {
-  return null;
-}
-s = s.trim();
-Matcher m = p.matcher(s);
-if (!m.matches() || s.compareToIgnoreCase("interval") == 0) {
+try {
+  return fromCaseInsensitiveString(s);
+} catch (IllegalArgumentException e) {
   return null;
-} else {
-  long months = toLong(m.group(1)) * 12 + toLong(m.group(2));
-  long microseconds = toLong(m.group(3)) * MICROS_PER_WEEK;
-  microseconds += toLong(m.group(4)) * MICROS_PER_DAY;
-  microseconds += toLong(m.group(5)) * MICROS_PER_HOUR;
-  microseconds += toLong(m.group(6)) * MICROS_PER_MINUTE;
-  microseconds += toLong(m.group(7)) * MICROS_PER_SECOND;
-  microseconds += toLong(m.group(8)) * MICROS_PER_MILLI;
-  microseconds += toLong(m.group(9));
-  return new CalendarInterval((int) months, microseconds);
 }
   }
 
   /**
-   * Convert a string to CalendarInterval. Unlike fromString, this method can 
handle
+   * Convert a string to CalendarInterval. This method can handle
* strings without the `interval` prefix and throws IllegalArgumentException
* when the input string is not a valid interval.
*
* @throws IllegalArgumentException if the string is not a valid internal.
*/
   public static CalendarInterval fromCaseInsensitiveString(String s) {
-if (s == null || s.trim().isEmpty()) {
-  throw new IllegalArgumentException("Interval cannot be null or blank.");
+if (s == null) {
+  throw new IllegalArgumentException("Interval cannot be null");
 }
-String sInLowerCase = s.trim().toLowerCase(Locale.ROOT);
-String interval =
-  sInLowerCase.startsWith("interval ") ? sInLowerCase : "interval " + 
sInLowerCase;
-CalendarInterval cal = fromString(interval);
-if (cal == null) {
+String trimmed = s.trim();
+if (trimmed.isEmpty()) {
+  throw new IllegalArgumentException("Interval cannot be blank");
+}
+String prefix = "interval";
+String intervalStr = trimmed;
+// Checks the given interval string does not start with the `interval` 
prefix
+if (!intervalStr.regionMatches(true, 0, prefix, 0, prefix.length())) {
+  // Prepend `interval` if it does not present because
+  // the regular expression strictly require it.
 
 Review comment:
   How about something like
   ```
   String intervalStr = trimmed.toLowerCase();
   if (intervalStr.startsWith("interval")) {
 intervalStr = intervalStr.drop(8)
   }
   // parse the interval string assuming there is no leading "interval"
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #26079: [SPARK-29369][SQL] Support string intervals without the `interval` prefix

2019-10-10 Thread GitBox

MaxGekk commented on a change in pull request #26079: [SPARK-29369][SQL] 
Support string intervals without the `interval` prefix
URL: https://github.com/apache/spark/pull/26079#discussion_r333579599
 
 

 ##
 File path: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java
 ##
 @@ -73,45 +72,53 @@ private static long toLong(String s) {
* This method is case-insensitive.
*/
   public static CalendarInterval fromString(String s) {
-if (s == null) {
-  return null;
-}
-s = s.trim();
-Matcher m = p.matcher(s);
-if (!m.matches() || s.compareToIgnoreCase("interval") == 0) {
+try {
+  return fromCaseInsensitiveString(s);
+} catch (IllegalArgumentException e) {
   return null;
-} else {
-  long months = toLong(m.group(1)) * 12 + toLong(m.group(2));
-  long microseconds = toLong(m.group(3)) * MICROS_PER_WEEK;
-  microseconds += toLong(m.group(4)) * MICROS_PER_DAY;
-  microseconds += toLong(m.group(5)) * MICROS_PER_HOUR;
-  microseconds += toLong(m.group(6)) * MICROS_PER_MINUTE;
-  microseconds += toLong(m.group(7)) * MICROS_PER_SECOND;
-  microseconds += toLong(m.group(8)) * MICROS_PER_MILLI;
-  microseconds += toLong(m.group(9));
-  return new CalendarInterval((int) months, microseconds);
 }
   }
 
   /**
-   * Convert a string to CalendarInterval. Unlike fromString, this method can 
handle
+   * Convert a string to CalendarInterval. This method can handle
* strings without the `interval` prefix and throws IllegalArgumentException
* when the input string is not a valid interval.
*
* @throws IllegalArgumentException if the string is not a valid internal.
*/
   public static CalendarInterval fromCaseInsensitiveString(String s) {
-if (s == null || s.trim().isEmpty()) {
-  throw new IllegalArgumentException("Interval cannot be null or blank.");
+if (s == null) {
+  throw new IllegalArgumentException("Interval cannot be null");
 }
-String sInLowerCase = s.trim().toLowerCase(Locale.ROOT);
-String interval =
-  sInLowerCase.startsWith("interval ") ? sInLowerCase : "interval " + 
sInLowerCase;
-CalendarInterval cal = fromString(interval);
-if (cal == null) {
+String trimmed = s.trim();
+if (trimmed.isEmpty()) {
+  throw new IllegalArgumentException("Interval cannot be blank");
+}
+String prefix = "interval";
+String intervalStr = trimmed;
+// Checks the given interval string does not start with the `interval` 
prefix
+if (!intervalStr.regionMatches(true, 0, prefix, 0, prefix.length())) {
+  // Prepend `interval` if it does not present because
+  // the regular expression strictly require it.
 
 Review comment:
   > String intervalStr = trimmed.toLowerCase();
   
   Your code is more expensive because you lower case whole input string.
   
   > // parse the interval string assuming there is no leading "interval"
   
   Here there is a problem with current regexp when you delete the anchor 
`"interval"`. Without this anchor, it cannot match to valid inputs:
   ```scala
   scala> import java.util.regex._
   import java.util.regex._
   
   scala> def unitRegex(unit: String) = "(?:\\s+(-?\\d+)\\s+" + unit + "s?)?"
   unitRegex: (unit: String)String
   
   scala> val p = Pattern.compile(unitRegex("year") + unitRegex("month") +
| unitRegex("week") + unitRegex("day") + unitRegex("hour") + 
unitRegex("minute") +
| unitRegex("second") + unitRegex("millisecond") + 
unitRegex("microsecond"),
| Pattern.CASE_INSENSITIVE)
   p: java.util.regex.Pattern = 
(?:\s+(-?\d+)\s+years?)?(?:\s+(-?\d+)\s+months?)?(?:\s+(-?\d+)\s+weeks?)?(?:\s+(-?\d+)\s+days?)?(?:\s+(-?\d+)\s+hours?)?(?:\s+(-?\d+)\s+minutes?)?(?:\s+(-?\d+)\s+seconds?)?(?:\s+(-?\d+)\s+milliseconds?)?(?:\s+(-?\d+)\s+microseconds?)?
   
   scala> val m = p.matcher("1 month 1 second")
   m: java.util.regex.Matcher = 
java.util.regex.Matcher[pattern=(?:\s+(-?\d+)\s+years?)?(?:\s+(-?\d+)\s+months?)?(?:\s+(-?\d+)\s+weeks?)?(?:\s+(-?\d+)\s+days?)?(?:\s+(-?\d+)\s+hours?)?(?:\s+(-?\d+)\s+minutes?)?(?:\s+(-?\d+)\s+seconds?)?(?:\s+(-?\d+)\s+milliseconds?)?(?:\s+(-?\d+)\s+microseconds?)?
 region=0,16 lastmatch=]
   
   scala> m.matches()
   res7: Boolean = false
   ```
   If we added it back:
   ```scala
   scala> val p = Pattern.compile("interval" + unitRegex("year") + 
unitRegex("month") +
| unitRegex("week") + unitRegex("day") + unitRegex("hour") + 
unitRegex("minute") +
| unitRegex("second") + unitRegex("millisecond") + 
unitRegex("microsecond"),
| Pattern.CASE_INSENSITIVE)
   p: java.util.regex.Pattern = 
interval(?:\s+(-?\d+)\s+years?)?(?:\s+(-?\d+)\s+months?)?(?:\s+(-?\d+)\s+weeks?)?(?:\s+(-?\d+)\s+days?)?(?:\s+(-?\d+)\s+hours?)?(?:\s+(-?\d+)\s+minutes?)?(?:\s+(-?\d+)\s+seconds?)?(?:\s+(-?\d+)\s+milliseconds?)?(?:\s+(-?\d+)\s+microseconds?)?
   
   scala> val m = p.matcher("interval 1 m

[GitHub] [spark] viirya commented on issue #25979: [SPARK-29295][SQL] Insert overwrite to Hive external table partition should delete old data

2019-10-10 Thread GitBox

viirya commented on issue #25979: [SPARK-29295][SQL] Insert overwrite to Hive 
external table partition should delete old data
URL: https://github.com/apache/spark/pull/25979#issuecomment-540638149
 
 
   @cloud-fan @felixcheung Do you have more comments on this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] Support fractional resources for task resource scheduling

2019-10-10 Thread GitBox

tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] 
Support fractional resources for task resource scheduling
URL: https://github.com/apache/spark/pull/26078#discussion_r333574367
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/scheduler/ExecutorResourceInfo.scala
 ##
 @@ -25,10 +25,13 @@ import org.apache.spark.resource.{ResourceAllocator, 
ResourceInformation}
  * information.
  * @param name Resource name
  * @param addresses Resource addresses provided by the executor
+ * @param numParts Number of ways each resource is subdivided when scheduling 
tasks
  */
-private[spark] class ExecutorResourceInfo(name: String, addresses: Seq[String])
+private[spark] class ExecutorResourceInfo(name: String, addresses: Seq[String],
+  numParts: Int)
   extends ResourceInformation(name, addresses.toArray) with ResourceAllocator {
 
   override protected def resourceName = this.name
   override protected def resourceAddresses = this.addresses
+  override protected def resourcesPerAddress = numParts
 
 Review comment:
   I think we should rename. resourcesPerAddress could be confusing since an 
address is a single resource.  
   I'm thinking we just call this tasksPerAddress and perhaps update this to be 
consistent everywhere - ie rename numParts as well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] Support fractional resources for task resource scheduling

2019-10-10 Thread GitBox

tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] 
Support fractional resources for task resource scheduling
URL: https://github.com/apache/spark/pull/26078#discussion_r333564756
 
 

 ##
 File path: docs/configuration.md
 ##
 @@ -1982,9 +1982,13 @@ Apart from these, the following properties are also 
available, and may be useful
   spark.task.resource.{resourceName}.amount
   1
   
-Amount of a particular resource type to allocate for each task. If this is 
specified
-you must also provide the executor config 
spark.executor.resource.{resourceName}.amount
-and any corresponding discovery configs so that your executors are created 
with that resource type.
+Amount of a particular resource type to allocate for each task, note that 
this can be a double.
+If this is specified you must also provide the executor config 
+spark.executor.resource.{resourceName}.amount and any 
corresponding discovery configs 
+so that your executors are created with that resource type. In addition to 
whole amounts, 
+a fractional amount (for example, 0.25, or 1/4th of a resource) may be 
specified. 
+Fractional amounts must be less than or equal to 0.5, or in other words, 
the minimum amount of
 
 Review comment:
   we should also add that any fractional value specified gets rounded to have 
an even number of tasks per resource.  ie 0. end up with 4 tasks per 
resource 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] Support fractional resources for task resource scheduling

2019-10-10 Thread GitBox

tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] 
Support fractional resources for task resource scheduling
URL: https://github.com/apache/spark/pull/26078#discussion_r333582986
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/resource/ResourceAllocator.scala
 ##
 @@ -30,27 +30,35 @@ trait ResourceAllocator {
 
   protected def resourceName: String
   protected def resourceAddresses: Seq[String]
+  protected def resourcesPerAddress: Int
 
   /**
-   * Map from an address to its availability, the value `true` means the 
address is available,
-   * while value `false` means the address is assigned.
+   * Map from an address to its availability, a value > 0 means the address is 
available,
+   * while value of 0 means the address is fully assigned.
+   *
+   * For task resources ([[org.apache.spark.scheduler.ExecutorResourceInfo]]), 
this value
+   * can be fractional.
 
 Review comment:
   I think we should clarify, here its actually not fractional its a multiple. 
I think if its called tasksPerAddress that will help 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] Support fractional resources for task resource scheduling

2019-10-10 Thread GitBox

tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] 
Support fractional resources for task resource scheduling
URL: https://github.com/apache/spark/pull/26078#discussion_r333581119
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/SparkContext.scala
 ##
 @@ -2775,7 +2775,10 @@ object SparkContext extends Logging {
 s" = ${taskReq.amount}")
 }
 // Compare and update the max slots each executor can provide.
-val resourceNumSlots = execAmount / taskReq.amount
+// If the configured amount per task was < 1.0, a task is subdividing
+// executor resources. If the amount per task was > 1.0, the task wants
+// multiple executor resources.
+val resourceNumSlots = Math.floor(execAmount * 
taskReq.numParts/taskReq.amount).toInt
 
 Review comment:
   add spaces around the "/"


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] Support fractional resources for task resource scheduling

2019-10-10 Thread GitBox

tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] 
Support fractional resources for task resource scheduling
URL: https://github.com/apache/spark/pull/26078#discussion_r333539938
 
 

 ##
 File path: docs/configuration.md
 ##
 @@ -1982,9 +1982,13 @@ Apart from these, the following properties are also 
available, and may be useful
   spark.task.resource.{resourceName}.amount
   1
   
-Amount of a particular resource type to allocate for each task. If this is 
specified
-you must also provide the executor config 
spark.executor.resource.{resourceName}.amount
-and any corresponding discovery configs so that your executors are created 
with that resource type.
+Amount of a particular resource type to allocate for each task, note that 
this can be a double.
+If this is specified you must also provide the executor config 
+spark.executor.resource.{resourceName}.amount and any 
corresponding discovery configs 
+so that your executors are created with that resource type. In addition to 
whole amounts, 
+a fractional amount (for example, 0.25, or 1/4th of a resource) may be 
specified. 
 
 Review comment:
   change "or 1/4th" to something like "which means 1/4th".  They way I read 
the or makes me question is I can pass in "1/4" to the config, just want it to 
be clear.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] Support fractional resources for task resource scheduling

2019-10-10 Thread GitBox

tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] 
Support fractional resources for task resource scheduling
URL: https://github.com/apache/spark/pull/26078#discussion_r333571682
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/scheduler/ExecutorResourceInfo.scala
 ##
 @@ -25,10 +25,13 @@ import org.apache.spark.resource.{ResourceAllocator, 
ResourceInformation}
  * information.
  * @param name Resource name
  * @param addresses Resource addresses provided by the executor
+ * @param numParts Number of ways each resource is subdivided when scheduling 
tasks
  */
-private[spark] class ExecutorResourceInfo(name: String, addresses: Seq[String])
+private[spark] class ExecutorResourceInfo(name: String, addresses: Seq[String],
+  numParts: Int)
 
 Review comment:
   fix formatting, should be 4 spaces idented from left


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] Support fractional resources for task resource scheduling

2019-10-10 Thread GitBox

tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] 
Support fractional resources for task resource scheduling
URL: https://github.com/apache/spark/pull/26078#discussion_r333586973
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
 ##
 @@ -215,7 +217,14 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
   totalCoreCount.addAndGet(cores)
   totalRegisteredExecutors.addAndGet(1)
   val resourcesInfo = resources.map{ case (k, v) =>
-(v.name, new ExecutorResourceInfo(v.name, v.addresses))}
+(v.name,
+ new ExecutorResourceInfo(v.name, v.addresses,
+   taskResources
 
 Review comment:
   instead of doing this filter each time how about we convert taskResources 
into a Map (and rename it) one time at the top when its declared and then just 
lookup in the map here. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] Support fractional resources for task resource scheduling

2019-10-10 Thread GitBox

tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] 
Support fractional resources for task resource scheduling
URL: https://github.com/apache/spark/pull/26078#discussion_r333538698
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala
 ##
 @@ -47,7 +48,9 @@ private[spark] case class ResourceRequest(
 discoveryScript: Option[String],
 vendor: Option[String])
 
-private[spark] case class ResourceRequirement(resourceName: String, amount: 
Int)
+private[spark] case class ResourceRequirement(resourceName: String, amount: 
Int,
 
 Review comment:
   lets add description, especially explaining numParts and amount. Mention 
fractional use case where amount is user specified get changed to 1 and 
numParts is the number per resource equivalent.
   
   Also need to fix formatting.  Please use formatting like ResourceRequest 
above


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] nonsleepr commented on a change in pull request #26000: [SPARK-29330][CORE][YARN] Allow users to chose the name of Spark Shuffle service

2019-10-10 Thread GitBox

nonsleepr commented on a change in pull request #26000: 
[SPARK-29330][CORE][YARN] Allow users to chose the name of Spark Shuffle service
URL: https://github.com/apache/spark/pull/26000#discussion_r333594674
 
 

 ##
 File path: docs/running-on-yarn.md
 ##
 @@ -492,6 +492,13 @@ To use a custom metrics.properties for the application 
master and executors, upd
   If it is not set then the YARN application ID is used.
   
 
+
+  spark.yarn.shuffle.service.name
+  spark_shuffle
+  
+Name of the external shuffle service.
 
 Review comment:
   That's what "external" word is for, isn't it?
   Should I reword it as follows?
   > Name of the external shuffle service used by executors.
   *The external service is configured and started by YARN (see [Configuring 
the External Shuffle Service](#configuring-the-external-shuffle-service) for 
details). Apache Spark distribution uses name `spark_shuffle`, but other 
distributions (e.g. HDP) might use other names.*


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on issue #25943: [WIP][SPARK-29261][SQL][CORE] Support recover live entities from KVStore for (SQL)AppStatusListener

2019-10-10 Thread GitBox

Ngone51 commented on issue #25943: [WIP][SPARK-29261][SQL][CORE] Support 
recover live entities from KVStore for (SQL)AppStatusListener
URL: https://github.com/apache/spark/pull/25943#issuecomment-540647353
 
 
   @squito 
   
   > Sorry I am getting caught up on a lot of stuff here -- how is this related 
to #25577 ?
   
   This PR just extracts recovering live entities functionality from #25577.
   
   > Just for storing live entities, this approach seems more promising to me 
-- using Java serialization is really bad for compatibility, any changes to 
classes will break deserialization. If we want the snapshot / checkpoint files 
to be able to entirely replace the event log files, we'll need the 
compatibility.
   
   Hmm...This PR doesn't store live entities, instead, it recovers (or say, 
restore) live entities from a KVStore. For snapshot / checkpoint files, I think 
@HeartSaVioR has already posted a good solution in #25811, which has good 
compatibility for  serialization/de-serialization.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #25943: [WIP][SPARK-29261][SQL][CORE] Support recover live entities from KVStore for (SQL)AppStatusListener

2019-10-10 Thread GitBox

Ngone51 commented on a change in pull request #25943: 
[WIP][SPARK-29261][SQL][CORE] Support recover live entities from KVStore for 
(SQL)AppStatusListener
URL: https://github.com/apache/spark/pull/25943#discussion_r333596371
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/storage/StorageLevel.scala
 ##
 @@ -163,6 +163,20 @@ object StorageLevel {
   val MEMORY_AND_DISK_SER_2 = new StorageLevel(true, true, false, false, 2)
   val OFF_HEAP = new StorageLevel(true, true, true, false, 1)
 
+  /**
+   * :: DeveloperApi ::
+   * Return the StorageLevel object with the specified description.
+   */
+  @DeveloperApi
+  def fromDescription(desc: String): StorageLevel = {
 
 Review comment:
   Good suggestion, I'll add a unit test for it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #25943: [WIP][SPARK-29261][SQL][CORE] Support recover live entities from KVStore for (SQL)AppStatusListener

2019-10-10 Thread GitBox

Ngone51 commented on a change in pull request #25943: 
[WIP][SPARK-29261][SQL][CORE] Support recover live entities from KVStore for 
(SQL)AppStatusListener
URL: https://github.com/apache/spark/pull/25943#discussion_r333596849
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/status/api/v1/api.scala
 ##
 @@ -181,6 +182,7 @@ class RDDStorageInfo private[spark](
 val partitions: Option[Seq[RDDPartitionInfo]])
 
 class RDDDataDistribution private[spark](
+val executorId: String,
 
 Review comment:
   Hi @squito, thanks for providing the api versioning requirements. That's 
really helpful.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-10 Thread GitBox

dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-540649306
 
 
   Thank you all, @yuecong , @viirya , @srowen , @HyukjinKwon .
   Merged to master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #25943: [WIP][SPARK-29261][SQL][CORE] Support recover live entities from KVStore for (SQL)AppStatusListener

2019-10-10 Thread GitBox

Ngone51 commented on a change in pull request #25943: 
[WIP][SPARK-29261][SQL][CORE] Support recover live entities from KVStore for 
(SQL)AppStatusListener
URL: https://github.com/apache/spark/pull/25943#discussion_r333598202
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
 ##
 @@ -103,6 +104,81 @@ private[spark] class AppStatusListener(
 }
   }
 
+  // visible for tests
+  private[spark] def recoverLiveEntities(): Unit = {
+if (!live) {
+  kvstore.view(classOf[JobDataWrapper])
+.asScala.filter(_.info.status == JobExecutionStatus.RUNNING)
+.map(_.toLiveJob).foreach(job => liveJobs.put(job.jobId, job))
+
+  kvstore.view(classOf[StageDataWrapper]).asScala
+.filter { stageData =>
+  stageData.info.status == v1.StageStatus.PENDING ||
+stageData.info.status == v1.StageStatus.ACTIVE
+}
+.map { stageData =>
+  val stageId = stageData.info.stageId
+  val jobs = liveJobs.values.filter(_.stageIds.contains(stageId)).toSeq
+  stageData.toLiveStage(jobs)
+}.foreach { stage =>
+val stageId = stage.info.stageId
+val stageAttempt = stage.info.attemptNumber()
+liveStages.put((stageId, stageAttempt), stage)
 
 Review comment:
   Yeah. But for now, I'd prefer to use Tuple2 (stageId, stageAttempt) as 
`AppStatusListener` already do. And we could leave such optimization in a 
separate PR later.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-10 Thread GitBox

dongjoon-hyun closed pull request #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #26081: [SPARK-29032][FOLLOWUP][DOC] Document PrometheusServlet in the monitoring documentation

2019-10-10 Thread GitBox

dongjoon-hyun commented on issue #26081: [SPARK-29032][FOLLOWUP][DOC] Document 
PrometheusServlet in the monitoring documentation
URL: https://github.com/apache/spark/pull/26081#issuecomment-540653620
 
 
   +1, LGTM. Thanks for the interest. Merged to master.
   
   Actually, I'm working on the full document, too. Please wait for one day~ :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #26081: [SPARK-29032][FOLLOWUP][DOCS] Add PrometheusServlet in the monitoring documentation

2019-10-10 Thread GitBox

dongjoon-hyun closed pull request #26081: [SPARK-29032][FOLLOWUP][DOCS] Add 
PrometheusServlet in the monitoring documentation
URL: https://github.com/apache/spark/pull/26081
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #25943: [WIP][SPARK-29261][SQL][CORE] Support recover live entities from KVStore for (SQL)AppStatusListener

2019-10-10 Thread GitBox

Ngone51 commented on a change in pull request #25943: 
[WIP][SPARK-29261][SQL][CORE] Support recover live entities from KVStore for 
(SQL)AppStatusListener
URL: https://github.com/apache/spark/pull/25943#discussion_r333610531
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
 ##
 @@ -103,6 +104,81 @@ private[spark] class AppStatusListener(
 }
   }
 
+  // visible for tests
+  private[spark] def recoverLiveEntities(): Unit = {
+if (!live) {
+  kvstore.view(classOf[JobDataWrapper])
+.asScala.filter(_.info.status == JobExecutionStatus.RUNNING)
+.map(_.toLiveJob).foreach(job => liveJobs.put(job.jobId, job))
+
+  kvstore.view(classOf[StageDataWrapper]).asScala
+.filter { stageData =>
+  stageData.info.status == v1.StageStatus.PENDING ||
+stageData.info.status == v1.StageStatus.ACTIVE
+}
+.map { stageData =>
+  val stageId = stageData.info.stageId
+  val jobs = liveJobs.values.filter(_.stageIds.contains(stageId)).toSeq
+  stageData.toLiveStage(jobs)
+}.foreach { stage =>
+val stageId = stage.info.stageId
+val stageAttempt = stage.info.attemptNumber()
+liveStages.put((stageId, stageAttempt), stage)
+
+kvstore.view(classOf[ExecutorStageSummaryWrapper])
+  .index("stage")
+  .first(Array(stageId, stageAttempt))
+  .last(Array(stageId, stageAttempt))
+  .asScala
+  .map(_.toLiveExecutorStageSummary)
+  .foreach { esummary =>
+stage.executorSummaries.put(esummary.executorId, esummary)
+if (esummary.isBlacklisted) {
+  stage.blackListedExecutors += esummary.executorId
+  liveExecutors(esummary.executorId).isBlacklisted = true
+  liveExecutors(esummary.executorId).blacklistedInStages += stageId
+}
+  }
+
+
+kvstore.view(classOf[TaskDataWrapper])
+  .parent(Array(stageId, stageAttempt))
+  .index(TaskIndexNames.STATUS)
+  .first(TaskState.RUNNING.toString)
 
 Review comment:
   Ignore `LAUNCHING` is safe, because `status` in `TaskDataWrapper` is 
actually from `LiveTask.TaskInfo.status`:
   
   
https://github.com/apache/spark/blob/e1ea806b3075d279b5f08a29fe4c1ad6d3c4191a/core/src/main/scala/org/apache/spark/scheduler/TaskInfo.scala#L96-L112
   
   And there's no `LAUNCHING`.
   
   Another available running status is `GET RESULT`. But this also seems 
impossible. `TaskInfo` in `LiveTask` is only updated when 
`SparkListenerTaskStart` and `SparkListenerTaskEnd` events comes. And these two 
events don't change task's status to `GET RESULT`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #25943: [WIP][SPARK-29261][SQL][CORE] Support recover live entities from KVStore for (SQL)AppStatusListener

2019-10-10 Thread GitBox

Ngone51 commented on a change in pull request #25943: 
[WIP][SPARK-29261][SQL][CORE] Support recover live entities from KVStore for 
(SQL)AppStatusListener
URL: https://github.com/apache/spark/pull/25943#discussion_r333611704
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
 ##
 @@ -103,6 +104,81 @@ private[spark] class AppStatusListener(
 }
   }
 
+  // visible for tests
+  private[spark] def recoverLiveEntities(): Unit = {
+if (!live) {
+  kvstore.view(classOf[JobDataWrapper])
+.asScala.filter(_.info.status == JobExecutionStatus.RUNNING)
 
 Review comment:
   Yeah. I didn't see any place we set Job status to `UNKNOWN`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on a change in pull request #26065: [SPARK-29404][DOCS] Add an explanation about the executor color changed in WebUI documentation

2019-10-10 Thread GitBox

srowen commented on a change in pull request #26065: [SPARK-29404][DOCS] Add an 
explanation about the executor color changed in WebUI documentation
URL: https://github.com/apache/spark/pull/26065#discussion_r333611603
 
 

 ##
 File path: docs/web-ui.md
 ##
 @@ -51,12 +51,17 @@ The information that is displayed in this section is
   
 
 
-* Details of jobs grouped by status: Displays detailed information of the jobs 
including Job ID, description (with a link to detailed job page), submitted 
time, duration, stages summary and tasks progress bar
+When you click on a executor, you can make the color darker to point it at.
 
 Review comment:
   No, I just don't think it's meaningful enough to try to explain, and the new 
text above is incorrect too.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] rafaelkyrdan commented on issue #19424: [SPARK-22197][SQL] push down operators to data source before planning

2019-10-10 Thread GitBox

rafaelkyrdan commented on issue #19424: [SPARK-22197][SQL] push down operators 
to data source before planning
URL: https://github.com/apache/spark/pull/19424#issuecomment-540666492
 
 
   @cloud-fan 
   I'm wondering whether it is not a priority or it wasn't easy to implement?
   If I want to implement it what class/interface should I extend? 
   Is it possible to do it in a custom `JDBC` source?
   Do you have any suggestions?
   The use case looks like this: there are 50M records in the DB I wanna select 
only 1M why I should read all of them?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] planga82 opened a new pull request #26082: [SPARK-29431][WebUI] Improve Web UI / Sql tab visualization with cached dataframes.

2019-10-10 Thread GitBox

planga82 opened a new pull request #26082: [SPARK-29431][WebUI] Improve Web UI 
/ Sql tab visualization with cached dataframes.
URL: https://github.com/apache/spark/pull/26082
 
 
   ### What changes were proposed in this pull request?
   With this pull request I want to improve the Web UI / SQL tab visualization. 
The principal problem that I find is when you have a cache in your plan, the 
SQL visualization don’t show any information about the part of the plan that 
has been cached.
   
   Before the change
   
![image](https://user-images.githubusercontent.com/12819544/66587418-aa7f6280-eb8a-11e9-80cf-bf10d6c0abea.png)
   After the change 
   
![image](https://user-images.githubusercontent.com/12819544/66587526-ddc1f180-eb8a-11e9-92de-c3b3f5657b66.png)
   
   ### Why are the changes needed?
   When we have a SQL plan with cached dataframes we lose the graphical 
information of this dataframe in the sql tab
   
   ### Does this PR introduce any user-facing change?
   Yes, in the sql tab
   
   ### How was this patch tested?
   Unit testing and manual tests throught spark shell
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #26028: [SPARK-29359][SQL][TESTS] Better exception handling in (SQL|ThriftServer)QueryTestSuite

2019-10-10 Thread GitBox

dongjoon-hyun commented on issue #26028: [SPARK-29359][SQL][TESTS] Better 
exception handling in (SQL|ThriftServer)QueryTestSuite
URL: https://github.com/apache/spark/pull/26028#issuecomment-540667842
 
 
   Hi, @wangyum . You can merge this after manual testing. 
   Or, we can wait until our Jenkins is back again.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #25943: [WIP][SPARK-29261][SQL][CORE] Support recover live entities from KVStore for (SQL)AppStatusListener

2019-10-10 Thread GitBox

Ngone51 commented on a change in pull request #25943: 
[WIP][SPARK-29261][SQL][CORE] Support recover live entities from KVStore for 
(SQL)AppStatusListener
URL: https://github.com/apache/spark/pull/25943#discussion_r333634637
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/status/storeTypes.scala
 ##
 @@ -76,6 +109,29 @@ private[spark] class JobDataWrapper(
 
   @JsonIgnore @KVIndex("completionTime")
   private def completionTime: Long = 
info.completionTime.map(_.getTime).getOrElse(-1L)
+
+  def toLiveJob: LiveJob = {
 
 Review comment:
   While converting `LiveJob` to `JobDataWrapper`,  only 
`completedIndices.size`/`completedStages.size` are written into KVStore. So, 
it's impossible to recover the detail `completedIndices`/`completedStages`. So, 
as a compromises, we only recover the number in this case. 
   
   And this brings a problem that the recovered live job wouldn't be 100% the 
same as the previous one. But, this can be acceptable when you look at 
their(`completedIndices`, `completedStages`) usages in `AppStatusListener`, as 
it does not mistake the final result(that is number).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24851: [SPARK-27303][GRAPH] Add Spark Graph API

2019-10-10 Thread GitBox

dongjoon-hyun commented on a change in pull request #24851: 
[SPARK-27303][GRAPH] Add Spark Graph API
URL: https://github.com/apache/spark/pull/24851#discussion_r42983
 
 

 ##
 File path: 
graph/api/src/main/scala/org/apache/spark/graph/api/PropertyGraphWriter.scala
 ##
 @@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.graph.api
+
+import java.util.Locale
+
+import org.apache.spark.sql.SaveMode
+
+abstract class PropertyGraphWriter(val graph: PropertyGraph) {
+
+  protected var saveMode: SaveMode = SaveMode.ErrorIfExists
+  protected var format: String =
+graph.cypherSession.sparkSession.sessionState.conf.defaultDataSourceName
+
+  /**
+   * Specifies the behavior when the graph already exists. Options include:
+   * 
+   * `SaveMode.Overwrite`: overwrite the existing data.
+   * `SaveMode.Ignore`: ignore the operation (i.e. no-op).
+   * `SaveMode.ErrorIfExists`: throw an exception at runtime.
+   * 
+   * 
+   * When writing the default option is `ErrorIfExists`.
+   *
+   * @since 3.0.0
+   */
+  def mode(mode: SaveMode): PropertyGraphWriter = {
+mode match {
+  case SaveMode.Append =>
+throw new IllegalArgumentException(s"Unsupported save mode: $mode. " +
+  "Accepted save modes are 'overwrite', 'ignore', 'error', 
'errorifexists'.")
+  case _ =>
+this.saveMode = mode
+}
+this
+  }
+
+  /**
+   * Specifies the behavior when the graph already exists. Options include:
+   * 
+   * `overwrite`: overwrite the existing graph.
+   * `ignore`: ignore the operation (i.e. no-op).
+   * `error` or `errorifexists`: default option, throw an exception at 
runtime.
+   * 
+   *
+   * @since 3.0.0
+   */
+  def mode(saveMode: String): PropertyGraphWriter = {
+saveMode.toLowerCase(Locale.ROOT) match {
+  case "overwrite" => mode(SaveMode.Overwrite)
+  case "ignore" => mode(SaveMode.Ignore)
+  case "error" | "errorifexists" => mode(SaveMode.ErrorIfExists)
+  case "default" => this
 
 Review comment:
   Ur, this can be wrong in case of `mode("overwrite").mode("default")`. Please 
merge `default` to the line 67. Please add a test case for this, too.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #24851: [SPARK-27303][GRAPH] Add Spark Graph API

2019-10-10 Thread GitBox

dongjoon-hyun commented on issue #24851: [SPARK-27303][GRAPH] Add Spark Graph 
API
URL: https://github.com/apache/spark/pull/24851#issuecomment-540683239
 
 
   Thank you for the swift updates. I'll review this afternoon again.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs opened a new pull request #26083: [SPARK-29417][CORE] Resource Scheduling - add TaskContext.resource java api

2019-10-10 Thread GitBox

tgravescs opened a new pull request #26083: [SPARK-29417][CORE] Resource 
Scheduling - add TaskContext.resource java api
URL: https://github.com/apache/spark/pull/26083
 
 
   
   
   ### What changes were proposed in this pull request?
   We added a TaskContext.resources() api, but I realized this is returning a 
scala Map which is not ideal for access from Java.  Here I add a resourcesJMap 
function which returns a java.util.Map to make it easily accessible from Java.
   
   ### Why are the changes needed?
   Java API access
   
   ### Does this PR introduce any user-facing change?
   
   Yes, new TaskContext function to access from Java
   
   ### How was this patch tested?
   
   new unit test


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on issue #26083: [SPARK-29417][CORE] Resource Scheduling - add TaskContext.resource java api

2019-10-10 Thread GitBox

tgravescs commented on issue #26083: [SPARK-29417][CORE] Resource Scheduling - 
add TaskContext.resource java api
URL: https://github.com/apache/spark/pull/26083#issuecomment-540684613
 
 
   @mengxr @jiangxb1987 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] rafaelkyrdan commented on issue #19424: [SPARK-22197][SQL] push down operators to data source before planning

2019-10-10 Thread GitBox

rafaelkyrdan commented on issue #19424: [SPARK-22197][SQL] push down operators 
to data source before planning
URL: https://github.com/apache/spark/pull/19424#issuecomment-540687736
 
 
   I think I found what I have to implement:
   `
   trait TableScan {
 def buildScan(): RDD[Row]
   }
   `
   am I right?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] liucht-inspur opened a new pull request #25994: [SPARK-29323][WEBUI] Add tooltip for The Executors Tab's column names in the Spark history server Page

2019-10-10 Thread GitBox

liucht-inspur opened a new pull request #25994: [SPARK-29323][WEBUI] Add 
tooltip for The Executors Tab's column names in the Spark history server Page
URL: https://github.com/apache/spark/pull/25994
 
 
   ### What changes were proposed in this pull request?
   
   This PR is Adding tooltip for The Executors Tab's column names include RDD 
Blocks, Disk Used,Cores, Activity Tasks, Failed Tasks , Complete Tasks, Total 
Tasks in the history server Page.
   https://issues.apache.org/jira/browse/SPARK-29323
   
   
![image](https://user-images.githubusercontent.com/28332082/66017759-b6c24a80-e50e-11e9-807b-5b076f701d2f.png)
   
   I have modify the following code in executorspage-template.html
   Before:
   RDD Blocks
   Disk Used
   Cores
   Active Tasks
   Failed Tasks
   Complete Tasks
   Total Tasks
   
   
![image](https://user-images.githubusercontent.com/28332082/66018111-4ddbd200-e510-11e9-9cfc-19f3eae81e76.png)
   
   After:
   
   RDD 
Blocks
   Disk 
Used
   Cores
   Active Tasks
   Failed Tasks
   Complete Tasks
   Total Tasks
  
   
   
![image](https://user-images.githubusercontent.com/28332082/66018157-79f75300-e510-11e9-96ba-6230aa0940c7.png)
   
   
   ### Why are the changes needed?
   
   the spark Executors of history Tab page, the Summary part shows the line in 
the list of title, but format is irregular.
   Some column names have tooltip, such as Storage Memory, Task Time(GC Time), 
Input, Shuffle Read,
   Shuffle Write and Blacklisted, but there are still some list names that have 
not tooltip. They are RDD Blocks, Disk Used,Cores, Activity Tasks, Failed Tasks 
, Complete Tasks and Total Tasks. oddly, Executors section below,All the column 
names Contains the column names above have tooltip .
   It's important for open source projects to have consistent style and 
user-friendly UI, and I'm working on keeping it consistent And more 
user-friendly.
   
   ### Does this PR introduce any user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Manual tests for Chrome, Firefox and Safari
   
   
   Authored-by: liucht-inspur 
   Signed-off-by: liucht-inspur 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen closed pull request #25994: [SPARK-29323][WEBUI] Add tooltip for The Executors Tab's column names in the Spark history server Page

2019-10-10 Thread GitBox

srowen closed pull request #25994: [SPARK-29323][WEBUI] Add tooltip for The 
Executors Tab's column names in the Spark history server Page
URL: https://github.com/apache/spark/pull/25994
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on issue #25994: [SPARK-29323][WEBUI] Add tooltip for The Executors Tab's column names in the Spark history server Page

2019-10-10 Thread GitBox

srowen commented on issue #25994: [SPARK-29323][WEBUI] Add tooltip for The 
Executors Tab's column names in the Spark history server Page
URL: https://github.com/apache/spark/pull/25994#issuecomment-540700641
 
 
   We'll have to wait for Jenkins to come back, to evaluate this with tests, 
but I'm sure it will be OK.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] rdblue commented on issue #26071: [SPARK-29412][SQL] refine the document of v2 session catalog config

2019-10-10 Thread GitBox

rdblue commented on issue #26071: [SPARK-29412][SQL] refine the document of v2 
session catalog config
URL: https://github.com/apache/spark/pull/26071#issuecomment-540704376
 
 
   +1
   
   Unless there are other comments, I'll merge this later today.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] brkyvz commented on issue #26071: [SPARK-29412][SQL] refine the document of v2 session catalog config

2019-10-10 Thread GitBox

brkyvz commented on issue #26071: [SPARK-29412][SQL] refine the document of v2 
session catalog config
URL: https://github.com/apache/spark/pull/26071#issuecomment-540707191
 
 
   LGTM as well


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dilipbiswal commented on a change in pull request #26042: [SPARK-29092][SQL] Report additional information about DataSourceScanExec in EXPLAIN FORMATTED

2019-10-10 Thread GitBox

dilipbiswal commented on a change in pull request #26042: [SPARK-29092][SQL] 
Report additional information about DataSourceScanExec in EXPLAIN FORMATTED
URL: https://github.com/apache/spark/pull/26042#discussion_r333670345
 
 

 ##
 File path: sql/core/src/test/resources/sql-tests/results/explain.sql.out
 ##
 @@ -58,6 +58,11 @@ struct
 
 (1) Scan parquet default.explain_temp1 
 Output: [key#x, val#x]
+Batched: true
+Format: Parquet
 
 Review comment:
   @cloud-fan I can filter it. But wanted to double check first. The one we 
print in the node name is `relation.toString` and the one printed here is 
`relation.format.toString`. Are they going to be same always ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dilipbiswal commented on a change in pull request #26042: [SPARK-29092][SQL] Report additional information about DataSourceScanExec in EXPLAIN FORMATTED

2019-10-10 Thread GitBox

dilipbiswal commented on a change in pull request #26042: [SPARK-29092][SQL] 
Report additional information about DataSourceScanExec in EXPLAIN FORMATTED
URL: https://github.com/apache/spark/pull/26042#discussion_r333671047
 
 

 ##
 File path: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerQueryTestSuite.scala
 ##
 @@ -100,7 +100,8 @@ class ThriftServerQueryTestSuite extends SQLQueryTestSuite 
{
 "subquery/in-subquery/in-group-by.sql",
 "subquery/in-subquery/simple-in.sql",
 "subquery/in-subquery/in-order-by.sql",
-"subquery/in-subquery/in-set-operations.sql"
+"subquery/in-subquery/in-set-operations.sql",
+"explain.sql"
 
 Review comment:
   @cloud-fan Let me enable it and see. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on a change in pull request #25561: [SPARK-28810][DOC][SQL] Document SHOW TABLES in SQL Reference.

2019-10-10 Thread GitBox

srowen commented on a change in pull request #25561: [SPARK-28810][DOC][SQL] 
Document SHOW TABLES in SQL Reference.
URL: https://github.com/apache/spark/pull/25561#discussion_r333672276
 
 

 ##
 File path: docs/sql-ref-syntax-aux-show-tables.md
 ##
 @@ -18,5 +18,90 @@ license: |
   See the License for the specific language governing permissions and
   limitations under the License.
 ---
+### Description
 
-**This page is under construction**
+The `SHOW TABLES` statement returns all the tables for an optionally specified 
database.
+Additionally, the output of this statement may be filtered by an optional 
matching
+pattern. If no database is specified then the tables are returned from the 
+current database.
+
+### Syntax
+{% highlight sql %}
+SHOW TABLES [{FROM|IN} database_name] [LIKE 'regex_pattern']
+{% endhighlight %}
+
+### Parameters
+
+  {FROM|IN} database_name
+  
+ Specifies the database name from which tables are listed.
+  
+  LIKE regex_pattern
+  
+ Specifies the regular expression pattern that is used to filter out 
unwanted tables. 
+  
+   Except for * and | character, the pattern works like a 
regex.
 
 Review comment:
   OK, I might put back the back-ticks or quotes around special chars here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ifilonenko commented on issue #25785: [SPARK-27812][CORE] Explicit System.exit after job's main

2019-10-10 Thread GitBox

ifilonenko commented on issue #25785: [SPARK-27812][CORE] Explicit System.exit 
after job's main
URL: https://github.com/apache/spark/pull/25785#issuecomment-540721815
 
 
   hey @igorcalabria, do you have a link to your PR in `kubernetes-client`? 
This problem affects our spark-on-k8s jobs and I would like to have a timeline 
on the patch or offer help where I can. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] squito commented on a change in pull request #26059: [SPARK-29398][core] Support dedicated thread pools for RPC endpoints.

2019-10-10 Thread GitBox

squito commented on a change in pull request #26059: [SPARK-29398][core] 
Support dedicated thread pools for RPC endpoints.
URL: https://github.com/apache/spark/pull/26059#discussion_r333683292
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/rpc/RpcEndpoint.scala
 ##
 @@ -146,3 +146,13 @@ private[spark] trait RpcEndpoint {
  * [[ThreadSafeRpcEndpoint]] for different messages.
  */
 private[spark] trait ThreadSafeRpcEndpoint extends RpcEndpoint
+
+/**
+ * An endpoint that uses a dedicated thread pool for delivering messages.
+ */
+private[spark] trait IsolatedRpcEndpoint extends RpcEndpoint {
+
+  /** How many threads to use for delivering messages. By default, use a 
single thread. */
+  def threadCount(): Int = 1
 
 Review comment:
   I'm trying to wrap my head around what happens if you create an 
`IsolatedRpcEndpoint` with threadCount() > 1, given the code in `Inbox` which 
checks for inheritance from `ThreadSafeRpcEndpoint`: 
https://github.com/apache/spark/blob/2b3c3793c97dc0e48f1e098c6f1038f51607b716/core/src/main/scala/org/apache/spark/rpc/netty/Inbox.scala#L123
   
   I guess if you expect one endpoint to be served by multiple threads, it 
makes sense you'd want `Inbox.enableConcurrent = false` and you'd have to make 
your endpoint safe for that -- but worth a comment here at least.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] abellina commented on issue #26078: [SPARK-29151][CORE] Support fractional resources for task resource scheduling

2019-10-10 Thread GitBox

abellina commented on issue #26078: [SPARK-29151][CORE] Support fractional 
resources for task resource scheduling
URL: https://github.com/apache/spark/pull/26078#issuecomment-540728997
 
 
   Thanks @tgravescs will address. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on a change in pull request #26000: [SPARK-29330][CORE][YARN] Allow users to chose the name of Spark Shuffle service

2019-10-10 Thread GitBox

tgravescs commented on a change in pull request #26000: 
[SPARK-29330][CORE][YARN] Allow users to chose the name of Spark Shuffle service
URL: https://github.com/apache/spark/pull/26000#discussion_r333691945
 
 

 ##
 File path: 
common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
 ##
 @@ -136,7 +136,11 @@
   private DB db;
 
   public YarnShuffleService() {
-super("spark_shuffle");
+this("spark_shuffle");
+  }
+
+  protected YarnShuffleService(String serviceName) {
 
 Review comment:
   So the name by itself isn't going to be enough.  If you really want it 
configurable we are going to have to have the port configurable. For instance 
the config name for the port spark.shuffle.service.port needs to be able to be 
something like spark.shuffle.service.{serviceName}.port.  Otherwise all the 
spark shuffle servers will try to get the same port and fail.  The only other 
option will be to use 0 for ephemeral but 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on a change in pull request #26000: [SPARK-29330][CORE][YARN] Allow users to chose the name of Spark Shuffle service

2019-10-10 Thread GitBox

tgravescs commented on a change in pull request #26000: 
[SPARK-29330][CORE][YARN] Allow users to chose the name of Spark Shuffle service
URL: https://github.com/apache/spark/pull/26000#discussion_r333656098
 
 

 ##
 File path: 
common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
 ##
 @@ -136,7 +136,11 @@
   private DB db;
 
   public YarnShuffleService() {
-super("spark_shuffle");
+this("spark_shuffle");
+  }
+
+  protected YarnShuffleService(String serviceName) {
+super(serviceName);
 logger.info("Initializing YARN shuffle service for Spark");
 
 Review comment:
   lets change the log statement to have the servicename in it


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on a change in pull request #26000: [SPARK-29330][CORE][YARN] Allow users to chose the name of Spark Shuffle service

2019-10-10 Thread GitBox

tgravescs commented on a change in pull request #26000: 
[SPARK-29330][CORE][YARN] Allow users to chose the name of Spark Shuffle service
URL: https://github.com/apache/spark/pull/26000#discussion_r333690639
 
 

 ##
 File path: docs/running-on-yarn.md
 ##
 @@ -492,6 +492,13 @@ To use a custom metrics.properties for the application 
master and executors, upd
   If it is not set then the YARN application ID is used.
   
 
+
+  spark.yarn.shuffle.service.name
+  spark_shuffle
+  
+Name of the external shuffle service.
 
 Review comment:
   many newbie's aren't familiar with what external shuffle service is or even 
yarn so its best to be clear. How about:
   
   The name of the external shuffle service.
   The external shuffle service itself is configured and started by YARN (see 
Configuring the External Shuffle Service for details). The name specified here 
must match the name YARN used.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] abellina commented on a change in pull request #26078: [SPARK-29151][CORE] Support fractional resources for task resource scheduling

2019-10-10 Thread GitBox

abellina commented on a change in pull request #26078: [SPARK-29151][CORE] 
Support fractional resources for task resource scheduling
URL: https://github.com/apache/spark/pull/26078#discussion_r333704288
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/scheduler/ExecutorResourceInfo.scala
 ##
 @@ -25,10 +25,13 @@ import org.apache.spark.resource.{ResourceAllocator, 
ResourceInformation}
  * information.
  * @param name Resource name
  * @param addresses Resource addresses provided by the executor
+ * @param numParts Number of ways each resource is subdivided when scheduling 
tasks
  */
-private[spark] class ExecutorResourceInfo(name: String, addresses: Seq[String])
+private[spark] class ExecutorResourceInfo(name: String, addresses: Seq[String],
+  numParts: Int)
   extends ResourceInformation(name, addresses.toArray) with ResourceAllocator {
 
   override protected def resourceName = this.name
   override protected def resourceAddresses = this.addresses
+  override protected def resourcesPerAddress = numParts
 
 Review comment:
   One issue here is that this would be part of the `ResourceAllocator` 
interface, of which there are two `WorkerResourceInfo`, which sets the 
`resourcesPerAddress` to 1, and `ExecutorResourceInfo`, which is the variable 
one. 
   
   Could we all it something like `slotsPerAddress`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] abellina commented on a change in pull request #26078: [SPARK-29151][CORE] Support fractional resources for task resource scheduling

2019-10-10 Thread GitBox

abellina commented on a change in pull request #26078: [SPARK-29151][CORE] 
Support fractional resources for task resource scheduling
URL: https://github.com/apache/spark/pull/26078#discussion_r333704705
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/resource/ResourceAllocator.scala
 ##
 @@ -30,27 +30,35 @@ trait ResourceAllocator {
 
   protected def resourceName: String
   protected def resourceAddresses: Seq[String]
+  protected def resourcesPerAddress: Int
 
   /**
-   * Map from an address to its availability, the value `true` means the 
address is available,
-   * while value `false` means the address is assigned.
+   * Map from an address to its availability, a value > 0 means the address is 
available,
+   * while value of 0 means the address is fully assigned.
+   *
+   * For task resources ([[org.apache.spark.scheduler.ExecutorResourceInfo]]), 
this value
+   * can be fractional.
 
 Review comment:
   good catch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #26013: [SPARK-29347][SQL] Add JSON serialization for external Rows

2019-10-10 Thread GitBox

viirya commented on a change in pull request #26013: [SPARK-29347][SQL] Add 
JSON serialization for external Rows
URL: https://github.com/apache/spark/pull/26013#discussion_r333711709
 
 

 ##
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
 ##
 @@ -501,4 +513,88 @@ trait Row extends Serializable {
   private def getAnyValAs[T <: AnyVal](i: Int): T =
 if (isNullAt(i)) throw new NullPointerException(s"Value at index $i is 
null")
 else getAs[T](i)
+
+  /** The compact JSON representation of this row. */
+  def json: String = compact(jsonValue)
+
+  /** The pretty (i.e. indented) JSON representation of this row. */
+  def prettyJson: String = pretty(render(jsonValue))
+
+  /**
+   * JSON representation of the row.
+   *
+   * Note that this only supports the data types that are also supported by
+   * [[org.apache.spark.sql.catalyst.encoders.RowEncoder]].
+   *
+   * @return the JSON representation of the row.
+   */
+  private[sql] def jsonValue: JValue = {
+require(schema != null, "JSON serialization requires a non-null schema.")
+
+lazy val timeZone = TimeZone.getTimeZone(SQLConf.get.sessionLocalTimeZone)
+lazy val dateFormatter = DateFormatter.apply(timeZone.toZoneId)
+lazy val timestampFormatter = TimestampFormatter.apply(timeZone.toZoneId)
+
+// Convert an iterator of values to a json array
+def iteratorToJsonArray(iterator: Iterator[_], elementType: DataType): 
JArray = {
+  JArray(iterator.map(toJson(_, elementType)).toList)
+}
+
+// Convert a value to json.
+def toJson(value: Any, dataType: DataType): JValue = (value, dataType) 
match {
+  case (null, _) => JNull
+  case (b: Boolean, _) => JBool(b)
+  case (b: Byte, _) => JLong(b)
+  case (s: Short, _) => JLong(s)
+  case (i: Int, _) => JLong(i)
+  case (l: Long, _) => JLong(l)
+  case (f: Float, _) => JDouble(f)
+  case (d: Double, _) => JDouble(d)
+  case (d: BigDecimal, _) => JDecimal(d)
+  case (d: java.math.BigDecimal, _) => JDecimal(d)
+  case (d: Decimal, _) => JDecimal(d.toBigDecimal)
+  case (s: String, _) => JString(s)
+  case (b: Array[Byte], BinaryType) =>
+JString(Base64.getEncoder.encodeToString(b))
+  case (d: LocalDate, _) =>
+JString(dateFormatter.format(DateTimeUtils.localDateToDays(d)))
+  case (d: Date, _) =>
+JString(dateFormatter.format(DateTimeUtils.fromJavaDate(d)))
+  case (i: Instant, _) =>
+JString(timestampFormatter.format(DateTimeUtils.instantToMicros(i)))
+  case (t: Timestamp, _) =>
+JString(timestampFormatter.format(DateTimeUtils.fromJavaTimestamp(t)))
+  case (a: Array[_], ArrayType(elementType, _)) =>
+iteratorToJsonArray(a.iterator, elementType)
+  case (s: Seq[_], ArrayType(elementType, _)) =>
+iteratorToJsonArray(s.iterator, elementType)
+  case (m: Map[String @unchecked, _], MapType(StringType, valueType, _)) =>
 
 Review comment:
   Can other primitive types like Int be good for this format too?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] Support fractional resources for task resource scheduling

2019-10-10 Thread GitBox

tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] 
Support fractional resources for task resource scheduling
URL: https://github.com/apache/spark/pull/26078#discussion_r333715025
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/scheduler/ExecutorResourceInfo.scala
 ##
 @@ -25,10 +25,13 @@ import org.apache.spark.resource.{ResourceAllocator, 
ResourceInformation}
  * information.
  * @param name Resource name
  * @param addresses Resource addresses provided by the executor
+ * @param numParts Number of ways each resource is subdivided when scheduling 
tasks
  */
-private[spark] class ExecutorResourceInfo(name: String, addresses: Seq[String])
+private[spark] class ExecutorResourceInfo(name: String, addresses: Seq[String],
+  numParts: Int)
   extends ResourceInformation(name, addresses.toArray) with ResourceAllocator {
 
   override protected def resourceName = this.name
   override protected def resourceAddresses = this.addresses
+  override protected def resourcesPerAddress = numParts
 
 Review comment:
   sure slotsPerAddress sounds good.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] squito commented on issue #25943: [WIP][SPARK-29261][SQL][CORE] Support recover live entities from KVStore for (SQL)AppStatusListener

2019-10-10 Thread GitBox

squito commented on issue #25943: [WIP][SPARK-29261][SQL][CORE] Support recover 
live entities from KVStore for (SQL)AppStatusListener
URL: https://github.com/apache/spark/pull/25943#issuecomment-540769871
 
 
   > Hmm...This PR doesn't store live entities, instead, it recovers (or say, 
restore) live entities from a KVStore
   
   I see, this is probably the key part I was missing.  So you're saying the 
KVStore already has enough info, this PR just repopulates the in-memory version 
of all the LiveEntities?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 189 matches

Mail list logo