date:20170710

[GitHub] spark pull request #18590: [SPARK-21365][PYTHON] Deduplicate logics parsing ...

2017-07-10 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/18590#discussion_r126598613
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2026,16 +2026,26 @@ def __init__(self, func, returnType, name=None):
 "{0}".format(type(func)))
 
 self.func = func
-self.returnType = (
-returnType if isinstance(returnType, DataType)
-else _parse_datatype_string(returnType))
+self._returnType = returnType
 # Stores UserDefinedPythonFunctions jobj, once initialized
+self._returnType_placeholder = None
 self._judf_placeholder = None
 self._name = name or (
 func.__name__ if hasattr(func, '__name__')
 else func.__class__.__name__)
 
 @property
+def returnType(self):
--- End diff --

We have pretty similar logic bellow, would it make sense to think about if 
there is a nicer more general way to handle these delayed iniatilization 
classes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18543: [SPARK-21319][SQL] Fix memory leak in UnsafeExter...

2017-07-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18543#discussion_r126600430
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java
 ---
@@ -211,7 +211,10 @@ public int compare(Object baseObj1, long baseOff1, 
Object baseObj2, long baseOff
   // TODO: Why are the sizes -1?
   row1.pointTo(baseObj1, baseOff1, -1);
   row2.pointTo(baseObj2, baseOff2, -1);
-  return ordering.compare(row1, row2);
+  int comparison = ordering.compare(row1, row2);
+  row1.pointTo(null, 0L, -1);
+  row2.pointTo(null, 0L, -1);
--- End diff --

can we avoid to do it per comparison? is there any places we can do a 
cleanup at the end?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18597: [SPARK-20456][PYTHON][FOLLOWUP] Fix timezone-dependent d...

2017-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18597
  
**[Test build #79505 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79505/testReport)**
 for PR 18597 at commit 
[`31fa09b`](https://github.com/apache/spark/commit/31fa09bd31b44a4513484de4b9de83c6ce7b9323).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18228: [SPARK-21007][SQL]Add SQL function - RIGHT && LEF...

2017-07-10 Thread 10110346

Github user 10110346 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18228#discussion_r126599834
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -1199,6 +1199,82 @@ case class Substring(str: Expression, pos: 
Expression, len: Expression)
 }
 
 /**
+ * Returns the rightmost n characters from the string.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, len) - Returns the rightmost `len` characters from 
the string `str`.",
+  extended = """
+Examples:
+  > SELECT _FUNC_('Spark SQL', 3);
+   SQL
+  """)
+case class Right(str: Expression, len: Expression)
+  extends BinaryExpression with ImplicitCastInputTypes with NullIntolerant 
{
--- End diff --

For example:
select right("sparksql",-2); 
for this case,we expected is "", 
if we implement `Right` as Substring(str, UnaryMinus(len)). this result 
will be `parksql`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18580: [SPARK-21354] [SQL] INPUT FILE related functions do not ...

2017-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18580
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79496/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18580: [SPARK-21354] [SQL] INPUT FILE related functions do not ...

2017-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18580
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18580: [SPARK-21354] [SQL] INPUT FILE related functions do not ...

2017-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18580
  
**[Test build #79496 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79496/testReport)**
 for PR 18580 at commit 
[`6b48a9e`](https://github.com/apache/spark/commit/6b48a9e52ded62715b32aef4ee31b121d3e7aee9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18580: [SPARK-21354] [SQL] INPUT FILE related functions ...

2017-07-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18580#discussion_r126599149
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -74,6 +74,15 @@ trait CheckAnalysis extends PredicateHelper {
 }
   }
 
+  private def getNumInputFileBlockSources(operator: LogicalPlan): Int = {
+operator match {
+  case _: LeafNode => 1
--- End diff --

shall we only consider file data source leaf node?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18597: [SPARK-20456][PYTHON][FOLLOWUP] Fix timezone-dependent d...

2017-07-10 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18597
  
Sounds good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18023
  
**[Test build #79504 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79504/testReport)**
 for PR 18023 at commit 
[`d613ff9`](https://github.com/apache/spark/commit/d613ff90596652d7e22859c0684dbfe3602344e0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...

2017-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18405
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...

2017-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18405
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79494/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18023
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79503/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...

2017-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18405
  
**[Test build #79494 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79494/testReport)**
 for PR 18405 at commit 
[`32bc6fd`](https://github.com/apache/spark/commit/32bc6fd4ec3ec1e388faa17624553a685a974b7f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18023
  
**[Test build #79503 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79503/testReport)**
 for PR 18023 at commit 
[`56e2b83`](https://github.com/apache/spark/commit/56e2b83670b209c68c2a6ced0934d60e3f6973af).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class ArrowSerializer(FramedSerializer):`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18023
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18023
  
**[Test build #79503 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79503/testReport)**
 for PR 18023 at commit 
[`56e2b83`](https://github.com/apache/spark/commit/56e2b83670b209c68c2a6ced0934d60e3f6973af).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18590: [SPARK-21365][PYTHON] Deduplicate logics parsing ...

2017-07-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18590#discussion_r126597140
  
--- Diff: python/pyspark/sql/types.py ---
@@ -806,43 +786,43 @@ def _parse_datatype_string(s):
 >>> _parse_datatype_string("blabla") # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
-ValueError:...
+ParseException:...
 >>> _parse_datatype_string("a: int,") # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
-ValueError:...
+ParseException:...
 >>> _parse_datatype_string("array>> _parse_datatype_string("map>") # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
-ValueError:...
+ParseException:...
 """
-s = s.strip()
-if s.startswith("array<"):
-if s[-1] != ">":
-raise ValueError("'>' should be the last char, but got: %s" % 
s)
-return ArrayType(_parse_datatype_string(s[6:-1]))
-elif s.startswith("map<"):
-if s[-1] != ">":
-raise ValueError("'>' should be the last char, but got: %s" % 
s)
-parts = _ignore_brackets_split(s[4:-1], ",")
-if len(parts) != 2:
-raise ValueError("The map type string format is: 
'map', " +
- "but got: %s" % s)
-kt = _parse_datatype_string(parts[0])
-vt = _parse_datatype_string(parts[1])
-return MapType(kt, vt)
-elif s.startswith("struct<"):
-if s[-1] != ">":
-raise ValueError("'>' should be the last char, but got: %s" % 
s)
-return _parse_struct_fields_string(s[7:-1])
-elif ":" in s:
-return _parse_struct_fields_string(s)
-else:
-return _parse_basic_datatype_string(s)
+sc = SparkContext._active_spark_context
+
+def from_ddl_schema(type_str):
+return _parse_datatype_json_string(
+
sc._jvm.org.apache.spark.sql.types.StructType.fromDDL(type_str).json())
+
+def from_ddl_datatype(type_str):
+return _parse_datatype_json_string(
+
sc._jvm.org.apache.spark.sql.api.python.PythonSQLUtils.parseDataType(type_str).json())
+
+try:
+# DDL format, "fieldname datatype, fieldname datatype".
+return from_ddl_schema(s)
+except Exception as e:
+try:
+# For backwards compatibility, "integer", "struct" and etc.
+return from_ddl_datatype(s)
+except:
+try:
+# For backwards compatibility, "fieldname: datatype, 
fieldname: datatype" case.
--- End diff --

won't `fieldname: datatype, fieldname: datatype` be parsed as DDL schema?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18305: [SPARK-20988][ML] Logistic regression uses aggregator hi...

2017-07-10 Thread MLnick

Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/18305
  
I get it in source code. I'm indifferent in tests. If it's easy sure.

How does it help uncover bugs though?
On Tue, 11 Jul 2017 at 04:42, Yanbo Liang  wrote:

> *@yanboliang* commented on this pull request.
> --
>
> In
> 
mllib/src/test/scala/org/apache/spark/ml/optim/aggregator/LogisticAggregatorSuite.scala
> :
>
> > +)
> +  }
> +
> +
> +  /** Get summary statistics for some data and create a new 
LogisticAggregator. */
> +  private def getNewAggregator(
> +  instances: Array[Instance],
> +  coefficients: Vector,
> +  fitIntercept: Boolean,
> +  isMultinomial: Boolean): LogisticAggregator = {
> +val (featuresSummarizer, ySummarizer) =
> +  
DifferentiableLossAggregatorSuite.getClassificationSummarizers(instances)
> +val numClasses = ySummarizer.histogram.length
> +val featuresStd = featuresSummarizer.variance.toArray.map(math.sqrt)
> +val bcFeaturesStd = spark.sparkContext.broadcast(featuresStd)
> +val bcCoefficients = spark.sparkContext.broadcast(coefficients)
>
> I think we always try to destroy broadcast variable explicitly both in
> source code and test cases, like here
> . Of course, these broadcast
> variables can be destroyed after spark session is torn down.
> The reason of why we do this in source code is users application may be
> long-time running, so it will accumulate lots of these variables, waste
> lots of resource and slower your application.
> The reason of why we do this in test case is we should keep same code
> route as in source code. Since we have encountered similar bugs which was
> not covered by test cases.
> But in this case, I think it's safe to not destroy these variables. I just
> suggested to follow MLlib's convention. Thanks.
>
> â
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18597: [SPARK-20456][PYTHON][FOLLOWUP] Fix timezone-dependent d...

2017-07-10 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/18597
  
How about surrounding with SQLConf `spark.sql.session.timeZone` as 
`America/Los_Angeles` ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...

2017-07-10 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17633
  
LGTM, pending jenkins


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18598: [SPARK-19285] [SQL] Implement UDF0

2017-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18598
  
**[Test build #79502 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79502/testReport)**
 for PR 18598 at commit 
[`b3f1bec`](https://github.com/apache/spark/commit/b3f1becbbef2d1e6579e94e683b9d9610fb70546).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18499: [SPARK-21176][WEB UI] Use a single ProxyServlet to proxy...

2017-07-10 Thread aosagie

Github user aosagie commented on the issue:

https://github.com/apache/spark/pull/18499
  
@cloud-fan Hi, I just pushed a change to up the selector threads from 1 to 
8. Can I get a retest please?

@jiangxb1987 Sorry to bother, but is there anyone available to give 
guidance on what I need to do to push this PR forward? This bug renders the 
master completely unresponsive for us in production.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18598: [SPARK-19285] [SQL] Implement UDF0

2017-07-10 Thread gatorsmile

GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/18598

[SPARK-19285] [SQL] Implement UDF0

### What changes were proposed in this pull request?
This PR is to implement UDF0. `UDF0` is needed when users need to implement 
a JAVA UDF with no argument. 

### How was this patch tested?
Added a test case

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark udf0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18598.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18598


commit b7c600015178823bec13958f5afe1c7971c17158
Author: gatorsmile 
Date:   2017-07-11T04:54:15Z

fix.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18582: [SPARK-18619][ML] Make QuantileDiscretizer/Bucketizer/St...

2017-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18582
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79497/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18582: [SPARK-18619][ML] Make QuantileDiscretizer/Bucketizer/St...

2017-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18582
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18582: [SPARK-18619][ML] Make QuantileDiscretizer/Bucketizer/St...

2017-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18582
  
**[Test build #79497 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79497/testReport)**
 for PR 18582 at commit 
[`bd467b6`](https://github.com/apache/spark/commit/bd467b6b1b754987f56c592437a24e3f54b58490).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18597: [SPARK-20456][PYTHON][FOLLOWUP] Fix timezone-dependent d...

2017-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18597
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18597: [SPARK-20456][PYTHON][FOLLOWUP] Fix timezone-dependent d...

2017-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18597
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79500/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18597: [SPARK-20456][PYTHON][FOLLOWUP] Fix timezone-dependent d...

2017-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18597
  
**[Test build #79500 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79500/testReport)**
 for PR 18597 at commit 
[`5c5f405`](https://github.com/apache/spark/commit/5c5f405af76df26d8387455867a270843ef216e2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...

2017-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17633
  
**[Test build #79501 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79501/testReport)**
 for PR 17633 at commit 
[`af3065a`](https://github.com/apache/spark/commit/af3065a0da64251350eca241b4eb3c3c9447f114).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-07-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17633#discussion_r126593630
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -589,18 +591,67 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 col.getType.startsWith(serdeConstants.CHAR_TYPE_NAME))
   .map(col => col.getName).toSet
 
-filters.collect {
-  case op @ BinaryComparison(a: Attribute, Literal(v, _: 
IntegralType)) =>
-s"${a.name} ${op.symbol} $v"
-  case op @ BinaryComparison(Literal(v, _: IntegralType), a: 
Attribute) =>
-s"$v ${op.symbol} ${a.name}"
-  case op @ BinaryComparison(a: Attribute, Literal(v, _: StringType))
+object ExtractableLiteral {
+  def unapply(expr: Expression): Option[String] = expr match {
+case Literal(value, _: IntegralType) => Some(value.toString)
+case Literal(value, _: StringType) => 
Some(quoteStringLiteral(value.toString))
+case _ => None
+  }
+}
+
+object ExtractableLiterals {
+  def unapply(exprs: Seq[Expression]): Option[Seq[String]] = {
+
exprs.map(ExtractableLiteral.unapply).foldLeft(Option(Seq.empty[String])) {
+  case (Some(accum), Some(value)) => Some(accum :+ value)
+  case _ => None
+}
+  }
+}
+
+object ExtractableValues {
+  private lazy val valueToLiteralString: PartialFunction[Any, String] 
= {
+case value: Byte => value.toString
+case value: Short => value.toString
+case value: Int => value.toString
+case value: Long => value.toString
+case value: UTF8String => quoteStringLiteral(value.toString)
+  }
+
+  def unapply(values: Set[Any]): Option[Seq[String]] = {
+values.toSeq.foldLeft(Option(Seq.empty[String])) {
+  case (Some(accum), value) if 
valueToLiteralString.isDefinedAt(value) =>
+Some(accum :+ valueToLiteralString(value))
+  case _ => None
+}
+  }
+}
+
+def convertInToOr(a: Attribute, values: Seq[String]): String = {
+  values.map(value => s"${a.name} = $value").mkString("(", " or ", ")")
+}
+
+lazy val convert: PartialFunction[Expression, String] = {
+  case In(a: Attribute, ExtractableLiterals(values))
+  if !varcharKeys.contains(a.name) && values.nonEmpty =>
+convertInToOr(a, values)
+  case InSet(a: Attribute, ExtractableValues(values))
+  if !varcharKeys.contains(a.name) && values.nonEmpty =>
+convertInToOr(a, values)
+  case op @ BinaryComparison(a: Attribute, ExtractableLiteral(value))
   if !varcharKeys.contains(a.name) =>
-s"""${a.name} ${op.symbol} ${quoteStringLiteral(v.toString)}"""
-  case op @ BinaryComparison(Literal(v, _: StringType), a: Attribute)
+s"${a.name} ${op.symbol} $value"
--- End diff --

nvm, realized that `and`, `or` have higher precedence over binary 
operators, so it should be fine


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-07-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17633#discussion_r126593137
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -589,18 +591,67 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 col.getType.startsWith(serdeConstants.CHAR_TYPE_NAME))
   .map(col => col.getName).toSet
 
-filters.collect {
-  case op @ BinaryComparison(a: Attribute, Literal(v, _: 
IntegralType)) =>
-s"${a.name} ${op.symbol} $v"
-  case op @ BinaryComparison(Literal(v, _: IntegralType), a: 
Attribute) =>
-s"$v ${op.symbol} ${a.name}"
-  case op @ BinaryComparison(a: Attribute, Literal(v, _: StringType))
+object ExtractableLiteral {
+  def unapply(expr: Expression): Option[String] = expr match {
+case Literal(value, _: IntegralType) => Some(value.toString)
+case Literal(value, _: StringType) => 
Some(quoteStringLiteral(value.toString))
+case _ => None
+  }
+}
+
+object ExtractableLiterals {
+  def unapply(exprs: Seq[Expression]): Option[Seq[String]] = {
+
exprs.map(ExtractableLiteral.unapply).foldLeft(Option(Seq.empty[String])) {
--- End diff --

`foldLeft` may be not friendly to some Spark developers, but it's not a big 
deal.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18228: [SPARK-21007][SQL]Add SQL function - RIGHT && LEF...

2017-07-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18228#discussion_r126592966
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -1199,6 +1199,82 @@ case class Substring(str: Expression, pos: 
Expression, len: Expression)
 }
 
 /**
+ * Returns the rightmost n characters from the string.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, len) - Returns the rightmost `len` characters from 
the string `str`.",
+  extended = """
+Examples:
+  > SELECT _FUNC_('Spark SQL', 3);
+   SQL
+  """)
+case class Right(str: Expression, len: Expression)
+  extends BinaryExpression with ImplicitCastInputTypes with NullIntolerant 
{
--- End diff --

`Substring` supports negative position, we can implement `Right` as 
`Substring(str, UnaryMinus(len))`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-07-10 Thread mallman

Github user mallman commented on a diff in the pull request:

https://github.com/apache/spark/pull/17633#discussion_r126592666
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -589,18 +591,67 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 col.getType.startsWith(serdeConstants.CHAR_TYPE_NAME))
   .map(col => col.getName).toSet
 
-filters.collect {
-  case op @ BinaryComparison(a: Attribute, Literal(v, _: 
IntegralType)) =>
-s"${a.name} ${op.symbol} $v"
-  case op @ BinaryComparison(Literal(v, _: IntegralType), a: 
Attribute) =>
-s"$v ${op.symbol} ${a.name}"
-  case op @ BinaryComparison(a: Attribute, Literal(v, _: StringType))
+object ExtractableLiteral {
+  def unapply(expr: Expression): Option[String] = expr match {
+case Literal(value, _: IntegralType) => Some(value.toString)
+case Literal(value, _: StringType) => 
Some(quoteStringLiteral(value.toString))
+case _ => None
+  }
+}
+
+object ExtractableLiterals {
+  def unapply(exprs: Seq[Expression]): Option[Seq[String]] = {
+
exprs.map(ExtractableLiteral.unapply).foldLeft(Option(Seq.empty[String])) {
+  case (Some(accum), Some(value)) => Some(accum :+ value)
+  case _ => None
+}
+  }
+}
+
+object ExtractableValues {
+  private lazy val valueToLiteralString: PartialFunction[Any, String] 
= {
+case value: Byte => value.toString
+case value: Short => value.toString
+case value: Int => value.toString
+case value: Long => value.toString
+case value: UTF8String => quoteStringLiteral(value.toString)
+  }
+
+  def unapply(values: Set[Any]): Option[Seq[String]] = {
+values.toSeq.foldLeft(Option(Seq.empty[String])) {
+  case (Some(accum), value) if 
valueToLiteralString.isDefinedAt(value) =>
+Some(accum :+ valueToLiteralString(value))
+  case _ => None
+}
+  }
+}
+
+def convertInToOr(a: Attribute, values: Seq[String]): String = {
+  values.map(value => s"${a.name} = $value").mkString("(", " or ", ")")
+}
+
+lazy val convert: PartialFunction[Expression, String] = {
+  case In(a: Attribute, ExtractableLiterals(values))
+  if !varcharKeys.contains(a.name) && values.nonEmpty =>
+convertInToOr(a, values)
+  case InSet(a: Attribute, ExtractableValues(values))
+  if !varcharKeys.contains(a.name) && values.nonEmpty =>
+convertInToOr(a, values)
+  case op @ BinaryComparison(a: Attribute, ExtractableLiteral(value))
   if !varcharKeys.contains(a.name) =>
-s"""${a.name} ${op.symbol} ${quoteStringLiteral(v.toString)}"""
-  case op @ BinaryComparison(Literal(v, _: StringType), a: Attribute)
+s"${a.name} ${op.symbol} $value"
+  case op @ BinaryComparison(ExtractableLiteral(value), a: Attribute)
   if !varcharKeys.contains(a.name) =>
-s"""${quoteStringLiteral(v.toString)} ${op.symbol} ${a.name}"""
-}.mkString(" and ")
+s"$value ${op.symbol} ${a.name}"
+  case op @ And(expr1, expr2)
+  if convert.isDefinedAt(expr1) || convert.isDefinedAt(expr2) =>
+(convert.lift(expr1) ++ convert.lift(expr2)).mkString("(", " and 
", ")")
+  case op @ Or(expr1, expr2)
+  if convert.isDefinedAt(expr1) && convert.isDefinedAt(expr2) =>
+(convert.lift(expr1) ++ convert.lift(expr2)).mkString("(", " or ", 
")")
--- End diff --

Ok.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-07-10 Thread mallman

Github user mallman commented on a diff in the pull request:

https://github.com/apache/spark/pull/17633#discussion_r126592639
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -589,18 +591,67 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 col.getType.startsWith(serdeConstants.CHAR_TYPE_NAME))
   .map(col => col.getName).toSet
 
-filters.collect {
-  case op @ BinaryComparison(a: Attribute, Literal(v, _: 
IntegralType)) =>
-s"${a.name} ${op.symbol} $v"
-  case op @ BinaryComparison(Literal(v, _: IntegralType), a: 
Attribute) =>
-s"$v ${op.symbol} ${a.name}"
-  case op @ BinaryComparison(a: Attribute, Literal(v, _: StringType))
+object ExtractableLiteral {
+  def unapply(expr: Expression): Option[String] = expr match {
+case Literal(value, _: IntegralType) => Some(value.toString)
+case Literal(value, _: StringType) => 
Some(quoteStringLiteral(value.toString))
+case _ => None
+  }
+}
+
+object ExtractableLiterals {
+  def unapply(exprs: Seq[Expression]): Option[Seq[String]] = {
+
exprs.map(ExtractableLiteral.unapply).foldLeft(Option(Seq.empty[String])) {
--- End diff --

Is there something wrong with the way it is now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18590: [SPARK-21365][PYTHON] Deduplicate logics parsing DDL typ...

2017-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18590
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79492/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18590: [SPARK-21365][PYTHON] Deduplicate logics parsing DDL typ...

2017-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18590
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-07-10 Thread mallman

Github user mallman commented on a diff in the pull request:

https://github.com/apache/spark/pull/17633#discussion_r126592491
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -589,18 +591,67 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 col.getType.startsWith(serdeConstants.CHAR_TYPE_NAME))
   .map(col => col.getName).toSet
 
-filters.collect {
-  case op @ BinaryComparison(a: Attribute, Literal(v, _: 
IntegralType)) =>
-s"${a.name} ${op.symbol} $v"
-  case op @ BinaryComparison(Literal(v, _: IntegralType), a: 
Attribute) =>
-s"$v ${op.symbol} ${a.name}"
-  case op @ BinaryComparison(a: Attribute, Literal(v, _: StringType))
+object ExtractableLiteral {
+  def unapply(expr: Expression): Option[String] = expr match {
+case Literal(value, _: IntegralType) => Some(value.toString)
+case Literal(value, _: StringType) => 
Some(quoteStringLiteral(value.toString))
+case _ => None
+  }
+}
+
+object ExtractableLiterals {
+  def unapply(exprs: Seq[Expression]): Option[Seq[String]] = {
+
exprs.map(ExtractableLiteral.unapply).foldLeft(Option(Seq.empty[String])) {
+  case (Some(accum), Some(value)) => Some(accum :+ value)
+  case _ => None
+}
+  }
+}
+
+object ExtractableValues {
+  private lazy val valueToLiteralString: PartialFunction[Any, String] 
= {
+case value: Byte => value.toString
+case value: Short => value.toString
+case value: Int => value.toString
+case value: Long => value.toString
+case value: UTF8String => quoteStringLiteral(value.toString)
+  }
+
+  def unapply(values: Set[Any]): Option[Seq[String]] = {
+values.toSeq.foldLeft(Option(Seq.empty[String])) {
+  case (Some(accum), value) if 
valueToLiteralString.isDefinedAt(value) =>
+Some(accum :+ valueToLiteralString(value))
+  case _ => None
+}
+  }
+}
+
+def convertInToOr(a: Attribute, values: Seq[String]): String = {
+  values.map(value => s"${a.name} = $value").mkString("(", " or ", ")")
+}
+
+lazy val convert: PartialFunction[Expression, String] = {
+  case In(a: Attribute, ExtractableLiterals(values))
+  if !varcharKeys.contains(a.name) && values.nonEmpty =>
+convertInToOr(a, values)
+  case InSet(a: Attribute, ExtractableValues(values))
+  if !varcharKeys.contains(a.name) && values.nonEmpty =>
+convertInToOr(a, values)
+  case op @ BinaryComparison(a: Attribute, ExtractableLiteral(value))
   if !varcharKeys.contains(a.name) =>
-s"""${a.name} ${op.symbol} ${quoteStringLiteral(v.toString)}"""
-  case op @ BinaryComparison(Literal(v, _: StringType), a: Attribute)
+s"${a.name} ${op.symbol} $value"
--- End diff --

Is there a problem with leaving them out?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18590: [SPARK-21365][PYTHON] Deduplicate logics parsing DDL typ...

2017-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18590
  
**[Test build #79492 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79492/testReport)**
 for PR 18590 at commit 
[`9d857e6`](https://github.com/apache/spark/commit/9d857e6db4bdcc0a5d6034d5d6261e4a30664960).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18574: [SPARK-21350] [SQL] Fix the error message when th...

2017-07-10 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18574#discussion_r126591307
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala ---
@@ -123,16 +128,20 @@ class UDFRegistration private[sql] (functionRegistry: 
FunctionRegistry) extends
   val anyCast = s".asInstanceOf[UDF$i[$anyTypeArgs, Any]]"
   val anyParams = (1 to i).map(_ => "_: Any").mkString(", ")
   println(s"""
- |/**
- | * Register a user-defined function with ${i} arguments.
- | * @since 1.3.0
- | */
- |def register(name: String, f: UDF$i[$extTypeArgs, _], 
returnType: DataType): Unit = {
- |  val func = f$anyCast.call($anyParams)
- |  functionRegistry.createOrReplaceTempFunction(
- |name,
- |(e: Seq[Expression]) => ScalaUDF(func, returnType, e))
- |}""".stripMargin)
+|/**
+| * Register a user-defined function with ${i} arguments.
+| * @since 1.3.0
+| */
+|def register(name: String, f: UDF$i[$extTypeArgs, _], returnType: 
DataType): Unit = {
--- End diff --

ok. We can add it. Let me submit a quick fix for this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18580: [SPARK-21354] [SQL] INPUT FILE related functions do not ...

2017-07-10 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18580
  
cc @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18597: [SPARK-20456][PYTHON][FOLLOWUP] Fix timezone-dependent d...

2017-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18597
  
**[Test build #79500 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79500/testReport)**
 for PR 18597 at commit 
[`5c5f405`](https://github.com/apache/spark/commit/5c5f405af76df26d8387455867a270843ef216e2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18597: [SPARK-20456][PYTHON][FOLLOWUP] Fix timezone-dependent d...

2017-07-10 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18597
  
cc @ueshin, could you take a look when you have some time?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18597: [SPARK-20456][PYTHON][FOLLOWUP] Fix timezone-depe...

2017-07-10 Thread HyukjinKwon

GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/18597

[SPARK-20456][PYTHON][FOLLOWUP] Fix timezone-dependent doctests in 
unix_timestamp and from_unixtime

## What changes were proposed in this pull request?

This PR proposes to simply ignore the results in examples that are 
timezone-dependent in `unix_timestamp` and `from_unixtime`.

```
Failed example:
time_df.select(unix_timestamp('dt', 
'-MM-dd').alias('unix_time')).collect()
Expected:
[Row(unix_time=1428476400)]
Got:unix_timestamp
[Row(unix_time=1428418800)]
```

```
Failed example:
time_df.select(from_unixtime('unix_time').alias('ts')).collect()
Expected:
[Row(ts=u'2015-04-08 00:00:00')]
Got:
[Row(ts=u'2015-04-08 16:00:00')]
```

## How was this patch tested?

Manually tested and `./run-tests --modules pyspark-sql`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-20456

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18597.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18597


commit 5c5f405af76df26d8387455867a270843ef216e2
Author: hyukjinkwon 
Date:   2017-07-11T03:56:02Z

Fix timezone-dependent doctests in unix_timestamp and from_unixtime




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...

2017-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17633
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...

2017-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17633
  
**[Test build #79493 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79493/testReport)**
 for PR 17633 at commit 
[`a087a0f`](https://github.com/apache/spark/commit/a087a0f44372a226c64514d8a7163801b8eb54ef).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...

2017-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17633
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79493/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18589: [SPARK-16872][ML] Add Gaussian NB

2017-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18589
  
**[Test build #79499 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79499/testReport)**
 for PR 18589 at commit 
[`b7b9206`](https://github.com/apache/spark/commit/b7b920627a24480daa5bcb2d952905700a5852bc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18591: [SPARK-21366][SQL][TEST] Add sql test for window functio...

2017-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18591
  
**[Test build #79498 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79498/testReport)**
 for PR 18591 at commit 
[`a8b0065`](https://github.com/apache/spark/commit/a8b0065da755c293a897636bbf32826f1d91e153).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18582: [SPARK-18619][ML] Make QuantileDiscretizer/Bucketizer/St...

2017-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18582
  
**[Test build #79497 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79497/testReport)**
 for PR 18582 at commit 
[`bd467b6`](https://github.com/apache/spark/commit/bd467b6b1b754987f56c592437a24e3f54b58490).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18581: [SPARK-21289][SQL][ML] Supports custom line separ...

2017-07-10 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/18581#discussion_r126589978
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMOptions.scala ---
@@ -41,11 +41,15 @@ private[libsvm] class LibSVMOptions(@transient private 
val parameters: CaseInsen
 case o => throw new IllegalArgumentException(s"Invalid value `$o` for 
parameter " +
   s"`$VECTOR_TYPE`. Expected types are `sparse` and `dense`.")
   }
+
+  val lineSeparator: Option[String] = parameters.get(LINE_SEPARATOR)
--- End diff --

ok, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18581: [SPARK-21289][SQL][ML] Supports custom line separ...

2017-07-10 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18581#discussion_r126589618
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMOptions.scala ---
@@ -41,11 +41,15 @@ private[libsvm] class LibSVMOptions(@transient private 
val parameters: CaseInsen
 case o => throw new IllegalArgumentException(s"Invalid value `$o` for 
parameter " +
   s"`$VECTOR_TYPE`. Expected types are `sparse` and `dense`.")
   }
+
+  val lineSeparator: Option[String] = parameters.get(LINE_SEPARATOR)
--- End diff --

`compression` is also there for many datasources. Probably, let me try to 
open up a discussion about tying up those later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18541: [SPARK-21315][SQL]Skip some spill files when gene...

2017-07-10 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18541


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18541: [SPARK-21315][SQL]Skip some spill files when generateIte...

2017-07-10 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18541
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18459: [SPARK-13534][PYSPARK] Using Apache Arrow to increase pe...

2017-07-10 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18459
  
great work!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-07-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17633#discussion_r126588869
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -589,18 +591,67 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 col.getType.startsWith(serdeConstants.CHAR_TYPE_NAME))
   .map(col => col.getName).toSet
 
-filters.collect {
-  case op @ BinaryComparison(a: Attribute, Literal(v, _: 
IntegralType)) =>
-s"${a.name} ${op.symbol} $v"
-  case op @ BinaryComparison(Literal(v, _: IntegralType), a: 
Attribute) =>
-s"$v ${op.symbol} ${a.name}"
-  case op @ BinaryComparison(a: Attribute, Literal(v, _: StringType))
+object ExtractableLiteral {
+  def unapply(expr: Expression): Option[String] = expr match {
+case Literal(value, _: IntegralType) => Some(value.toString)
+case Literal(value, _: StringType) => 
Some(quoteStringLiteral(value.toString))
+case _ => None
+  }
+}
+
+object ExtractableLiterals {
+  def unapply(exprs: Seq[Expression]): Option[Seq[String]] = {
+
exprs.map(ExtractableLiteral.unapply).foldLeft(Option(Seq.empty[String])) {
+  case (Some(accum), Some(value)) => Some(accum :+ value)
+  case _ => None
+}
+  }
+}
+
+object ExtractableValues {
+  private lazy val valueToLiteralString: PartialFunction[Any, String] 
= {
+case value: Byte => value.toString
+case value: Short => value.toString
+case value: Int => value.toString
+case value: Long => value.toString
+case value: UTF8String => quoteStringLiteral(value.toString)
+  }
+
+  def unapply(values: Set[Any]): Option[Seq[String]] = {
+values.toSeq.foldLeft(Option(Seq.empty[String])) {
+  case (Some(accum), value) if 
valueToLiteralString.isDefinedAt(value) =>
+Some(accum :+ valueToLiteralString(value))
+  case _ => None
+}
+  }
+}
+
+def convertInToOr(a: Attribute, values: Seq[String]): String = {
+  values.map(value => s"${a.name} = $value").mkString("(", " or ", ")")
+}
+
+lazy val convert: PartialFunction[Expression, String] = {
+  case In(a: Attribute, ExtractableLiterals(values))
+  if !varcharKeys.contains(a.name) && values.nonEmpty =>
+convertInToOr(a, values)
+  case InSet(a: Attribute, ExtractableValues(values))
+  if !varcharKeys.contains(a.name) && values.nonEmpty =>
+convertInToOr(a, values)
+  case op @ BinaryComparison(a: Attribute, ExtractableLiteral(value))
   if !varcharKeys.contains(a.name) =>
-s"""${a.name} ${op.symbol} ${quoteStringLiteral(v.toString)}"""
-  case op @ BinaryComparison(Literal(v, _: StringType), a: Attribute)
+s"${a.name} ${op.symbol} $value"
--- End diff --

shall we add `()` for binary comparisons?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-07-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17633#discussion_r126588744
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -589,18 +591,67 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 col.getType.startsWith(serdeConstants.CHAR_TYPE_NAME))
   .map(col => col.getName).toSet
 
-filters.collect {
-  case op @ BinaryComparison(a: Attribute, Literal(v, _: 
IntegralType)) =>
-s"${a.name} ${op.symbol} $v"
-  case op @ BinaryComparison(Literal(v, _: IntegralType), a: 
Attribute) =>
-s"$v ${op.symbol} ${a.name}"
-  case op @ BinaryComparison(a: Attribute, Literal(v, _: StringType))
+object ExtractableLiteral {
+  def unapply(expr: Expression): Option[String] = expr match {
+case Literal(value, _: IntegralType) => Some(value.toString)
+case Literal(value, _: StringType) => 
Some(quoteStringLiteral(value.toString))
+case _ => None
+  }
+}
+
+object ExtractableLiterals {
+  def unapply(exprs: Seq[Expression]): Option[Seq[String]] = {
+
exprs.map(ExtractableLiteral.unapply).foldLeft(Option(Seq.empty[String])) {
+  case (Some(accum), Some(value)) => Some(accum :+ value)
+  case _ => None
+}
+  }
+}
+
+object ExtractableValues {
+  private lazy val valueToLiteralString: PartialFunction[Any, String] 
= {
+case value: Byte => value.toString
+case value: Short => value.toString
+case value: Int => value.toString
+case value: Long => value.toString
+case value: UTF8String => quoteStringLiteral(value.toString)
+  }
+
+  def unapply(values: Set[Any]): Option[Seq[String]] = {
+values.toSeq.foldLeft(Option(Seq.empty[String])) {
+  case (Some(accum), value) if 
valueToLiteralString.isDefinedAt(value) =>
+Some(accum :+ valueToLiteralString(value))
+  case _ => None
+}
+  }
+}
+
+def convertInToOr(a: Attribute, values: Seq[String]): String = {
+  values.map(value => s"${a.name} = $value").mkString("(", " or ", ")")
+}
+
+lazy val convert: PartialFunction[Expression, String] = {
+  case In(a: Attribute, ExtractableLiterals(values))
+  if !varcharKeys.contains(a.name) && values.nonEmpty =>
+convertInToOr(a, values)
+  case InSet(a: Attribute, ExtractableValues(values))
+  if !varcharKeys.contains(a.name) && values.nonEmpty =>
+convertInToOr(a, values)
+  case op @ BinaryComparison(a: Attribute, ExtractableLiteral(value))
   if !varcharKeys.contains(a.name) =>
-s"""${a.name} ${op.symbol} ${quoteStringLiteral(v.toString)}"""
-  case op @ BinaryComparison(Literal(v, _: StringType), a: Attribute)
+s"${a.name} ${op.symbol} $value"
+  case op @ BinaryComparison(ExtractableLiteral(value), a: Attribute)
   if !varcharKeys.contains(a.name) =>
-s"""${quoteStringLiteral(v.toString)} ${op.symbol} ${a.name}"""
-}.mkString(" and ")
+s"$value ${op.symbol} ${a.name}"
+  case op @ And(expr1, expr2)
+  if convert.isDefinedAt(expr1) || convert.isDefinedAt(expr2) =>
+(convert.lift(expr1) ++ convert.lift(expr2)).mkString("(", " and 
", ")")
+  case op @ Or(expr1, expr2)
+  if convert.isDefinedAt(expr1) && convert.isDefinedAt(expr2) =>
+(convert.lift(expr1) ++ convert.lift(expr2)).mkString("(", " or ", 
")")
--- End diff --

nit: s"(convert(expr1) or convert(expr))"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18582: [SPARK-18619][ML] Make QuantileDiscretizer/Bucketizer/St...

2017-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18582
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79495/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18582: [SPARK-18619][ML] Make QuantileDiscretizer/Bucketizer/St...

2017-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18582
  
**[Test build #79495 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79495/testReport)**
 for PR 18582 at commit 
[`71c4250`](https://github.com/apache/spark/commit/71c42501d20e44d056e423588911aa87821c18f5).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class Bucketizer(JavaTransformer, HasInputCol, HasOutputCol, 
HasHandleInvalid,`
  * `class QuantileDiscretizer(JavaEstimator, HasInputCol, HasOutputCol, 
HasHandleInvalid,`
  * `class RFormula(JavaEstimator, HasFeaturesCol, HasLabelCol, 
HasHandleInvalid,`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18582: [SPARK-18619][ML] Make QuantileDiscretizer/Bucketizer/St...

2017-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18582
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18589: [SPARK-16872][ML] Add Gaussian NB

2017-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18589
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79491/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18589: [SPARK-16872][ML] Add Gaussian NB

2017-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18589
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-07-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17633#discussion_r126588444
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -589,18 +591,67 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 col.getType.startsWith(serdeConstants.CHAR_TYPE_NAME))
   .map(col => col.getName).toSet
 
-filters.collect {
-  case op @ BinaryComparison(a: Attribute, Literal(v, _: 
IntegralType)) =>
-s"${a.name} ${op.symbol} $v"
-  case op @ BinaryComparison(Literal(v, _: IntegralType), a: 
Attribute) =>
-s"$v ${op.symbol} ${a.name}"
-  case op @ BinaryComparison(a: Attribute, Literal(v, _: StringType))
+object ExtractableLiteral {
+  def unapply(expr: Expression): Option[String] = expr match {
+case Literal(value, _: IntegralType) => Some(value.toString)
+case Literal(value, _: StringType) => 
Some(quoteStringLiteral(value.toString))
+case _ => None
+  }
+}
+
+object ExtractableLiterals {
+  def unapply(exprs: Seq[Expression]): Option[Seq[String]] = {
+
exprs.map(ExtractableLiteral.unapply).foldLeft(Option(Seq.empty[String])) {
+  case (Some(accum), Some(value)) => Some(accum :+ value)
+  case _ => None
+}
+  }
+}
+
+object ExtractableValues {
+  private lazy val valueToLiteralString: PartialFunction[Any, String] 
= {
+case value: Byte => value.toString
+case value: Short => value.toString
+case value: Int => value.toString
+case value: Long => value.toString
+case value: UTF8String => quoteStringLiteral(value.toString)
+  }
+
+  def unapply(values: Set[Any]): Option[Seq[String]] = {
+values.toSeq.foldLeft(Option(Seq.empty[String])) {
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18589: [SPARK-16872][ML] Add Gaussian NB

2017-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18589
  
**[Test build #79491 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79491/testReport)**
 for PR 18589 at commit 
[`d71084a`](https://github.com/apache/spark/commit/d71084abb2cdc4adafa5fc9ac7d72e4215f78540).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-07-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17633#discussion_r126588383
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -589,18 +591,67 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 col.getType.startsWith(serdeConstants.CHAR_TYPE_NAME))
   .map(col => col.getName).toSet
 
-filters.collect {
-  case op @ BinaryComparison(a: Attribute, Literal(v, _: 
IntegralType)) =>
-s"${a.name} ${op.symbol} $v"
-  case op @ BinaryComparison(Literal(v, _: IntegralType), a: 
Attribute) =>
-s"$v ${op.symbol} ${a.name}"
-  case op @ BinaryComparison(a: Attribute, Literal(v, _: StringType))
+object ExtractableLiteral {
+  def unapply(expr: Expression): Option[String] = expr match {
+case Literal(value, _: IntegralType) => Some(value.toString)
+case Literal(value, _: StringType) => 
Some(quoteStringLiteral(value.toString))
+case _ => None
+  }
+}
+
+object ExtractableLiterals {
+  def unapply(exprs: Seq[Expression]): Option[Seq[String]] = {
+
exprs.map(ExtractableLiteral.unapply).foldLeft(Option(Seq.empty[String])) {
--- End diff --

I'd like it to be more java style:
```
val extracted = exprs.map(ExtractableLiteral.unapply)
if (extracted. exists(_.isEmpty)) {
  None
} else {
  extracted.map(_.get)
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18591: [SPARK-21366][SQL][TEST] Add sql test for window functio...

2017-07-10 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/18591
  
Okay, let me remove them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18593: [SPARK-21369][Core]Don't use Scala Tuple2 in comm...

2017-07-10 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18593


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18593: [SPARK-21369][Core]Don't use Scala Tuple2 in common/netw...

2017-07-10 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18593
  
LGTM, merging to master/2.2!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18596: [SPARK-21371] dev/make-distribution.sh scripts use of $@...

2017-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18596
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18596: [SPARK-21371] dev/make-distribution.sh scripts us...

2017-07-10 Thread liu-zhaokun

GitHub user liu-zhaokun opened a pull request:

https://github.com/apache/spark/pull/18596

[SPARK-21371] dev/make-distribution.sh scripts use of $@ without ""


[https://issues.apache.org/jira/browse/SPARK-21371](https://issues.apache.org/jira/browse/SPARK-21371)
dev/make-distribution.sh scripts use of $@ without " ",this will affect the 
length of args.For example, if there is a space in the parameter,it will be 
identified as two parameter.Mean while,other modules in spark have used $@ with 
" ",it's right,I think dev/make-distribution.sh should be consistent with 
others,because it's safety.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liu-zhaokun/spark new711

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18596.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18596


commit 5316be09052316707172d5e730b5e2d3cbdb2527
Author: liuzhaokun 
Date:   2017-07-11T03:24:51Z

[SPARK-21371] dev/make-distribution.sh scripts use of  without




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18574: [SPARK-21350] [SQL] Fix the error message when th...

2017-07-10 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18574


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18574: [SPARK-21350] [SQL] Fix the error message when the numbe...

2017-07-10 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18574
  
LGTM, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18300: [SPARK-21043][SQL] Add unionByName in Dataset

2017-07-10 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18300


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18574: [SPARK-21350] [SQL] Fix the error message when th...

2017-07-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18574#discussion_r126587004
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala ---
@@ -123,16 +128,20 @@ class UDFRegistration private[sql] (functionRegistry: 
FunctionRegistry) extends
   val anyCast = s".asInstanceOf[UDF$i[$anyTypeArgs, Any]]"
   val anyParams = (1 to i).map(_ => "_: Any").mkString(", ")
   println(s"""
- |/**
- | * Register a user-defined function with ${i} arguments.
- | * @since 1.3.0
- | */
- |def register(name: String, f: UDF$i[$extTypeArgs, _], 
returnType: DataType): Unit = {
- |  val func = f$anyCast.call($anyParams)
- |  functionRegistry.createOrReplaceTempFunction(
- |name,
- |(e: Seq[Expression]) => ScalaUDF(func, returnType, e))
- |}""".stripMargin)
+|/**
+| * Register a user-defined function with ${i} arguments.
+| * @since 1.3.0
+| */
+|def register(name: String, f: UDF$i[$extTypeArgs, _], returnType: 
DataType): Unit = {
--- End diff --

do you know why we don't have `UDF0`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18591: [SPARK-21366][SQL][TEST] Add sql test for window functio...

2017-07-10 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18591
  
At least these three test cases are not related to Hive compatibility in 
`WindowQuerySuite `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18580: [SPARK-21354] [SQL] INPUT FILE related functions do not ...

2017-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18580
  
**[Test build #79496 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79496/testReport)**
 for PR 18580 at commit 
[`6b48a9e`](https://github.com/apache/spark/commit/6b48a9e52ded62715b32aef4ee31b121d3e7aee9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18300: [SPARK-21043][SQL] Add unionByName in Dataset

2017-07-10 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18300
  
Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18594: [SPARK-20904][core] Don't report task failures to driver...

2017-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18594
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79485/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18594: [SPARK-20904][core] Don't report task failures to driver...

2017-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18594
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18594: [SPARK-20904][core] Don't report task failures to driver...

2017-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18594
  
**[Test build #79485 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79485/testReport)**
 for PR 18594 at commit 
[`76de32a`](https://github.com/apache/spark/commit/76de32a22cda3edab5f6e7baa12af80112715051).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18581: [SPARK-21289][SQL][ML] Supports custom line separ...

2017-07-10 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/18581#discussion_r126585940
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMOptions.scala ---
@@ -41,11 +41,15 @@ private[libsvm] class LibSVMOptions(@transient private 
val parameters: CaseInsen
 case o => throw new IllegalArgumentException(s"Invalid value `$o` for 
parameter " +
   s"`$VECTOR_TYPE`. Expected types are `sparse` and `dense`.")
   }
+
+  val lineSeparator: Option[String] = parameters.get(LINE_SEPARATOR)
--- End diff --

Also, if we support one or two characters only,  I feel we better 
explicitly throw an exception for more than two characters here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18581: [SPARK-21289][SQL][ML] Supports custom line separ...

2017-07-10 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/18581#discussion_r126585768
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMOptions.scala ---
@@ -41,11 +41,15 @@ private[libsvm] class LibSVMOptions(@transient private 
val parameters: CaseInsen
 case o => throw new IllegalArgumentException(s"Invalid value `$o` for 
parameter " +
   s"`$VECTOR_TYPE`. Expected types are `sparse` and `dense`.")
   }
+
+  val lineSeparator: Option[String] = parameters.get(LINE_SEPARATOR)
--- End diff --

Could we put this option in a single place for these formats? I feel 
putting this option in each format looks a little messy...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18582: [SPARK-18619][ML] Make QuantileDiscretizer/Bucketizer/St...

2017-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18582
  
**[Test build #79495 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79495/testReport)**
 for PR 18582 at commit 
[`71c4250`](https://github.com/apache/spark/commit/71c42501d20e44d056e423588911aa87821c18f5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18595: [SPARK-21370][SS] Create distinction between read-only a...

2017-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18595
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18595: [SPARK-21370][SS] Create distinction between read-only a...

2017-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18595
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79487/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...

2017-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18405
  
**[Test build #79494 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79494/testReport)**
 for PR 18405 at commit 
[`32bc6fd`](https://github.com/apache/spark/commit/32bc6fd4ec3ec1e388faa17624553a685a974b7f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18595: [SPARK-21370][SS] Create distinction between read-only a...

2017-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18595
  
**[Test build #79487 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79487/testReport)**
 for PR 18595 at commit 
[`95e04fa`](https://github.com/apache/spark/commit/95e04fa04c44c6ed13f91b8e06bf39ffe83719e0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18405: [SPARK-21194][SQL] Fail the putNullmethod when co...

2017-07-10 Thread jinxing64

Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18405#discussion_r126584992
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
 ---
@@ -758,6 +758,35 @@ class ColumnarBatchSuite extends SparkFunSuite {
 }}
   }
 
+  test("Putting null should fail when null is forbidden in array.") {
+(MemoryMode.ON_HEAP :: MemoryMode.OFF_HEAP :: Nil).foreach { memMode =>
+  val column = ColumnVector.allocate(10, new ArrayType(IntegerType, 
false), memMode)
+  val data = column.arrayData();
+  data.putInt(0, 0)
+  data.putInt(1, 1)
+  assert(data.getInt(0) === 0)
+  assert(data.getInt(1) === 1)
+  val ex = intercept[RuntimeException] {
+data.putNull(2)
--- End diff --

Sure, did it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18593: [SPARK-21369][Core]Don't use Scala Tuple2 in common/netw...

2017-07-10 Thread jinxing64

Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18593
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18305: [SPARK-20988][ML] Logistic regression uses aggreg...

2017-07-10 Thread yanboliang

Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/18305#discussion_r126582816
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/optim/aggregator/LogisticAggregatorSuite.scala
 ---
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.ml.optim.aggregator
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.ml.feature.Instance
+import org.apache.spark.ml.linalg.{BLAS, Matrices, Vector, Vectors}
+import org.apache.spark.ml.util.TestingUtils._
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+
+class LogisticAggregatorSuite extends SparkFunSuite with 
MLlibTestSparkContext {
+
+  import DifferentiableLossAggregatorSuite.getClassificationSummarizers
+
+  @transient var instances: Array[Instance] = _
+  @transient var instancesConstantFeature: Array[Instance] = _
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+instances = Array(
+  Instance(0.0, 0.1, Vectors.dense(1.0, 2.0)),
+  Instance(1.0, 0.5, Vectors.dense(1.5, 1.0)),
+  Instance(2.0, 0.3, Vectors.dense(4.0, 0.5))
+)
+instancesConstantFeature = Array(
+  Instance(0.0, 0.1, Vectors.dense(1.0, 2.0)),
+  Instance(1.0, 0.5, Vectors.dense(1.0, 1.0)),
+  Instance(2.0, 0.3, Vectors.dense(1.0, 0.5))
+)
+  }
+
+
+  /** Get summary statistics for some data and create a new 
LogisticAggregator. */
+  private def getNewAggregator(
+  instances: Array[Instance],
+  coefficients: Vector,
+  fitIntercept: Boolean,
+  isMultinomial: Boolean): LogisticAggregator = {
+val (featuresSummarizer, ySummarizer) =
+  
DifferentiableLossAggregatorSuite.getClassificationSummarizers(instances)
+val numClasses = ySummarizer.histogram.length
+val featuresStd = featuresSummarizer.variance.toArray.map(math.sqrt)
+val bcFeaturesStd = spark.sparkContext.broadcast(featuresStd)
+val bcCoefficients = spark.sparkContext.broadcast(coefficients)
--- End diff --

I think we always try to destroy broadcast variable explicitly both in 
source code and test cases, like 
[here](https://github.com/apache/spark/pull/18152). Of course, these broadcast 
variables can be destroyed after spark session is torn down.
The reason of why we do this in source code is users application may be 
long-time running, so it will accumulate lots of these variables, waste lots of 
resource and slower your application.
The reason of why we do this in test case is we should keep same code route 
as in source code. Since we have encountered similar bugs which was not covered 
by test cases.
But in this case, I think it's safe to not destroy these variables. I just 
suggested to follow MLlib's convention. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18574: [SPARK-21350] [SQL] Fix the error message when the numbe...

2017-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18574
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18574: [SPARK-21350] [SQL] Fix the error message when the numbe...

2017-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18574
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79486/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-07-10 Thread pralabhkumar

Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
ping @sethah  @MLnick 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18574: [SPARK-21350] [SQL] Fix the error message when the numbe...

2017-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18574
  
**[Test build #79486 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79486/testReport)**
 for PR 18574 at commit 
[`5448be9`](https://github.com/apache/spark/commit/5448be96728ae1043d124433fc3521f538b6ca7a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18487: [SPARK-21243][Core] Limit no. of map outputs in a shuffl...

2017-07-10 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18487
  
Will this be convered by https://github.com/apache/spark/pull/18388 ? And 
another concern is how shall we expect users to tune this config? Can users 
just tune `spark.reducer.maxReqsInFlight` instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18591: [SPARK-21366][SQL][TEST] Add sql test for window functio...

2017-07-10 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/18591
  
also cc @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18591: [SPARK-21366][SQL][TEST] Add sql test for window functio...

2017-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18591
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79484/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18595: [SPARK-21370][SS] Create distinction between read...

2017-07-10 Thread brkyvz

Github user brkyvz closed the pull request at:

https://github.com/apache/spark/pull/18595


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18591: [SPARK-21366][SQL][TEST] Add sql test for window functio...

2017-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18591
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 497 matches

Mail list logo