from:"techaddict"

[GitHub] spark issue #22194: [SPARK-23932][SQL][FOLLOW-UP] Fix an example of zip_with...

2018-08-22 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/22194
  
@ueshin LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22031: [SPARK-23932][SQL] Higher order function zip_with

2018-08-15 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/22031#discussion_r210452329
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala
 ---
@@ -442,3 +442,91 @@ case class ArrayAggregate(
 
   override def prettyName: String = "aggregate"
 }
+
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(left, right, func) - Merges the two given arrays, 
element-wise, into a single array using function. If one array is shorter, 
nulls are appended at the end to match the length of the longer array, before 
applying function.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array('a', 'b', 'c'), (x, y) -> (y, 
x));
+   array(('a', 1), ('b', 3), ('c', 5))
+  > SELECT _FUNC_(array(1, 2), array(3, 4), (x, y) -> x + y));
+   array(4, 6)
+  > SELECT _FUNC_(array('a', 'b', 'c'), array('d', 'e', 'f'), (x, y) 
-> concat(x, y));
+   array('ad', 'be', 'cf')
+  """,
+  since = "2.4.0")
+// scalastyle:on line.size.limit
+case class ArraysZipWith(
+left: Expression,
+right: Expression,
+function: Expression)
+  extends HigherOrderFunction with CodegenFallback with ExpectsInputTypes {
+
+  override def inputs: Seq[Expression] = List(left, right)
+
+  override def functions: Seq[Expression] = List(function)
+
+  def expectingFunctionType: AbstractDataType = AnyDataType
+  @transient lazy val functionForEval: Expression = functionsForEval.head
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType, 
ArrayType, expectingFunctionType)
+
+  override def nullable: Boolean = inputs.exists(_.nullable)
+
+  override def dataType: ArrayType = ArrayType(function.dataType, 
function.nullable)
+
+  override def bind(f: (Expression, Seq[(DataType, Boolean)]) => 
LambdaFunction): ArraysZipWith = {
+val (leftElementType, leftContainsNull) = left.dataType match {
+  case ArrayType(elementType, containsNull) => (elementType, 
containsNull)
+  case _ =>
+val ArrayType(elementType, containsNull) = 
ArrayType.defaultConcreteType
+(elementType, containsNull)
+}
+val (rightElementType, rightContainsNull) = right.dataType match {
+  case ArrayType(elementType, containsNull) => (elementType, 
containsNull)
+  case _ =>
+val ArrayType(elementType, containsNull) = 
ArrayType.defaultConcreteType
+(elementType, containsNull)
+}
+copy(function = f(function,
+  (leftElementType, leftContainsNull) :: (rightElementType, 
rightContainsNull) :: Nil))
--- End diff --

@mn-mikke @ueshin "both arrays must be the same length" was how zip_with in 
presto used to work, they've moved to appending nulls and process regardless.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22031: [TODO][SPARK-23932][SQL] Higher order function zip_with

2018-08-14 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/22031
  
Hi @ueshin I will update the PR tommorow


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22031: [TODO][SPARK-23932][SQL] Higher order function zi...

2018-08-07 Thread techaddict

GitHub user techaddict opened a pull request:

https://github.com/apache/spark/pull/22031

[TODO][SPARK-23932][SQL] Higher order function zip_with

## What changes were proposed in this pull request?
Merges the two given arrays, element-wise, into a single array using 
function. If one array is shorter, nulls are appended at the end to match the 
length of the longer array, before applying function:
```
SELECT zip_with(ARRAY[1, 3, 5], ARRAY['a', 'b', 'c'], (x, y) -> (y, 
x)); -- [ROW('a', 1), ROW('b', 3), ROW('c', 5)]
SELECT zip_with(ARRAY[1, 2], ARRAY[3, 4], (x, y) -> x + y); -- [4, 6]
SELECT zip_with(ARRAY['a', 'b', 'c'], ARRAY['d', 'e', 'f'], (x, y) -> 
concat(x, y)); -- ['ad', 'be', 'cf']
SELECT zip_with(ARRAY['a'], ARRAY['d', null, 'f'], (x, y) -> 
coalesce(x, y)); -- ['a', null, 'f']
```
## How was this patch tested?
Added tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/techaddict/spark SPARK-23932

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22031.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22031


commit 03d19cee425be90a61b60163ff9d6740716d45a6
Author: Sandeep Singh 
Date:   2018-08-03T04:15:00Z

.

commit 6f91777de93121d668ff11e7701f449bb4c96337
Author: Sandeep Singh 
Date:   2018-08-04T22:00:38Z

fix description




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...

2017-05-17 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/14036
  
@HyukjinKwon didn't have bandwidth will try to finish this weekend


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2017-05-17 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15831
  
@HyukjinKwon was busy, will restart this week.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2017-01-09 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15831
  
@sethah I will revive this pr thanks ð 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2016-12-01 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15831
  
@MLnick I will create a umbrella jira and start adding jira's for things 
I'm aware of of and you can start prioritising ð  sounds like a plan ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2016-12-01 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15831
  
@sethah @yanboliang I've started with migrating `IDF`, can you review the 
WIP and if i'm going in the right direction 
https://github.com/techaddict/spark/pull/2/files
there is some code duplication were we can make mllib code actually depend 
on the ml one


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16101: [WIP] Migrate IDF to not used mllib

2016-12-01 Thread techaddict

GitHub user techaddict opened a pull request:

https://github.com/apache/spark/pull/16101

[WIP] Migrate IDF to not used mllib



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/techaddict/spark migrate-idf

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16101.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16101


commit fa220a6b21dd36591a44dfb7d32494fee7c60b08
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-12-01T14:59:36Z

add transform

commit cb869eb71392f47b9e63af3ba6aeaa031523baaf
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-12-01T15:02:30Z

make IDFModel work

commit d1bb36d3c93e99214aeaec34bffdd63c82724f89
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-12-01T15:02:59Z

since tag

commit 89546ec4e5248d71db39b519cf6a6d072b767bd1
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-12-01T15:15:37Z

works

commit 72f8c7d59da2224bd71b0d56e1f2c388e277f9df
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-12-01T15:22:27Z

works

commit 5cb2c3e4df4807941647e72cec1f41ce4f02018b
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-12-01T15:32:00Z

migrate everything to ml




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16101: [WIP] Migrate IDF to not used mllib

2016-12-01 Thread techaddict

Github user techaddict closed the pull request at:

https://github.com/apache/spark/pull/16101


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...

2016-12-01 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15843
  
@jkbradley @holdenk @viirya PR updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...

2016-11-30 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15843
  
@jkbradley @holdenk will update the PR with changes today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15817: [SPARK-18366][PYSPARK][ML] Add handleInvalid to Pyspark ...

2016-11-24 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15817
  
ping @davies @jkbradley 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2016-11-17 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15831
  
@sethah I agree, 2nd approach is much more reasonable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15817: [SPARK-18366][PYSPARK][ML] Add handleInvalid to Pyspark ...

2016-11-14 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15817
  
@jkbradley done ð 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark String...

2016-11-11 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15843
  
@holdenk updated the description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15817: [SPARK-18366][PYSPARK] Add handleInvalid to Pyspa...

2016-11-11 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/15817#discussion_r87621123
  
--- Diff: python/pyspark/ml/feature.py ---
@@ -1163,9 +1184,11 @@ class QuantileDiscretizer(JavaEstimator, 
HasInputCol, HasOutputCol, JavaMLReadab
 
 >>> df = spark.createDataFrame([(0.1,), (0.4,), (1.2,), (1.5,)], 
["values"])
 >>> qds = QuantileDiscretizer(numBuckets=2,
-... inputCol="values", outputCol="buckets", relativeError=0.01)
+... inputCol="values", outputCol="buckets", relativeError=0.01, 
handleInvalid="error")
 >>> qds.getRelativeError()
 0.01
+>>> qds.getHandleInvalid()
--- End diff --

good idea! adding


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15817: [SPARK-18366][PYSPARK] Add handleInvalid to Pyspark for ...

2016-11-11 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15817
  
@MLnick thanks for the review, addressed your comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15817: [SPARK-18366][PYSPARK] Add handleInvalid to Pyspa...

2016-11-11 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/15817#discussion_r87593705
  
--- Diff: python/pyspark/ml/feature.py ---
@@ -1194,21 +1217,30 @@ class QuantileDiscretizer(JavaEstimator, 
HasInputCol, HasOutputCol, JavaMLReadab
   "Must be in the range [0, 1].",
   typeConverter=TypeConverters.toFloat)
 
+handleInvalid = Param(Params._dummy(), "handleInvalid", "how to handle 
invalid entries. " +
+  "Options are skip (filter out rows with invalid 
values), " +
+  "error (throw an error), or keep (keep invalid 
values in a special " +
+  "additional bucket).",
+  typeConverter=TypeConverters.toString)
+
 @keyword_only
-def __init__(self, numBuckets=2, inputCol=None, outputCol=None, 
relativeError=0.001):
+def __init__(self, numBuckets=2, inputCol=None, outputCol=None, 
relativeError=0.001,
+ handleInvalid="error"):
 """
-__init__(self, numBuckets=2, inputCol=None, outputCol=None, 
relativeError=0.001)
+__init__(self, numBuckets=2, inputCol=None, outputCol=None, 
relativeError=0.001,
+handleInvalid="error")
 """
 super(QuantileDiscretizer, self).__init__()
 self._java_obj = 
self._new_java_obj("org.apache.spark.ml.feature.QuantileDiscretizer",
 self.uid)
-self._setDefault(numBuckets=2, relativeError=0.001)
+self._setDefault(numBuckets=2, relativeError=0.001, 
handleInvalid="error")
 kwargs = self.__init__._input_kwargs
 self.setParams(**kwargs)
 
 @keyword_only
 @since("2.0.0")
-def setParams(self, numBuckets=2, inputCol=None, outputCol=None, 
relativeError=0.001):
+def setParams(self, numBuckets=2, inputCol=None, outputCol=None, 
relativeError=0.001,
+  handleInvalid="error"):
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15817: [SPARK-18366][PYSPARK] Add handleInvalid to Pyspa...

2016-11-11 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/15817#discussion_r87593693
  
--- Diff: python/pyspark/ml/feature.py ---
@@ -158,19 +158,26 @@ class Bucketizer(JavaTransformer, HasInputCol, 
HasOutputCol, JavaMLReadable, Jav
   "splits specified will be treated as errors.",
   typeConverter=TypeConverters.toListFloat)
 
+handleInvalid = Param(Params._dummy(), "handleInvalid", "how to handle 
invalid entries. " +
+  "Options are skip (filter out rows with invalid 
values), " +
+  "error (throw an error), or keep (keep invalid 
values in a special " +
+  "additional bucket).",
+  typeConverter=TypeConverters.toString)
+
 @keyword_only
-def __init__(self, splits=None, inputCol=None, outputCol=None):
+def __init__(self, splits=None, inputCol=None, outputCol=None, 
handleInvalid="error"):
 """
-__init__(self, splits=None, inputCol=None, outputCol=None)
+__init__(self, splits=None, inputCol=None, outputCol=None, 
handleInvalid="error")
 """
 super(Bucketizer, self).__init__()
 self._java_obj = 
self._new_java_obj("org.apache.spark.ml.feature.Bucketizer", self.uid)
+self._setDefault(handleInvalid="error")
 kwargs = self.__init__._input_kwargs
 self.setParams(**kwargs)
 
 @keyword_only
 @since("1.4.0")
-def setParams(self, splits=None, inputCol=None, outputCol=None):
+def setParams(self, splits=None, inputCol=None, outputCol=None, 
handleInvalid="error"):
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark...

2016-11-11 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/15843#discussion_r87550799
  
--- Diff: python/pyspark/ml/wrapper.py ---
@@ -33,6 +33,10 @@ def __init__(self, java_obj=None):
 super(JavaWrapper, self).__init__()
 self._java_obj = java_obj
 
+def __del__(self):
+if SparkContext._active_spark_context:
--- End diff --

checking if there is active spark context, got this error after `quit()` in 
`pyspark`
```
Exception ignored in: 
Traceback (most recent call last):
File "/Users/xx/Project/Spark/python/pyspark/ml/wrapper.py", line
37, in __del__
SparkContext._active_spark_context._gateway.detach(self._java_obj)
AttributeError: 'NoneType' object has no attribute '_gateway'
Exception ignored in: 
Traceback (most recent call last):
File "/Users/xx/Project/Spark/python/pyspark/ml/wrapper.py", line
37, in __del__
AttributeError: 'NoneType' object has no attribute '_gateway'
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark String...

2016-11-10 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15843
  
@jkbradley looks good, merged ð 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark String...

2016-11-10 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15843
  
@jkbradley yes I did it for `JavaWrapper` first, but try running tests with 
it gives 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68478/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark String...

2016-11-10 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15843
  
cc: @jkbradley @davies @holdenk 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15817: [SPARK-18366][PYSPARK] Add handleInvalid to Pyspark for ...

2016-11-10 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15817
  
cc: @sethah @marmbrus 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15843: [SPARK-18274] Memory leak in PySpark StringIndexe...

2016-11-10 Thread techaddict

GitHub user techaddict opened a pull request:

https://github.com/apache/spark/pull/15843

[SPARK-18274] Memory leak in PySpark StringIndexer

## What changes were proposed in this pull request?
Make Java Gateway dereference object in destructor, using 
`SparkContext._gateway.detach` inside`JavaWrapper`'s destructor

## How was this patch tested?
```scala
import random, string
from pyspark.ml.feature import StringIndexer

l = [(''.join(random.choice(string.ascii_uppercase) for _ in range(10)), ) 
for _ in range(int(7e5))]  # 70 random strings of 10 characters
df = spark.createDataFrame(l, ['string'])

for i in range(50):
indexer = StringIndexer(inputCol='string', outputCol='index')
indexer.fit(df)
```
Before: would keep StringIndexer strong reference, causing GC issues and is 
halted midway
After: garbage collection works as the object is dereferenced, and 
computation completes

Testing using profiler

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/techaddict/spark SPARK-18274

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15843.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15843


commit a493c1961829000986446db11ce67f3103a79bea
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-11-10T16:16:13Z

[SPARK-18274] Memory leak in PySpark StringIndexer




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2016-11-09 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15831
  
cc: @dbtsai @mengxr 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15831: [SPARK-18385][ML] Make the transformer's natively...

2016-11-09 Thread techaddict

GitHub user techaddict opened a pull request:

https://github.com/apache/spark/pull/15831

[SPARK-18385][ML] Make the transformer's natively in ml framework to avoid 
extra conversion

## What changes were proposed in this pull request?
Transformer's added in ml framework to avoid extra conversion for:
ChiSqSelector
IDF
StandardScaler
PCA

## How was this patch tested?
Existing Tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/techaddict/spark ml-transformer

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15831.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15831


commit da3626168ce264719517a8d34afdc500991fb700
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-11-09T14:53:14Z

ChiSqSelector: make the transformer natively in ml framework to avoid extra 
conversion

commit 733394fb3d7f4ea6891a4f6b0e41a03c9a1abc38
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-11-09T15:40:24Z

add transformer for IDF

commit da437316879a6e2cb9df9549e28ea9b1b95b63d5
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-11-09T15:55:22Z

add StandardScaler transform

commit a9483ef41423f2dfdc3bfb747a3bcf99ea1db50b
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-11-09T16:03:01Z

add PCA transform




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15817: [SPARK-18366][PYSPARK] Add handleInvalid to Pyspa...

2016-11-08 Thread techaddict

GitHub user techaddict opened a pull request:

https://github.com/apache/spark/pull/15817

[SPARK-18366][PYSPARK] Add handleInvalid to Pyspark for QuantileDiscretizer 
and Bucketizer

## What changes were proposed in this pull request?
added the new handleInvalid param for these transformers to Python to 
maintain API parity.

## How was this patch tested?
existing tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/techaddict/spark SPARK-18366

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15817.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15817


commit 0e41b36493fcb5eee5f342f694b0d2bc2a1e6c41
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-11-09T00:03:21Z

add handleInvalid to QuantileDiscretizer

commit 3b5133cac34dc42db71fabbf12c0a8e44d0fb2ba
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-11-09T00:09:13Z

fix lint issues

commit 20bfd9b3e1028e54619a992a4b333b4fe8c694bc
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-11-09T00:15:10Z

handleInvalid to Bucketizer

commit 19224724350a6d6c1936b496784131309ce286b0
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-11-09T00:15:52Z

fix lint error

commit b4720aa49eb94092aa255dcaa47f3e52b44cd6d2
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-11-09T00:21:04Z

Merge branch 'master' into SPARK-18366




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15809: [SPARK-18268] ALS.run fail with better message if rating...

2016-11-08 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15809
  
@srowen done ð 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15809: [SPARK-18268] ALS.run fail with better message if...

2016-11-08 Thread techaddict

GitHub user techaddict opened a pull request:

https://github.com/apache/spark/pull/15809

[SPARK-18268] ALS.run fail with better message if ratings is empty rdd

## What changes were proposed in this pull request?
ALS.run fail with better message if ratings is empty rdd
ALS.train and ALS.trainImplicit are also affected

## How was this patch tested?
existing tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/techaddict/spark SPARK-18268

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15809.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15809


commit 8080583922dc8f274d559ec3ea985d1bc9d171b9
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-11-08T15:22:49Z

[SPARK-18268] ALS.run fail with better message if ratings is empty rdd

ALS.train and ALS.trainImplicit are also affected




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15654: [SPARK-16881][MESOS] Migrate Mesos configs to use Config...

2016-11-01 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15654
  
@mgummelt yes working on it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15654: [SPARK-16881][MESOS] Migrate Mesos configs to use Config...

2016-10-31 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15654
  
@mgummelt  done! ð 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15654: [SPARK-16881][MESOS] Migrate Mesos configs to use Config...

2016-10-28 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15654
  
cc: @mgummelt @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15654: [SPARK-16881][MESOS] Migrate Mesos configs to use...

2016-10-26 Thread techaddict

GitHub user techaddict opened a pull request:

https://github.com/apache/spark/pull/15654

[SPARK-16881][MESOS] Migrate Mesos configs to use ConfigEntry

## What changes were proposed in this pull request?
Migrate Mesos configs to use ConfigEntry

## How was this patch tested?
Jenkins Tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/techaddict/spark SPARK-16881

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15654.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15654


commit 55ff640abd8703826590bde7d0d4f7604272142e
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-10-27T02:59:16Z

[SPARK-16881] Migrate Mesos configs to use ConfigEntry

commit af306bd3c2d182d890fd769dffb190da2c2620ab
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-10-27T02:59:57Z

Merge branch 'master' into SPARK-16881




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15433: [SPARK-17822][SPARKR] Use weak reference in JVMOb...

2016-10-23 Thread techaddict

Github user techaddict closed the pull request at:

https://github.com/apache/spark/pull/15433


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15433: [SPARK-17822][SPARKR] Use weak reference in JVMObjectTra...

2016-10-23 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15433
  
closing this since, its maybe not the right way to do this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12913: [SPARK-928][CORE] Add support for Unsafe-based serialize...

2016-10-22 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/12913
  
@rxin can you review again, all comments addressed ð 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #12913: [SPARK-928][CORE] Add support for Unsafe-based se...

2016-10-21 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/12913#discussion_r84570678
  
--- Diff: 
core/src/test/scala/org/apache/spark/serializer/UnsafeKryoSerializerSuite.scala 
---
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.serializer
+
+class UnsafeKryoSerializerSuite extends KryoSerializerSuite {
+
+  // This test suite should run all tests in KryoSerializerSuite with kryo 
unsafe.
+
+  override def beforeAll() {
+super.beforeAll()
+conf.set("spark.kryo.unsafe", "true")
--- End diff --

Ohh yes, fixed and tested ð 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12913: [SPARK-928][CORE] Add support for Unsafe-based serialize...

2016-10-20 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/12913
  
@mateiz updated ð 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12913: [SPARK-928][CORE] Add support for Unsafe-based serialize...

2016-10-19 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/12913
  
@mateiz updated the pr ð 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15433: [SPARK-17822][SPARKR] Use weak reference in JVMObjectTra...

2016-10-13 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15433
  
@shivaram @srowen not sure why its failing, will try to fix this ASAP.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15433: [SPARK-17822] Use weak reference in JVMObjectTrac...

2016-10-12 Thread techaddict

GitHub user techaddict reopened a pull request:

https://github.com/apache/spark/pull/15433

[SPARK-17822] Use weak reference in JVMObjectTracker.objMap because it may 
leak JVM objects

## What changes were proposed in this pull request?
Use weak reference in JVMObjectTracker.objMap because it may leak JVM 
objects

## How was this patch tested?
existing tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/techaddict/spark SPARK-17822

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15433.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15433


commit 7023c40a99eaa81ee7bcd202a4b74df811d0cfc7
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-10-10T11:54:34Z

[SPARK-17822] Use weak reference in JVMObjectTracker.objMap because it may 
leak JVM objects

commit 69845947df62187eb40f3cc6468b52e38bdab897
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-10-10T13:23:56Z

Merge branch 'master' into SPARK-17822

commit 995611d75351d24907ce2b22e7d33752cc803da3
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-10-11T13:13:09Z

Merge branch 'master' into SPARK-17822

commit 8e763bef78fe147e84e1771f237a75ff42780705
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-10-12T06:33:26Z

fix for failures

commit 7d50d84f90fcda9e5dec79c9be834870c83443c4
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-10-12T06:34:23Z

Merge branch 'master' into SPARK-17822




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15433: [SPARK-17822] Use weak reference in JVMObjectTrac...

2016-10-12 Thread techaddict

Github user techaddict closed the pull request at:

https://github.com/apache/spark/pull/15433


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15433: [SPARK-17822] Use weak reference in JVMObjectTrac...

2016-10-11 Thread techaddict

GitHub user techaddict opened a pull request:

https://github.com/apache/spark/pull/15433

[SPARK-17822] Use weak reference in JVMObjectTracker.objMap because it may 
leak JVM objects

## What changes were proposed in this pull request?
Use weak reference in JVMObjectTracker.objMap because it may leak JVM 
objects

## How was this patch tested?
existing tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/techaddict/spark SPARK-17822

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15433.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15433


commit 7023c40a99eaa81ee7bcd202a4b74df811d0cfc7
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-10-10T11:54:34Z

[SPARK-17822] Use weak reference in JVMObjectTracker.objMap because it may 
leak JVM objects

commit 69845947df62187eb40f3cc6468b52e38bdab897
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-10-10T13:23:56Z

Merge branch 'master' into SPARK-17822




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13334: [SPARK-15576] Add back hive tests blacklisted by ...

2016-09-08 Thread techaddict

Github user techaddict closed the pull request at:

https://github.com/apache/spark/pull/13334


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13767: [MINOR][SQL] Not dropping all necessary tables

2016-09-03 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/13767
  
@srowen yes, the issue is still there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13767: [MINOR][SQL] Not dropping all necessary tables

2016-09-03 Thread techaddict

GitHub user techaddict reopened a pull request:

https://github.com/apache/spark/pull/13767

[MINOR][SQL] Not dropping all necessary tables

## What changes were proposed in this pull request?
was not dropping table `parquet_t3`

## How was this patch tested?
tested `LogicalPlanToSQLSuite` locally

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/techaddict/spark minor-8

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13767.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13767


commit a2bab62abf9de24b4f09f1c3a31bcc468f1af8a4
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-06-19T06:11:28Z

[MINOR][SQL] Not dropping all necessary tables

Not dropping table `parquet_t3`




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13767: [MINOR][SQL] Not dropping all necessary tables

2016-09-03 Thread techaddict

Github user techaddict closed the pull request at:

https://github.com/apache/spark/pull/13767


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14924: [SPARK-17299] TRIM/LTRIM/RTRIM should not strips charact...

2016-09-02 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/14924
  
@srowen yes in stringExpressions the trim is on with UTF8String.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14924: [SPARK-17299] TRIM/LTRIM/RTRIM should not strips charact...

2016-09-01 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/14924
  
@rxin Done ð 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14924: [SPARK-17299] TRIM/LTRIM/RTRIM should not strips ...

2016-09-01 Thread techaddict

GitHub user techaddict opened a pull request:

https://github.com/apache/spark/pull/14924

[SPARK-17299] TRIM/LTRIM/RTRIM should not strips characters other than 
spaces

## What changes were proposed in this pull request?
TRIM/LTRIM/RTRIM should not strips characters other than spaces, we were 
trimming all chars small than ASCII 0x20(space)

## How was this patch tested?
fixed existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/techaddict/spark SPARK-17299

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14924.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14924


commit 58c2a5dae4fc372ed5b7f2ff8e47ab4d6bb9e76e
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-09-01T19:29:17Z

[SPARK-17299] TRIM/LTRIM/RTRIM should not strips characters other than 
spaces




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12913: [SPARK-928][CORE] Add support for Unsafe-based serialize...

2016-08-05 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/12913
  
@holdenk Updated the PR, ready for review again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #12913: [SPARK-928][CORE] Add support for Unsafe-based se...

2016-08-03 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/12913#discussion_r73454617
  
--- Diff: 
core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala ---
@@ -399,6 +399,14 @@ class KryoSerializerSuite extends SparkFunSuite with 
SharedSparkContext {
 assert(!ser2.getAutoReset)
   }
 
+  private def testBothUnsafeAndSafe(f: SparkConf => Unit): Unit = {
--- End diff --

Yes will update the pr today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #11105: [SPARK-12469][CORE] Data Property accumulators fo...

2016-08-03 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/11105#discussion_r73330940
  
--- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala 
---
@@ -220,8 +220,27 @@ class TaskMetrics private[spark] () extends 
Serializable {
*/
   @transient private[spark] lazy val externalAccums = new 
ArrayBuffer[AccumulatorV2[_, _]]
 
+   /**
+* All data property accumulators registered with this task.
+*/
+   @transient private lazy val dataPropertyAccums = new 
ArrayBuffer[AccumulatorV2[_, _]]
+
   private[spark] def registerAccumulator(a: AccumulatorV2[_, _]): Unit = {
 externalAccums += a
+if (a.dataProperty) {
+  dataPropertyAccums += a
+}
+  }
+
+  private[spark] def hasDataPropertyAccumulators(): Boolean = {
+!dataPropertyAccums.isEmpty
--- End diff --

nit: could be nonEmpty


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #11105: [SPARK-12469][CORE] Data Property accumulators fo...

2016-08-03 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/11105#discussion_r73330741
  
--- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala 
---
@@ -220,8 +220,27 @@ class TaskMetrics private[spark] () extends 
Serializable {
*/
   @transient private[spark] lazy val externalAccums = new 
ArrayBuffer[AccumulatorV2[_, _]]
 
+   /**
+* All data property accumulators registered with this task.
+*/
+   @transient private lazy val dataPropertyAccums = new 
ArrayBuffer[AccumulatorV2[_, _]]
+
   private[spark] def registerAccumulator(a: AccumulatorV2[_, _]): Unit = {
 externalAccums += a
+if (a.dataProperty) {
+  dataPropertyAccums += a
+}
+  }
+
+  private[spark] def hasDataPropertyAccumulators(): Boolean = {
+!dataPropertyAccums.isEmpty
+  }
+
+  /**
+   * Mark an rdd/shuffle/and partition as fully processed for all 
dataProperty accumulators.
+   */
+  private[spark] def markFullyProcessed(rddId: Int, shuffleWriteId: Int, 
partitionId: Int) = {
+dataPropertyAccums.map(_.markFullyProcessed(rddId, shuffleWriteId, 
partitionId))
--- End diff --

should be `foreach` instead of `map`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13334: [SPARK-15576] Add back hive tests blacklisted by SPARK-1...

2016-07-29 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/13334
  
@andrewor14 ping.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14315: [HOTFIX][BUILD][SPARK-16287][SQL] Fix annotation argumen...

2016-07-21 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/14315
  
@jaceklaskowski thanks for finding this out. Its weird it passed locally 
too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...

2016-07-21 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/14036
  
@yhuai sure, doing performance testing using sql query or expression ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13990: [SPARK-16287][SQL] Implement str_to_map SQL function

2016-07-18 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/13990
  
@cloud-fan Comment addressed, test passed ð


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13990: [SPARK-16287][SQL] Implement str_to_map SQL function

2016-07-18 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/13990
  
@cloud-fan np, ð 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13990: [SPARK-16287][SQL] Implement str_to_map SQL function

2016-07-14 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/13990
  
@cloud-fan anything else, it good to merge ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid un...

2016-07-13 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/14036#discussion_r70639783
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 ---
@@ -234,6 +234,7 @@ object FunctionRegistry {
 expression[Subtract]("-"),
 expression[Multiply]("*"),
 expression[Divide]("/"),
+expression[IntegerDivide]("div"),
--- End diff --

@lianhuiwang doing ```div(4,2)``` gives
```
hive> div(4, 2);
NoViableAltException(14@[])
at 
org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1099)
at 
org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:204)
at 
org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:440)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:319)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1249)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1295)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1178)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1166)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:236)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:782)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:721)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:648)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
FAILED: ParseException line 1:0 cannot recognize input near 'div' '(' '4'
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...

2016-07-13 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/14036
  
@cloud-fan Done ð 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...

2016-07-12 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/14036
  
@cloud-fan Updated the PR, all tests should pass now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid un...

2016-07-12 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/14036#discussion_r70563932
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 ---
@@ -249,11 +241,12 @@ case class Divide(left: Expression, right: Expression)
   s"${eval2.value} == 0"
 }
 val javaType = ctx.javaType(dataType)
-val divide = if (dataType.isInstanceOf[DecimalType]) {
+val divide = if (dataType.isInstanceOf[DecimalType] || 
dataType.isInstanceOf[DoubleType]) {
   s"${eval1.value}.$decimalMethod(${eval2.value})"
 } else {
-  s"($javaType)(${eval1.value} $symbol ${eval2.value})"
+  s"($javaType)(${eval1.value} $decimalMethod ${eval2.value})"
--- End diff --

but what about the decimalMethod used in line 245 ? if we inline the 
operator `/` there too it gives 
```Binary numeric promotion not possible on types 
"org.apache.spark.sql.types.Decimal" and "org.apache.spark.sql.types.Decimal"```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13990: [SPARK-16287][SQL] Implement str_to_map SQL funct...

2016-07-12 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/13990#discussion_r70562875
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala ---
@@ -384,4 +384,39 @@ class StringFunctionsSuite extends QueryTest with 
SharedSQLContext {
 }.getMessage
 assert(m.contains("Invalid number of arguments for function 
sentences"))
   }
+
+  test("str_to_map function") {
+val df1 = Seq(
+  ("a=1,b=2", "y"),
+  ("a=1,b=2,c=3", "y")
+).toDF("a", "b")
+
+checkAnswer(
+  df1.selectExpr("str_to_map(a,',','=')"),
+  Seq(
+Row(Map("a" -> "1", "b" -> "2")),
+Row(Map("a" -> "1", "b" -> "2", "c" -> "3"))
+  )
+)
+
+val df2 = Seq(("a:1,b:2,c:3", "y")).toDF("a", "b")
+
+checkAnswer(
+  df2.selectExpr("str_to_map(a)"),
+  Seq(Row(Map("a" -> "1", "b" -> "2", "c" -> "3")))
+)
+
+// All arguments should be string literals.
+val m1 = intercept[AnalysisException]{
+  sql("select str_to_map('a:1,b:2,c:3',null,null)").collect()
--- End diff --

It gives ```FAILED: SemanticException [Error 10014]: Line 1:7 Wrong 
arguments 'TOK_NULL': All argument should be string/character type```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13990: [SPARK-16287][SQL] Implement str_to_map SQL function

2016-07-12 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/13990
  
@cloud-fan all comments addressed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13990: [SPARK-16287][SQL] Implement str_to_map SQL funct...

2016-07-12 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/13990#discussion_r70559727
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -393,3 +394,54 @@ case class CreateNamedStructUnsafe(children: 
Seq[Expression]) extends Expression
 
   override def prettyName: String = "named_struct_unsafe"
 }
+
+/**
+ * Creates a map after splitting the input text into key/value pairs using 
delimeters
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(text[, pairDelim, keyValueDelim]) - Creates a map after 
splitting the text " +
+"into key/value pairs using delimiters. " +
+"Default delimiters are ',' for pairDelim and ':' for keyValueDelim.",
+  extended = """ > SELECT _FUNC_('a:1,b:2,c:3',',',':');\n 
map("a":"1","b":"2","c":"3") """)
+case class StringToMap(text: Expression, pairDelim: Expression, 
keyValueDelim: Expression)
+  extends TernaryExpression with CodegenFallback{
+
+  def this(child: Expression, pairDelim: Expression) = {
+this(child, pairDelim, Literal(":"))
+  }
+
+  def this(child: Expression) = {
+this(child, Literal(","), Literal(":"))
+  }
+
+  override def children: Seq[Expression] = Seq(text, pairDelim, 
keyValueDelim)
+
+  override def dataType: DataType = MapType(StringType, StringType, 
valueContainsNull = false)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.map(_.dataType).forall(_ == StringType)) {
+  TypeCheckResult.TypeCheckSuccess
+} else if (Seq(pairDelim, keyValueDelim).forall(_.foldable)) {
+  TypeCheckResult.TypeCheckFailure(s"String To Map's all arguments 
must be of type string.")
+} else {
+  TypeCheckResult.TypeCheckFailure(
--- End diff --

First all children should have dataType StringType, if they don't and 
delims are foldable fail using args must be string. else they are not foldable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid un...

2016-07-12 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/14036#discussion_r70471772
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 ---
@@ -237,6 +229,9 @@ case class Divide(left: Expression, right: Expression)
 }
   }
 
+  // Used by doGenCode
+  protected def divide(eval1: ExprCode, eval2: ExprCode, javaType: 
String): String
--- End diff --

@cloud-fan yes but getting ```A method named "$div" is not declared in any 
enclosing class nor any supertype, nor through a static import``` in the 
updated pr for `Code generation of (2.0 / 1.0)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid un...

2016-07-12 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/14036#discussion_r70437720
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 ---
@@ -237,6 +229,9 @@ case class Divide(left: Expression, right: Expression)
 }
   }
 
+  // Used by doGenCode
+  protected def divide(eval1: ExprCode, eval2: ExprCode, javaType: 
String): String
--- End diff --

I did it on purpose. we can't call `$div` on `byte's` and plus if I try to 
call `value = value1 / value2;` for decimals, I get ```Binary numeric promotion 
not possible on types "org.apache.spark.sql.types.Decimal" and 
"org.apache.spark.sql.types.Decimal"```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13990: [SPARK-16287][SQL] Implement str_to_map SQL funct...

2016-07-12 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/13990#discussion_r70434309
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -393,3 +394,84 @@ case class CreateNamedStructUnsafe(children: 
Seq[Expression]) extends Expression
 
   override def prettyName: String = "named_struct_unsafe"
 }
+
+/**
+ * Creates a map after splitting the input text into key/value pairs using 
delimeters
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(text[, pairDelim, keyValueDelim]) - Creates a map after 
splitting the text " +
+"into key/value pairs using delimiters. " +
+"Default delimiters are ',' for pairDelim and ':' for keyValueDelim.",
+  extended = """ > SELECT _FUNC_('a:1,b:2,c:3',',',':');\n 
map("a":"1","b":"2","c":"3") """)
+case class StringToMap(text: Expression, pairDelim: Expression, 
keyValueDelim: Expression)
+  extends TernaryExpression {
+
+  def this(child: Expression, pairDelim: Expression) = {
+this(child, pairDelim, Literal(":"))
+  }
+
+  def this(child: Expression) = {
+this(child, Literal(","), Literal(":"))
+  }
+
+  override def children: Seq[Expression] = Seq(text, pairDelim, 
keyValueDelim)
+
+  override def dataType: DataType = MapType(StringType, StringType, 
valueContainsNull = false)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.map(_.dataType).forall(_ == StringType)) {
+  TypeCheckResult.TypeCheckSuccess
+} else {
+  TypeCheckResult.TypeCheckFailure(s"String To Map's all arguments 
should be string literal.")
--- End diff --

only text should be foldable ? or all three ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13990: [SPARK-16287][SQL] Implement str_to_map SQL function

2016-07-10 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/13990
  
@rxin no need, I will update this today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...

2016-07-06 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/14036
  
@rxin @cloud-fan done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13990: [SPARK-16287][SQL] Implement str_to_map SQL funct...

2016-07-06 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/13990#discussion_r69772459
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -441,10 +452,15 @@ case class StringToMap(text: Expression, pairDelim: 
Expression, keyValueDelim: E
 UTF8String[] $keyArray = new UTF8String[$tempArray.length];
 UTF8String[] $valueArray = new UTF8String[$tempArray.length];
 
-for (int $i = 0; $i < $tempArray.length; $i ++) {
+for (int $i = 0; $i < $tempArray.length; $i++) {
   UTF8String[] $keyValue = ($tempArray[$i]).split($keyValueDelim, 
2);
   $keyArray[$i] = $keyValue[0];
-  $valueArray[$i] = $keyValue[1];
+  if ($keyValue.length < 2) {
+$valueArray[$i] = null;
+  }
+  else {
+$valueArray[$i] = $keyValue[1];
+  }
--- End diff --

I think that syntax is allowed in java, and it's much more readable than 
tertiary if notation. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13990: [SPARK-16287][SQL] Implement str_to_map SQL funct...

2016-07-05 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/13990#discussion_r69676815
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -393,3 +393,71 @@ case class CreateNamedStructUnsafe(children: 
Seq[Expression]) extends Expression
 
   override def prettyName: String = "named_struct_unsafe"
 }
+
+/**
+ * Creates a map after splitting the input text into key/value pairs using 
delimeters
+ */
+@ExpressionDescription(
+  usage = """_FUNC_(text[, pairDelim, keyValueDelim]) - Creates a map 
after splitting the text into
--- End diff --

not sure about the display ```[Usage: str_to_map(text[, pairDelim, 
keyValueDelim]) - Creates a map after splitting the text into
key/value pairs using delimiters.
Default delimiters are ',' for pairDelim and '=' for keyValueDelim.]```
added example


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13990: [SPARK-16287][SQL] Implement str_to_map SQL funct...

2016-07-05 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/13990#discussion_r69675997
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -393,3 +393,71 @@ case class CreateNamedStructUnsafe(children: 
Seq[Expression]) extends Expression
 
   override def prettyName: String = "named_struct_unsafe"
 }
+
+/**
+ * Creates a map after splitting the input text into key/value pairs using 
delimeters
+ */
+@ExpressionDescription(
+  usage = """_FUNC_(text[, pairDelim, keyValueDelim]) - Creates a map 
after splitting the text into
+key/value pairs using delimiters.
+Default delimiters are ',' for pairDelim and '=' for keyValueDelim.""")
+case class StringToMap(text: Expression, pairDelim: Expression, 
keyValueDelim: Expression)
+  extends TernaryExpression with ExpectsInputTypes {
+
+  def this(child: Expression) = {
+this(child, Literal(","), Literal("="))
+  }
+
+  override def children: Seq[Expression] = Seq(text, pairDelim, 
keyValueDelim)
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(StringType, 
StringType, StringType)
+
+  override def dataType: DataType = MapType(StringType, StringType, 
valueContainsNull = false)
+
+  override def nullSafeEval(str: Any, delim1: Any, delim2: Any): Any = {
+val array = str.asInstanceOf[UTF8String]
+  .split(delim1.asInstanceOf[UTF8String], -1)
+  .map{_.split(delim2.asInstanceOf[UTF8String], 2)}
+
+ArrayBasedMapData(array.map(_(0)), 
array.map(_(1))).asInstanceOf[MapData]
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+nullSafeCodeGen(ctx, ev, (text, delim1, delim2) => {
+  val arrayClass = classOf[GenericArrayData].getName
+  val mapClass = classOf[ArrayBasedMapData].getName
+  val keyArray = ctx.freshName("keyArray")
+  val valueArray = ctx.freshName("valueArray")
+  ctx.addMutableState("UTF8String[]", keyArray, s"this.$keyArray = 
null;")
+  ctx.addMutableState("UTF8String[]", valueArray, s"this.$valueArray = 
null;")
--- End diff --

ohh yes, makes sense. Made the change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13990: [SPARK-16287][SQL] Implement str_to_map SQL funct...

2016-07-05 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/13990#discussion_r69675457
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -393,3 +393,71 @@ case class CreateNamedStructUnsafe(children: 
Seq[Expression]) extends Expression
 
   override def prettyName: String = "named_struct_unsafe"
 }
+
+/**
+ * Creates a map after splitting the input text into key/value pairs using 
delimeters
+ */
+@ExpressionDescription(
+  usage = """_FUNC_(text[, pairDelim, keyValueDelim]) - Creates a map 
after splitting the text into
+key/value pairs using delimiters.
+Default delimiters are ',' for pairDelim and '=' for keyValueDelim.""")
+case class StringToMap(text: Expression, pairDelim: Expression, 
keyValueDelim: Expression)
+  extends TernaryExpression with ExpectsInputTypes {
+
+  def this(child: Expression) = {
+this(child, Literal(","), Literal("="))
+  }
+
+  override def children: Seq[Expression] = Seq(text, pairDelim, 
keyValueDelim)
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(StringType, 
StringType, StringType)
+
+  override def dataType: DataType = MapType(StringType, StringType, 
valueContainsNull = false)
+
+  override def nullSafeEval(str: Any, delim1: Any, delim2: Any): Any = {
+val array = str.asInstanceOf[UTF8String]
+  .split(delim1.asInstanceOf[UTF8String], -1)
+  .map{_.split(delim2.asInstanceOf[UTF8String], 2)}
+
+ArrayBasedMapData(array.map(_(0)), 
array.map(_(1))).asInstanceOf[MapData]
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+nullSafeCodeGen(ctx, ev, (text, delim1, delim2) => {
+  val arrayClass = classOf[GenericArrayData].getName
+  val mapClass = classOf[ArrayBasedMapData].getName
+  val keyArray = ctx.freshName("keyArray")
+  val valueArray = ctx.freshName("valueArray")
+  ctx.addMutableState("UTF8String[]", keyArray, s"this.$keyArray = 
null;")
+  ctx.addMutableState("UTF8String[]", valueArray, s"this.$valueArray = 
null;")
--- End diff --

And we are doing similar stuff in `CreateMap` and doing same 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala#L129


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13990: [SPARK-16287][SQL] Implement str_to_map SQL funct...

2016-07-05 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/13990#discussion_r69675325
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -393,3 +393,71 @@ case class CreateNamedStructUnsafe(children: 
Seq[Expression]) extends Expression
 
   override def prettyName: String = "named_struct_unsafe"
 }
+
+/**
+ * Creates a map after splitting the input text into key/value pairs using 
delimeters
+ */
+@ExpressionDescription(
+  usage = """_FUNC_(text[, pairDelim, keyValueDelim]) - Creates a map 
after splitting the text into
+key/value pairs using delimiters.
+Default delimiters are ',' for pairDelim and '=' for keyValueDelim.""")
+case class StringToMap(text: Expression, pairDelim: Expression, 
keyValueDelim: Expression)
+  extends TernaryExpression with ExpectsInputTypes {
+
+  def this(child: Expression) = {
+this(child, Literal(","), Literal("="))
+  }
+
+  override def children: Seq[Expression] = Seq(text, pairDelim, 
keyValueDelim)
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(StringType, 
StringType, StringType)
+
+  override def dataType: DataType = MapType(StringType, StringType, 
valueContainsNull = false)
+
+  override def nullSafeEval(str: Any, delim1: Any, delim2: Any): Any = {
+val array = str.asInstanceOf[UTF8String]
+  .split(delim1.asInstanceOf[UTF8String], -1)
+  .map{_.split(delim2.asInstanceOf[UTF8String], 2)}
+
+ArrayBasedMapData(array.map(_(0)), 
array.map(_(1))).asInstanceOf[MapData]
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+nullSafeCodeGen(ctx, ev, (text, delim1, delim2) => {
+  val arrayClass = classOf[GenericArrayData].getName
+  val mapClass = classOf[ArrayBasedMapData].getName
+  val keyArray = ctx.freshName("keyArray")
+  val valueArray = ctx.freshName("valueArray")
+  ctx.addMutableState("UTF8String[]", keyArray, s"this.$keyArray = 
null;")
+  ctx.addMutableState("UTF8String[]", valueArray, s"this.$valueArray = 
null;")
--- End diff --

I get ```Caused by: org.codehaus.commons.compiler.CompileException: File 
'generated.java', Line 56, Column 16: 
"org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection"
 has no field "keyArray"```
for 
```java 
this.keyArray = null;
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13990: [SPARK-16287][SQL] Implement str_to_map SQL funct...

2016-07-05 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/13990#discussion_r69670913
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -393,3 +393,71 @@ case class CreateNamedStructUnsafe(children: 
Seq[Expression]) extends Expression
 
   override def prettyName: String = "named_struct_unsafe"
 }
+
+/**
+ * Creates a map after splitting the input text into key/value pairs using 
delimeters
+ */
+@ExpressionDescription(
+  usage = """_FUNC_(text[, pairDelim, keyValueDelim]) - Creates a map 
after splitting the text into
+key/value pairs using delimiters.
+Default delimiters are ',' for pairDelim and '=' for keyValueDelim.""")
+case class StringToMap(text: Expression, pairDelim: Expression, 
keyValueDelim: Expression)
+  extends TernaryExpression with ExpectsInputTypes {
+
+  def this(child: Expression) = {
+this(child, Literal(","), Literal("="))
+  }
+
+  override def children: Seq[Expression] = Seq(text, pairDelim, 
keyValueDelim)
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(StringType, 
StringType, StringType)
+
+  override def dataType: DataType = MapType(StringType, StringType, 
valueContainsNull = false)
+
+  override def nullSafeEval(str: Any, delim1: Any, delim2: Any): Any = {
+val array = str.asInstanceOf[UTF8String]
+  .split(delim1.asInstanceOf[UTF8String], -1)
+  .map{_.split(delim2.asInstanceOf[UTF8String], 2)}
+
+ArrayBasedMapData(array.map(_(0)), 
array.map(_(1))).asInstanceOf[MapData]
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+nullSafeCodeGen(ctx, ev, (text, delim1, delim2) => {
+  val arrayClass = classOf[GenericArrayData].getName
+  val mapClass = classOf[ArrayBasedMapData].getName
+  val keyArray = ctx.freshName("keyArray")
+  val valueArray = ctx.freshName("valueArray")
+  ctx.addMutableState("UTF8String[]", keyArray, s"this.$keyArray = 
null;")
+  ctx.addMutableState("UTF8String[]", valueArray, s"this.$valueArray = 
null;")
--- End diff --

It won't let me assign value to these 
vars(https://github.com/apache/spark/pull/13990/files#diff-c1758d627a06084e577be0d33d47f44eR457)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13990: [SPARK-16287][SQL] Implement str_to_map SQL funct...

2016-07-05 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/13990#discussion_r69605110
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -393,3 +393,73 @@ case class CreateNamedStructUnsafe(children: 
Seq[Expression]) extends Expression
 
   override def prettyName: String = "named_struct_unsafe"
 }
+
+/**
+ * Creates a map after splitting the input text into key/value pairs using 
delimeters
+ */
+@ExpressionDescription(
+  usage = """_FUNC_(text[, pairDelim, keyValueDelim]) - Creates a map 
after splitting the text into
+key/value pairs using delimiters.
+Default delimiters are ',' for pairDelim and '=' for keyValueDelim.""")
+case class StringToMap(child: Expression, pairDelim: Expression, 
keyValueDelim: Expression)
+  extends TernaryExpression with ExpectsInputTypes {
+
+  def this(child: Expression) = {
+this(child, Literal(","), Literal("="))
+  }
+
+  override def children: Seq[Expression] = Seq(child, pairDelim, 
keyValueDelim)
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(StringType, 
StringType, StringType)
--- End diff --

not sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...

2016-07-05 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/14036
  
@cloud-fan addressed all your comments ð 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13334: [SPARK-15576] Add back hive tests blacklisted by SPARK-1...

2016-07-04 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/13334
  
@andrewor14 I've made the changes, can you take a look now ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13334: [SPARK-15576] Add back hive tests blacklisted by ...

2016-07-04 Thread techaddict

GitHub user techaddict reopened a pull request:

https://github.com/apache/spark/pull/13334

[SPARK-15576] Add back hive tests blacklisted by SPARK-15539

## What changes were proposed in this pull request?
Add back hive tests blacklisted by SPARK-15539

## How was this patch tested?
ran HiveCompatibilitySuite

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/techaddict/spark SPARK-15576

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13334.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13334


commit 6361ac5653733f89d9697101a4cac52f17901c61
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-05-26T19:34:48Z

[SPARK-15576] Add back hive tests blacklisted by SPARK-15539




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid un...

2016-07-03 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/14036#discussion_r69407742
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 ---
@@ -285,6 +284,75 @@ case class Divide(left: Expression, right: Expression)
 }
 
 @ExpressionDescription(
+  usage = "a _FUNC_ b - Divides a by b.",
+  extended = "> SELECT 3 _FUNC_ 2;\n 1")
+case class IntegerDivide(left: Expression, right: Expression)
--- End diff --

Done 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid un...

2016-07-03 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/14036#discussion_r69392406
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 ---
@@ -285,6 +284,75 @@ case class Divide(left: Expression, right: Expression)
 }
 
 @ExpressionDescription(
+  usage = "a _FUNC_ b - Divides a by b.",
+  extended = "> SELECT 3 _FUNC_ 2;\n 1")
+case class IntegerDivide(left: Expression, right: Expression)
--- End diff --

Let me try doing that ð 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid un...

2016-07-03 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/14036#discussion_r69392402
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 ---
@@ -234,6 +234,7 @@ object FunctionRegistry {
 expression[Subtract]("-"),
 expression[Multiply]("*"),
 expression[Divide]("/"),
+expression[IntegerDivide]("div"),
--- End diff --

I don't think so.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13990: [SPARK-16287][SQL] Implement str_to_map SQL funct...

2016-07-03 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/13990#discussion_r69392299
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -393,3 +393,73 @@ case class CreateNamedStructUnsafe(children: 
Seq[Expression]) extends Expression
 
   override def prettyName: String = "named_struct_unsafe"
 }
+
+/**
+ * Creates a map after splitting the input text into key/value pairs using 
delimeters
+ */
+@ExpressionDescription(
+  usage = """_FUNC_(text[, delimiter1, delimiter2]) - Creates a map after 
splitting the text into
--- End diff --

yupp sound much better, let me make the change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13990: [SPARK-16287][SQL] Implement str_to_map SQL funct...

2016-07-03 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/13990#discussion_r69392113
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -393,3 +393,73 @@ case class CreateNamedStructUnsafe(children: 
Seq[Expression]) extends Expression
 
   override def prettyName: String = "named_struct_unsafe"
 }
+
+/**
+ * Creates a map after splitting the input text into key/value pairs using 
delimeters
+ */
+@ExpressionDescription(
+  usage = """_FUNC_(text[, delimiter1, delimiter2]) - Creates a map after 
splitting the text into
--- End diff --

Used `delimiter1` and `delimiter2` because its named that way in hive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13990: [SPARK-16287][SQL] Implement str_to_map SQL funct...

2016-07-03 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/13990#discussion_r69392080
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -393,3 +393,73 @@ case class CreateNamedStructUnsafe(children: 
Seq[Expression]) extends Expression
 
   override def prettyName: String = "named_struct_unsafe"
 }
+
+/**
+ * Creates a map after splitting the input text into key/value pairs using 
delimeters
+ */
+@ExpressionDescription(
+  usage = """_FUNC_(text[, delimiter1, delimiter2]) - Creates a map after 
splitting the text into
--- End diff --

how about `pairDelim` and `pairSeperatorDelim`, not very good with naming 
what do you suggest ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13990: [SPARK-16287][SQL] Implement str_to_map SQL function

2016-07-03 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/13990
  
cc: @cloud-fan @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...

2016-07-03 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/14036
  
cc: @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid un...

2016-07-03 Thread techaddict

GitHub user techaddict opened a pull request:

https://github.com/apache/spark/pull/14036

[SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessary cast

## What changes were proposed in this pull request?
Add IntegerDivide to avoid unnecessary cast

Before:
```
scala> spark.sql("select 6 div 3").explain(true)
...
== Analyzed Logical Plan ==
CAST((6 / 3) AS BIGINT): bigint
Project [cast((cast(6 as double) / cast(3 as double)) as bigint) AS
CAST((6 / 3) AS BIGINT)#5L]
+- OneRowRelation$
...
```

After:
```
scala> spark.sql("select 6 div 3").explain(true)
...
== Analyzed Logical Plan ==
(6 / 3): int
Project [(6 / 3) AS (6 / 3)#11]
+- OneRowRelation$
...
```

## How was this patch tested?
Existing Tests and added new ones

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/techaddict/spark SPARK-16323

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14036.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14036


commit 067b788b05846e659615feef8613f8965573f150
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-07-03T09:00:11Z

[SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessary cast

Before:
```
scala> spark.sql("select 6 div 3").explain(true)
...
== Analyzed Logical Plan ==
CAST((6 / 3) AS BIGINT): bigint
Project [cast((cast(6 as double) / cast(3 as double)) as bigint) AS
CAST((6 / 3) AS BIGINT)#5L]
+- OneRowRelation$
...
```

After:
```
scala> spark.sql("select 6 div 3").explain(true)
...
== Analyzed Logical Plan ==
(6 / 3): int
Project [(6 / 3) AS (6 / 3)#11]
+- OneRowRelation$
...
```

commit e4e42c35b0236dff6aedf6468a7d94f80bc6023b
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-07-03T09:00:59Z

Merge branch 'master' into SPARK-16323




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14032: [Minor][SQL] Replace Parquet deprecations

2016-07-03 Thread techaddict

Github user techaddict closed the pull request at:

https://github.com/apache/spark/pull/14032


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14032: [Minor][SQL] Replace Parquet deprecations

2016-07-02 Thread techaddict

GitHub user techaddict opened a pull request:

https://github.com/apache/spark/pull/14032

[Minor][SQL] Replace Parquet deprecations

## What changes were proposed in this pull request?
1. Replace `Binary.fromByteArray` with `Binary.fromReusedByteArray`
2. Replace `ConversionPatterns.listType ` 
with`ConversionPatterns.listOfElements`

## How was this patch tested?
Existing Tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/techaddict/spark depre-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14032.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14032


commit 20aa871a02d08d45f716a9974abe479f077ccd30
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-07-03T04:45:54Z

[Minor][SQL] Replace Parquet deprecations

1. Replace `Binary.fromByteArray` with `Binary.fromReusedByteArray`
2. Replace `ConversionPatterns.listType ` with
`ConversionPatterns.listOfElements`




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13767: [MINOR][SQL] Not dropping all necessary tables

2016-06-29 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/13767
  
cc: @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13990: [SPARK-16287][SQL][WIP] Implement str_to_map SQL ...

2016-06-29 Thread techaddict

GitHub user techaddict opened a pull request:

https://github.com/apache/spark/pull/13990

[SPARK-16287][SQL][WIP] Implement str_to_map SQL function

## What changes were proposed in this pull request?
This PR adds `str_to_map` SQL function in order to remove Hive fallback.

## How was this patch tested?
Pass the Jenkins tests with newly added.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/techaddict/spark SPARK-16287

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13990.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13990


commit af59f57cecd93de49ec5bd20058199d93a9f2445
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-06-30T03:54:05Z

First pass without arguments

commit dc6b1f439e32768828bdb7d1a10f8b8178fa4c13
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-06-30T04:32:54Z

Add delimiter options

commit a8e6631edf6d124f218b15589427664f5b454759
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-06-30T04:36:08Z

Merge master

commit 1f888abb532c905dac11b404819786fd2641e38f
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-06-30T04:37:13Z

merge fix




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13767: [MINOR][SQL] Not dropping all necessary tables

2016-06-19 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/13767
  
jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13767: [MINOR][SQL] Not dropping all necessary tables

2016-06-19 Thread techaddict

GitHub user techaddict opened a pull request:

https://github.com/apache/spark/pull/13767

[MINOR][SQL] Not dropping all necessary tables

## What changes were proposed in this pull request?
was not dropping table `parquet_t3`

## How was this patch tested?
tested `LogicalPlanToSQLSuite` locally

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/techaddict/spark minor-8

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13767.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13767


commit a2bab62abf9de24b4f09f1c3a31bcc468f1af8a4
Author: Sandeep Singh <sand...@techaddict.me>
Date:   2016-06-19T06:11:28Z

[MINOR][SQL] Not dropping all necessary tables

Not dropping table `parquet_t3`




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 320 matches

Mail list logo