date:20160706

[GitHub] spark issue #14077: [SPARK-16402] [SQL] JDBC Source: Implement save API of D...

2016-07-06 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14077
  
@JustinPihony You know, I do not care which PR is merged eventually. You 
can try to clean your PR at your best. I will review your PR when it is ready. 
Thanks for your work! Please continue to submit more PRs for improving Spark. 

To reduce the code changes in your PR, I think we should not extend 
`SchemaRelationProvider`. Now, I think you can assume the copy location has 
been fixed.  

Since this is related to Data Source APIs, CC @rxin @yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14004: [SPARK-16285][SQL] Implement sentences SQL functions

2016-07-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14004
  
**[Test build #61895 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61895/consoleFull)**
 for PR 14004 at commit 
[`8d7c3d4`](https://github.com/apache/spark/commit/8d7c3d400581332503f7cb831e4d0f850294b608).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14068: enhanced simulate multiply

2016-07-06 Thread uzadude

Github user uzadude commented on the issue:

https://github.com/apache/spark/pull/14068
  
Hi srowen,
I have read the "how to contribute" wiki. I thought that it is too small of 
enhancement to open a jira for it and it passes the tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14004: [SPARK-16285][SQL] Implement sentences SQL functions

2016-07-06 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14004
  
Thank you, @cloud-fan .
I updated the PR according to your comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13876: [SPARK-16174][SQL] Improve `OptimizeIn` optimizer to rem...

2016-07-06 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13876
  
Looks pretty good.

cc @cloud-fan for another look.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13876: [SPARK-16174][SQL] Improve `OptimizeIn` optimizer...

2016-07-06 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13876#discussion_r69854463
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -820,16 +820,24 @@ object ConstantFolding extends Rule[LogicalPlan] {
 }
 
 /**
- * Replaces [[In (value, seq[Literal])]] with optimized version[[InSet 
(value, HashSet[Literal])]]
- * which is much faster
+ * Optimize IN predicates:
+ * 1. Removes literal repetitions.
+ * 2. Replaces [[In (value, seq[Literal])]] with optimized version
+ *[[InSet (value, HashSet[Literal])]] which is much faster.
  */
 case class OptimizeIn(conf: CatalystConf) extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
 case q: LogicalPlan => q transformExpressionsDown {
-  case In(v, list) if !list.exists(!_.isInstanceOf[Literal]) &&
-  list.size > conf.optimizerInSetConversionThreshold =>
-val hSet = list.map(e => e.eval(EmptyRow))
-InSet(v, HashSet() ++ hSet)
+  case expr @ In(v, list) if expr.inSetConvertible =>
+val newList = ExpressionSet(list).toSeq
+if (newList.size > conf.optimizerInSetConversionThreshold) {
+  val hSet = newList.map(e => e.eval(EmptyRow))
+  InSet(v, HashSet() ++ hSet)
+} else if (newList.size < list.size) {
+  expr.copy(value = v, list = newList)
--- End diff --

you don't need to copy value here, do you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-06 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69854388
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +654,160 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = stringExprs(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = stringExprs(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
+case _ => null
+  }
+
+  private lazy val stringExprs = children.toArray
+  import ParseUrl._
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.size > 3 || children.size < 2) {
+  TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two 
or three arguments")
+} else {
+  super[ImplicitCastInputTypes].checkInputDataTypes()
+}
+  }
+
+  private def getPattern(key: UTF8String): Pattern = {
+if (key != null) {
+  Pattern.compile(REGEXPREFIX + key.toString + REGEXSUBFIX)
+} else {
+  null
+}
+  }
+
+  private def getUrl(url: UTF8String): URL = {
+try {
+  new URL(url.toString)
+} catch {
+  case e: MalformedURLException => null
+}
+  }
+
+  private def extractValueFromQuery(query: UTF8String, pattern: Pattern): 
UTF8String = {
+val m = pattern.matcher(query.toString)
+if (m.find()) {
+  UTF8String.fromString(m.group(2))
+} else {
+  null
+}
+  }
+
+  private def extractFromUrl(url: URL, partToExtract: UTF8String): 
UTF8String = {
+if (partToExtract.equals(HOST)) {
+  UTF8String.fromString(url.getHost)
+} else if (partToExtract.equals(PATH)) {
+  UTF8String.fromString(url.getPath)
+} else if (partToExtract.equals(QUERY)) {
+  UTF8String.fromString(url.getQuery)
+} else if (partToExtract.equals(REF)) {
+  UTF8String.fromString(url.getRef)
+} else if (partToExtract.equals(PROTOCOL)) {
+  UTF8String.fromString(url.getProtocol)
+} else if (partToExtract.equals(FILE)) {
+  UTF8String.fromString(url.getFile)
+} else if (partToExtract.equals(AUTHORITY)) {
+  UTF8String.fromString(url.getAuthority)
+} else if (partToExtract.equals(USERINFO)) {
+  UTF8String.fromString(url.getUserInfo)
+} else {
+  null
--- End diff --

yea - actually why don't we simplify this function and require part to be 
foldable rather than per row? It's similar to the other reflect pull request.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your

[GitHub] spark pull request #14004: [SPARK-16285][SQL] Implement sentences SQL functi...

2016-07-06 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14004#discussion_r69854198
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -198,6 +203,66 @@ case class StringSplit(str: Expression, pattern: 
Expression)
   override def prettyName: String = "split"
 }
 
+/**
+ * Splits a string into arrays of sentences, where each sentence is an 
array of words.
+ * The 'lang' and 'country' arguments are optional, and if omitted, the 
default locale is used.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, lang, country) - Splits str into an array of array 
of words.",
+  extended = "> SELECT _FUNC_('Hi there! Good morning.');\n  
[['Hi','there'], ['Good','morning']]")
+case class Sentences(
+str: Expression,
+language: Expression = Literal(""),
+country: Expression = Literal(""))
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  def this(str: Expression) = this(str, Literal(""), Literal(""))
+  def this(str: Expression, language: Expression) = this(str, language, 
Literal(""))
+
+  override def nullable: Boolean = true
+  override def dataType: DataType =
+ArrayType(ArrayType(StringType, containsNull = false), containsNull = 
false)
+  override def inputTypes: Seq[AbstractDataType] = Seq(StringType, 
StringType, StringType)
+  override def children: Seq[Expression] = str :: language :: country :: 
Nil
+
+  override def eval(input: InternalRow): Any = {
+val string = str.eval(input)
+if (string == null) {
+  null
+} else {
+  val locale = try {
+new Locale(language.eval(input).asInstanceOf[UTF8String].toString,
+  country.eval(input).asInstanceOf[UTF8String].toString)
+  } catch {
+case _: NullPointerException | _: ClassCastException => 
Locale.getDefault
--- End diff --

It created the wrong locale, then it compared the system available locales. 
So, finally, ignored. The following is the underlying code in `rt.jar`.
```
List var4 = 
Control.getControl(Control.FORMAT_DEFAULT).getCandidateLocales("", var1);
Iterator var5 = var4.iterator();

while(var5.hasNext()) {
Locale var6 = (Locale)var5.next();
if(!var6.equals(var1)) {
var2 = findAdapter(var0, var6);
if(var2 != null) {
((ConcurrentMap)var3).putIfAbsent(var1, var2);
return var2;
}
}
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14072: [SPARK-16398][CORE] Make cancelJob and cancelStag...

2016-07-06 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14072


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14084: [SPARK-16021][test-maven] Fix the maven build

2016-07-06 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14084


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14075: [SPARK-16401] [SQL] Data Source API: Enable Extending Re...

2016-07-06 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14075
  
Sure, let me know whether I need to submit another PR for backporting to 
2.0. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13983: [SPARK-16021] Fill freed memory in test to help c...

2016-07-06 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13983#discussion_r69854073
  
--- Diff: 
common/unsafe/src/test/java/org/apache/spark/unsafe/PlatformUtilSuite.java ---
@@ -58,4 +61,17 @@ public void overlappingCopyMemory() {
   Assert.assertEquals((byte)i, data[i + 1]);
 }
   }
+
+  @Test
+  public void memoryDebugFillEnabledInTest() {
+Assert.assertTrue(MemoryAllocator.MEMORY_DEBUG_FILL_ENABLED);
--- End diff --

Actually it's fixed by https://github.com/apache/spark/pull/14084


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13983: [SPARK-16021] Fill freed memory in test to help c...

2016-07-06 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13983#discussion_r69854030
  
--- Diff: 
common/unsafe/src/test/java/org/apache/spark/unsafe/PlatformUtilSuite.java ---
@@ -58,4 +61,17 @@ public void overlappingCopyMemory() {
   Assert.assertEquals((byte)i, data[i + 1]);
 }
   }
+
+  @Test
+  public void memoryDebugFillEnabledInTest() {
+Assert.assertTrue(MemoryAllocator.MEMORY_DEBUG_FILL_ENABLED);
--- End diff --

Yea I think the problem is that Maven doesn't have the property set.

cc @ericl  we need to set that in Maven too.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14084: [SPARK-16021][test-maven] Fix the maven build

2016-07-06 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14084
  
LGTM - I'm going to merge this.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14084: [SPARK-16021][test-maven] Fix the maven build

2016-07-06 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/14084
  
cc @rxin @ericl 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14072: [SPARK-16398][CORE] Make cancelJob and cancelStage APIs ...

2016-07-06 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14072
  
Merging in master. Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14075: [SPARK-16401] [SQL] Data Source API: Enable Extending Re...

2016-07-06 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14075
  
Thanks - then we should merge this in 2.0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14077: [SPARK-16402] [SQL] JDBC Source: Implement save API of D...

2016-07-06 Thread JustinPihony

Github user JustinPihony commented on the issue:

https://github.com/apache/spark/pull/14077
  
Then the best course of action would be to use my current impl as it works 
no matter the position of copy. I can add the additional tests if that would 
make it more amenable?  Otherwise I'll push a reduced code set in the morning, 
but it would rely on the copy location move PR

> On Jul 7, 2016, at 1:27 AM, Hyukjin Kwon  wrote:
> 
> (Personally, I hope this does not get delayed because this usage was 
introduced in Spark Summit PPT and I guess users would try to use this API.)
> 
> â
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub, or mute the thread.
> 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69853706
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +654,145 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = children(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
--- End diff --

`case Literal(url: UTF8String, _) if url != null`, the `getUrl ` doesn't 
handle null now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14004: [SPARK-16285][SQL] Implement sentences SQL functi...

2016-07-06 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14004#discussion_r69853682
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala
 ---
@@ -725,4 +725,41 @@ class StringExpressionsSuite extends SparkFunSuite 
with ExpressionEvalHelper {
 checkEvaluation(FindInSet(Literal("abf"), Literal("abc,b,ab,c,def")), 
0)
 checkEvaluation(FindInSet(Literal("ab,"), Literal("abc,b,ab,c,def")), 
0)
   }
+
+  test("Sentences") {
+val nullString = Literal.create(null, StringType)
+checkEvaluation(Sentences(nullString, nullString, nullString), null, 
EmptyRow)
+checkEvaluation(Sentences(nullString, nullString), null, EmptyRow)
+checkEvaluation(Sentences(nullString), null, EmptyRow)
+checkEvaluation(Sentences(Literal.create(null, NullType)), null, 
EmptyRow)
+checkEvaluation(Sentences("", nullString, nullString), Seq.empty, 
EmptyRow)
+checkEvaluation(Sentences("", nullString), Seq.empty, EmptyRow)
+checkEvaluation(Sentences(""), Seq.empty, EmptyRow)
+
+val correct_answer = Seq(
+  Seq("Hi", "there"),
+  Seq("The", "price", "was"),
+  Seq("But", "not", "now"))
+
+// Hive compatible test-cases.
+checkEvaluation(
+  Sentences("Hi there! The price was $1,234.56 But, not now."),
+  correct_answer,
+  EmptyRow)
--- End diff --

Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69853668
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +654,145 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = children(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = children(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
--- End diff --

`case Literal(url: UTF8String, _) if url != null`, the `getUrl` doesn't 
handle null now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14084: [SPARK-16021][test-maven] Fix the maven build

2016-07-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14084
  
**[Test build #61894 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61894/consoleFull)**
 for PR 14084 at commit 
[`a618729`](https://github.com/apache/spark/commit/a6187297351865ea4b6e0a62f4e105a9ef58e876).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13494: [SPARK-15752] [SQL] Optimize metadata only query that ha...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13494
  
hi @lianhuiwang can you rebase your PR to master? I think it's pretty close!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14004: [SPARK-16285][SQL] Implement sentences SQL functi...

2016-07-06 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14004#discussion_r69853496
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala ---
@@ -347,4 +347,24 @@ class StringFunctionsSuite extends QueryTest with 
SharedSQLContext {
   df2.filter("b>0").selectExpr("format_number(a, b)"),
   Row("5.") :: Row("4.000") :: Row("4.000") :: Row("4.000") :: 
Row("3.00") :: Nil)
   }
+
+  test("string sentences function") {
+val df = Seq(("Hi there! The price was $1,234.56 But, not now.", 
"en", "US"))
+  .toDF("str", "language", "country")
+
+checkAnswer(
+  df.selectExpr("sentences(str, language, country)"),
+  Row(Seq(Seq("Hi", "there"), Seq("The", "price", "was"), Seq("But", 
"not", "now"
+
+// Type coercion
+checkAnswer(
+  df.selectExpr("sentences(null)", "sentences(10)", "sentences(3.14)"),
+  Row(null, Seq(Seq("10")), Seq(Seq("3.14"
+
+// Argument number exception
+val m = intercept[AnalysisException] {
+  df.selectExpr("sentences()")
+}.getMessage
+assert(m.contains("Invalid number of arguments"))
--- End diff --

It is `Invalid number of arguments for function sentences`. I'll update 
this, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13494: [SPARK-15752] [SQL] Optimize metadata only query ...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13494#discussion_r69853489
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -1689,4 +1689,76 @@ class SQLQuerySuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
   )
 }
   }
+
+  test("spark-15752 optimize metadata only query for hive table") {
--- End diff --

why this test is so different from the one in sql core `SQLQuerySuite`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14026: [SPARK-13569][STREAMING][KAFKA] pattern based topic subs...

2016-07-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14026
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14084: [SPARK-16021][test-maven] Fix the maven build

2016-07-06 Thread zsxwing

GitHub user zsxwing opened a pull request:

https://github.com/apache/spark/pull/14084

[SPARK-16021][test-maven] Fix the maven build

## What changes were proposed in this pull request?

Fixed the maven build for #13983

## How was this patch tested?

The existing tests.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zsxwing/spark fix-maven

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14084.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14084


commit a6187297351865ea4b6e0a62f4e105a9ef58e876
Author: Shixiong Zhu 
Date:   2016-07-07T05:33:22Z

[SPARK-16021] Fix the maven build




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14026: [SPARK-13569][STREAMING][KAFKA] pattern based topic subs...

2016-07-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14026
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61892/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14026: [SPARK-13569][STREAMING][KAFKA] pattern based topic subs...

2016-07-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14026
  
**[Test build #61892 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61892/consoleFull)**
 for PR 14026 at commit 
[`f287722`](https://github.com/apache/spark/commit/f2877226ebe8c70d44f35a10bab056402bc9ffa9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69853099
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala
 ---
@@ -725,4 +725,52 @@ class StringExpressionsSuite extends SparkFunSuite 
with ExpressionEvalHelper {
 checkEvaluation(FindInSet(Literal("abf"), Literal("abc,b,ab,c,def")), 
0)
 checkEvaluation(FindInSet(Literal("ab,"), Literal("abc,b,ab,c,def")), 
0)
   }
+
+  test("ParseUrl") {
+def checkParseUrl(expected: String, urlStr: String, partToExtract: 
String): Unit = {
+  checkEvaluation(
+ParseUrl(Seq(Literal(urlStr), Literal(partToExtract))), expected)
+}
+def checkParseUrlWithKey(
+expected: String,
+urlStr: String,
+partToExtract: String,
+key: String): Unit = {
+  checkEvaluation(
+ParseUrl(Seq(Literal(urlStr), Literal(partToExtract), 
Literal(key))), expected)
+}
+
+checkParseUrl("spark.apache.org", 
"http://spark.apache.org/path?query=1;, "HOST")
+checkParseUrl("/path", "http://spark.apache.org/path?query=1;, "PATH")
+checkParseUrl("query=1", "http://spark.apache.org/path?query=1;, 
"QUERY")
+checkParseUrl("Ref", "http://spark.apache.org/path?query=1#Ref;, "REF")
+checkParseUrl("http", "http://spark.apache.org/path?query=1;, 
"PROTOCOL")
+checkParseUrl("/path?query=1", "http://spark.apache.org/path?query=1;, 
"FILE")
+checkParseUrl("spark.apache.org:8080", 
"http://spark.apache.org:8080/path?query=1;, "AUTHORITY")
+checkParseUrl("userinfo", 
"http://useri...@spark.apache.org/path?query=1;, "USERINFO")
+checkParseUrlWithKey("1", "http://spark.apache.org/path?query=1;, 
"QUERY", "query")
+
+// Null checking
+checkParseUrl(null, null, "HOST")
+checkParseUrl(null, "http://spark.apache.org/path?query=1;, null)
+checkParseUrl(null, null, null)
+checkParseUrl(null, "test", "HOST")
+checkParseUrl(null, "http://spark.apache.org/path?query=1;, "NO")
+checkParseUrl(null, "http://spark.apache.org/path?query=1;, "USERINFO")
+checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, 
"HOST", "query")
+checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, 
"QUERY", "quer")
+checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, 
"QUERY", null)
+checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, 
"QUERY", "")
+
+// exceptional cases
+intercept[java.util.regex.PatternSyntaxException] {
+  evaluate(ParseUrl(Seq(Literal("http://spark.apache.org/path?;),
+Literal("QUERY"), Literal("???"
+}
+
+// arguments checking
+assert(ParseUrl(Seq(Literal("1"))).checkInputDataTypes().isFailure)
+assert(ParseUrl(Seq(Literal("1"), Literal("2"), Literal("3"), 
Literal("4")))
--- End diff --

ah right, no need to bother here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69853050
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +654,160 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = stringExprs(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = stringExprs(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
+case _ => null
+  }
+
+  private lazy val stringExprs = children.toArray
+  import ParseUrl._
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.size > 3 || children.size < 2) {
+  TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two 
or three arguments")
+} else {
+  super[ImplicitCastInputTypes].checkInputDataTypes()
+}
+  }
+
+  private def getPattern(key: UTF8String): Pattern = {
+if (key != null) {
+  Pattern.compile(REGEXPREFIX + key.toString + REGEXSUBFIX)
+} else {
+  null
+}
+  }
+
+  private def getUrl(url: UTF8String): URL = {
+try {
+  new URL(url.toString)
+} catch {
+  case e: MalformedURLException => null
+}
+  }
+
+  private def extractValueFromQuery(query: UTF8String, pattern: Pattern): 
UTF8String = {
+val m = pattern.matcher(query.toString)
+if (m.find()) {
+  UTF8String.fromString(m.group(2))
+} else {
+  null
+}
+  }
+
+  private def extractFromUrl(url: URL, partToExtract: UTF8String): 
UTF8String = {
+if (partToExtract.equals(HOST)) {
+  UTF8String.fromString(url.getHost)
+} else if (partToExtract.equals(PATH)) {
+  UTF8String.fromString(url.getPath)
+} else if (partToExtract.equals(QUERY)) {
+  UTF8String.fromString(url.getQuery)
+} else if (partToExtract.equals(REF)) {
+  UTF8String.fromString(url.getRef)
+} else if (partToExtract.equals(PROTOCOL)) {
+  UTF8String.fromString(url.getProtocol)
+} else if (partToExtract.equals(FILE)) {
+  UTF8String.fromString(url.getFile)
+} else if (partToExtract.equals(AUTHORITY)) {
+  UTF8String.fromString(url.getAuthority)
+} else if (partToExtract.equals(USERINFO)) {
+  UTF8String.fromString(url.getUserInfo)
+} else {
+  null
+}
+  }
+
+  private def parseUrlWithoutKey(url: UTF8String, partToExtract: 
UTF8String): UTF8String = {
+if (url != null && partToExtract != null) {
+  if (cachedUrl ne null) {
+extractFromUrl(cachedUrl, partToExtract)
+  } else {
+

[GitHub] spark issue #14077: [SPARK-16402] [SQL] JDBC Source: Implement save API of D...

2016-07-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14077
  
(Personally, I hope this does not get delayed because this usage was 
introduced in Spark Summit PPT and I guess users would try to use this API.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-06 Thread janplus

Github user janplus commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69851574
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala
 ---
@@ -725,4 +725,52 @@ class StringExpressionsSuite extends SparkFunSuite 
with ExpressionEvalHelper {
 checkEvaluation(FindInSet(Literal("abf"), Literal("abc,b,ab,c,def")), 
0)
 checkEvaluation(FindInSet(Literal("ab,"), Literal("abc,b,ab,c,def")), 
0)
   }
+
+  test("ParseUrl") {
+def checkParseUrl(expected: String, urlStr: String, partToExtract: 
String): Unit = {
+  checkEvaluation(
+ParseUrl(Seq(Literal(urlStr), Literal(partToExtract))), expected)
+}
+def checkParseUrlWithKey(
+expected: String,
+urlStr: String,
+partToExtract: String,
+key: String): Unit = {
+  checkEvaluation(
+ParseUrl(Seq(Literal(urlStr), Literal(partToExtract), 
Literal(key))), expected)
+}
+
+checkParseUrl("spark.apache.org", 
"http://spark.apache.org/path?query=1;, "HOST")
+checkParseUrl("/path", "http://spark.apache.org/path?query=1;, "PATH")
+checkParseUrl("query=1", "http://spark.apache.org/path?query=1;, 
"QUERY")
+checkParseUrl("Ref", "http://spark.apache.org/path?query=1#Ref;, "REF")
+checkParseUrl("http", "http://spark.apache.org/path?query=1;, 
"PROTOCOL")
+checkParseUrl("/path?query=1", "http://spark.apache.org/path?query=1;, 
"FILE")
+checkParseUrl("spark.apache.org:8080", 
"http://spark.apache.org:8080/path?query=1;, "AUTHORITY")
+checkParseUrl("userinfo", 
"http://useri...@spark.apache.org/path?query=1;, "USERINFO")
+checkParseUrlWithKey("1", "http://spark.apache.org/path?query=1;, 
"QUERY", "query")
+
+// Null checking
+checkParseUrl(null, null, "HOST")
+checkParseUrl(null, "http://spark.apache.org/path?query=1;, null)
+checkParseUrl(null, null, null)
+checkParseUrl(null, "test", "HOST")
+checkParseUrl(null, "http://spark.apache.org/path?query=1;, "NO")
+checkParseUrl(null, "http://spark.apache.org/path?query=1;, "USERINFO")
+checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, 
"HOST", "query")
+checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, 
"QUERY", "quer")
+checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, 
"QUERY", null)
+checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, 
"QUERY", "")
+
+// exceptional cases
+intercept[java.util.regex.PatternSyntaxException] {
+  evaluate(ParseUrl(Seq(Literal("http://spark.apache.org/path?;),
+Literal("QUERY"), Literal("???"
+}
+
+// arguments checking
+assert(ParseUrl(Seq(Literal("1"))).checkInputDataTypes().isFailure)
+assert(ParseUrl(Seq(Literal("1"), Literal("2"), Literal("3"), 
Literal("4")))
--- End diff --

As I declare ParseUrl with `ImplicitCastInputTypes`, I am no sure whether 
the cases with invalid-type parameters is necessary


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-06 Thread janplus

Github user janplus commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69851306
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +654,160 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = stringExprs(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = stringExprs(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
+case _ => null
+  }
+
+  private lazy val stringExprs = children.toArray
+  import ParseUrl._
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.size > 3 || children.size < 2) {
+  TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two 
or three arguments")
+} else {
+  super[ImplicitCastInputTypes].checkInputDataTypes()
+}
+  }
+
+  private def getPattern(key: UTF8String): Pattern = {
+if (key != null) {
+  Pattern.compile(REGEXPREFIX + key.toString + REGEXSUBFIX)
+} else {
+  null
+}
+  }
+
+  private def getUrl(url: UTF8String): URL = {
+try {
+  new URL(url.toString)
+} catch {
+  case e: MalformedURLException => null
+}
+  }
+
+  private def extractValueFromQuery(query: UTF8String, pattern: Pattern): 
UTF8String = {
+val m = pattern.matcher(query.toString)
+if (m.find()) {
+  UTF8String.fromString(m.group(2))
+} else {
+  null
+}
+  }
+
+  private def extractFromUrl(url: URL, partToExtract: UTF8String): 
UTF8String = {
+if (partToExtract.equals(HOST)) {
+  UTF8String.fromString(url.getHost)
+} else if (partToExtract.equals(PATH)) {
+  UTF8String.fromString(url.getPath)
+} else if (partToExtract.equals(QUERY)) {
+  UTF8String.fromString(url.getQuery)
+} else if (partToExtract.equals(REF)) {
+  UTF8String.fromString(url.getRef)
+} else if (partToExtract.equals(PROTOCOL)) {
+  UTF8String.fromString(url.getProtocol)
+} else if (partToExtract.equals(FILE)) {
+  UTF8String.fromString(url.getFile)
+} else if (partToExtract.equals(AUTHORITY)) {
+  UTF8String.fromString(url.getAuthority)
+} else if (partToExtract.equals(USERINFO)) {
+  UTF8String.fromString(url.getUserInfo)
+} else {
+  null
+}
+  }
+
+  private def parseUrlWithoutKey(url: UTF8String, partToExtract: 
UTF8String): UTF8String = {
+if (url != null && partToExtract != null) {
+  if (cachedUrl ne null) {
+extractFromUrl(cachedUrl, partToExtract)
+  } else {
+val

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-06 Thread janplus

Github user janplus commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69851163
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +654,160 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = stringExprs(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = stringExprs(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
+case _ => null
+  }
+
+  private lazy val stringExprs = children.toArray
--- End diff --

OK..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69851130
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala
 ---
@@ -725,4 +725,52 @@ class StringExpressionsSuite extends SparkFunSuite 
with ExpressionEvalHelper {
 checkEvaluation(FindInSet(Literal("abf"), Literal("abc,b,ab,c,def")), 
0)
 checkEvaluation(FindInSet(Literal("ab,"), Literal("abc,b,ab,c,def")), 
0)
   }
+
+  test("ParseUrl") {
+def checkParseUrl(expected: String, urlStr: String, partToExtract: 
String): Unit = {
+  checkEvaluation(
+ParseUrl(Seq(Literal(urlStr), Literal(partToExtract))), expected)
+}
+def checkParseUrlWithKey(
+expected: String,
+urlStr: String,
+partToExtract: String,
+key: String): Unit = {
+  checkEvaluation(
+ParseUrl(Seq(Literal(urlStr), Literal(partToExtract), 
Literal(key))), expected)
+}
+
+checkParseUrl("spark.apache.org", 
"http://spark.apache.org/path?query=1;, "HOST")
+checkParseUrl("/path", "http://spark.apache.org/path?query=1;, "PATH")
+checkParseUrl("query=1", "http://spark.apache.org/path?query=1;, 
"QUERY")
+checkParseUrl("Ref", "http://spark.apache.org/path?query=1#Ref;, "REF")
+checkParseUrl("http", "http://spark.apache.org/path?query=1;, 
"PROTOCOL")
+checkParseUrl("/path?query=1", "http://spark.apache.org/path?query=1;, 
"FILE")
+checkParseUrl("spark.apache.org:8080", 
"http://spark.apache.org:8080/path?query=1;, "AUTHORITY")
+checkParseUrl("userinfo", 
"http://useri...@spark.apache.org/path?query=1;, "USERINFO")
+checkParseUrlWithKey("1", "http://spark.apache.org/path?query=1;, 
"QUERY", "query")
+
+// Null checking
+checkParseUrl(null, null, "HOST")
+checkParseUrl(null, "http://spark.apache.org/path?query=1;, null)
+checkParseUrl(null, null, null)
+checkParseUrl(null, "test", "HOST")
+checkParseUrl(null, "http://spark.apache.org/path?query=1;, "NO")
+checkParseUrl(null, "http://spark.apache.org/path?query=1;, "USERINFO")
+checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, 
"HOST", "query")
+checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, 
"QUERY", "quer")
+checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, 
"QUERY", null)
+checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, 
"QUERY", "")
+
+// exceptional cases
+intercept[java.util.regex.PatternSyntaxException] {
+  evaluate(ParseUrl(Seq(Literal("http://spark.apache.org/path?;),
+Literal("QUERY"), Literal("???"
+}
+
+// arguments checking
+assert(ParseUrl(Seq(Literal("1"))).checkInputDataTypes().isFailure)
+assert(ParseUrl(Seq(Literal("1"), Literal("2"), Literal("3"), 
Literal("4")))
--- End diff --

also add some cases with invalid-type parameters?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-06 Thread janplus

Github user janplus commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69851065
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +654,160 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = stringExprs(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = stringExprs(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
+case _ => null
+  }
+
+  private lazy val stringExprs = children.toArray
+  import ParseUrl._
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.size > 3 || children.size < 2) {
+  TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two 
or three arguments")
+} else {
+  super[ImplicitCastInputTypes].checkInputDataTypes()
+}
+  }
+
+  private def getPattern(key: UTF8String): Pattern = {
+if (key != null) {
+  Pattern.compile(REGEXPREFIX + key.toString + REGEXSUBFIX)
+} else {
+  null
+}
+  }
+
+  private def getUrl(url: UTF8String): URL = {
+try {
+  new URL(url.toString)
+} catch {
+  case e: MalformedURLException => null
+}
+  }
+
+  private def extractValueFromQuery(query: UTF8String, pattern: Pattern): 
UTF8String = {
+val m = pattern.matcher(query.toString)
+if (m.find()) {
+  UTF8String.fromString(m.group(2))
+} else {
+  null
+}
+  }
+
+  private def extractFromUrl(url: URL, partToExtract: UTF8String): 
UTF8String = {
+if (partToExtract.equals(HOST)) {
+  UTF8String.fromString(url.getHost)
+} else if (partToExtract.equals(PATH)) {
+  UTF8String.fromString(url.getPath)
+} else if (partToExtract.equals(QUERY)) {
+  UTF8String.fromString(url.getQuery)
+} else if (partToExtract.equals(REF)) {
+  UTF8String.fromString(url.getRef)
+} else if (partToExtract.equals(PROTOCOL)) {
+  UTF8String.fromString(url.getProtocol)
+} else if (partToExtract.equals(FILE)) {
+  UTF8String.fromString(url.getFile)
+} else if (partToExtract.equals(AUTHORITY)) {
+  UTF8String.fromString(url.getAuthority)
+} else if (partToExtract.equals(USERINFO)) {
+  UTF8String.fromString(url.getUserInfo)
+} else {
+  null
+}
+  }
+
+  private def parseUrlWithoutKey(url: UTF8String, partToExtract: 
UTF8String): UTF8String = {
+if (url != null && partToExtract != null) {
+  if (cachedUrl ne null) {
+extractFromUrl(cachedUrl, partToExtract)
+  } else {
+val

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69851040
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +654,160 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = stringExprs(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = stringExprs(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
+case _ => null
+  }
+
+  private lazy val stringExprs = children.toArray
+  import ParseUrl._
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.size > 3 || children.size < 2) {
+  TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two 
or three arguments")
+} else {
+  super[ImplicitCastInputTypes].checkInputDataTypes()
+}
+  }
+
+  private def getPattern(key: UTF8String): Pattern = {
+if (key != null) {
+  Pattern.compile(REGEXPREFIX + key.toString + REGEXSUBFIX)
+} else {
+  null
+}
+  }
+
+  private def getUrl(url: UTF8String): URL = {
+try {
+  new URL(url.toString)
+} catch {
+  case e: MalformedURLException => null
+}
+  }
+
+  private def extractValueFromQuery(query: UTF8String, pattern: Pattern): 
UTF8String = {
+val m = pattern.matcher(query.toString)
+if (m.find()) {
+  UTF8String.fromString(m.group(2))
+} else {
+  null
+}
+  }
+
+  private def extractFromUrl(url: URL, partToExtract: UTF8String): 
UTF8String = {
+if (partToExtract.equals(HOST)) {
+  UTF8String.fromString(url.getHost)
+} else if (partToExtract.equals(PATH)) {
+  UTF8String.fromString(url.getPath)
+} else if (partToExtract.equals(QUERY)) {
+  UTF8String.fromString(url.getQuery)
+} else if (partToExtract.equals(REF)) {
+  UTF8String.fromString(url.getRef)
+} else if (partToExtract.equals(PROTOCOL)) {
+  UTF8String.fromString(url.getProtocol)
+} else if (partToExtract.equals(FILE)) {
+  UTF8String.fromString(url.getFile)
+} else if (partToExtract.equals(AUTHORITY)) {
+  UTF8String.fromString(url.getAuthority)
+} else if (partToExtract.equals(USERINFO)) {
+  UTF8String.fromString(url.getUserInfo)
+} else {
+  null
+}
+  }
+
+  private def parseUrlWithoutKey(url: UTF8String, partToExtract: 
UTF8String): UTF8String = {
+if (url != null && partToExtract != null) {
+  if (cachedUrl ne null) {
+extractFromUrl(cachedUrl, partToExtract)
+  } else {
+

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69850990
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +654,160 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = stringExprs(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = stringExprs(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
+case _ => null
+  }
+
+  private lazy val stringExprs = children.toArray
--- End diff --

I don't it's necessary... we only have 2 or 3 parameters...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14077: [SPARK-16402] [SQL] JDBC Source: Implement save API of D...

2016-07-06 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14077
  
Thank you for confirming that it is a bug in another PR. 

Regarding the solution of this PR, it is not a true circular reference. The 
solution in this PR is to minimize the duplicate codes. I also think it make 
senses to move the common code logics from `jdbc` API to `createRelation` 
implementation of `CreatableRelationProvider`. The JDBC-specific logics should 
not be exposed to the `DataFrameWriter` APIs. 

If you wants to do it in your PR, I am also fine. Please minimize the code 
changes and add the test cases introduced in this PR. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69850899
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +654,160 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = stringExprs(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = stringExprs(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
+case _ => null
+  }
+
+  private lazy val stringExprs = children.toArray
+  import ParseUrl._
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.size > 3 || children.size < 2) {
+  TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two 
or three arguments")
+} else {
+  super[ImplicitCastInputTypes].checkInputDataTypes()
+}
+  }
+
+  private def getPattern(key: UTF8String): Pattern = {
+if (key != null) {
+  Pattern.compile(REGEXPREFIX + key.toString + REGEXSUBFIX)
+} else {
+  null
+}
+  }
+
+  private def getUrl(url: UTF8String): URL = {
+try {
+  new URL(url.toString)
+} catch {
+  case e: MalformedURLException => null
+}
+  }
+
+  private def extractValueFromQuery(query: UTF8String, pattern: Pattern): 
UTF8String = {
+val m = pattern.matcher(query.toString)
+if (m.find()) {
+  UTF8String.fromString(m.group(2))
+} else {
+  null
+}
+  }
+
+  private def extractFromUrl(url: URL, partToExtract: UTF8String): 
UTF8String = {
+if (partToExtract.equals(HOST)) {
+  UTF8String.fromString(url.getHost)
+} else if (partToExtract.equals(PATH)) {
+  UTF8String.fromString(url.getPath)
+} else if (partToExtract.equals(QUERY)) {
+  UTF8String.fromString(url.getQuery)
+} else if (partToExtract.equals(REF)) {
+  UTF8String.fromString(url.getRef)
+} else if (partToExtract.equals(PROTOCOL)) {
+  UTF8String.fromString(url.getProtocol)
+} else if (partToExtract.equals(FILE)) {
+  UTF8String.fromString(url.getFile)
+} else if (partToExtract.equals(AUTHORITY)) {
+  UTF8String.fromString(url.getAuthority)
+} else if (partToExtract.equals(USERINFO)) {
+  UTF8String.fromString(url.getUserInfo)
+} else {
+  null
+}
+  }
+
+  private def parseUrlWithoutKey(url: UTF8String, partToExtract: 
UTF8String): UTF8String = {
+if (url != null && partToExtract != null) {
+  if (cachedUrl ne null) {
+extractFromUrl(cachedUrl, partToExtract)
+  } else {
+

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69850820
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +654,160 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = stringExprs(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = stringExprs(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
+case _ => null
+  }
+
+  private lazy val stringExprs = children.toArray
+  import ParseUrl._
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.size > 3 || children.size < 2) {
+  TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two 
or three arguments")
+} else {
+  super[ImplicitCastInputTypes].checkInputDataTypes()
+}
+  }
+
+  private def getPattern(key: UTF8String): Pattern = {
+if (key != null) {
+  Pattern.compile(REGEXPREFIX + key.toString + REGEXSUBFIX)
+} else {
+  null
+}
+  }
+
+  private def getUrl(url: UTF8String): URL = {
+try {
+  new URL(url.toString)
+} catch {
+  case e: MalformedURLException => null
+}
+  }
+
+  private def extractValueFromQuery(query: UTF8String, pattern: Pattern): 
UTF8String = {
+val m = pattern.matcher(query.toString)
+if (m.find()) {
+  UTF8String.fromString(m.group(2))
+} else {
+  null
+}
+  }
+
+  private def extractFromUrl(url: URL, partToExtract: UTF8String): 
UTF8String = {
+if (partToExtract.equals(HOST)) {
+  UTF8String.fromString(url.getHost)
+} else if (partToExtract.equals(PATH)) {
+  UTF8String.fromString(url.getPath)
+} else if (partToExtract.equals(QUERY)) {
+  UTF8String.fromString(url.getQuery)
+} else if (partToExtract.equals(REF)) {
+  UTF8String.fromString(url.getRef)
+} else if (partToExtract.equals(PROTOCOL)) {
+  UTF8String.fromString(url.getProtocol)
+} else if (partToExtract.equals(FILE)) {
+  UTF8String.fromString(url.getFile)
+} else if (partToExtract.equals(AUTHORITY)) {
+  UTF8String.fromString(url.getAuthority)
+} else if (partToExtract.equals(USERINFO)) {
+  UTF8String.fromString(url.getUserInfo)
+} else {
+  null
--- End diff --

It's weird that we return null if users provide an invalid `part` string, 
while we have a fixed list of supported `part` string. Even it's hive's rule, 
it still looks unreasonable to me.  cc @rxin @yhuai 


---
If your project is set up for it, you can reply to this

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69850568
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/UnsafeArrayDataBenchmark.scala
 ---
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import scala.util.Random
+
+import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
+import org.apache.spark.sql.catalyst.expressions.{UnsafeArrayData, 
UnsafeRow}
+import org.apache.spark.sql.catalyst.expressions.codegen.{BufferHolder, 
UnsafeArrayWriter}
+import org.apache.spark.util.Benchmark
+
+/**
+ * Benchmark [[UnsafeArrayDataBenchmark]] for UnsafeArrayData
+ * To run this:
+ *  1. replace ignore(...) with test(...)
+ *  2. build/sbt "sql/test-only *benchmark.UnsafeArrayDataBenchmark"
+ *
+ * Benchmarks in this file are skipped in normal builds.
+ */
+class UnsafeArrayDataBenchmark extends BenchmarkBase {
+
+  def calculateHeaderPortionInBytes(count: Int) : Int = {
+// Use this assignment for SPARK-15962
+// val size = 4 + 4 * count
+val size = UnsafeArrayData.calculateHeaderPortionInBytes(count)
+size
+  }
+
+  def readUnsafeArray(iters: Int): Unit = {
+val count = 1024 * 1024 * 16
+val rand = new Random(42)
+
+var intResult: Int = 0
+val intBuffer = Array.fill[Int](count) { rand.nextInt }
+val intEncoder = ExpressionEncoder[Array[Int]].resolveAndBind()
+val intInternalRow = intEncoder.toRow(intBuffer)
+val intUnsafeArray = intInternalRow.getArray(0)
+val readIntArray = { i: Int =>
+  var n = 0
+  while (n < iters) {
+val len = intUnsafeArray.numElements
+var sum = 0.toInt
+var i = 0
+while (i < len) {
+  sum += intUnsafeArray.getInt(i)
+  i += 1
+}
+intResult = sum
+n += 1
+  }
+}
+
+var doubleResult: Double = 0
+val doubleBuffer = Array.fill[Double](count) { rand.nextDouble }
+val doubleEncoder = ExpressionEncoder[Array[Double]].resolveAndBind()
+val doubleInternalRow = doubleEncoder.toRow(doubleBuffer)
+val doubleUnsafeArray = doubleInternalRow.getArray(0)
+val readDoubleArray = { i: Int =>
+  var n = 0
+  while (n < iters) {
+val len = doubleUnsafeArray.numElements
+var sum = 0.toDouble
+var i = 0
+while (i < len) {
+  sum += doubleUnsafeArray.getDouble(i)
+  i += 1
+}
+doubleResult = sum
+n += 1
+  }
+}
+
+val benchmark = new Benchmark("Read UnsafeArrayData", count * iters)
+benchmark.addCase("Int")(readIntArray)
+benchmark.addCase("Double")(readDoubleArray)
+benchmark.run
+/*
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.10.4
+Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
+
+Read UnsafeArrayData:Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative
+

+Int279 /  294600.4 
  1.7   1.0X
+Double 296 /  303567.0 
  1.8   0.9X
+*/
+  }
+
+  def writeUnsafeArray(iters: Int): Unit = {
+val count = 1024 * 1024 * 16
+
+val intUnsafeRow = new UnsafeRow(1)
+val intUnsafeArrayWriter = new UnsafeArrayWriter
+val intBufferHolder = new BufferHolder(intUnsafeRow, 64)
+intBufferHolder.reset()
+intUnsafeArrayWriter.initialize(intBufferHolder, count, 4)
+val intCursor = intBufferHolder.cursor
+val writeIntArray = { i: Int =>
+  var n = 0
+  while (n <

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69850526
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/UnsafeArrayDataBenchmark.scala
 ---
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import scala.util.Random
+
+import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
+import org.apache.spark.sql.catalyst.expressions.{UnsafeArrayData, 
UnsafeRow}
+import org.apache.spark.sql.catalyst.expressions.codegen.{BufferHolder, 
UnsafeArrayWriter}
+import org.apache.spark.util.Benchmark
+
+/**
+ * Benchmark [[UnsafeArrayDataBenchmark]] for UnsafeArrayData
+ * To run this:
+ *  1. replace ignore(...) with test(...)
+ *  2. build/sbt "sql/test-only *benchmark.UnsafeArrayDataBenchmark"
+ *
+ * Benchmarks in this file are skipped in normal builds.
+ */
+class UnsafeArrayDataBenchmark extends BenchmarkBase {
+
+  def calculateHeaderPortionInBytes(count: Int) : Int = {
+// Use this assignment for SPARK-15962
+// val size = 4 + 4 * count
+val size = UnsafeArrayData.calculateHeaderPortionInBytes(count)
+size
+  }
+
+  def readUnsafeArray(iters: Int): Unit = {
+val count = 1024 * 1024 * 16
+val rand = new Random(42)
+
+var intResult: Int = 0
+val intBuffer = Array.fill[Int](count) { rand.nextInt }
+val intEncoder = ExpressionEncoder[Array[Int]].resolveAndBind()
+val intInternalRow = intEncoder.toRow(intBuffer)
+val intUnsafeArray = intInternalRow.getArray(0)
+val readIntArray = { i: Int =>
+  var n = 0
+  while (n < iters) {
+val len = intUnsafeArray.numElements
+var sum = 0.toInt
+var i = 0
+while (i < len) {
+  sum += intUnsafeArray.getInt(i)
+  i += 1
+}
+intResult = sum
+n += 1
+  }
+}
+
+var doubleResult: Double = 0
+val doubleBuffer = Array.fill[Double](count) { rand.nextDouble }
+val doubleEncoder = ExpressionEncoder[Array[Double]].resolveAndBind()
+val doubleInternalRow = doubleEncoder.toRow(doubleBuffer)
+val doubleUnsafeArray = doubleInternalRow.getArray(0)
+val readDoubleArray = { i: Int =>
+  var n = 0
+  while (n < iters) {
+val len = doubleUnsafeArray.numElements
+var sum = 0.toDouble
+var i = 0
+while (i < len) {
+  sum += doubleUnsafeArray.getDouble(i)
+  i += 1
+}
+doubleResult = sum
+n += 1
+  }
+}
+
+val benchmark = new Benchmark("Read UnsafeArrayData", count * iters)
+benchmark.addCase("Int")(readIntArray)
+benchmark.addCase("Double")(readDoubleArray)
+benchmark.run
+/*
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.10.4
+Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
+
+Read UnsafeArrayData:Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative
+

+Int279 /  294600.4 
  1.7   1.0X
+Double 296 /  303567.0 
  1.8   0.9X
+*/
+  }
+
+  def writeUnsafeArray(iters: Int): Unit = {
+val count = 1024 * 1024 * 16
+
+val intUnsafeRow = new UnsafeRow(1)
+val intUnsafeArrayWriter = new UnsafeArrayWriter
--- End diff --

have you seen my comment here? 
https://github.com/apache/spark/pull/13680/files#r69392823

testing the array writer is so low level and peoples are more interested in 
writing the whole array. If you take a look at what `encoder.toRow` does, it 
generates a

[GitHub] spark issue #13494: [SPARK-15752] [SQL] Optimize metadata only query that ha...

2016-07-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13494
  
Build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13494: [SPARK-15752] [SQL] Optimize metadata only query that ha...

2016-07-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13494
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61891/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13494: [SPARK-15752] [SQL] Optimize metadata only query that ha...

2016-07-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13494
  
**[Test build #61891 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61891/consoleFull)**
 for PR 13494 at commit 
[`9546b40`](https://github.com/apache/spark/commit/9546b40840e69166c563c491ef6720ccb2f1b2eb).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69850344
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/UnsafeArrayDataBenchmark.scala
 ---
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import scala.util.Random
+
+import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
+import org.apache.spark.sql.catalyst.expressions.{UnsafeArrayData, 
UnsafeRow}
+import org.apache.spark.sql.catalyst.expressions.codegen.{BufferHolder, 
UnsafeArrayWriter}
+import org.apache.spark.util.Benchmark
+
+/**
+ * Benchmark [[UnsafeArrayDataBenchmark]] for UnsafeArrayData
+ * To run this:
+ *  1. replace ignore(...) with test(...)
+ *  2. build/sbt "sql/test-only *benchmark.UnsafeArrayDataBenchmark"
+ *
+ * Benchmarks in this file are skipped in normal builds.
+ */
+class UnsafeArrayDataBenchmark extends BenchmarkBase {
+
+  def calculateHeaderPortionInBytes(count: Int) : Int = {
+// Use this assignment for SPARK-15962
+// val size = 4 + 4 * count
+val size = UnsafeArrayData.calculateHeaderPortionInBytes(count)
+size
+  }
+
+  def readUnsafeArray(iters: Int): Unit = {
+val count = 1024 * 1024 * 16
+val rand = new Random(42)
+
+var intResult: Int = 0
+val intBuffer = Array.fill[Int](count) { rand.nextInt }
+val intEncoder = ExpressionEncoder[Array[Int]].resolveAndBind()
+val intInternalRow = intEncoder.toRow(intBuffer)
+val intUnsafeArray = intInternalRow.getArray(0)
+val readIntArray = { i: Int =>
+  var n = 0
+  while (n < iters) {
+val len = intUnsafeArray.numElements
+var sum = 0.toInt
+var i = 0
+while (i < len) {
+  sum += intUnsafeArray.getInt(i)
+  i += 1
+}
+intResult = sum
--- End diff --

did you see a very different performance result without this assignment?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69850312
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/UnsafeArrayDataBenchmark.scala
 ---
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import scala.util.Random
+
+import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
+import org.apache.spark.sql.catalyst.expressions.{UnsafeArrayData, 
UnsafeRow}
+import org.apache.spark.sql.catalyst.expressions.codegen.{BufferHolder, 
UnsafeArrayWriter}
+import org.apache.spark.util.Benchmark
+
+/**
+ * Benchmark [[UnsafeArrayDataBenchmark]] for UnsafeArrayData
+ * To run this:
+ *  1. replace ignore(...) with test(...)
+ *  2. build/sbt "sql/test-only *benchmark.UnsafeArrayDataBenchmark"
+ *
+ * Benchmarks in this file are skipped in normal builds.
+ */
+class UnsafeArrayDataBenchmark extends BenchmarkBase {
+
+  def calculateHeaderPortionInBytes(count: Int) : Int = {
+// Use this assignment for SPARK-15962
+// val size = 4 + 4 * count
+val size = UnsafeArrayData.calculateHeaderPortionInBytes(count)
+size
+  }
+
+  def readUnsafeArray(iters: Int): Unit = {
+val count = 1024 * 1024 * 16
+val rand = new Random(42)
+
+var intResult: Int = 0
+val intBuffer = Array.fill[Int](count) { rand.nextInt }
+val intEncoder = ExpressionEncoder[Array[Int]].resolveAndBind()
+val intInternalRow = intEncoder.toRow(intBuffer)
+val intUnsafeArray = intInternalRow.getArray(0)
+val readIntArray = { i: Int =>
+  var n = 0
+  while (n < iters) {
+val len = intUnsafeArray.numElements
+var sum = 0.toInt
+var i = 0
+while (i < len) {
+  sum += intUnsafeArray.getInt(i)
+  i += 1
+}
+intResult = sum
+n += 1
+  }
+}
+
+var doubleResult: Double = 0
+val doubleBuffer = Array.fill[Double](count) { rand.nextDouble }
+val doubleEncoder = ExpressionEncoder[Array[Double]].resolveAndBind()
+val doubleInternalRow = doubleEncoder.toRow(doubleBuffer)
+val doubleUnsafeArray = doubleInternalRow.getArray(0)
+val readDoubleArray = { i: Int =>
+  var n = 0
+  while (n < iters) {
+val len = doubleUnsafeArray.numElements
+var sum = 0.toDouble
--- End diff --

`var sum = 0L`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69850289
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/UnsafeArrayDataBenchmark.scala
 ---
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import scala.util.Random
+
+import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
+import org.apache.spark.sql.catalyst.expressions.{UnsafeArrayData, 
UnsafeRow}
+import org.apache.spark.sql.catalyst.expressions.codegen.{BufferHolder, 
UnsafeArrayWriter}
+import org.apache.spark.util.Benchmark
+
+/**
+ * Benchmark [[UnsafeArrayDataBenchmark]] for UnsafeArrayData
+ * To run this:
+ *  1. replace ignore(...) with test(...)
+ *  2. build/sbt "sql/test-only *benchmark.UnsafeArrayDataBenchmark"
+ *
+ * Benchmarks in this file are skipped in normal builds.
+ */
+class UnsafeArrayDataBenchmark extends BenchmarkBase {
+
+  def calculateHeaderPortionInBytes(count: Int) : Int = {
+// Use this assignment for SPARK-15962
+// val size = 4 + 4 * count
+val size = UnsafeArrayData.calculateHeaderPortionInBytes(count)
+size
+  }
+
+  def readUnsafeArray(iters: Int): Unit = {
+val count = 1024 * 1024 * 16
+val rand = new Random(42)
+
+var intResult: Int = 0
+val intBuffer = Array.fill[Int](count) { rand.nextInt }
+val intEncoder = ExpressionEncoder[Array[Int]].resolveAndBind()
+val intInternalRow = intEncoder.toRow(intBuffer)
+val intUnsafeArray = intInternalRow.getArray(0)
+val readIntArray = { i: Int =>
+  var n = 0
+  while (n < iters) {
+val len = intUnsafeArray.numElements
+var sum = 0.toInt
--- End diff --

unnecessary `toInt`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69850272
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/UnsafeArrayDataBenchmark.scala
 ---
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import scala.util.Random
+
+import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
+import org.apache.spark.sql.catalyst.expressions.{UnsafeArrayData, 
UnsafeRow}
+import org.apache.spark.sql.catalyst.expressions.codegen.{BufferHolder, 
UnsafeArrayWriter}
+import org.apache.spark.util.Benchmark
+
+/**
+ * Benchmark [[UnsafeArrayDataBenchmark]] for UnsafeArrayData
+ * To run this:
+ *  1. replace ignore(...) with test(...)
+ *  2. build/sbt "sql/test-only *benchmark.UnsafeArrayDataBenchmark"
+ *
+ * Benchmarks in this file are skipped in normal builds.
+ */
+class UnsafeArrayDataBenchmark extends BenchmarkBase {
+
+  def calculateHeaderPortionInBytes(count: Int) : Int = {
+// Use this assignment for SPARK-15962
+// val size = 4 + 4 * count
+val size = UnsafeArrayData.calculateHeaderPortionInBytes(count)
+size
+  }
+
+  def readUnsafeArray(iters: Int): Unit = {
+val count = 1024 * 1024 * 16
+val rand = new Random(42)
+
+var intResult: Int = 0
+val intBuffer = Array.fill[Int](count) { rand.nextInt }
+val intEncoder = ExpressionEncoder[Array[Int]].resolveAndBind()
+val intInternalRow = intEncoder.toRow(intBuffer)
+val intUnsafeArray = intInternalRow.getArray(0)
--- End diff --

combine these 2 lines


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14075: [SPARK-16401] [SQL] Data Source API: Enable Extending Re...

2016-07-06 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14075
  
@rxin This is a regression. I did try it in Spark 1.6. It works well. I 
think we need to fix it in Spark 2.0

Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69850225
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/UnsafeArrayDataBenchmark.scala
 ---
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import scala.util.Random
+
+import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
+import org.apache.spark.sql.catalyst.expressions.{UnsafeArrayData, 
UnsafeRow}
+import org.apache.spark.sql.catalyst.expressions.codegen.{BufferHolder, 
UnsafeArrayWriter}
+import org.apache.spark.util.Benchmark
+
+/**
+ * Benchmark [[UnsafeArrayDataBenchmark]] for UnsafeArrayData
+ * To run this:
+ *  1. replace ignore(...) with test(...)
+ *  2. build/sbt "sql/test-only *benchmark.UnsafeArrayDataBenchmark"
+ *
+ * Benchmarks in this file are skipped in normal builds.
+ */
+class UnsafeArrayDataBenchmark extends BenchmarkBase {
+
+  def calculateHeaderPortionInBytes(count: Int) : Int = {
+// Use this assignment for SPARK-15962
+// val size = 4 + 4 * count
+val size = UnsafeArrayData.calculateHeaderPortionInBytes(count)
+size
--- End diff --

we can just return `UnsafeArrayData.calculateHeaderPortionInBytes(count)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69850211
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/UnsafeArrayDataBenchmark.scala
 ---
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import scala.util.Random
+
+import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
+import org.apache.spark.sql.catalyst.expressions.{UnsafeArrayData, 
UnsafeRow}
+import org.apache.spark.sql.catalyst.expressions.codegen.{BufferHolder, 
UnsafeArrayWriter}
+import org.apache.spark.util.Benchmark
+
+/**
+ * Benchmark [[UnsafeArrayDataBenchmark]] for UnsafeArrayData
+ * To run this:
+ *  1. replace ignore(...) with test(...)
+ *  2. build/sbt "sql/test-only *benchmark.UnsafeArrayDataBenchmark"
+ *
+ * Benchmarks in this file are skipped in normal builds.
+ */
+class UnsafeArrayDataBenchmark extends BenchmarkBase {
+
+  def calculateHeaderPortionInBytes(count: Int) : Int = {
+// Use this assignment for SPARK-15962
+// val size = 4 + 4 * count
--- End diff --

remove this comment?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14030: [SPARK-16350][SQL] Fix support for incremental planning ...

2016-07-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14030
  
**[Test build #61893 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61893/consoleFull)**
 for PR 14030 at commit 
[`85e2352`](https://github.com/apache/spark/commit/85e2352c3010ed311603acd5007c6b3e05c25056).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69850185
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/UnsafeArraySuite.scala
 ---
@@ -18,27 +18,126 @@
 package org.apache.spark.sql.catalyst.util
 
 import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
 import org.apache.spark.sql.catalyst.expressions.UnsafeArrayData
+import org.apache.spark.unsafe.Platform
 
 class UnsafeArraySuite extends SparkFunSuite {
 
-  test("from primitive int array") {
-val array = Array(1, 10, 100)
-val unsafe = UnsafeArrayData.fromPrimitiveArray(array)
-assert(unsafe.numElements == 3)
-assert(unsafe.getSizeInBytes == 4 + 4 * 3 + 4 * 3)
-assert(unsafe.getInt(0) == 1)
-assert(unsafe.getInt(1) == 10)
-assert(unsafe.getInt(2) == 100)
+  val booleanArray = Array(false, true)
+  val shortArray = Array(1.toShort, 10.toShort, 100.toShort)
+  val intArray = Array(1, 10, 100)
+  val longArray = Array(1.toLong, 10.toLong, 100.toLong)
+  val floatArray = Array(1.1.toFloat, 2.2.toFloat, 3.3.toFloat)
+  val doubleArray = Array(1.1, 2.2, 3.3)
+
+  val intMultiDimArray = Array(Array(1, 10), Array(2, 20, 200), Array(3, 
30, 300, 3000))
+  val doubleMultiDimArray = Array(
+Array(1.1, 11.1), Array(2.2, 22.2, 222.2), Array(3.3, 33.3, 333.3, 
.3))
+
+  test("read array") {
+val unsafeBoolean = ExpressionEncoder[Array[Boolean]].resolveAndBind().
+  toRow(booleanArray).getArray(0)
+assert(unsafeBoolean.isInstanceOf[UnsafeArrayData])
+assert(unsafeBoolean.numElements == booleanArray.length)
+booleanArray.zipWithIndex.map { case (e, i) =>
+  assert(unsafeBoolean.getBoolean(i) == e)
+}
+
+val unsafeShort = ExpressionEncoder[Array[Short]].resolveAndBind().
+  toRow(shortArray).getArray(0)
+assert(unsafeShort.isInstanceOf[UnsafeArrayData])
+assert(unsafeShort.numElements == shortArray.length)
+shortArray.zipWithIndex.map { case (e, i) =>
+  assert(unsafeShort.getShort(i) == e)
+}
+
+val unsafeInt = ExpressionEncoder[Array[Int]].resolveAndBind().
+  toRow(intArray).getArray(0)
+assert(unsafeInt.isInstanceOf[UnsafeArrayData])
+assert(unsafeInt.numElements == intArray.length)
+intArray.zipWithIndex.map { case (e, i) =>
+  assert(unsafeInt.getInt(i) == e)
+}
+
+val unsafeLong = ExpressionEncoder[Array[Long]].resolveAndBind().
+  toRow(longArray).getArray(0)
+assert(unsafeLong.isInstanceOf[UnsafeArrayData])
+assert(unsafeLong.numElements == longArray.length)
+longArray.zipWithIndex.map { case (e, i) =>
+  assert(unsafeLong.getLong(i) == e)
+}
+
+val unsafeFloat = ExpressionEncoder[Array[Float]].resolveAndBind().
+  toRow(floatArray).getArray(0)
+assert(unsafeFloat.isInstanceOf[UnsafeArrayData])
+assert(unsafeFloat.numElements == floatArray.length)
+floatArray.zipWithIndex.map { case (e, i) =>
+  assert(unsafeFloat.getFloat(i) == e)
+}
+
+val unsafeDouble = ExpressionEncoder[Array[Double]].resolveAndBind().
+  toRow(doubleArray).getArray(0)
+assert(unsafeDouble.isInstanceOf[UnsafeArrayData])
+assert(unsafeDouble.numElements == doubleArray.length)
+doubleArray.zipWithIndex.map { case (e, i) =>
+  assert(unsafeDouble.getDouble(i) == e)
+}
+
+val unsafeMultiDimInt = 
ExpressionEncoder[Array[Array[Int]]].resolveAndBind().
+  toRow(intMultiDimArray).getArray(0)
+assert(unsafeMultiDimInt.isInstanceOf[UnsafeArrayData])
+assert(unsafeMultiDimInt.numElements == intMultiDimArray.length)
+intMultiDimArray.zipWithIndex.map { case (a, j) =>
+  val u = unsafeMultiDimInt.getArray(j)
+  assert(u.isInstanceOf[UnsafeArrayData])
+  assert(u.numElements == a.length)
+  a.zipWithIndex.map { case (e, i) =>
+assert(u.getInt(i) == e)
+  }
+}
+
+val unsafeMultiDimDouble = 
ExpressionEncoder[Array[Array[Double]]].resolveAndBind().
+  toRow(doubleMultiDimArray).getArray(0)
+assert(unsafeDouble.isInstanceOf[UnsafeArrayData])
+assert(unsafeMultiDimDouble.numElements == doubleMultiDimArray.length)
+doubleMultiDimArray.zipWithIndex.map { case (a, j) =>
+  val u = unsafeMultiDimDouble.getArray(j)
+  assert(u.isInstanceOf[UnsafeArrayData])
+  assert(u.numElements == a.length)
+  a.zipWithIndex.map { case (e, i) =>
+assert(u.getDouble(i) == e)
+  }
+}
+  }
+
+  test("from primitive array") {
+val unsafeInt =

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69850087
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/UnsafeArraySuite.scala
 ---
@@ -18,27 +18,126 @@
 package org.apache.spark.sql.catalyst.util
 
 import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
 import org.apache.spark.sql.catalyst.expressions.UnsafeArrayData
+import org.apache.spark.unsafe.Platform
 
 class UnsafeArraySuite extends SparkFunSuite {
 
-  test("from primitive int array") {
-val array = Array(1, 10, 100)
-val unsafe = UnsafeArrayData.fromPrimitiveArray(array)
-assert(unsafe.numElements == 3)
-assert(unsafe.getSizeInBytes == 4 + 4 * 3 + 4 * 3)
-assert(unsafe.getInt(0) == 1)
-assert(unsafe.getInt(1) == 10)
-assert(unsafe.getInt(2) == 100)
+  val booleanArray = Array(false, true)
+  val shortArray = Array(1.toShort, 10.toShort, 100.toShort)
+  val intArray = Array(1, 10, 100)
+  val longArray = Array(1.toLong, 10.toLong, 100.toLong)
+  val floatArray = Array(1.1.toFloat, 2.2.toFloat, 3.3.toFloat)
+  val doubleArray = Array(1.1, 2.2, 3.3)
--- End diff --

let's also test string array


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69850001
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java
 ---
@@ -33,91 +38,144 @@
   // The offset of the global buffer where we start to write this array.
   private int startingOffset;
 
-  public void initialize(BufferHolder holder, int numElements, int 
fixedElementSize) {
-// We need 4 bytes to store numElements and 4 bytes each element to 
store offset.
-final int fixedSize = 4 + 4 * numElements;
+  // The number of elements in this array
+  private int numElements;
+
+  private int headerInBytes;
+
+  private void assertIndexIsValid(int index) {
+assert index >= 0 : "index (" + index + ") should >= 0";
+assert index < numElements : "index (" + index + ") should < " + 
numElements;
+  }
+
+  public void initialize(BufferHolder holder, int numElements, int 
elementSize) {
+// We need 4 bytes to store numElements in header
+this.numElements = numElements;
+this.headerInBytes = calculateHeaderPortionInBytes(numElements);
 
 this.holder = holder;
 this.startingOffset = holder.cursor;
 
-holder.grow(fixedSize);
-Platform.putInt(holder.buffer, holder.cursor, numElements);
-holder.cursor += fixedSize;
+// Grows the global buffer ahead for header and fixed size data.
+holder.grow(headerInBytes + elementSize * numElements);
+
+// Write numElements and clear out null bits to header
+Platform.putInt(holder.buffer, startingOffset, numElements);
+for (int i = 4; i < headerInBytes; i += 8) {
+  Platform.putLong(holder.buffer, startingOffset + i, 0L);
+}
+holder.cursor += (headerInBytes + elementSize * numElements);
+  }
+
+  private long getElementOffset(int ordinal, int elementSize) {
+return startingOffset + headerInBytes + ordinal * elementSize;
+  }
+
+  public void setOffset(int ordinal, int currentCursor) {
--- End diff --

do we always pass in `holder.cursor` as `currentCursor`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69849961
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
 ---
@@ -189,28 +189,29 @@ object GenerateUnsafeProjection extends 
CodeGenerator[Seq[Expression], UnsafePro
 
 val jt = ctx.javaType(et)
 
-val fixedElementSize = et match {
+val elementOrOffsetSize = et match {
   case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS => 8
   case _ if ctx.isPrimitiveType(jt) => et.defaultSize
-  case _ => 0
+  case _ => 8  // we need 8 bytes to store offset and length for 
variable-length types
 }
 
+val tmpCursor = ctx.freshName("tmpCursor")
--- End diff --

it's never used


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69849874
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
 ---
@@ -189,28 +189,29 @@ object GenerateUnsafeProjection extends 
CodeGenerator[Seq[Expression], UnsafePro
 
 val jt = ctx.javaType(et)
 
-val fixedElementSize = et match {
+val elementOrOffsetSize = et match {
   case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS => 8
   case _ if ctx.isPrimitiveType(jt) => et.defaultSize
-  case _ => 0
+  case _ => 8  // we need 8 bytes to store offset and length for 
variable-length types
--- End diff --

It should be 4 now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13494: [SPARK-15752] [SQL] Optimize metadata only query that ha...

2016-07-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13494
  
Build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13494: [SPARK-15752] [SQL] Optimize metadata only query that ha...

2016-07-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13494
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61890/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13494: [SPARK-15752] [SQL] Optimize metadata only query that ha...

2016-07-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13494
  
**[Test build #61890 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61890/consoleFull)**
 for PR 13494 at commit 
[`88fd3bf`](https://github.com/apache/spark/commit/88fd3bfb13852e1739cef0146209437b417445f3).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14082: [SPARK-16381][SQL][SparkR] Update SQL examples and progr...

2016-07-06 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/14082
  
@shivaram @mengxr It would be nice if any of you can help review this one, 
thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14077: [SPARK-16402] [SQL] JDBC Source: Implement save API of D...

2016-07-06 Thread JustinPihony

Github user JustinPihony commented on the issue:

https://github.com/apache/spark/pull/14077
  
@gatorsmile If `copy` is a bug, then that is fine with me (I just commented 
my findings on this and will be curious to hear back). That said, it would make 
my implementation simpler. I'd be fine with simplifying it down to a basic 
save, however I am still not OK with the circular reference. It adds confusion 
to debugging and future refactorings. And to fix that, you end up back at my 
PR. So, ultimately this still seems like a duplicate to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69849567
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java
 ---
@@ -19,9 +19,14 @@
 
 import org.apache.spark.sql.types.Decimal;
 import org.apache.spark.unsafe.Platform;
+import org.apache.spark.unsafe.bitset.BitSetMethods;
 import org.apache.spark.unsafe.types.CalendarInterval;
 import org.apache.spark.unsafe.types.UTF8String;
 
+import java.util.Arrays;
--- End diff --

it's still here...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69849512
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java
 ---
@@ -341,63 +328,115 @@ public UnsafeArrayData copy() {
 return arrayCopy;
   }
 
-  public static UnsafeArrayData fromPrimitiveArray(int[] arr) {
-if (arr.length > (Integer.MAX_VALUE - 4) / 8) {
-  throw new UnsupportedOperationException("Cannot convert this array 
to unsafe format as " +
-"it's too big.");
-}
+  @Override
+  public boolean[] toBooleanArray() {
+int size = numElements();
+boolean[] values = new boolean[size];
+Platform.copyMemory(
+  baseObject, baseOffset + headerInBytes, values, 
Platform.BOOLEAN_ARRAY_OFFSET, size);
+return values;
+  }
 
-final int offsetRegionSize = 4 * arr.length;
-final int valueRegionSize = 4 * arr.length;
-final int totalSize = 4 + offsetRegionSize + valueRegionSize;
-final byte[] data = new byte[totalSize];
+  @Override
+  public byte[] toByteArray() {
+int size = numElements();
+byte[] values = new byte[size];
+Platform.copyMemory(
+  baseObject, baseOffset + headerInBytes, values, 
Platform.BYTE_ARRAY_OFFSET, size);
+return values;
+  }
 
-Platform.putInt(data, Platform.BYTE_ARRAY_OFFSET, arr.length);
+  @Override
+  public short[] toShortArray() {
+int size = numElements();
+short[] values = new short[size];
+Platform.copyMemory(
+  baseObject, baseOffset + headerInBytes, values, 
Platform.SHORT_ARRAY_OFFSET, size * 2);
+return values;
+  }
 
-int offsetPosition = Platform.BYTE_ARRAY_OFFSET + 4;
-int valueOffset = 4 + offsetRegionSize;
-for (int i = 0; i < arr.length; i++) {
-  Platform.putInt(data, offsetPosition, valueOffset);
-  offsetPosition += 4;
-  valueOffset += 4;
-}
+  @Override
+  public int[] toIntArray() {
+int size = numElements();
+int[] values = new int[size];
+Platform.copyMemory(
+  baseObject, baseOffset + headerInBytes, values, 
Platform.INT_ARRAY_OFFSET, size * 4);
+return values;
+  }
 
-Platform.copyMemory(arr, Platform.INT_ARRAY_OFFSET, data,
-  Platform.BYTE_ARRAY_OFFSET + 4 + offsetRegionSize, valueRegionSize);
+  @Override
+  public long[] toLongArray() {
+int size = numElements();
+long[] values = new long[size];
+Platform.copyMemory(
+  baseObject, baseOffset + headerInBytes, values, 
Platform.LONG_ARRAY_OFFSET, size * 8);
+return values;
+  }
 
-UnsafeArrayData result = new UnsafeArrayData();
-result.pointTo(data, Platform.BYTE_ARRAY_OFFSET, totalSize);
-return result;
+  @Override
+  public float[] toFloatArray() {
+int size = numElements();
+float[] values = new float[size];
+Platform.copyMemory(
+  baseObject, baseOffset + headerInBytes, values, 
Platform.FLOAT_ARRAY_OFFSET, size * 4);
+return values;
   }
 
-  public static UnsafeArrayData fromPrimitiveArray(double[] arr) {
-if (arr.length > (Integer.MAX_VALUE - 4) / 12) {
+  @Override
+  public double[] toDoubleArray() {
+int size = numElements();
+double[] values = new double[size];
+Platform.copyMemory(
+  baseObject, baseOffset + headerInBytes, values, 
Platform.DOUBLE_ARRAY_OFFSET, size * 8);
+return values;
+  }
+
+  private static UnsafeArrayData fromPrimitiveArray(
+   Object arr, int offset, int length, int elementSize) {
+final long headerInBytes = calculateHeaderPortionInBytes(length);
+final long valueRegionInBytes = (long)elementSize * (long)length;
+final long totalSizeInLongs = (headerInBytes + valueRegionInBytes + 7) 
/ 8;
--- End diff --

we should declare `totalSizeInLongs` as int


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69849441
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java
 ---
@@ -341,63 +328,115 @@ public UnsafeArrayData copy() {
 return arrayCopy;
   }
 
-  public static UnsafeArrayData fromPrimitiveArray(int[] arr) {
-if (arr.length > (Integer.MAX_VALUE - 4) / 8) {
-  throw new UnsupportedOperationException("Cannot convert this array 
to unsafe format as " +
-"it's too big.");
-}
+  @Override
+  public boolean[] toBooleanArray() {
+int size = numElements();
+boolean[] values = new boolean[size];
+Platform.copyMemory(
+  baseObject, baseOffset + headerInBytes, values, 
Platform.BOOLEAN_ARRAY_OFFSET, size);
+return values;
+  }
 
-final int offsetRegionSize = 4 * arr.length;
-final int valueRegionSize = 4 * arr.length;
-final int totalSize = 4 + offsetRegionSize + valueRegionSize;
-final byte[] data = new byte[totalSize];
+  @Override
+  public byte[] toByteArray() {
+int size = numElements();
+byte[] values = new byte[size];
+Platform.copyMemory(
+  baseObject, baseOffset + headerInBytes, values, 
Platform.BYTE_ARRAY_OFFSET, size);
+return values;
+  }
 
-Platform.putInt(data, Platform.BYTE_ARRAY_OFFSET, arr.length);
+  @Override
+  public short[] toShortArray() {
+int size = numElements();
+short[] values = new short[size];
+Platform.copyMemory(
+  baseObject, baseOffset + headerInBytes, values, 
Platform.SHORT_ARRAY_OFFSET, size * 2);
+return values;
+  }
 
-int offsetPosition = Platform.BYTE_ARRAY_OFFSET + 4;
-int valueOffset = 4 + offsetRegionSize;
-for (int i = 0; i < arr.length; i++) {
-  Platform.putInt(data, offsetPosition, valueOffset);
-  offsetPosition += 4;
-  valueOffset += 4;
-}
+  @Override
+  public int[] toIntArray() {
+int size = numElements();
+int[] values = new int[size];
+Platform.copyMemory(
+  baseObject, baseOffset + headerInBytes, values, 
Platform.INT_ARRAY_OFFSET, size * 4);
+return values;
+  }
 
-Platform.copyMemory(arr, Platform.INT_ARRAY_OFFSET, data,
-  Platform.BYTE_ARRAY_OFFSET + 4 + offsetRegionSize, valueRegionSize);
+  @Override
+  public long[] toLongArray() {
+int size = numElements();
+long[] values = new long[size];
+Platform.copyMemory(
+  baseObject, baseOffset + headerInBytes, values, 
Platform.LONG_ARRAY_OFFSET, size * 8);
+return values;
+  }
 
-UnsafeArrayData result = new UnsafeArrayData();
-result.pointTo(data, Platform.BYTE_ARRAY_OFFSET, totalSize);
-return result;
+  @Override
+  public float[] toFloatArray() {
+int size = numElements();
+float[] values = new float[size];
+Platform.copyMemory(
+  baseObject, baseOffset + headerInBytes, values, 
Platform.FLOAT_ARRAY_OFFSET, size * 4);
+return values;
   }
 
-  public static UnsafeArrayData fromPrimitiveArray(double[] arr) {
-if (arr.length > (Integer.MAX_VALUE - 4) / 12) {
+  @Override
+  public double[] toDoubleArray() {
+int size = numElements();
+double[] values = new double[size];
+Platform.copyMemory(
+  baseObject, baseOffset + headerInBytes, values, 
Platform.DOUBLE_ARRAY_OFFSET, size * 8);
+return values;
+  }
+
+  private static UnsafeArrayData fromPrimitiveArray(
+   Object arr, int offset, int length, int elementSize) {
+final long headerInBytes = calculateHeaderPortionInBytes(length);
+final long valueRegionInBytes = (long)elementSize * (long)length;
+final long totalSizeInLongs = (headerInBytes + valueRegionInBytes + 7) 
/ 8;
+if (totalSizeInLongs * 8> Integer.MAX_VALUE) {
--- End diff --

nit: a space after `8`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14083: [SPARK-16406][SQL] Improve performance of Logical...

2016-07-06 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/14083#discussion_r69849376
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -165,111 +169,99 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   def resolveQuoted(
   name: String,
   resolver: Resolver): Option[NamedExpression] = {
-resolve(UnresolvedAttribute.parseAttributeName(name), output, resolver)
+
outputAttributeResolver.resolve(UnresolvedAttribute.parseAttributeName(name), 
resolver)
   }
 
   /**
-   * Resolve the given `name` string against the given attribute, 
returning either 0 or 1 match.
-   *
-   * This assumes `name` has multiple parts, where the 1st part is a 
qualifier
-   * (i.e. table name, alias, or subquery alias).
-   * See the comment above `candidates` variable in resolve() for 
semantics the returned data.
+   * Refreshes (or invalidates) any metadata/data cached in the plan 
recursively.
*/
-  private def resolveAsTableColumn(
-  nameParts: Seq[String],
-  resolver: Resolver,
-  attribute: Attribute): Option[(Attribute, List[String])] = {
-assert(nameParts.length > 1)
-if (attribute.qualifier.exists(resolver(_, nameParts.head))) {
-  // At least one qualifier matches. See if remaining parts match.
-  val remainingParts = nameParts.tail
-  resolveAsColumn(remainingParts, resolver, attribute)
-} else {
-  None
-}
+  def refresh(): Unit = children.foreach(_.refresh())
+}
+
+/**
+ * Helper class for (LogicalPlan) attribute resolution. This class indexes 
attributes by their
+ * case-in-sensitive name, and checks potential candidates using the given 
Resolver. Both qualified
+ * and direct resolution are supported.
+ */
+private[catalyst] class AttributeResolver(attributes: Seq[Attribute]) 
extends Logging {
+  private def unique[T](m: Map[T, Seq[Attribute]]): Map[T, Seq[Attribute]] 
= {
+m.mapValues(_.distinct).map(identity)
   }
 
-  /**
-   * Resolve the given `name` string against the given attribute, 
returning either 0 or 1 match.
-   *
-   * Different from resolveAsTableColumn, this assumes `name` does NOT 
start with a qualifier.
-   * See the comment above `candidates` variable in resolve() for 
semantics the returned data.
-   */
-  private def resolveAsColumn(
-  nameParts: Seq[String],
-  resolver: Resolver,
-  attribute: Attribute): Option[(Attribute, List[String])] = {
-if (!attribute.isGenerated && resolver(attribute.name, 
nameParts.head)) {
-  Option((attribute.withName(nameParts.head), nameParts.tail.toList))
-} else {
-  None
+  /** Map to use for direct case insensitive attribute lookups. */
+  private val direct: Map[String, Seq[Attribute]] = {
--- End diff --

You have a point: it is the secondary code path, so it is less likely to be 
used. I'll take a look at it on my next pass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14075: [SPARK-16401] [SQL] Data Source API: Enable Extending Re...

2016-07-06 Thread JustinPihony

Github user JustinPihony commented on the issue:

https://github.com/apache/spark/pull/14075
  
@rxin It does look like this might have been a regression introduced via 
[the initial creation of 
`DataSource`](https://github.com/apache/spark/blob/1e28840594b9d972c96d3922ca0bf0f76e313e82/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala)
 from 
[`ResolvedDataSource`](https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ResolvedDataSource.scala).
 Although, @marmbrus could speak to it better as he was the one who wrote that 
code. Maybe there was a reason?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r69849225
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java
 ---
@@ -237,62 +229,57 @@ public Decimal getDecimal(int ordinal, int precision, 
int scale) {
 
   @Override
   public UTF8String getUTF8String(int ordinal) {
-assertIndexIsValid(ordinal);
-final int offset = getElementOffset(ordinal);
-if (offset < 0) return null;
-final int size = getElementSize(offset, ordinal);
+if (isNullAt(ordinal)) return null;
+final int offset = getInt(ordinal);
+final int size = getSize(ordinal);
 return UTF8String.fromAddress(baseObject, baseOffset + offset, size);
   }
 
   @Override
   public byte[] getBinary(int ordinal) {
-assertIndexIsValid(ordinal);
-final int offset = getElementOffset(ordinal);
-if (offset < 0) return null;
-final int size = getElementSize(offset, ordinal);
+if (isNullAt(ordinal)) return null;
+final int offset = getInt(ordinal);
+final int size = getSize(ordinal);
 final byte[] bytes = new byte[size];
 Platform.copyMemory(baseObject, baseOffset + offset, bytes, 
Platform.BYTE_ARRAY_OFFSET, size);
 return bytes;
   }
 
   @Override
   public CalendarInterval getInterval(int ordinal) {
-assertIndexIsValid(ordinal);
-final int offset = getElementOffset(ordinal);
-if (offset < 0) return null;
+if (isNullAt(ordinal)) return null;
+final long offsetAndSize = getLong(ordinal);
--- End diff --

it's unused


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14083: [SPARK-16406][SQL] Improve performance of Logical...

2016-07-06 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14083#discussion_r69849157
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -165,111 +169,99 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   def resolveQuoted(
   name: String,
   resolver: Resolver): Option[NamedExpression] = {
-resolve(UnresolvedAttribute.parseAttributeName(name), output, resolver)
+
outputAttributeResolver.resolve(UnresolvedAttribute.parseAttributeName(name), 
resolver)
   }
 
   /**
-   * Resolve the given `name` string against the given attribute, 
returning either 0 or 1 match.
-   *
-   * This assumes `name` has multiple parts, where the 1st part is a 
qualifier
-   * (i.e. table name, alias, or subquery alias).
-   * See the comment above `candidates` variable in resolve() for 
semantics the returned data.
+   * Refreshes (or invalidates) any metadata/data cached in the plan 
recursively.
*/
-  private def resolveAsTableColumn(
-  nameParts: Seq[String],
-  resolver: Resolver,
-  attribute: Attribute): Option[(Attribute, List[String])] = {
-assert(nameParts.length > 1)
-if (attribute.qualifier.exists(resolver(_, nameParts.head))) {
-  // At least one qualifier matches. See if remaining parts match.
-  val remainingParts = nameParts.tail
-  resolveAsColumn(remainingParts, resolver, attribute)
-} else {
-  None
-}
+  def refresh(): Unit = children.foreach(_.refresh())
+}
+
+/**
+ * Helper class for (LogicalPlan) attribute resolution. This class indexes 
attributes by their
+ * case-in-sensitive name, and checks potential candidates using the given 
Resolver. Both qualified
+ * and direct resolution are supported.
+ */
+private[catalyst] class AttributeResolver(attributes: Seq[Attribute]) 
extends Logging {
+  private def unique[T](m: Map[T, Seq[Attribute]]): Map[T, Seq[Attribute]] 
= {
+m.mapValues(_.distinct).map(identity)
   }
 
-  /**
-   * Resolve the given `name` string against the given attribute, 
returning either 0 or 1 match.
-   *
-   * Different from resolveAsTableColumn, this assumes `name` does NOT 
start with a qualifier.
-   * See the comment above `candidates` variable in resolve() for 
semantics the returned data.
-   */
-  private def resolveAsColumn(
-  nameParts: Seq[String],
-  resolver: Resolver,
-  attribute: Attribute): Option[(Attribute, List[String])] = {
-if (!attribute.isGenerated && resolver(attribute.name, 
nameParts.head)) {
-  Option((attribute.withName(nameParts.head), nameParts.tail.toList))
-} else {
-  None
+  /** Map to use for direct case insensitive attribute lookups. */
+  private val direct: Map[String, Seq[Attribute]] = {
--- End diff --

nit: do we need to use lazy val?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-06 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r69849058
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -236,29 +246,41 @@ private[spark] class TaskSchedulerImpl(
* given TaskSetManager have completed, so state associated with the 
TaskSetManager should be
* cleaned up.
*/
-  def taskSetFinished(manager: TaskSetManager): Unit = synchronized {
+  def taskSetFinished(manager: TaskSetManager, success: Boolean): Unit = 
synchronized {
 taskSetsByStageIdAndAttempt.get(manager.taskSet.stageId).foreach { 
taskSetsForStage =>
   taskSetsForStage -= manager.taskSet.stageAttemptId
   if (taskSetsForStage.isEmpty) {
 taskSetsByStageIdAndAttempt -= manager.taskSet.stageId
   }
 }
 manager.parent.removeSchedulable(manager)
-logInfo("Removed TaskSet %s, whose tasks have all completed, from pool 
%s"
-  .format(manager.taskSet.id, manager.parent.name))
+if (success) {
+  blacklistTracker.taskSetSucceeded(manager.taskSet.stageId, this)
+  logInfo(s"Removed TaskSet ${manager.taskSet.id}, whose tasks have 
all completed, from pool" +
+s" ${manager.parent.name}")
+} else {
+  blacklistTracker.taskSetFailed(manager.taskSet.stageId)
+  logInfo(s"Removed TaskSet ${manager.taskSet.id}, since it failed, 
from pool" +
+s" ${manager.parent.name}")
--- End diff --

Changing the log msg is unrelated to blacklisting, but this msg had always 
annoyed / confused me earlier, so I thought it was worth updating since i 
needed `success` anyway.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14053: [SPARK-16374] [SQL] Remove Alias from MetastoreRe...

2016-07-06 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14053


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14053: [SPARK-16374] [SQL] Remove Alias from MetastoreRelation ...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14053
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14004: [SPARK-16285][SQL] Implement sentences SQL functions

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14004
  
LGTM except some style comment, thanks for working on it!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14004: [SPARK-16285][SQL] Implement sentences SQL functi...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14004#discussion_r69848762
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala ---
@@ -347,4 +347,24 @@ class StringFunctionsSuite extends QueryTest with 
SharedSQLContext {
   df2.filter("b>0").selectExpr("format_number(a, b)"),
   Row("5.") :: Row("4.000") :: Row("4.000") :: Row("4.000") :: 
Row("3.00") :: Nil)
   }
+
+  test("string sentences function") {
+val df = Seq(("Hi there! The price was $1,234.56 But, not now.", 
"en", "US"))
+  .toDF("str", "language", "country")
+
+checkAnswer(
+  df.selectExpr("sentences(str, language, country)"),
+  Row(Seq(Seq("Hi", "there"), Seq("The", "price", "was"), Seq("But", 
"not", "now"
+
+// Type coercion
+checkAnswer(
+  df.selectExpr("sentences(null)", "sentences(10)", "sentences(3.14)"),
+  Row(null, Seq(Seq("10")), Seq(Seq("3.14"
+
+// Argument number exception
+val m = intercept[AnalysisException] {
+  df.selectExpr("sentences()")
+}.getMessage
+assert(m.contains("Invalid number of arguments"))
--- End diff --

btw what's the full error message here? I thought it would be `function not 
found` or something...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14004: [SPARK-16285][SQL] Implement sentences SQL functi...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14004#discussion_r69848674
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala
 ---
@@ -725,4 +725,41 @@ class StringExpressionsSuite extends SparkFunSuite 
with ExpressionEvalHelper {
 checkEvaluation(FindInSet(Literal("abf"), Literal("abc,b,ab,c,def")), 
0)
 checkEvaluation(FindInSet(Literal("ab,"), Literal("abc,b,ab,c,def")), 
0)
   }
+
+  test("Sentences") {
+val nullString = Literal.create(null, StringType)
+checkEvaluation(Sentences(nullString, nullString, nullString), null, 
EmptyRow)
+checkEvaluation(Sentences(nullString, nullString), null, EmptyRow)
+checkEvaluation(Sentences(nullString), null, EmptyRow)
+checkEvaluation(Sentences(Literal.create(null, NullType)), null, 
EmptyRow)
+checkEvaluation(Sentences("", nullString, nullString), Seq.empty, 
EmptyRow)
+checkEvaluation(Sentences("", nullString), Seq.empty, EmptyRow)
+checkEvaluation(Sentences(""), Seq.empty, EmptyRow)
+
+val correct_answer = Seq(
+  Seq("Hi", "there"),
+  Seq("The", "price", "was"),
+  Seq("But", "not", "now"))
+
+// Hive compatible test-cases.
+checkEvaluation(
+  Sentences("Hi there! The price was $1,234.56 But, not now."),
+  correct_answer,
+  EmptyRow)
--- End diff --

`EmptyRow` is the default value, we don't need to pass it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13517: [SPARK-14839][SQL] Support for other types for `t...

2016-07-06 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13517


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types for `tablePro...

2016-07-06 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/13517
  
NP :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14004: [SPARK-16285][SQL] Implement sentences SQL functi...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14004#discussion_r69848618
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -198,6 +203,66 @@ case class StringSplit(str: Expression, pattern: 
Expression)
   override def prettyName: String = "split"
 }
 
+/**
+ * Splits a string into arrays of sentences, where each sentence is an 
array of words.
+ * The 'lang' and 'country' arguments are optional, and if omitted, the 
default locale is used.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, lang, country) - Splits str into an array of array 
of words.",
+  extended = "> SELECT _FUNC_('Hi there! Good morning.');\n  
[['Hi','there'], ['Good','morning']]")
+case class Sentences(
+str: Expression,
+language: Expression = Literal(""),
+country: Expression = Literal(""))
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  def this(str: Expression) = this(str, Literal(""), Literal(""))
+  def this(str: Expression, language: Expression) = this(str, language, 
Literal(""))
+
+  override def nullable: Boolean = true
+  override def dataType: DataType =
+ArrayType(ArrayType(StringType, containsNull = false), containsNull = 
false)
+  override def inputTypes: Seq[AbstractDataType] = Seq(StringType, 
StringType, StringType)
+  override def children: Seq[Expression] = str :: language :: country :: 
Nil
+
+  override def eval(input: InternalRow): Any = {
+val string = str.eval(input)
+if (string == null) {
+  null
+} else {
+  val locale = try {
+new Locale(language.eval(input).asInstanceOf[UTF8String].toString,
+  country.eval(input).asInstanceOf[UTF8String].toString)
+  } catch {
+case _: NullPointerException | _: ClassCastException => 
Locale.getDefault
--- End diff --

I'd like to use `if` to check the null explicitly. And `ClassCastException` 
will never happen as we have type checking.
BTW what will be the result of invalid language and country? e.g. `new 
Locale("abc", "xyz")`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types for `tablePro...

2016-07-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/13517
  
Oh... sorry... and thanks..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types for `tablePro...

2016-07-06 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/13517
  
@HyukjinKwon still on holiday...

LGTM - merging to master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14083: [SPARK-16406][SQL] Improve performance of LogicalPlan.re...

2016-07-06 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14083
  
yea. :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14083: [SPARK-16406][SQL] Improve performance of LogicalPlan.re...

2016-07-06 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/14083
  
@viirya you mean I forgot to add `time(sql(query))`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14077: [SPARK-16402] [SQL] JDBC Source: Implement save API of D...

2016-07-06 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14077
  
@JustinPihony Thanks for your review! Let me try to answer your concerns. 
- The `copy` function location is actually a bug. See another PR: 
https://github.com/apache/spark/pull/14075. 
- The trait `CreatableRelationProvider` was introduced for the `save` API. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid un...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14036#discussion_r69848182
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 ---
@@ -277,14 +268,52 @@ case class Divide(left: Expression, right: Expression)
   if (${eval1.isNull}) {
 ${ev.isNull} = true;
   } else {
-${ev.value} = $divide;
+${ev.value} = $division;
   }
 }""")
 }
   }
 }
 
 @ExpressionDescription(
+  usage = "a _FUNC_ b - Divides a by b.",
--- End diff --

we should mention this is a fraction division, i.e. the parameter must be 
fraction type and the result is also fraction.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14083: [SPARK-16406][SQL] Improve performance of LogicalPlan.re...

2016-07-06 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14083
  
The codes in the description seems incomplete?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types for `tablePro...

2016-07-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/13517
  
(@hvanhovell I just addressed your comments!)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid un...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14036#discussion_r69848124
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 ---
@@ -249,11 +244,7 @@ case class Divide(left: Expression, right: Expression)
   s"${eval2.value} == 0"
 }
 val javaType = ctx.javaType(dataType)
-val divide = if (dataType.isInstanceOf[DecimalType]) {
--- End diff --

Does it already cover both fraction and integral division?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13890: [SPARK-16189][SQL] Add ExistingRDD logical plan f...

2016-07-06 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13890#discussion_r69848022
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -74,13 +74,71 @@ object RDDConversions {
   }
 }
 
+private[sql] object ExistingRDD {
+
+  def apply[T: Encoder](rdd: RDD[T])(session: SparkSession): LogicalPlan = 
{
+val exisitingRdd = ExistingRDD(CatalystSerde.generateObjAttr[T], 
rdd)(session)
+CatalystSerde.serialize[T](exisitingRdd)
+  }
+}
+
 /** Logical plan node for scanning data from an RDD. */
+private[sql] case class ExistingRDD[T](
+outputObjAttr: Attribute,
+rdd: RDD[T])(session: SparkSession)
+  extends LeafNode with ObjectProducer with MultiInstanceRelation {
+
+  override protected final def otherCopyArgs: Seq[AnyRef] = session :: Nil
+
+  override def newInstance(): ExistingRDD.this.type =
+ExistingRDD(outputObjAttr.newInstance(), 
rdd)(session).asInstanceOf[this.type]
+
+  override def sameResult(plan: LogicalPlan): Boolean = {
+plan.canonicalized match {
+  case ExistingRDD(_, otherRDD) => rdd.id == otherRDD.id
+  case _ => false
+}
+  }
+
+  override protected def stringArgs: Iterator[Any] = Iterator(output)
+
+  @transient override lazy val statistics: Statistics = Statistics(
+// TODO: Instead of returning a default value here, find a way to 
return a meaningful size
+// estimate for RDDs. See PR 1238 for more discussions.
+sizeInBytes = BigInt(session.sessionState.conf.defaultSizeInBytes)
+  )
+}
+
+/** Physical plan node for scanning data from an RDD. */
+private[sql] case class ExistingRDDScanExec[T](
--- End diff --

how about ExternalRDDScan?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

2016-07-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/13988
  
(@rxin gentle ping..)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14026: [SPARK-13569][STREAMING][KAFKA] pattern based topic subs...

2016-07-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14026
  
**[Test build #61892 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61892/consoleFull)**
 for PR 14026 at commit 
[`f287722`](https://github.com/apache/spark/commit/f2877226ebe8c70d44f35a10bab056402bc9ffa9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #8066: [SPARK-9778][SQL] remove unnecessary evaluation fr...

2016-07-06 Thread cloud-fan

Github user cloud-fan closed the pull request at:

https://github.com/apache/spark/pull/8066


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14072: [SPARK-16398][CORE] Make cancelJob and cancelStage APIs ...

2016-07-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14072
  
**[Test build #3167 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3167/consoleFull)**
 for PR 14072 at commit 
[`e2d2cb1`](https://github.com/apache/spark/commit/e2d2cb12af4f30c7fc8f2ff822e49644cfba149d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13876: [SPARK-16174][SQL] Improve `OptimizeIn` optimizer to rem...

2016-07-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13876
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13876: [SPARK-16174][SQL] Improve `OptimizeIn` optimizer to rem...

2016-07-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13876
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61888/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13876: [SPARK-16174][SQL] Improve `OptimizeIn` optimizer to rem...

2016-07-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13876
  
**[Test build #61888 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61888/consoleFull)**
 for PR 13876 at commit 
[`ccf972d`](https://github.com/apache/spark/commit/ccf972dc9c258651a5d9977a4e76d87805f4c1c6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-07-06 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13701
  
I will update this soon..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13778: [SPARK-16062][SPARK-15989][SQL] Fix two bugs of P...

2016-07-06 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/13778#discussion_r69844600
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -374,13 +407,15 @@ object MapObjects {
  * @param lambdaFunction A function that take the `loopVar` as input, and 
used as lambda function
  *   to handle collection elements.
  * @param inputData An expression that when evaluated returns a collection 
object.
+ * @param inputDataType The dataType of inputData.
--- End diff --

OK.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 >

1 - 100 of 736 matches

Mail list logo