[GitHub] spark pull request: [SPARK-14321][SQL] Reduce date format cost and...

2016-05-29 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request:

https://github.com/apache/spark/pull/12105#issuecomment-222408035
  
Sorry about the delay in responding to this. Will try to rebase and post 
the patch asap.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14321][SQL] Reduce date format cost and...

2016-05-24 Thread sameeragarwal
Github user sameeragarwal commented on the pull request:

https://github.com/apache/spark/pull/12105#issuecomment-221437247
  
@rajeshbalamohan thanks again for working on this; it'd be great to have 
this in 2.0. Do you have cycles to bring to this up to date and address 
reviewer comments?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14321][SQL] Reduce date format cost and...

2016-05-20 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/12105#issuecomment-220533781
  
hi @rajeshbalamohan , sorry for the delay,  do you wanna try this again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14321][SQL] Reduce date format cost and...

2016-05-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/12105#discussion_r63999680
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -391,21 +393,24 @@ abstract class UnixTime extends BinaryExpression with 
ExpectsInputTypes {
   case StringType if right.foldable =>
 val sdf = classOf[SimpleDateFormat].getName
 val fString = if (constFormat == null) null else 
constFormat.toString
-val formatter = ctx.freshName("formatter")
 if (fString == null) {
   s"""
 boolean ${ev.isNull} = true;
 ${ctx.javaType(dataType)} ${ev.value} = 
${ctx.defaultValue(dataType)};
   """
 } else {
+  val formatter = ctx.freshName("formatter")
+  ctx.addMutableState(sdf, formatter, s"""$formatter = null;""")
--- End diff --

is it possible to create the formatter here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-04-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/12105#issuecomment-212182055
  
This is trivial but would be great if the title follows 
`[SPARK-X][SQL]` format as described in 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-04-18 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request:

https://github.com/apache/spark/pull/12105#issuecomment-211286426
  
In the generated code, it returns null if constFormat == null.  So it is 
not required to change the generated code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-04-18 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/12105#discussion_r60020044
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -391,21 +395,24 @@ abstract class UnixTime extends BinaryExpression with 
ExpectsInputTypes {
   case StringType if right.foldable =>
 val sdf = classOf[SimpleDateFormat].getName
 val fString = if (constFormat == null) null else 
constFormat.toString
-val formatter = ctx.freshName("formatter")
 if (fString == null) {
   s"""
 boolean ${ev.isNull} = true;
 ${ctx.javaType(dataType)} ${ev.value} = 
${ctx.defaultValue(dataType)};
   """
 } else {
+  val formatter = ctx.freshName("formatter")
+  ctx.addMutableState(sdf, formatter, s"""$formatter = null;""")
   val eval1 = left.gen(ctx)
   s"""
 ${eval1.code}
 boolean ${ev.isNull} = ${eval1.isNull};
 ${ctx.javaType(dataType)} ${ev.value} = 
${ctx.defaultValue(dataType)};
 if (!${ev.isNull}) {
   try {
-$sdf $formatter = new $sdf("$fString");
+if ($formatter == null) {
+  $formatter = new $sdf("$fString");
--- End diff --

Hm, but doesn't this generated code then need the same guards as above? I 
think I may not know enough to review this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-04-17 Thread rajeshbalamohan
Github user rajeshbalamohan commented on a diff in the pull request:

https://github.com/apache/spark/pull/12105#discussion_r60001514
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -368,7 +369,10 @@ abstract class UnixTime extends BinaryExpression with 
ExpectsInputTypes {
   t.asInstanceOf[Long] / 100L
 case StringType if right.foldable =>
   if (constFormat != null) {
-Try(new SimpleDateFormat(constFormat.toString).parse(
+if (formatter == null) {
+  formatter = Try(new 
SimpleDateFormat(constFormat.toString)).getOrElse(null)
--- End diff --

Didn't want to throw the error back as it would break the earlier 
functionality.  Eearlier it was returning null when any exception (i.e, could 
be constFormat being null, or parsing error) was thrown.  

Creating the formatter upfront in the recent commit and handling null 
earlier itself, to have minimal changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-04-15 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/12105#discussion_r59864380
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -355,6 +355,7 @@ abstract class UnixTime extends BinaryExpression with 
ExpectsInputTypes {
   override def nullable: Boolean = true
 
   private lazy val constFormat: UTF8String = 
right.eval().asInstanceOf[UTF8String]
+  private var formatter: SimpleDateFormat = _
--- End diff --

Not sure this is worth changing, but `...: DateFormat = null`? I actually 
am not familiar with the convention for when to use _ vs null. I thought = _ 
was mostly for defs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-04-15 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/12105#discussion_r59864421
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -368,7 +369,10 @@ abstract class UnixTime extends BinaryExpression with 
ExpectsInputTypes {
   t.asInstanceOf[Long] / 100L
 case StringType if right.foldable =>
   if (constFormat != null) {
-Try(new SimpleDateFormat(constFormat.toString).parse(
+if (formatter == null) {
+  formatter = Try(new 
SimpleDateFormat(constFormat.toString)).getOrElse(null)
--- End diff --

`.orNull` is available? but, this will just cause an NPE in the next line, 
which will also get squashed. Do we really want to hide both of these? why not 
let the error fly?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-04-15 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request:

https://github.com/apache/spark/pull/12105#issuecomment-210384204
  
Revised the patch addressing comments. Fixed eval() of UnixTime, 
FromUnixTime.  Haven't changed eval in DateFormatClass as i am not sure if 
format can change in between. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-04-14 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request:

https://github.com/apache/spark/pull/12105#issuecomment-210202377
  
Sorry about the delay. I will share the update patch today


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-04-14 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/12105#issuecomment-20986
  
@rajeshbalamohan are you still working on this? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-04-10 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/12105#issuecomment-208000318
  
Ping @rajeshbalamohan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-04-06 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/12105#issuecomment-206476854
  
@rajeshbalamohan Could you also update the eval() of UnixTime, FromUnixTime 
and DateFormatClass?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-04-06 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/12105#discussion_r58746649
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -519,7 +530,10 @@ case class FromUnixTime(sec: Expression, format: 
Expression)
   ${ctx.javaType(dataType)} ${ev.value} = 
${ctx.defaultValue(dataType)};
   if (!${ev.isNull}) {
 try {
-  ${ev.value} = UTF8String.fromString(new 
$sdf("${constFormat.toString}").format(
+  if ($sdfTerm == null) {
--- End diff --

format is an constant here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-04-06 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/12105#discussion_r58745694
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -416,11 +419,17 @@ abstract class UnixTime extends BinaryExpression with 
ExpectsInputTypes {
 }
   case StringType =>
 val sdf = classOf[SimpleDateFormat].getName
+val formatter = ctx.freshName("sdf")
+ctx.addMutableState(sdf, formatter, s"""$formatter = null;""")
 nullSafeCodeGen(ctx, ev, (string, format) => {
   s"""
 try {
+  if ($formatter == null ||
+  !$formatter.toPattern().equals("$format.toString()")) {
--- End diff --

This optimization seems useless to me too, I'd like to not have it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-04-06 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/12105#discussion_r58716090
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -519,7 +530,10 @@ case class FromUnixTime(sec: Expression, format: 
Expression)
   ${ctx.javaType(dataType)} ${ev.value} = 
${ctx.defaultValue(dataType)};
   if (!${ev.isNull}) {
 try {
-  ${ev.value} = UTF8String.fromString(new 
$sdf("${constFormat.toString}").format(
+  if ($sdfTerm == null) {
--- End diff --

Does this not need the same guard above to check the format? I suspect I 
might misunderstand the logic. CC @davies for a look too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-04-06 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/12105#discussion_r58715959
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -416,11 +419,17 @@ abstract class UnixTime extends BinaryExpression with 
ExpectsInputTypes {
 }
   case StringType =>
 val sdf = classOf[SimpleDateFormat].getName
+val formatter = ctx.freshName("sdf")
+ctx.addMutableState(sdf, formatter, s"""$formatter = null;""")
 nullSafeCodeGen(ctx, ev, (string, format) => {
   s"""
 try {
+  if ($formatter == null ||
+  !$formatter.toPattern().equals("$format.toString()")) {
--- End diff --

So, this raises an interesting point ... the SimpleDateFormat you're 
caching is a reference used across logically different code branches that need 
different formats. Are we sure this is a win? because every time a different 
format is needed you have to switch anyway. I might just be paranoid


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-04-06 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/12105#discussion_r58715478
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -391,13 +391,14 @@ abstract class UnixTime extends BinaryExpression with 
ExpectsInputTypes {
   case StringType if right.foldable =>
 val sdf = classOf[SimpleDateFormat].getName
 val fString = if (constFormat == null) null else 
constFormat.toString
-val formatter = ctx.freshName("formatter")
 if (fString == null) {
   s"""
 boolean ${ev.isNull} = true;
 ${ctx.javaType(dataType)} ${ev.value} = 
${ctx.defaultValue(dataType)};
   """
 } else {
+  val formatter = ctx.freshName("sdf")
--- End diff --

Pardon my ignorance but is "sdf" the right name here? I admit I am not 
familiar with how the code gen works, but just looking at how it seems to 
operate, formatter seems to be a different reference than sdf.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-04-03 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request:

https://github.com/apache/spark/pull/12105#issuecomment-205122837
  
Agreed. Thanks @srowen . Reverted calendar changes in DateTimeUtils in 
recent commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-04-03 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/12105#discussion_r58320790
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -59,6 +59,8 @@ object DateTimeUtils {
   final val toYearZero = to2001 + 7304850
   final val TimeZoneGMT = TimeZone.getTimeZone("GMT")
 
+  lazy val c = Calendar.getInstance(TimeZone.getTimeZone("GMT"))
--- End diff --

Yes if SimpleDateFormat isn't used in multiple threads and you're sure it 
can't be, then caching it in the instance is OK. 

This however isn't thread-safe. The Calendar would be shared across threads.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-04-03 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the pull request:

https://github.com/apache/spark/pull/12105#issuecomment-205082821
  
SDF declared in the generated code is not shared in multiple threads.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-04-01 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/12105#issuecomment-204441304
  
The problem is that this object is not thread-safe. Unless you know this 
instance is only accessed by one thread, this potentially fails when accessed 
concurrently. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-04-01 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12105#issuecomment-204296880
  
**[Test build #2721 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2721/consoleFull)**
 for PR 12105 at commit 
[`6fd07db`](https://github.com/apache/spark/commit/6fd07db11b5c9eed795dde11177f1c245a6fef16).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-04-01 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12105#issuecomment-204267311
  
**[Test build #2721 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2721/consoleFull)**
 for PR 12105 at commit 
[`6fd07db`](https://github.com/apache/spark/commit/6fd07db11b5c9eed795dde11177f1c245a6fef16).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-03-31 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/12105#issuecomment-204228144
  
Jenkins, OK to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-03-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12105#issuecomment-204227649
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...

2016-03-31 Thread rajeshbalamohan
GitHub user rajeshbalamohan opened a pull request:

https://github.com/apache/spark/pull/12105

SPARK-14321. [SQL]  Reduce date format cost and string-to-date cost i…

## What changes were proposed in this pull request?

Here is the generated code snippet when executing date functions. 
SimpleDateFormat is fairly expensive and can show up bottleneck when processing 
millions of records. It would be better to instantiate it once.

```
/* 066 */ UTF8String primitive5 = null;
/* 067 */ if (!isNull4) {
/* 068 */   try {
/* 069 */ primitive5 = UTF8String.fromString(new 
java.text.SimpleDateFormat("-MM-dd HH:mm:ss").format(
/* 070 */ new java.util.Date(primitive7 * 1000L)));
/* 071 */   } catch (java.lang.Throwable e) {
/* 072 */ isNull4 = true;
/* 073 */   }
/* 074 */ }
```

With modified code, here is the generated code
```
/* 010 */   private java.text.SimpleDateFormat sdf2;
/* 011 */   private UnsafeRow result13;
/* 012 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder bufferHolder14;
/* 013 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter15;
/* 014 */
...
...
/* 065 */ boolean isNull0 = isNull3;
/* 066 */ UTF8String primitive1 = null;
/* 067 */ if (!isNull0) {
/* 068 */   try {
/* 069 */ if (sdf2 == null) {
/* 070 */   sdf2 = new java.text.SimpleDateFormat("-MM-dd 
HH:mm:ss");
/* 071 */ }
/* 072 */ primitive1 = UTF8String.fromString(sdf2.format(
/* 073 */ new java.util.Date(primitive4 * 1000L)));
/* 074 */   } catch (java.lang.Throwable e) {
/* 075 */ isNull0 = true;
/* 076 */   }
/* 077 */ }
```

Similarly Calendar.getInstance was used in DateTimeUtils which can be 
lazily inited.


## How was this patch tested?


org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite,org.apache.spark.sql.catalyst.util.DateTimeUtilsSuite
Also tried with couple of sample SQL queries with single executor (6GB) 
which showed good improvement with the fix.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rajeshbalamohan/spark SPARK-14321

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12105.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12105


commit 6fd07db11b5c9eed795dde11177f1c245a6fef16
Author: Rajesh Balamohan 
Date:   2016-04-01T02:41:07Z

SPARK-14321. [SQL]  Reduce date format cost and string-to-date cost in date 
functions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org