subject:"spark git commit\: \[SPARK\-11080\] \[SQL\] Incorporate per\-JVM id into ExprId to prevent unsafe cross\-JVM comparisions"

spark git commit: [SPARK-11080] [SQL] Incorporate per-JVM id into ExprId to prevent unsafe cross-JVM comparisions

2015-12-11 Thread davies

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 cb0246c93 -> 5e603a51c


[SPARK-11080] [SQL] Incorporate per-JVM id into ExprId to prevent unsafe 
cross-JVM comparisions

In the current implementation of named expressions' `ExprIds`, we rely on a 
per-JVM AtomicLong to ensure that expression ids are unique within a JVM. 
However, these expression ids will not be _globally_ unique. This opens the 
potential for id collisions if new expression ids happen to be created inside 
of tasks rather than on the driver.

There are currently a few cases where tasks allocate expression ids, which 
happen to be safe because those expressions are never compared to expressions 
created on the driver. In order to guard against the introduction of invalid 
comparisons between driver-created and executor-created expression ids, this 
patch extends `ExprId` to incorporate a UUID to identify the JVM that created 
the id, which prevents collisions.

Author: Josh Rosen 

Closes #9093 from JoshRosen/SPARK-11080.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5e603a51
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5e603a51
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5e603a51

Branch: refs/heads/branch-1.5
Commit: 5e603a51c09a94280c346bee12def0c49479d069
Parents: cb0246c
Author: Josh Rosen 
Authored: Tue Oct 13 15:09:31 2015 -0700
Committer: Davies Liu 
Committed: Fri Dec 11 12:37:54 2015 -0800

--
 .../sql/catalyst/expressions/namedExpressions.scala  | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/5e603a51/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
index 5768c60..8957df0 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
@@ -17,6 +17,8 @@
 
 package org.apache.spark.sql.catalyst.expressions
 
+import java.util.UUID
+
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute
 import org.apache.spark.sql.catalyst.expressions.codegen._
@@ -24,16 +26,23 @@ import org.apache.spark.sql.types._
 
 object NamedExpression {
   private val curId = new java.util.concurrent.atomic.AtomicLong()
-  def newExprId: ExprId = ExprId(curId.getAndIncrement())
+  private[expressions] val jvmId = UUID.randomUUID()
+  def newExprId: ExprId = ExprId(curId.getAndIncrement(), jvmId)
   def unapply(expr: NamedExpression): Option[(String, DataType)] = 
Some(expr.name, expr.dataType)
 }
 
 /**
- * A globally unique (within this JVM) id for a given named expression.
+ * A globally unique id for a given named expression.
  * Used to identify which attribute output by a relation is being
  * referenced in a subsequent computation.
+ *
+ * The `id` field is unique within a given JVM, while the `uuid` is used to 
uniquely identify JVMs.
  */
-case class ExprId(id: Long)
+case class ExprId(id: Long, jvmId: UUID)
+
+object ExprId {
+  def apply(id: Long): ExprId = ExprId(id, NamedExpression.jvmId)
+}
 
 /**
  * An [[Expression]] that is named.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-11080] [SQL] Incorporate per-JVM id into ExprId to prevent unsafe cross-JVM comparisions

2015-10-13 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master 0d1b73b78 -> ef72673b2


[SPARK-11080] [SQL] Incorporate per-JVM id into ExprId to prevent unsafe 
cross-JVM comparisions

In the current implementation of named expressions' `ExprIds`, we rely on a 
per-JVM AtomicLong to ensure that expression ids are unique within a JVM. 
However, these expression ids will not be _globally_ unique. This opens the 
potential for id collisions if new expression ids happen to be created inside 
of tasks rather than on the driver.

There are currently a few cases where tasks allocate expression ids, which 
happen to be safe because those expressions are never compared to expressions 
created on the driver. In order to guard against the introduction of invalid 
comparisons between driver-created and executor-created expression ids, this 
patch extends `ExprId` to incorporate a UUID to identify the JVM that created 
the id, which prevents collisions.

Author: Josh Rosen 

Closes #9093 from JoshRosen/SPARK-11080.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ef72673b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ef72673b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ef72673b

Branch: refs/heads/master
Commit: ef72673b234579c161b8cbb6cafc851d9eba1bfb
Parents: 0d1b73b
Author: Josh Rosen 
Authored: Tue Oct 13 15:09:31 2015 -0700
Committer: Michael Armbrust 
Committed: Tue Oct 13 15:09:31 2015 -0700

--
 .../sql/catalyst/expressions/namedExpressions.scala  | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ef72673b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
index 5768c60..8957df0 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
@@ -17,6 +17,8 @@
 
 package org.apache.spark.sql.catalyst.expressions
 
+import java.util.UUID
+
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute
 import org.apache.spark.sql.catalyst.expressions.codegen._
@@ -24,16 +26,23 @@ import org.apache.spark.sql.types._
 
 object NamedExpression {
   private val curId = new java.util.concurrent.atomic.AtomicLong()
-  def newExprId: ExprId = ExprId(curId.getAndIncrement())
+  private[expressions] val jvmId = UUID.randomUUID()
+  def newExprId: ExprId = ExprId(curId.getAndIncrement(), jvmId)
   def unapply(expr: NamedExpression): Option[(String, DataType)] = 
Some(expr.name, expr.dataType)
 }
 
 /**
- * A globally unique (within this JVM) id for a given named expression.
+ * A globally unique id for a given named expression.
  * Used to identify which attribute output by a relation is being
  * referenced in a subsequent computation.
+ *
+ * The `id` field is unique within a given JVM, while the `uuid` is used to 
uniquely identify JVMs.
  */
-case class ExprId(id: Long)
+case class ExprId(id: Long, jvmId: UUID)
+
+object ExprId {
+  def apply(id: Long): ExprId = ExprId(id, NamedExpression.jvmId)
+}
 
 /**
  * An [[Expression]] that is named.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-11080] [SQL] Incorporate per-JVM id into ExprId to prevent unsafe cross-JVM comparisions

spark git commit: [SPARK-11080] [SQL] Incorporate per-JVM id into ExprId to prevent unsafe cross-JVM comparisions

2 matches

Site Navigation

Mail list logo

Footer information