[GitHub] spark issue #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apache.spa...
Github user obermeier commented on the issue: https://github.com/apache/spark/pull/19408 If Spark runs in YARN Cluster this issue still exists --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apa...
GitHub user obermeier reopened a pull request: https://github.com/apache/spark/pull/19408 [SPARK-22180][CORE] Allow IPv6 address in org.apache.spark.util.Utils.parseHostPort External applications like Apache Cassandra are able to deal with IPv6 addresses. Libraries like spark-cassandra-connector combine Apache Cassandra with Apache Spark. This combination is very useful IMHO. One problem is that `org.apache.spark.util.Utils.parseHostPort(hostPort:` `String)` takes the last colon to sepperate the port from host path. This conflicts with literal IPv6 addresses. I think we can take `hostPort` as literal IPv6 address if it contains tow ore more colons. If IPv6 addresses are enclosed in square brackets port definition is still possible. You can merge this pull request into a Git repository by running: $ git pull https://github.com/obermeier/spark issue/SPARK-22180 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19408.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19408 commit 453e104c3e6ed6a5ca310f599c274d6c66a3d3c8 Author: Stefan Obermeier Date: 2017-10-01T14:28:58Z [SPARK-22180][CORE] Allow IPv6 address in org.apache.spark.util.Utils.parseHostPort ## What changes were proposed in this pull request? Take ```hostPort``` as literal IPv6 address if it contains tow ore more colons. If IPv6 addresses are enclosed in square brackets port definition is still possible. ## How was this patch tested? Added a new test case into UtilsSuite Remove comment commit 1e6623f518b43df4079bdc2680288063c6be13c6 Author: obermeier Date: 2017-12-14T22:52:21Z Merge branch 'master' into issue/SPARK-22180 commit 1400299808631da0196a61f3588ede786dd0b041 Author: Stefan Obermeier Date: 2017-12-15T07:39:45Z Fix build problem commit 68c322129305a35c9d3e04f8cacc011be5fbaec4 Author: Stefan Obermeier Date: 2017-12-18T13:51:04Z Fix style checks violation. Remove whitespace at end of line. commit 8220d95a99f4564e735f22947cb1cb698613efa5 Author: Stefan Obermeier Date: 2018-10-09T21:54:20Z Add log message if hostname:port is not valid --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apa...
Github user obermeier closed the pull request at: https://github.com/apache/spark/pull/19408 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apache.spa...
Github user obermeier commented on the issue: https://github.com/apache/spark/pull/19408 This issue seems to be fixed in Spark 2.3.2 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apache.spa...
Github user obermeier commented on the issue: https://github.com/apache/spark/pull/19408 I total agree with you. What do you think about just adding a log message if the given string is obviously not a valid host name. Because the given _NumberFormatException_ much later after the parsing component was a little bit confusing. ``` java.lang.NumberFormatException: For input string: "f904" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:580) at java.lang.Integer.parseInt(Integer.java:615) at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272) at scala.collection.immutable.StringOps.toInt(StringOps.scala:29) at org.apache.spark.util.Utils$.parseHostPort(Utils.scala:935) at org.apache.spark.scheduler.cluster.YarnScheduler.getRackForHost(YarnScheduler.scala:36) at org.apache.spark.scheduler.TaskSetManager$$anonfun$org$apache$spark$scheduler$TaskSetManager$$addPendingTask$1.apply(TaskSetManager.scala:206) at org.apache.spark.scheduler.TaskSetManager$$anonfun$org$apache$spark$scheduler$TaskSetManager$$addPendingTask$1.apply(TaskSetManager.scala:187) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.TaskSetManager.org$apache$spark$scheduler$TaskSetManager$$addPendingTask(TaskSetManager.scala:187) at org.apache.spark.scheduler.TaskSetManager$$anonfun$1.apply$mcVI$sp(TaskSetManager.scala:166) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) at org.apache.spark.scheduler.TaskSetManager.(TaskSetManager.scala:165) at org.apache.spark.scheduler.TaskSchedulerImpl.createTaskSetManager(TaskSchedulerImpl.scala:205) at org.apache.spark.scheduler.TaskSchedulerImpl.submitTasks(TaskSchedulerImpl.scala:169) at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1058) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:933) at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:873) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1626) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1618) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1607) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apa...
Github user obermeier commented on a diff in the pull request: https://github.com/apache/spark/pull/19408#discussion_r150909772 --- Diff: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala --- @@ -1146,6 +1146,20 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging { } } + test("parseHostPort") { +assert(Utils.parseHostPort("abc:123") === (("abc", 123))) +assert(Utils.parseHostPort("example.com") === (("example.com", 0))) +assert(Utils.parseHostPort("example.com:123") === (("example.com", 123))) +assert(Utils.parseHostPort("127.0.0.1")=== (("127.0.0.1", 0))) +assert(Utils.parseHostPort("127.0.0.1:123")=== (("127.0.0.1", 123))) +assert(Utils.parseHostPort("2001:db8::1") === (("2001:db8::1", 0))) +assert(Utils.parseHostPort("2001:DB8::1") === (("2001:DB8::1", 0))) +assert(Utils.parseHostPort("2001:dB8::1") === (("2001:dB8::1", 0))) +assert(Utils.parseHostPort("0:0:0:0:0:0:0:0") === (("0:0:0:0:0:0:0:0", 0))) +assert(Utils.parseHostPort("::1") === (("::1", 0))) +assert(Utils.parseHostPort("[::1]:123")=== (("[::1]", 123))) +assert(Utils.parseHostPort("[2001:db8:42::1]:123") === (("[2001:db8:42::1]", 123))) --- End diff -- What is the prefered way to handle this kind of parse errors in Spark? Changing the signature of this method to something like :Try[..], :Option ... is no option!? Error log messages? Unchecked Exceptions? ... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apa...
Github user obermeier commented on a diff in the pull request: https://github.com/apache/spark/pull/19408#discussion_r150907312 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -982,7 +982,19 @@ private[spark] object Utils extends Logging { return cached } -val indx: Int = hostPort.lastIndexOf(':') +val indx: Int = + // Interpret hostPort as literal IPv6 address if it contains multiple colons. + // IPv6 addresses enclosed in square brackets like [::1]:123 are not included. + // scalastyle:off SingleSpaceBetweenRParenAndLCurlyBrace + if (hostPort.matches("(([0-9a-fA-F]*):([0-9a-fA-F]*)){2,}")) { + // scalastyle:on SingleSpaceBetweenRParenAndLCurlyBrace +-1 + } else { +// Else last colon defines start of port definition +hostPort.lastIndexOf(':') + } + + // This is potentially broken - when dealing with ipv6 addresses for example, sigh ... // but then hadoop does not support ipv6 right now. // For now, we assume that if port exists, then it is valid - not check if it is an int > 0 --- End diff -- I removed this comment --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apa...
Github user obermeier commented on a diff in the pull request: https://github.com/apache/spark/pull/19408#discussion_r150798868 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -982,7 +982,19 @@ private[spark] object Utils extends Logging { return cached } -val indx: Int = hostPort.lastIndexOf(':') +val indx: Int = + // Interpret hostPort as literal IPv6 address if it contains multiple colons. + // IPv6 addresses enclosed in square brackets like [::1]:123 are not included. + // scalastyle:off SingleSpaceBetweenRParenAndLCurlyBrace + if (hostPort.matches("(([0-9a-fA-F]*):([0-9a-fA-F]*)){2,}")) { + // scalastyle:on SingleSpaceBetweenRParenAndLCurlyBrace +-1 + } else { +// Else last colon defines start of port definition +hostPort.lastIndexOf(':') + } + + // This is potentially broken - when dealing with ipv6 addresses for example, sigh ... // but then hadoop does not support ipv6 right now. // For now, we assume that if port exists, then it is valid - not check if it is an int > 0 --- End diff -- I think yes because some checks are not implemented E.g. no check how many colons are used, double colon should appear only once ... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apache.spa...
Github user obermeier commented on the issue: https://github.com/apache/spark/pull/19408 I chose this function because I had some exceptions like this [1] if I used IPv6 hosts. In this example ```org.apache.spark.util.Utils$.parseHostPort``` decided to use f904 as port but it was the last 16 bit chunk of an IPv6 address. I do not have an overview over the Spark code so I am currently not able to provide an general IPv6 solution. I think this code snipets will improve the situation and enables us to use IPv6 hosts in some cases. [1] ``` java.lang.NumberFormatException: For input string: "f904" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:580) at java.lang.Integer.parseInt(Integer.java:615) at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272) at scala.collection.immutable.StringOps.toInt(StringOps.scala:29) at org.apache.spark.util.Utils$.parseHostPort(Utils.scala:935) at org.apache.spark.scheduler.cluster.YarnScheduler.getRackForHost(YarnScheduler.scala:36) at org.apache.spark.scheduler.TaskSetManager$$anonfun$org$apache$spark$scheduler$TaskSetManager$$addPendingTask$1.apply(TaskSetManager.scala:206) at org.apache.spark.scheduler.TaskSetManager$$anonfun$org$apache$spark$scheduler$TaskSetManager$$addPendingTask$1.apply(TaskSetManager.scala:187) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.TaskSetManager.org$apache$spark$scheduler$TaskSetManager$$addPendingTask(TaskSetManager.scala:187) at org.apache.spark.scheduler.TaskSetManager$$anonfun$1.apply$mcVI$sp(TaskSetManager.scala:166) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) at org.apache.spark.scheduler.TaskSetManager.(TaskSetManager.scala:165) at org.apache.spark.scheduler.TaskSchedulerImpl.createTaskSetManager(TaskSchedulerImpl.scala:205) at org.apache.spark.scheduler.TaskSchedulerImpl.submitTasks(TaskSchedulerImpl.scala:169) at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1058) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:933) at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:873) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1626) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1618) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1607) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apache.spa...
Github user obermeier commented on the issue: https://github.com/apache/spark/pull/19408 Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apa...
Github user obermeier commented on a diff in the pull request: https://github.com/apache/spark/pull/19408#discussion_r142047496 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -981,7 +981,13 @@ private[spark] object Utils extends Logging { return cached } -val indx: Int = hostPort.lastIndexOf(':') +val indx: Int = + // Interpret hostPort as literal IPv6 address if it contains two ore more colons + // scalastyle:off SingleSpaceBetweenRParenAndLCurlyBrace + if (hostPort.matches("(([0-9a-f]*):([0-9a-f]*)){2,}")) -1 --- End diff -- Yes a real parser would be much better!! I hope the methods will check the input. Like the name resolver... At this point I thought more about the separation of the port. I think it is important to check if two colons exists, otherwise this expression accepts hostnames like abc:123 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6
GitHub user obermeier opened a pull request: https://github.com/apache/spark/pull/19408 [SPARK-22180][CORE] Allow IPv6 External applications like Apache Cassandra are able to deal with IPv6 addresses. Libraries like spark-cassandra-connector combine Apache Cassandra with Apache Spark. This combination is very useful IMHO. One problem is that `org.apache.spark.util.Utils.parseHostPort(hostPort:` `String)` takes the last colon to sepperate the port from host path. This conflicts with literal IPv6 addresses. I think we can take `hostPort` as literal IPv6 address if it contains tow ore more colons. If IPv6 addresses are enclosed in square brackets port definition is still possible. You can merge this pull request into a Git repository by running: $ git pull https://github.com/obermeier/spark issue/SPARK-22180 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19408.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19408 commit 3783f85b18540ed8746c078ad9c2f12d7167be9d Author: Stefan Obermeier Date: 2017-10-01T14:28:58Z [SPARK-22180][CORE] Allow IPv6 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org