Re: Network-related environemental problem when running JDBCSuite
Thanks for everyone's patience with this email thread. I have fixed my environmental problem and my tests run cleanly now. This seems to be a problem which afflicts modern JVMs on Mac OSX (and maybe other unix variants). The following can happen on these platforms: InetAddress.getLocalHost().isReachable( 2000 ) == false If this happens to you, the fix is to add the following line to /etc/hosts: 127.0.0.1 localhost $yourMachineName where $yourMachineName is the result of the hostname command. For more information, see http://stackoverflow.com/questions/1881546/inetaddress-getlocalhost-throws-unknownhostexception Thanks, -Rick Richard Hillegas/San Francisco/IBM@IBMUS wrote on 10/15/2015 11:15:29 AM: > From: Richard Hillegas/San Francisco/IBM@IBMUS > To: Dev > Date: 10/15/2015 11:16 AM > Subject: Re: Network-related environemental problem when running JDBCSuite > > Continuing this lively conversation with myself (hopefully this > archived thread may be useful to someone else in the future): > > I set the following environment variable as recommended by this page: > http://stackoverflow.com/questions/29906686/failed-to-bind-to-spark- > master-using-a-remote-cluster-with-two-workers > > export SPARK_LOCAL_IP=127.0.0.1 > > Then I got errors related to booting the metastore_db. So I deleted > that directory. After that I was able to run spark-shell again. > > Now let's see if this hack fixes the tests... > > > Thanks, > Rick Hillegas > > > > Richard Hillegas/San Francisco/IBM@IBMUS wrote on 10/15/2015 10:50:55 AM: > > > From: Richard Hillegas/San Francisco/IBM@IBMUS > > To: Richard Hillegas/San Francisco/IBM@IBMUS > > Cc: Dev > > Date: 10/15/2015 10:51 AM > > Subject: Re: Network-related environemental problem when running JDBCSuite > > > > For the record, I get the same error when I simply try to boot the > > spark shell: > > > > bash-3.2$ bin/spark-shell > > log4j:WARN No appenders could be found for logger > > (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). > > log4j:WARN Please initialize the log4j system properly. > > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig > > for more info. > > Using Spark's repl log4j profile: org/apache/spark/log4j-defaults- > > repl.properties > > To adjust logging level use sc.setLogLevel("INFO") > > Welcome to > > __ > > / __/__ ___ _/ /__ > > _\ \/ _ \/ _ `/ __/ '_/ > >/___/ .__/\_,_/_/ /_/\_\ version 1.6.0-SNAPSHOT > > /_/ > > > > Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, > Java 1.8.0_60) > > Type in expressions to have them evaluated. > > Type :help for more information. > > 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158. > > 156:0, shutting down Netty transport > > 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind > > on port 0. Attempting port 1. > > 15/10/15 10:49:09 ERROR Remoting: Remoting system has been > > terminated abrubtly. Attempting to shut down transports > > 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158. > > 156:0, shutting down Netty transport > > 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind > > on port 0. Attempting port 1. > > 15/10/15 10:49:09 ERROR Remoting: Remoting system has been > > terminated abrubtly. Attempting to shut down transports > > 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158. > > 156:0, shutting down Netty transport > > 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind > > on port 0. Attempting port 1. > > 15/10/15 10:49:09 ERROR Remoting: Remoting system has been > > terminated abrubtly. Attempting to shut down transports > > 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158. > > 156:0, shutting down Netty transport > > 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind > > on port 0. Attempting port 1. > > 15/10/15 10:49:09 ERROR Remoting: Remoting system has been > > terminated abrubtly. Attempting to shut down transports > > 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158. > > 156:0, shutting down Netty transport > > 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind > > on port 0. Attempting port 1. > > 15/10/15 10:49:09 ERROR Remoting: Remoting system has been > > terminated abrubtly. Attempting to shut down transports > > 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158. > > 156:0, shutti
Re: Network-related environemental problem when running JDBCSuite
Continuing this lively conversation with myself (hopefully this archived thread may be useful to someone else in the future): I set the following environment variable as recommended by this page: http://stackoverflow.com/questions/29906686/failed-to-bind-to-spark-master-using-a-remote-cluster-with-two-workers export SPARK_LOCAL_IP=127.0.0.1 Then I got errors related to booting the metastore_db. So I deleted that directory. After that I was able to run spark-shell again. Now let's see if this hack fixes the tests... Thanks, Rick Hillegas Richard Hillegas/San Francisco/IBM@IBMUS wrote on 10/15/2015 10:50:55 AM: > From: Richard Hillegas/San Francisco/IBM@IBMUS > To: Richard Hillegas/San Francisco/IBM@IBMUS > Cc: Dev > Date: 10/15/2015 10:51 AM > Subject: Re: Network-related environemental problem when running JDBCSuite > > For the record, I get the same error when I simply try to boot the > spark shell: > > bash-3.2$ bin/spark-shell > log4j:WARN No appenders could be found for logger > (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig > for more info. > Using Spark's repl log4j profile: org/apache/spark/log4j-defaults- > repl.properties > To adjust logging level use sc.setLogLevel("INFO") > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 1.6.0-SNAPSHOT > /_/ > > Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_60) > Type in expressions to have them evaluated. > Type :help for more information. > 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158. > 156:0, shutting down Netty transport > 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind > on port 0. Attempting port 1. > 15/10/15 10:49:09 ERROR Remoting: Remoting system has been > terminated abrubtly. Attempting to shut down transports > 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158. > 156:0, shutting down Netty transport > 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind > on port 0. Attempting port 1. > 15/10/15 10:49:09 ERROR Remoting: Remoting system has been > terminated abrubtly. Attempting to shut down transports > 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158. > 156:0, shutting down Netty transport > 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind > on port 0. Attempting port 1. > 15/10/15 10:49:09 ERROR Remoting: Remoting system has been > terminated abrubtly. Attempting to shut down transports > 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158. > 156:0, shutting down Netty transport > 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind > on port 0. Attempting port 1. > 15/10/15 10:49:09 ERROR Remoting: Remoting system has been > terminated abrubtly. Attempting to shut down transports > 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158. > 156:0, shutting down Netty transport > 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind > on port 0. Attempting port 1. > 15/10/15 10:49:09 ERROR Remoting: Remoting system has been > terminated abrubtly. Attempting to shut down transports > 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158. > 156:0, shutting down Netty transport > 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind > on port 0. Attempting port 1. > 15/10/15 10:49:09 ERROR Remoting: Remoting system has been > terminated abrubtly. Attempting to shut down transports > 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158. > 156:0, shutting down Netty transport > 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind > on port 0. Attempting port 1. > 15/10/15 10:49:09 ERROR Remoting: Remoting system has been > terminated abrubtly. Attempting to shut down transports > 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158. > 156:0, shutting down Netty transport > 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind > on port 0. Attempting port 1. > 15/10/15 10:49:09 ERROR Remoting: Remoting system has been > terminated abrubtly. Attempting to shut down transports > 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158. > 156:0, shutting down Netty transport > 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind > on port 0. Attempting port 1. > 15/10/15 10:49:09 ERROR Remoting: Remoting system has been > terminated abrubtly. Attempting to
Re: Network-related environemental problem when running JDBCSuite
cala.concurrent.forkjoin.ForkJoinWorkerThread.run (ForkJoinWorkerThread.java:107) java.lang.NullPointerException at org.apache.spark.sql.SQLContext$.createListenerAndUI (SQLContext.scala:1323) at org.apache.spark.sql.hive.HiveContext. (HiveContext.scala:100) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance (NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance (DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.spark.repl.SparkILoop.createSQLContext (SparkILoop.scala:1028) at $iwC$$iwC.(:9) at $iwC.(:18) at (:20) at .(:24) at .() at .(:7) at .() at $print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call (SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun (SparkIMain.scala:1340) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1 (SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1 (SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith (SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark $1.apply(SparkILoopInit.scala:132) at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark $1.apply(SparkILoopInit.scala:124) at org.apache.spark.repl.SparkIMain.beQuietDuring (SparkIMain.scala:324) at org.apache.spark.repl.SparkILoopInit$class.initializeSpark (SparkILoopInit.scala:124) at org.apache.spark.repl.SparkILoop.initializeSpark (SparkILoop.scala:64) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl $SparkILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp (SparkILoop.scala:974) at org.apache.spark.repl.SparkILoopInit$class.runThunks (SparkILoopInit.scala:159) at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64) at org.apache.spark.repl.SparkILoopInit$class.postInitialization (SparkILoopInit.scala:108) at org.apache.spark.repl.SparkILoop.postInitialization (SparkILoop.scala:64) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl $SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:991) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl $SparkILoop$$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl $SparkILoop$$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader (ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$ $process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy $SparkSubmit$$runMain(SparkSubmit.scala:680) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1 (SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) :10: error: not found: value sqlContext import sqlContext.implicits._ ^ :10: error: not found: value sqlContext import sqlContext.sql Thanks, Rick Hillegas Richard Hillegas/San Francisco/IBM@IBMUS wrote on 10/15/2015 09:47:22 AM: > From: Richard Hillegas/San Francisco/IBM@IBMUS > To: Dev > Date: 10/15/2015 09:47 AM > Subject: Network-related environemental problem when running JDBCSuite > > I am seeing what look like environmental errors when I try to run a > test on a clean local branch which has been sync'd to the head of > the development trunk. I would ap
Network-related environemental problem when running JDBCSuite
I am seeing what look like environmental errors when I try to run a test on a clean local branch which has been sync'd to the head of the development trunk. I would appreciate advice about how to debug or hack around this problem. For the record, the test ran cleanly last week. This is the experiment I am running: # build mvn -Pyarn -Phadoop-2.3 -DskipTests -Phive -Phive-thriftserver clean package # run one suite mvn -Dhadoop.version=2.4.0 -DwildcardSuites=JDBCSuite The test bombs out before getting to JDBCSuite. I see this summary at the end... [INFO] [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM ... SUCCESS [ 2.023 s] [INFO] Spark Project Test Tags SUCCESS [ 1.924 s] [INFO] Spark Project Launcher . SUCCESS [ 5.837 s] [INFO] Spark Project Networking ... SUCCESS [ 12.498 s] [INFO] Spark Project Shuffle Streaming Service SUCCESS [01:28 min] [INFO] Spark Project Unsafe ... SUCCESS [01:09 min] [INFO] Spark Project Core . SUCCESS [02:45 min] [INFO] Spark Project Bagel SUCCESS [ 30.182 s] [INFO] Spark Project GraphX ... SUCCESS [ 59.002 s] [INFO] Spark Project Streaming FAILURE [06:21 min] [INFO] Spark Project Catalyst . SKIPPED [INFO] Spark Project SQL .. SKIPPED [INFO] Spark Project ML Library ... SKIPPED [INFO] Spark Project Tools SKIPPED [INFO] Spark Project Hive . SKIPPED [INFO] Spark Project REPL . SKIPPED [INFO] Spark Project Assembly . SKIPPED [INFO] Spark Project External Twitter . SKIPPED [INFO] Spark Project External Flume Sink .. SKIPPED [INFO] Spark Project External Flume ... SKIPPED [INFO] Spark Project External Flume Assembly .. SKIPPED [INFO] Spark Project External MQTT SKIPPED [INFO] Spark Project External MQTT Assembly ... SKIPPED [INFO] Spark Project External ZeroMQ .. SKIPPED [INFO] Spark Project External Kafka ... SKIPPED [INFO] Spark Project Examples . SKIPPED [INFO] Spark Project External Kafka Assembly .. SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 13:37 min [INFO] Finished at: 2015-10-15T09:03:06-07:00 [INFO] Final Memory: 69M/793M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) on project spark-streaming_2.10: There are test failures. [ERROR] [ERROR] Please refer to /Users/rhillegas/spark/spark/streaming/target/surefire-reports for the individual test results. [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :spark-streaming_2.10 >From the logs in streaming/target/surefire-reports, it appears that the following tests failed... org.apache.spark.streaming.JavaAPISuite.txt org.apache.spark.streaming.JavaReceiverAPISuite.txt ...with this error: java.net.BindException: Failed to bind to: /9.52.158.156:0: Service 'sparkDriver' failed after 100 retries! at org.jboss.netty.bootstrap.ServerBootstrap.bind (ServerBootstrap.java:272) at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply (NettyTransport.scala:393) at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply (NettyTransport.scala:389) at scala.util.Success$$anonfun$map$1.apply(Try.scala:206) at scala.util.Try$.apply(Try.scala:161) at scala.util.Success.map(Try.scala:206) at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch (BatchingExecutor.scala:55) at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply $mcV$sp(BatchingExecu
Re: unsubscribe
Hi Sukesh, To unsubscribe from the dev list, please send a message to dev-unsubscr...@spark.apache.org. To unsubscribe from the user list, please send a message user-unsubscr...@spark.apache.org. Please see: http://spark.apache.org/community.html#mailing-lists. Thanks, -Rick sukesh kumar wrote on 09/28/2015 11:39:01 PM: > From: sukesh kumar > To: "u...@spark.apache.org" , > "dev@spark.apache.org" > Date: 09/28/2015 11:39 PM > Subject: unsubscribe > > unsubscribe > > -- > Thanks & Best Regards > Sukesh Kumar
Re: [Discuss] NOTICE file for transitive "NOTICE"s
Thanks, Sean! Sean Owen wrote on 09/25/2015 06:35:46 AM: > From: Sean Owen > To: Reynold Xin , Richard Hillegas/San > Francisco/IBM@IBMUS > Cc: "dev@spark.apache.org" > Date: 09/25/2015 07:21 PM > Subject: Re: [Discuss] NOTICE file for transitive "NOTICE"s > > Work underway at ... > > https://issues.apache.org/jira/browse/SPARK-10833 > https://github.com/apache/spark/pull/8919 > > > > On Fri, Sep 25, 2015 at 8:54 AM, Sean Owen wrote: > > Update: I *think* the conclusion was indeed that nothing needs to > > happen with NOTICE. > > However, along the way in > > https://issues.apache.org/jira/browse/LEGAL-226 it emerged that the > > BSD/MIT licenses should be inlined into LICENSE (or copied in the > > distro somewhere). I can get on that -- just some grunt work to copy > > and paste it all. > > > > On Thu, Sep 24, 2015 at 6:55 PM, Reynold Xin wrote: > >> Richard, > >> > >> Thanks for bringing this up and this is a great point. Let's start another > >> thread for it so we don't hijack the release thread. > >> > >> > >> > >> On Thu, Sep 24, 2015 at 10:51 AM, Sean Owen wrote: > >>> > >>> On Thu, Sep 24, 2015 at 6:45 PM, Richard Hillegas > >>> wrote: > >>> > Under your guidance, I would be happy to help compile a NOTICE file > >>> > which > >>> > follows the pattern used by Derby and the JDK. This effort might proceed > >>> > in > >>> > parallel with vetting 1.5.1 and could be targeted at a later release > >>> > vehicle. I don't think that the ASF's exposure is greatly increased by > >>> > one > >>> > more release which follows the old pattern. > >>> > >>> I'd prefer to use the ASF's preferred pattern, no? That's what we've > >>> been trying to do and seems like we're even required to do so, not > >>> follow a different convention. There is some specific guidance there > >>> about what to add, and not add, to these files. Specifically, because > >>> the AL2 requires downstream projects to embed the contents of NOTICE, > >>> the guidance is to only include elements in NOTICE that must appear > >>> there. > >>> > >>> Put it this way -- what would you like to change specifically? (you > >>> can start another thread for that) > >>> > >>> >> My assessment (just looked before I saw Sean's email) is the same as > >>> >> his. The NOTICE file embeds other projects' licenses. > >>> > > >>> > This may be where our perspectives diverge. I did not find those > >>> > licenses > >>> > embedded in the NOTICE file. As I see it, the licenses are cited but not > >>> > included. > >>> > >>> Pretty sure that was meant to say that NOTICE embeds other projects' > >>> "notices", not licenses. And those notices can have all kinds of > >>> stuff, including licenses. > >>> > >>> - > >>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > >>> For additional commands, e-mail: dev-h...@spark.apache.org > >>> > >> > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org >
Re: [Discuss] NOTICE file for transitive "NOTICE"s
Hi Sean, My reading would be that a separate copy of the BSD license, with copyright years filled in, is required for each BSD-licensed dependency. Same for MIT-licensed dependencies. Hopefully, we will receive some guidance on https://issues.apache.org/jira/browse/LEGAL-226 Thanks, -Rick Sean Owen wrote on 09/24/2015 12:40:12 PM: > From: Sean Owen > To: Richard Hillegas/San Francisco/IBM@IBMUS > Cc: "dev@spark.apache.org" > Date: 09/24/2015 12:40 PM > Subject: Re: [Discuss] NOTICE file for transitive "NOTICE"s > > Yes, the issue of where 3rd-party license information goes is > different, and varies by license. I think the BSD/MIT licenses are all > already listed in LICENSE accordingly. Let me know if you spy an > omission. > > On Thu, Sep 24, 2015 at 8:36 PM, Richard Hillegas wrote: > > Thanks for that pointer, Sean. It may be that Derby is putting the license > > information in the wrong place, viz. in the NOTICE file. But the 3rd party > > license text may need to go somewhere else. See for instance the advice a > > little further up the page at > > http://www.apache.org/dev/licensing-howto.html#permissive-deps > > > > Thanks, > > -Rick > > > > Sean Owen wrote on 09/24/2015 12:07:01 PM: > > > >> From: Sean Owen > >> To: Richard Hillegas/San Francisco/IBM@IBMUS > >> Cc: "dev@spark.apache.org" > >> Date: 09/24/2015 12:08 PM > >> Subject: Re: [Discuss] NOTICE file for transitive "NOTICE"s > > > > > >> > >> Have a look at http://www.apache.org/dev/licensing-howto.html#mod-notice > >> though, which makes a good point about limiting what goes into NOTICE > >> to what is required. That's what makes me think we shouldn't do this. > >> > >> On Thu, Sep 24, 2015 at 7:24 PM, Richard Hillegas > >> wrote: > >> > To answer Sean's question on the previous email thread, I would propose > >> > making changes like the following to the NOTICE file: > >> > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org >
Re: [Discuss] NOTICE file for transitive "NOTICE"s
Thanks for that pointer, Sean. It may be that Derby is putting the license information in the wrong place, viz. in the NOTICE file. But the 3rd party license text may need to go somewhere else. See for instance the advice a little further up the page at http://www.apache.org/dev/licensing-howto.html#permissive-deps Thanks, -Rick Sean Owen wrote on 09/24/2015 12:07:01 PM: > From: Sean Owen > To: Richard Hillegas/San Francisco/IBM@IBMUS > Cc: "dev@spark.apache.org" > Date: 09/24/2015 12:08 PM > Subject: Re: [Discuss] NOTICE file for transitive "NOTICE"s > > Have a look at http://www.apache.org/dev/licensing-howto.html#mod-notice > though, which makes a good point about limiting what goes into NOTICE > to what is required. That's what makes me think we shouldn't do this. > > On Thu, Sep 24, 2015 at 7:24 PM, Richard Hillegas wrote: > > To answer Sean's question on the previous email thread, I would propose > > making changes like the following to the NOTICE file: >
Re: [Discuss] NOTICE file for transitive "NOTICE"s
Thanks for forking the new email thread, Reynold. It is entirely possible that I am being overly skittish. I have posed a question for our legal experts: https://issues.apache.org/jira/browse/LEGAL-226 To answer Sean's question on the previous email thread, I would propose making changes like the following to the NOTICE file: Replace a stanza like this... "This product contains a modified version of 'JZlib', a re-implementation of zlib in pure Java, which can be obtained at: * LICENSE: * license/LICENSE.jzlib.txt (BSD Style License) * HOMEPAGE: * http://www.jcraft.com/jzlib/"; ...with full license text like this "This product contains a modified version of 'JZlib', a re-implementation of zlib in pure Java, which can be obtained at: * HOMEPAGE: * http://www.jcraft.com/jzlib/ The ZLIB license text follows: JZlib 0.0.* were released under the GNU LGPL license. Later, we have switched over to a BSD-style license. -- Copyright (c) 2000-2011 ymnk, JCraft,Inc. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The names of the authors may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL JCRAFT, INC. OR ANY CONTRIBUTORS TO THIS SOFTWARE BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE." Thanks, -Rick Reynold Xin wrote on 09/24/2015 10:55:53 AM: > From: Reynold Xin > To: Sean Owen > Cc: Richard Hillegas/San Francisco/IBM@IBMUS, "dev@spark.apache.org" > > Date: 09/24/2015 10:56 AM > Subject: [Discuss] NOTICE file for transitive "NOTICE"s > > Richard, > > Thanks for bringing this up and this is a great point. Let's start > another thread for it so we don't hijack the release thread. > > On Thu, Sep 24, 2015 at 10:51 AM, Sean Owen wrote: > On Thu, Sep 24, 2015 at 6:45 PM, Richard Hillegas wrote: > > Under your guidance, I would be happy to help compile a NOTICE file which > > follows the pattern used by Derby and the JDK. This effort might proceed in > > parallel with vetting 1.5.1 and could be targeted at a later release > > vehicle. I don't think that the ASF's exposure is greatly increased by one > > more release which follows the old pattern. > > I'd prefer to use the ASF's preferred pattern, no? That's what we've > been trying to do and seems like we're even required to do so, not > follow a different convention. There is some specific guidance there > about what to add, and not add, to these files. Specifically, because > the AL2 requires downstream projects to embed the contents of NOTICE, > the guidance is to only include elements in NOTICE that must appear > there. > > Put it this way -- what would you like to change specifically? (you > can start another thread for that) > > >> My assessment (just looked before I saw Sean's email) is the same as > >> his. The NOTICE file embeds other projects' licenses. > > > > This may be where our perspectives diverge. I did not find those licenses > > embedded in the NOTICE file. As I see it, the licenses are cited but not > > included. > > Pretty sure that was meant to say that NOTICE embeds other projects' > "notices", not licenses. And those notices can have all kinds of > stuff, including licenses. > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.5.1 (RC1)
Hi Sean and Wendell, I share your concerns about how difficult and important it is to get this right. I think that the Spark community has compiled a very readable and well organized NOTICE file. A lot of careful thought went into gathering together 3rd party projects which share the same license text. All I can offer is my own experience of having served as a release manager for a sister Apache project (Derby) over the past ten years. The Derby NOTICE file recites 3rd party licenses verbatim. This is also the approach taken by the THIRDPARTYLICENSEREADME.txt in the JDK. I am not a lawyer. However, I have great respect for the experience and legal sensitivities of the people who compile that JDK license file. Under your guidance, I would be happy to help compile a NOTICE file which follows the pattern used by Derby and the JDK. This effort might proceed in parallel with vetting 1.5.1 and could be targeted at a later release vehicle. I don't think that the ASF's exposure is greatly increased by one more release which follows the old pattern. Another comment inline... Patrick Wendell wrote on 09/24/2015 10:24:25 AM: > From: Patrick Wendell > To: Sean Owen > Cc: Richard Hillegas/San Francisco/IBM@IBMUS, "dev@spark.apache.org" > > Date: 09/24/2015 10:24 AM > Subject: Re: [VOTE] Release Apache Spark 1.5.1 (RC1) > > Hey Richard, > > My assessment (just looked before I saw Sean's email) is the same as > his. The NOTICE file embeds other projects' licenses. This may be where our perspectives diverge. I did not find those licenses embedded in the NOTICE file. As I see it, the licenses are cited but not included. Thanks, -Rick > If those > licenses themselves have pointers to other files or dependencies, we > don't embed them. I think this is standard practice. > > - Patrick > > On Thu, Sep 24, 2015 at 10:00 AM, Sean Owen wrote: > > Hi Richard, those are messages reproduced from other projects' NOTICE > > files, not created by Spark. They need to be reproduced in Spark's > > NOTICE file to comply with the license, but their text may or may not > > apply to Spark's distribution. The intent is that users would track > > this back to the source project if interested to investigate what the > > upstream notice is about. > > > > Requirements vary by license, but I do not believe there is additional > > requirement to reproduce these other files. Their license information > > is already indicated in accordance with the license terms. > > > > What licenses are you looking for in LICENSE that you believe > should be there? > > > > Getting all this right is both difficult and important. I've made some > > efforts over time to strictly comply with the Apache take on > > licensing, which is at http://www.apache.org/legal/resolved.html It's > > entirely possible there's still a mistake somewhere in here (possibly > > a new dependency, etc). Please point it out if you see such a thing. > > > > But so far what you describe is "working as intended", as far as I > > know, according to Apache. > > > > > > On Thu, Sep 24, 2015 at 5:52 PM, Richard Hillegas > wrote: > >> -1 (non-binding) > >> > >> I was able to build Spark cleanly from the source distribution using the > >> command in README.md: > >> > >> build/mvn -DskipTests clean package > >> > >> However, while I was waiting for the build to complete, I started going > >> through the NOTICE file. I was confused about where to find > licenses for 3rd > >> party software bundled with Spark. About halfway through the NOTICE file, > >> starting with Java Collections Framework, there is a list of > licenses of the > >> form > >> > >>license/*.txt > >> > >> But there is no license subdirectory in the source distro. I couldn't find > >> the *.txt license files for Java Collections Framework, Base64 Encoder, or > >> JZlib anywhere in the source distro. I couldn't find those files in license > >> subdirectories at the indicated home pages for those projects. (I did find > >> the license for JZLIB somewhere else, however: > >> http://www.jcraft.com/jzlib/LICENSE.txt.) > >> > >> In addition, I couldn't find licenses for those projects in the master > >> LICENSE file. > >> > >> Are users supposed to get licenses from the indicated 3rd party web sites? > >> Those online licenses could change. I would feel more comfortableif the ASF > >> were protected by our bundling the licenses inside our source distros. > >> &
Re: [VOTE] Release Apache Spark 1.5.1 (RC1)
-1 (non-binding) I was able to build Spark cleanly from the source distribution using the command in README.md: build/mvn -DskipTests clean package However, while I was waiting for the build to complete, I started going through the NOTICE file. I was confused about where to find licenses for 3rd party software bundled with Spark. About halfway through the NOTICE file, starting with Java Collections Framework, there is a list of licenses of the form license/*.txt But there is no license subdirectory in the source distro. I couldn't find the *.txt license files for Java Collections Framework, Base64 Encoder, or JZlib anywhere in the source distro. I couldn't find those files in license subdirectories at the indicated home pages for those projects. (I did find the license for JZLIB somewhere else, however: http://www.jcraft.com/jzlib/LICENSE.txt.) In addition, I couldn't find licenses for those projects in the master LICENSE file. Are users supposed to get licenses from the indicated 3rd party web sites? Those online licenses could change. I would feel more comfortable if the ASF were protected by our bundling the licenses inside our source distros. After looking for those three licenses, I stopped reading the NOTICE file. Maybe I'm confused about how to read the NOTICE file. Where should users expect to find the 3rd party licenses? Thanks, -Rick Reynold Xin wrote on 09/24/2015 12:27:25 AM: > From: Reynold Xin > To: "dev@spark.apache.org" > Date: 09/24/2015 12:28 AM > Subject: [VOTE] Release Apache Spark 1.5.1 (RC1) > > Please vote on releasing the following candidate as Apache Spark > version 1.5.1. The vote is open until Sun, Sep 27, 2015 at 10:00 UTC > and passes if a majority of at least 3 +1 PMC votes are cast. > > [ ] +1 Release this package as Apache Spark 1.5.1 > [ ] -1 Do not release this package because ... > > The release fixes 81 known issues in Spark 1.5.0, listed here: > http://s.apache.org/spark-1.5.1 > > The tag to be voted on is v1.5.1-rc1: > https://github.com/apache/spark/commit/ > 4df97937dbf68a9868de58408b9be0bf87dbbb94 > > The release files, including signatures, digests, etc. can be found at: > http://people.apache.org/~pwendell/spark-releases/spark-1.5.1-rc1-bin/ > > Release artifacts are signed with the following key: > https://people.apache.org/keys/committer/pwendell.asc > > The staging repository for this release (1.5.1) can be found at: > https://repository.apache.org/content/repositories/orgapachespark-1148/ > > The documentation corresponding to this release can be found at: > http://people.apache.org/~pwendell/spark-releases/spark-1.5.1-rc1-docs/ > > === > How can I help test this release? > === > If you are a Spark user, you can help us test this release by taking > an existing Spark workload and running on this release candidate, > then reporting any regressions. > > > What justifies a -1 vote for this release? > > -1 vote should occur for regressions from Spark 1.5.0. Bugs already > present in 1.5.0 will not block this release. > > === > What should happen to JIRA tickets still targeting 1.5.1? > === > Please target 1.5.2 or 1.6.0.
Re: column identifiers in Spark SQL
Thanks for that additional tip, Michael. Backticks fix the problem query in which an identifier was transformed into a string literal. So this works now... // now correctly resolves the unnormalized column id sqlContext.sql("""select `b` from test_data""").show Any suggestion about how to escape an embedded double quote? // java.sql.SQLSyntaxErrorException: Syntax error: Encountered "\"" at line 1, column 12. sqlContext.sql("""select `c"d` from test_data""").show // org.apache.spark.sql.AnalysisException: cannot resolve 'c\"d' given input columns A, b, c"d; line 1 pos 7 sqlContext.sql("""select `c\"d` from test_data""").show Thanks, -Rick Michael Armbrust wrote on 09/22/2015 01:16:12 PM: > From: Michael Armbrust > To: Richard Hillegas/San Francisco/IBM@IBMUS > Cc: Dev > Date: 09/22/2015 01:16 PM > Subject: Re: column identifiers in Spark SQL > > HiveQL uses `backticks` for quoted identifiers. > > On Tue, Sep 22, 2015 at 1:06 PM, Richard Hillegas wrote: > Thanks for that tip, Michael. I think that my sqlContext was a raw > SQLContext originally. I have rebuilt Spark like so... > > sbt/sbt -Phive assembly/assembly > > Now I see that my sqlContext is a HiveContext. That fixes one of the > queries. Now unnormalized column names work: > > // ...unnormalized column names work now > sqlContext.sql("""select a from test_data""").show > > However, quoted identifiers are still treated as string literals: > > // this still returns rows consisting of the string literal "b" > sqlContext.sql("""select "b" from test_data""").show > > And embedded quotes inside quoted identifiers are swallowed up: > > // this now returns rows consisting of the string literal "cd" > sqlContext.sql("""select "c""d" from test_data""").show > > Thanks, > -Rick > > Michael Armbrust wrote on 09/22/2015 10:58:36 AM: > > > From: Michael Armbrust > > To: Richard Hillegas/San Francisco/IBM@IBMUS > > Cc: Dev > > Date: 09/22/2015 10:59 AM > > Subject: Re: column identifiers in Spark SQL > > > > > Are you using a SQLContext or a HiveContext? The programming guide > > suggests the latter, as the former is really only there because some > > applications may have conflicts with Hive dependencies. SQLContext > > is case sensitive by default where as the HiveContext is not. The > > parser in HiveContext is also a lot better. > > > > On Tue, Sep 22, 2015 at 10:53 AM, Richard Hillegas > wrote: > > I am puzzled by the behavior of column identifiers in Spark SQL. I > > don't find any guidance in the "Spark SQL and DataFrame Guide" at > > http://spark.apache.org/docs/latest/sql-programming-guide.html. I am > > seeing odd behavior related to case-sensitivity and to delimited > > (quoted) identifiers. > > > > Consider the following declaration of a table in the Derby > > relational database, whose dialect hews closely to the SQL Standard: > > > > create table app.t( a int, "b" int, "c""d" int ); > > > > Now let's load that table into Spark like this: > > > > import org.apache.spark.sql._ > > import org.apache.spark.sql.types._ > > > > val df = sqlContext.read.format("jdbc").options( > > Map("url" -> "jdbc:derby:/Users/rhillegas/derby/databases/derby1", > > "dbtable" -> "app.t")).load() > > df.registerTempTable("test_data") > > > > The following query runs fine because the column name matches the > > normalized form in which it is stored in the metadata catalogs of > > the relational database: > > > > // normalized column names are recognized > > sqlContext.sql(s"""select A from test_data""").show > > > > But the following query fails during name resolution. This puzzles > > me because non-delimited identifiers are case-insensitive in the > > ANSI/ISO Standard. They are also supposed to be case-insensitive in > > HiveQL, at least according to section 2.3.1 of the > > QuotedIdentifier.html webpage attached to https://issues.apache.org/ > > jira/browse/HIVE-6013: > > > > // ...unnormalized column names raise this error: > > org.apache.spark.sql.AnalysisException: cannot resolve 'a' given > > input columns A, b, c"d; > > sqlContext.sql("""select a from test_data""").show > > > > Delimited (quoted) identifiers are treated as string literals. > > Again, non-Standard behavior: > > > > // this returns rows consisting of the string literal "b" > > sqlContext.sql("""select "b" from test_data""").show > > > > Embedded quotes in delimited identifiers won't even parse: > > > > // embedded quotes raise this error: java.lang.RuntimeException: > > [1.11] failure: ``union'' expected but "d" found > > sqlContext.sql("""select "c""d" from test_data""").show > > > > This behavior is non-Standard and it strikes me as hard to describe > > to users concisely. Would the community support an effort to bring > > the handling of column identifiers into closer conformance with the > > Standard? Would backward compatibility concerns even allow us to do that? > > > > Thanks, > > -Rick
Re: Derby version in Spark
Thanks, Ted. I'll follow up with the Hive folks. Cheers, -Rick Ted Yu wrote on 09/22/2015 03:41:12 PM: > From: Ted Yu > To: Richard Hillegas/San Francisco/IBM@IBMUS > Cc: Dev > Date: 09/22/2015 03:41 PM > Subject: Re: Derby version in Spark > > I cloned Hive 1.2 code base and saw: > > 10.10.2.0 > > So the version used by Spark is quite close to what Hive uses. > > On Tue, Sep 22, 2015 at 3:29 PM, Ted Yu wrote: > I see. > I use maven to build so I observe different contents under > lib_managed directory. > > Here is snippet of dependency tree: > > [INFO] | +- org.spark-project.hive:hive-metastore:jar:1.2.1.spark:compile > [INFO] | | +- com.jolbox:bonecp:jar:0.8.0.RELEASE:compile > [INFO] | | +- org.apache.derby:derby:jar:10.10.1.1:compile > > On Tue, Sep 22, 2015 at 3:21 PM, Richard Hillegas wrote: > Thanks, Ted. I'm working on my master branch. The lib_managed/jars > directory has a lot of jarballs, including hadoop and hive. Maybe > these were faulted in when I built with the following command? > > sbt/sbt -Phive assembly/assembly > > The Derby jars seem to be used in order to manage the metastore_db > database. Maybe my question should be directed to the Hive community? > > Thanks, > -Rick > > Here are the gory details: > > bash-3.2$ ls lib_managed/jars > FastInfoset-1.2.12.jar curator-test-2.4.0.jar jersey-test-framework- > grizzly2-1.9.jar parquet-format-2.3.0-incubating.jar > JavaEWAH-0.3.2.jar datanucleus-api-jdo-3.2.6.jar jets3t-0.7.1.jar > parquet-generator-1.7.0.jar > ST4-4.0.4.jar datanucleus-core-3.2.10.jar jetty-continuation-8.1. > 14.v20131031.jar parquet-hadoop-1.7.0.jar > activation-1.1.jar datanucleus-rdbms-3.2.9.jar jetty-http-8.1. > 14.v20131031.jar parquet-hadoop-bundle-1.6.0.jar > akka-actor_2.10-2.3.11.jar derby-10.10.1.1.jar jetty-io-8.1. > 14.v20131031.jar parquet-jackson-1.7.0.jar > akka-remote_2.10-2.3.11.jar derby-10.10.2.0.jar jetty-jndi-8.1. > 14.v20131031.jar platform-3.4.0.jar > akka-slf4j_2.10-2.3.11.jar genjavadoc-plugin_2.10.4-0.9-spark0.jar > jetty-plus-8.1.14.v20131031.jar pmml-agent-1.1.15.jar > akka-testkit_2.10-2.3.11.jar groovy-all-2.1.6.jar jetty-security-8. > 1.14.v20131031.jar pmml-model-1.1.15.jar > antlr-2.7.7.jar guava-11.0.2.jar jetty-server-8.1.14.v20131031.jar > pmml-schema-1.1.15.jar > antlr-runtime-3.4.jar guice-3.0.jar jetty-servlet-8.1. > 14.v20131031.jar postgresql-9.3-1102-jdbc41.jar > aopalliance-1.0.jar h2-1.4.183.jar jetty-util-6.1.26.jar py4j-0.8.2.1.jar > arpack_combined_all-0.1-javadoc.jar hadoop-annotations-2.2.0.jar > jetty-util-8.1.14.v20131031.jar pyrolite-4.4.jar > arpack_combined_all-0.1.jar hadoop-auth-2.2.0.jar jetty-webapp-8.1. > 14.v20131031.jar quasiquotes_2.10-2.0.0.jar > asm-3.2.jar hadoop-client-2.2.0.jar jetty-websocket-8.1. > 14.v20131031.jar reflectasm-1.07-shaded.jar > avro-1.7.4.jar hadoop-common-2.2.0.jar jetty-xml-8.1. > 14.v20131031.jar sac-1.3.jar > avro-1.7.7.jar hadoop-hdfs-2.2.0.jar jline-0.9.94.jar scala- > compiler-2.10.0.jar > avro-ipc-1.7.7-tests.jar hadoop-mapreduce-client-app-2.2.0.jar > jline-2.10.4.jar scala-compiler-2.10.4.jar > avro-ipc-1.7.7.jar hadoop-mapreduce-client-common-2.2.0.jar jline-2. > 12.jar scala-library-2.10.4.jar > avro-mapred-1.7.7-hadoop2.jar hadoop-mapreduce-client-core-2.2.0.jar > jna-3.4.0.jar scala-reflect-2.10.4.jar > breeze-macros_2.10-0.11.2.jar hadoop-mapreduce-client-jobclient-2.2. > 0.jar joda-time-2.5.jar scalacheck_2.10-1.11.3.jar > breeze_2.10-0.11.2.jar hadoop-mapreduce-client-shuffle-2.2.0.jar > jodd-core-3.5.2.jar scalap-2.10.0.jar > calcite-avatica-1.2.0-incubating.jar hadoop-yarn-api-2.2.0.jar > json-20080701.jar selenium-api-2.42.2.jar > calcite-core-1.2.0-incubating.jar hadoop-yarn-client-2.2.0.jar > json-20090211.jar selenium-chrome-driver-2.42.2.jar > calcite-linq4j-1.2.0-incubating.jar hadoop-yarn-common-2.2.0.jar > json4s-ast_2.10-3.2.10.jar selenium-firefox-driver-2.42.2.jar > cglib-2.2.1-v20090111.jar hadoop-yarn-server-common-2.2.0.jar > json4s-core_2.10-3.2.10.jar selenium-htmlunit-driver-2.42.2.jar > cglib-nodep-2.1_3.jar hadoop-yarn-server-nodemanager-2.2.0.jar > json4s-jackson_2.10-3.2.10.jar selenium-ie-driver-2.42.2.jar > chill-java-0.5.0.jar hamcrest-core-1.1.jar jsr173_api-1.0.jar > selenium-java-2.42.2.jar > chill_2.10-0.5.0.jar hamcrest-core-1.3.jar jsr305-1.3.9.jar > selenium-remote-driver-2.42.2.jar > commons-beanutils-1.7.0.jar hamcrest-library-1.3.jar jsr305-2.0. > 1.jar selenium-safari-driver-2.42.2.jar > commons-beanutils-core-1.8.0.jar hive-exec-1.2.1.spark.jar jta-1. > 1.jar selenium-support-2.42.2.jar > commons-cli-1.2.jar hive-metastore-1.2.1.spark.jar jtransforms-2.4. > 0.jar serializer-2.7.1.jar > commons-codec-
Re: Derby version in Spark
-1.10.jar htmlunit-2.14.jar jul-to-slf4j-1.7.10.jar slf4j-api-1.7.10.jar commons-codec-1.4.jar htmlunit-core-js-2.14.jar junit-4.10.jar slf4j-log4j12-1.7.10.jar commons-codec-1.5.jar httpclient-4.3.2.jar junit-dep-4.10.jar snappy-0.2.jar commons-codec-1.9.jar httpcore-4.3.1.jar junit-dep-4.8.2.jar spire-macros_2.10-0.7.4.jar commons-collections-3.2.1.jar httpmime-4.3.2.jar junit-interface-0.10.jarspire_2.10-0.7.4.jar commons-compiler-2.7.8.jar istack-commons-runtime-2.16.jar junit-interface-0.9.jar stax-api-1.0.1.jar commons-compress-1.4.1.jar ivy-2.4.0.jar libfb303-0.9.2.jar stream-2.7.0.jar commons-configuration-1.6.jar jackson-core-asl-1.8.8.jar libthrift-0.9.2.jar stringtemplate-3.2.1.jar commons-dbcp-1.4.jarjackson-core-asl-1.9.13.jar lz4-1.3.0.jar tachyon-client-0.7.1.jar commons-digester-1.8.jarjackson-jaxrs-1.8.8.jar mesos-0.21.1-shaded-protobuf.jar tachyon-underfs-hdfs-0.7.1.jar commons-exec-1.1.jarjackson-mapper-asl-1.9.13.jar minlog-1.2.jar tachyon-underfs-local-0.7.1.jar commons-httpclient-3.1.jar jackson-xc-1.8.8.jar mockito-core-1.9.5.jar test-interface-0.5.jar commons-io-2.1.jar janino-2.7.8.jar mysql-connector-java-5.1.34.jar test-interface-1.0.jar commons-io-2.4.jar jansi-1.4.jar nekohtml-1.9.20.jar uncommons-maths-1.2.2a.jar commons-lang-2.5.jarjavassist-3.15.0-GA.jar netty-all-4.0.29.Final.jar unused-1.0.0.jar commons-lang-2.6.jarjavax.inject-1.jar objenesis-1.0.jar webbit-0.4.14.jar commons-lang3-3.3.2.jar jaxb-api-2.2.2.jar objenesis-1.2.jar xalan-2.7.1.jar commons-logging-1.1.3.jar jaxb-api-2.2.7.jar opencsv-2.3.jar xercesImpl-2.11.0.jar commons-math-2.1.jarjaxb-core-2.2.7.jar oro-2.0.8.jar xml-apis-1.4.01.jar commons-math-2.2.jarjaxb-impl-2.2.3-1.jar paranamer-2.3.jar xmlenc-0.52.jar commons-math3-3.4.1.jar jaxb-impl-2.2.7.jar paranamer-2.6.jar xz-1.0.jar commons-net-3.1.jar jblas-1.2.4.jar parquet-avro-1.7.0.jar zookeeper-3.4.5.jar commons-pool-1.5.4.jar jcl-over-slf4j-1.7.10.jar parquet-column-1.7.0.jar core-1.1.2.jar jdo-api-3.0.1.jar parquet-common-1.7.0.jar cssparser-0.9.13.jarjersey-guice-1.9.jar parquet-encoding-1.7.0.jar Ted Yu wrote on 09/22/2015 01:32:39 PM: > From: Ted Yu > To: Richard Hillegas/San Francisco/IBM@IBMUS > Cc: Dev > Date: 09/22/2015 01:33 PM > Subject: Re: Derby version in Spark > > Which Spark release are you building ? > > For master branch, I get the following: > > lib_managed/jars/datanucleus-api-jdo-3.2.6.jar lib_managed/jars/ > datanucleus-core-3.2.10.jar lib_managed/jars/datanucleus-rdbms-3.2.9.jar > > FYI > > On Tue, Sep 22, 2015 at 1:28 PM, Richard Hillegas wrote: > I see that lib_managed/jars holds these old Derby versions: > > lib_managed/jars/derby-10.10.1.1.jar > lib_managed/jars/derby-10.10.2.0.jar > > The Derby 10.10 release family supports some ancient JVMs: Java SE 5 > and Java ME CDC/Foundation Profile 1.1. It's hard to imagine anyone > running Spark on the resource-constrained Java ME platform. Is Spark > really deployed on Java SE 5? Is there some other reason that Spark > uses the 10.10 Derby family? > > If no-one needs those ancient JVMs, maybe we could consider changing > the Derby version to 10.11.1.1 or even to the upcoming 10.12.1.1 > release (both run on Java 6 and up). > > Thanks, > -Rick
Derby version in Spark
I see that lib_managed/jars holds these old Derby versions: lib_managed/jars/derby-10.10.1.1.jar lib_managed/jars/derby-10.10.2.0.jar The Derby 10.10 release family supports some ancient JVMs: Java SE 5 and Java ME CDC/Foundation Profile 1.1. It's hard to imagine anyone running Spark on the resource-constrained Java ME platform. Is Spark really deployed on Java SE 5? Is there some other reason that Spark uses the 10.10 Derby family? If no-one needs those ancient JVMs, maybe we could consider changing the Derby version to 10.11.1.1 or even to the upcoming 10.12.1.1 release (both run on Java 6 and up). Thanks, -Rick
Re: column identifiers in Spark SQL
Thanks for that tip, Michael. I think that my sqlContext was a raw SQLContext originally. I have rebuilt Spark like so... sbt/sbt -Phive assembly/assembly Now I see that my sqlContext is a HiveContext. That fixes one of the queries. Now unnormalized column names work: // ...unnormalized column names work now sqlContext.sql("""select a from test_data""").show However, quoted identifiers are still treated as string literals: // this still returns rows consisting of the string literal "b" sqlContext.sql("""select "b" from test_data""").show And embedded quotes inside quoted identifiers are swallowed up: // this now returns rows consisting of the string literal "cd" sqlContext.sql("""select "c""d" from test_data""").show Thanks, -Rick Michael Armbrust wrote on 09/22/2015 10:58:36 AM: > From: Michael Armbrust > To: Richard Hillegas/San Francisco/IBM@IBMUS > Cc: Dev > Date: 09/22/2015 10:59 AM > Subject: Re: column identifiers in Spark SQL > > Are you using a SQLContext or a HiveContext? The programming guide > suggests the latter, as the former is really only there because some > applications may have conflicts with Hive dependencies. SQLContext > is case sensitive by default where as the HiveContext is not. The > parser in HiveContext is also a lot better. > > On Tue, Sep 22, 2015 at 10:53 AM, Richard Hillegas wrote: > I am puzzled by the behavior of column identifiers in Spark SQL. I > don't find any guidance in the "Spark SQL and DataFrame Guide" at > http://spark.apache.org/docs/latest/sql-programming-guide.html. I am > seeing odd behavior related to case-sensitivity and to delimited > (quoted) identifiers. > > Consider the following declaration of a table in the Derby > relational database, whose dialect hews closely to the SQL Standard: > > create table app.t( a int, "b" int, "c""d" int ); > > Now let's load that table into Spark like this: > > import org.apache.spark.sql._ > import org.apache.spark.sql.types._ > > val df = sqlContext.read.format("jdbc").options( > Map("url" -> "jdbc:derby:/Users/rhillegas/derby/databases/derby1", > "dbtable" -> "app.t")).load() > df.registerTempTable("test_data") > > The following query runs fine because the column name matches the > normalized form in which it is stored in the metadata catalogs of > the relational database: > > // normalized column names are recognized > sqlContext.sql(s"""select A from test_data""").show > > But the following query fails during name resolution. This puzzles > me because non-delimited identifiers are case-insensitive in the > ANSI/ISO Standard. They are also supposed to be case-insensitive in > HiveQL, at least according to section 2.3.1 of the > QuotedIdentifier.html webpage attached to https://issues.apache.org/ > jira/browse/HIVE-6013: > > // ...unnormalized column names raise this error: > org.apache.spark.sql.AnalysisException: cannot resolve 'a' given > input columns A, b, c"d; > sqlContext.sql("""select a from test_data""").show > > Delimited (quoted) identifiers are treated as string literals. > Again, non-Standard behavior: > > // this returns rows consisting of the string literal "b" > sqlContext.sql("""select "b" from test_data""").show > > Embedded quotes in delimited identifiers won't even parse: > > // embedded quotes raise this error: java.lang.RuntimeException: > [1.11] failure: ``union'' expected but "d" found > sqlContext.sql("""select "c""d" from test_data""").show > > This behavior is non-Standard and it strikes me as hard to describe > to users concisely. Would the community support an effort to bring > the handling of column identifiers into closer conformance with the > Standard? Would backward compatibility concerns even allow us to do that? > > Thanks, > -Rick
column identifiers in Spark SQL
I am puzzled by the behavior of column identifiers in Spark SQL. I don't find any guidance in the "Spark SQL and DataFrame Guide" at http://spark.apache.org/docs/latest/sql-programming-guide.html. I am seeing odd behavior related to case-sensitivity and to delimited (quoted) identifiers. Consider the following declaration of a table in the Derby relational database, whose dialect hews closely to the SQL Standard: create table app.t( a int, "b" int, "c""d" int ); Now let's load that table into Spark like this: import org.apache.spark.sql._ import org.apache.spark.sql.types._ val df = sqlContext.read.format("jdbc").options( Map("url" -> "jdbc:derby:/Users/rhillegas/derby/databases/derby1", "dbtable" -> "app.t")).load() df.registerTempTable("test_data") The following query runs fine because the column name matches the normalized form in which it is stored in the metadata catalogs of the relational database: // normalized column names are recognized sqlContext.sql(s"""select A from test_data""").show But the following query fails during name resolution. This puzzles me because non-delimited identifiers are case-insensitive in the ANSI/ISO Standard. They are also supposed to be case-insensitive in HiveQL, at least according to section 2.3.1 of the QuotedIdentifier.html webpage attached to https://issues.apache.org/jira/browse/HIVE-6013: // ...unnormalized column names raise this error: org.apache.spark.sql.AnalysisException: cannot resolve 'a' given input columns A, b, c"d; sqlContext.sql("""select a from test_data""").show Delimited (quoted) identifiers are treated as string literals. Again, non-Standard behavior: // this returns rows consisting of the string literal "b" sqlContext.sql("""select "b" from test_data""").show Embedded quotes in delimited identifiers won't even parse: // embedded quotes raise this error: java.lang.RuntimeException: [1.11] failure: ``union'' expected but "d" found sqlContext.sql("""select "c""d" from test_data""").show This behavior is non-Standard and it strikes me as hard to describe to users concisely. Would the community support an effort to bring the handling of column identifiers into closer conformance with the Standard? Would backward compatibility concerns even allow us to do that? Thanks, -Rick
Re: Unsubscribe
To unsubscribe from the dev list, please send a message to dev-unsubscr...@spark.apache.org as described here: http://spark.apache.org/community.html#mailing-lists. Thanks, -Rick Dulaj Viduranga wrote on 09/21/2015 10:15:58 AM: > From: Dulaj Viduranga > To: dev@spark.apache.org > Date: 09/21/2015 10:16 AM > Subject: Unsubscribe > > Unsubscribe > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org >