[jira] [Created] (SPARK-18001) Broke link to R DataFrame In sql-programming-guide
Tommy Yu created SPARK-18001: Summary: Broke link to R DataFrame In sql-programming-guide Key: SPARK-18001 URL: https://issues.apache.org/jira/browse/SPARK-18001 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 2.0.1 Reporter: Tommy Yu Priority: Trivial In http://spark.apache.org/docs/latest/sql-programming-guide.html, Section "Untyped Dataset Operations (aka DataFrame Operations)" Link to R doesn't work that return The requested URL /docs/latest/api/R/DataFrame.html was not found on this server. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13597) Python API for GeneralizedLinearRegression
[ https://issues.apache.org/jira/browse/SPARK-13597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175239#comment-15175239 ] Tommy Yu commented on SPARK-13597: -- I can work on this to expose python api for GeneralizedLinearRegression > Python API for GeneralizedLinearRegression > -- > > Key: SPARK-13597 > URL: https://issues.apache.org/jira/browse/SPARK-13597 > Project: Spark > Issue Type: New Feature > Components: ML, PySpark >Reporter: Xiangrui Meng >Priority: Critical > > After SPARK-12811, we should add Python API for generalized linear regression. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13153) PySpark ML persistence failed when handle no default value parameter
[ https://issues.apache.org/jira/browse/SPARK-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tommy Yu updated SPARK-13153: - Priority: Minor (was: Major) > PySpark ML persistence failed when handle no default value parameter > > > Key: SPARK-13153 > URL: https://issues.apache.org/jira/browse/SPARK-13153 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Affects Versions: 1.6.0 >Reporter: Tommy Yu >Priority: Minor > > This defect find when implement task spark-13033. When add below code to > doctest. > It looks like _transfer_params_from_java did not consider the params which do > not have default value and we should handle them. > >>> import os, tempfile > >>> path = tempfile.mkdtemp() > >>> aftsr_path = path + "/aftsr" > >>> aftsr.save(aftsr_path) > >>> aftsr2 = AFTSurvivalRegression.load(aftsr_path) > Exception detail. > ir2 = IsotonicRegression.load(ir_path) > Exception raised: > Traceback (most recent call last): > File "C:\Python27\lib\doctest.py", line 1289, in run > compileflags, 1) in test.globs > File "", line 1, in > ir2 = IsotonicRegression.load(ir_path) > File > "C:\aWorkFolder\spark\spark-1.6.0-bin-hadoop2.6\spark-1.6.0-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\ml\util.py", > line 194, in load > return cls.read().load(path) > File > "C:\aWorkFolder\spark\spark-1.6.0-bin-hadoop2.6\spark-1.6.0-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\ml\util.py", > line 148, in load > instance.transfer_params_from_java() > File > "C:\aWorkFolder\spark\spark-1.6.0-bin-hadoop2.6\spark-1.6.0-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\ml\wrapper.py", > line 82, in tran > fer_params_from_java > value = _java2py(sc, self._java_obj.getOrDefault(java_param)) > File > "C:\aWorkFolder\spark\spark-1.6.0-bin-hadoop2.6\spark-1.6.0-bin-hadoop2.6\python\lib\py4j-0.9-src.zip\py4j\java_gateway.py", > line 813, in > _call > answer, self.gateway_client, self.target_id, self.name) > File > "C:\aWorkFolder\spark\spark-1.6.0-bin-hadoop2.6\spark-1.6.0-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\sql\utils.py", > line 45, in deco > return f(a, *kw) > File > "C:\aWorkFolder\spark\spark-1.6.0-bin-hadoop2.6\spark-1.6.0-bin-hadoop2.6\python\lib\py4j-0.9-src.zip\py4j\protocol.py", > line 308, in get_ > eturn_value > format(target_id, ".", name), value) > Py4JJavaError: An error occurred while calling o351.getOrDefault. > : java.util.NoSuchElementException: Failed to find a default value for > weightCol > at > org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:647) > at > org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:647) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.ml.param.Params$class.getOrDefault(params.scala:646) > at org.apache.spark.ml.PipelineStage.getOrDefault(Pipeline.scala:43) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) > at py4j.Gateway.invoke(Gateway.java:259) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:209) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13153) PySpark ML persistence failed when handle no default value parameter
Tommy Yu created SPARK-13153: Summary: PySpark ML persistence failed when handle no default value parameter Key: SPARK-13153 URL: https://issues.apache.org/jira/browse/SPARK-13153 Project: Spark Issue Type: Bug Components: ML, PySpark Affects Versions: 1.6.0 Reporter: Tommy Yu This defect find when implement task spark-13033. When add below code to doctest. It looks like _transfer_params_from_java did not consider the params which do not have default value and we should handle them. >>> import os, tempfile >>> path = tempfile.mkdtemp() >>> aftsr_path = path + "/aftsr" >>> aftsr.save(aftsr_path) >>> aftsr2 = AFTSurvivalRegression.load(aftsr_path) Exception detail. ir2 = IsotonicRegression.load(ir_path) Exception raised: Traceback (most recent call last): File "C:\Python27\lib\doctest.py", line 1289, in run compileflags, 1) in test.globs File "", line 1, in ir2 = IsotonicRegression.load(ir_path) File "C:\aWorkFolder\spark\spark-1.6.0-bin-hadoop2.6\spark-1.6.0-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\ml\util.py", line 194, in load return cls.read().load(path) File "C:\aWorkFolder\spark\spark-1.6.0-bin-hadoop2.6\spark-1.6.0-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\ml\util.py", line 148, in load instance.transfer_params_from_java() File "C:\aWorkFolder\spark\spark-1.6.0-bin-hadoop2.6\spark-1.6.0-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\ml\wrapper.py", line 82, in tran fer_params_from_java value = _java2py(sc, self._java_obj.getOrDefault(java_param)) File "C:\aWorkFolder\spark\spark-1.6.0-bin-hadoop2.6\spark-1.6.0-bin-hadoop2.6\python\lib\py4j-0.9-src.zip\py4j\java_gateway.py", line 813, in _call answer, self.gateway_client, self.target_id, self.name) File "C:\aWorkFolder\spark\spark-1.6.0-bin-hadoop2.6\spark-1.6.0-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\sql\utils.py", line 45, in deco return f(a, *kw) File "C:\aWorkFolder\spark\spark-1.6.0-bin-hadoop2.6\spark-1.6.0-bin-hadoop2.6\python\lib\py4j-0.9-src.zip\py4j\protocol.py", line 308, in get_ eturn_value format(target_id, ".", name), value) Py4JJavaError: An error occurred while calling o351.getOrDefault. : java.util.NoSuchElementException: Failed to find a default value for weightCol at org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:647) at org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:647) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.ml.param.Params$class.getOrDefault(params.scala:646) at org.apache.spark.ml.PipelineStage.getOrDefault(Pipeline.scala:43) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:209) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13033) PySpark ml.regression support export/import
[ https://issues.apache.org/jira/browse/SPARK-13033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15119233#comment-15119233 ] Tommy Yu commented on SPARK-13033: -- I can take this one, but may need 13032 merged at first. > PySpark ml.regression support export/import > --- > > Key: SPARK-13033 > URL: https://issues.apache.org/jira/browse/SPARK-13033 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Reporter: Yanbo Liang >Priority: Minor > > Add export/import for all estimators and transformers(which have Scala > implementation) under pyspark/ml/regression.py. Please refer the > implementation at SPARK-13032. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5865) Add doc warnings for methods that return local data structures
[ https://issues.apache.org/jira/browse/SPARK-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15111317#comment-15111317 ] Tommy Yu commented on SPARK-5865: - I will work on this taks. Thanks. > Add doc warnings for methods that return local data structures > -- > > Key: SPARK-5865 > URL: https://issues.apache.org/jira/browse/SPARK-5865 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Reporter: Nicholas Chammas >Priority: Minor > Labels: starter > > We should include a note in the doc string for any method that collects an > RDD to the driver so that users have some hint of why their call might be > OOMing. > {{RDD.take()}} > {{RDD.collect()}} > * > [Scala|https://github.com/apache/spark/blob/d8adefefcc2a4af32295440ed1d4917a6968f017/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L803-L806] > * > [Python|https://github.com/apache/spark/blob/d8adefefcc2a4af32295440ed1d4917a6968f017/python/pyspark/rdd.py#L680-L683] > {{DataFrame.head()}} > {{DataFrame.toPandas()}} > * > [Python|https://github.com/apache/spark/blob/c76da36c2163276b5c34e59fbb139eeb34ed0faa/python/pyspark/sql/dataframe.py#L637-L645] > {{Column.toPandas()}} > * > [Python|https://github.com/apache/spark/blob/c76da36c2163276b5c34e59fbb139eeb34ed0faa/python/pyspark/sql/dataframe.py#L965-L973] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10262) Add @Since annotation to ml.attribute
[ https://issues.apache.org/jira/browse/SPARK-10262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110743#comment-15110743 ] Tommy Yu commented on SPARK-10262: -- HI Xiangrui Meng I take a look all class under ml.attribute, those all are develop api. Only one no-develop api is "AttributeFactory" but it's private to this package either. I though this task should be closed without any PR need. Can you please take a look? Regards. Yu Wenpei. > Add @Since annotation to ml.attribute > - > > Key: SPARK-10262 > URL: https://issues.apache.org/jira/browse/SPARK-10262 > Project: Spark > Issue Type: Sub-task > Components: Documentation, ML >Reporter: Xiangrui Meng >Priority: Minor > Labels: starter > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10264) Add @Since annotation to ml.recoomendation
[ https://issues.apache.org/jira/browse/SPARK-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096218#comment-15096218 ] Tommy Yu commented on SPARK-10264: -- thanks, I will work on this. > Add @Since annotation to ml.recoomendation > -- > > Key: SPARK-10264 > URL: https://issues.apache.org/jira/browse/SPARK-10264 > Project: Spark > Issue Type: Sub-task > Components: Documentation, ML >Reporter: Xiangrui Meng >Assignee: Tijo Thomas >Priority: Minor > Labels: starter > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10264) Add @Since annotation to ml.recoomendation
[ https://issues.apache.org/jira/browse/SPARK-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095844#comment-15095844 ] Tommy Yu commented on SPARK-10264: -- It's long time no update for origin PR for this defect, can let me work on this? > Add @Since annotation to ml.recoomendation > -- > > Key: SPARK-10264 > URL: https://issues.apache.org/jira/browse/SPARK-10264 > Project: Spark > Issue Type: Sub-task > Components: Documentation, ML >Reporter: Xiangrui Meng >Assignee: Tijo Thomas >Priority: Minor > Labels: starter > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12422) Binding Spark Standalone Master to public IP fails
[ https://issues.apache.org/jira/browse/SPARK-12422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15084913#comment-15084913 ] Tommy Yu commented on SPARK-12422: -- Hi For docker images, can you please check /etc/hosts file and remove first line for ip & hosts wrapper. Suggest take a look below doc if you want set up a cluster env base on docker. sometechshit.blogspot.ru/2015/04/running-spark-standalone-cluster-in.html Regards. > Binding Spark Standalone Master to public IP fails > -- > > Key: SPARK-12422 > URL: https://issues.apache.org/jira/browse/SPARK-12422 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 1.5.2 > Environment: Fails on direct deployment on Mac OSX and also in Docker > Environment (running on OSX or Ubuntu) >Reporter: Bennet Jeutter >Priority: Blocker > > The start of the Spark Standalone Master fails, when the host specified > equals the public IP address. For example I created a Docker Machine with > public IP 192.168.99.100, then I run: > /usr/spark/bin/spark-class org.apache.spark.deploy.master.Master -h > 192.168.99.100 > It'll fail with: > Exception in thread "main" java.net.BindException: Failed to bind to: > /192.168.99.100:7093: Service 'sparkMaster' failed after 16 retries! > at > org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272) > at > akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393) > at > akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389) > at scala.util.Success$$anonfun$map$1.apply(Try.scala:206) > at scala.util.Try$.apply(Try.scala:161) > at scala.util.Success.map(Try.scala:206) > at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) > at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) > at > akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55) > at > akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91) > at > akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91) > at > akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91) > at > scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) > at > akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > So I thought oh well, lets just bind to the local IP and access it via public > IP - this doesn't work, it will give: > dropping message [class akka.actor.ActorSelectionMessage] for non-local > recipient [Actor[akka.tcp://sparkMaster@192.168.99.100:7077/]] arriving at > [akka.tcp://sparkMaster@192.168.99.100:7077] inbound addresses are > [akka.tcp://sparkMaster@spark-master:7077] > So there is currently no possibility to run all this... related stackoverflow > issues: > * > http://stackoverflow.com/questions/31659228/getting-java-net-bindexception-when-attempting-to-start-spark-master-on-ec2-node > * > http://stackoverflow.com/questions/33768029/access-apache-spark-standalone-master-via-ip -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12638) Parameter explaination not very accurate for rdd function "aggregate"
Tommy Yu created SPARK-12638: Summary: Parameter explaination not very accurate for rdd function "aggregate" Key: SPARK-12638 URL: https://issues.apache.org/jira/browse/SPARK-12638 Project: Spark Issue Type: Bug Components: Documentation, Spark Core Affects Versions: 1.5.2 Reporter: Tommy Yu Priority: Trivial Currently, RDD function aggregate's parameter doesn't explain well, especially parameter "zeroValue". It's necessary to let junior scala user know that "zeroValue" attend both "seqOp" and "combOp" phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org