[jira] [Created] (SPARK-18001) Broke link to R DataFrame In sql-programming-guide

2016-10-18 Thread Tommy Yu (JIRA)
Tommy Yu created SPARK-18001:


 Summary: Broke link to R DataFrame In sql-programming-guide 
 Key: SPARK-18001
 URL: https://issues.apache.org/jira/browse/SPARK-18001
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 2.0.1
Reporter: Tommy Yu
Priority: Trivial


In http://spark.apache.org/docs/latest/sql-programming-guide.html, Section 
"Untyped Dataset Operations (aka DataFrame Operations)"

Link to R doesn't work that return 
The requested URL /docs/latest/api/R/DataFrame.html was not found on this 
server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13597) Python API for GeneralizedLinearRegression

2016-03-02 Thread Tommy Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175239#comment-15175239
 ] 

Tommy Yu commented on SPARK-13597:
--

I can work on this to expose python api for GeneralizedLinearRegression

> Python API for GeneralizedLinearRegression
> --
>
> Key: SPARK-13597
> URL: https://issues.apache.org/jira/browse/SPARK-13597
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, PySpark
>Reporter: Xiangrui Meng
>Priority: Critical
>
> After SPARK-12811, we should add Python API for generalized linear regression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13153) PySpark ML persistence failed when handle no default value parameter

2016-02-02 Thread Tommy Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tommy Yu updated SPARK-13153:
-
Priority: Minor  (was: Major)

> PySpark ML persistence failed when handle no default value parameter
> 
>
> Key: SPARK-13153
> URL: https://issues.apache.org/jira/browse/SPARK-13153
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 1.6.0
>Reporter: Tommy Yu
>Priority: Minor
>
> This defect find when implement task spark-13033. When add below code to 
> doctest. 
> It looks like _transfer_params_from_java did not consider the params which do 
> not have default value and we should handle them. 
> >>> import os, tempfile
> >>> path = tempfile.mkdtemp()
> >>> aftsr_path = path + "/aftsr"
> >>> aftsr.save(aftsr_path)
> >>> aftsr2 = AFTSurvivalRegression.load(aftsr_path)
> Exception detail.
> ir2 = IsotonicRegression.load(ir_path)
> Exception raised:
> Traceback (most recent call last):
> File "C:\Python27\lib\doctest.py", line 1289, in run
> compileflags, 1) in test.globs
> File "", line 1, in
> ir2 = IsotonicRegression.load(ir_path)
> File 
> "C:\aWorkFolder\spark\spark-1.6.0-bin-hadoop2.6\spark-1.6.0-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\ml\util.py",
>  line 194, in load
> return cls.read().load(path)
> File 
> "C:\aWorkFolder\spark\spark-1.6.0-bin-hadoop2.6\spark-1.6.0-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\ml\util.py",
>  line 148, in load
> instance.transfer_params_from_java()
> File 
> "C:\aWorkFolder\spark\spark-1.6.0-bin-hadoop2.6\spark-1.6.0-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\ml\wrapper.py",
>  line 82, in tran
> fer_params_from_java
> value = _java2py(sc, self._java_obj.getOrDefault(java_param))
> File 
> "C:\aWorkFolder\spark\spark-1.6.0-bin-hadoop2.6\spark-1.6.0-bin-hadoop2.6\python\lib\py4j-0.9-src.zip\py4j\java_gateway.py",
>  line 813, in
> _call
> answer, self.gateway_client, self.target_id, self.name)
> File 
> "C:\aWorkFolder\spark\spark-1.6.0-bin-hadoop2.6\spark-1.6.0-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\sql\utils.py",
>  line 45, in deco
> return f(a, *kw)
> File 
> "C:\aWorkFolder\spark\spark-1.6.0-bin-hadoop2.6\spark-1.6.0-bin-hadoop2.6\python\lib\py4j-0.9-src.zip\py4j\protocol.py",
>  line 308, in get_
> eturn_value
> format(target_id, ".", name), value)
> Py4JJavaError: An error occurred while calling o351.getOrDefault.
> : java.util.NoSuchElementException: Failed to find a default value for 
> weightCol
> at 
> org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:647)
> at 
> org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:647)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.ml.param.Params$class.getOrDefault(params.scala:646)
> at org.apache.spark.ml.PipelineStage.getOrDefault(Pipeline.scala:43)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
> at py4j.Gateway.invoke(Gateway.java:259)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:209)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13153) PySpark ML persistence failed when handle no default value parameter

2016-02-02 Thread Tommy Yu (JIRA)
Tommy Yu created SPARK-13153:


 Summary: PySpark ML persistence failed when handle no default 
value parameter
 Key: SPARK-13153
 URL: https://issues.apache.org/jira/browse/SPARK-13153
 Project: Spark
  Issue Type: Bug
  Components: ML, PySpark
Affects Versions: 1.6.0
Reporter: Tommy Yu


This defect find when implement task spark-13033. When add below code to 
doctest. 
It looks like _transfer_params_from_java did not consider the params which do 
not have default value and we should handle them. 

>>> import os, tempfile
>>> path = tempfile.mkdtemp()
>>> aftsr_path = path + "/aftsr"
>>> aftsr.save(aftsr_path)
>>> aftsr2 = AFTSurvivalRegression.load(aftsr_path)


Exception detail.
ir2 = IsotonicRegression.load(ir_path)
Exception raised:
Traceback (most recent call last):
File "C:\Python27\lib\doctest.py", line 1289, in run
compileflags, 1) in test.globs
File "", line 1, in
ir2 = IsotonicRegression.load(ir_path)
File 
"C:\aWorkFolder\spark\spark-1.6.0-bin-hadoop2.6\spark-1.6.0-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\ml\util.py",
 line 194, in load
return cls.read().load(path)
File 
"C:\aWorkFolder\spark\spark-1.6.0-bin-hadoop2.6\spark-1.6.0-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\ml\util.py",
 line 148, in load
instance.transfer_params_from_java()
File 
"C:\aWorkFolder\spark\spark-1.6.0-bin-hadoop2.6\spark-1.6.0-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\ml\wrapper.py",
 line 82, in tran
fer_params_from_java
value = _java2py(sc, self._java_obj.getOrDefault(java_param))
File 
"C:\aWorkFolder\spark\spark-1.6.0-bin-hadoop2.6\spark-1.6.0-bin-hadoop2.6\python\lib\py4j-0.9-src.zip\py4j\java_gateway.py",
 line 813, in
_call
answer, self.gateway_client, self.target_id, self.name)
File 
"C:\aWorkFolder\spark\spark-1.6.0-bin-hadoop2.6\spark-1.6.0-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\sql\utils.py",
 line 45, in deco
return f(a, *kw)
File 
"C:\aWorkFolder\spark\spark-1.6.0-bin-hadoop2.6\spark-1.6.0-bin-hadoop2.6\python\lib\py4j-0.9-src.zip\py4j\protocol.py",
 line 308, in get_
eturn_value
format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o351.getOrDefault.
: java.util.NoSuchElementException: Failed to find a default value for weightCol
at 
org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:647)
at 
org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:647)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.ml.param.Params$class.getOrDefault(params.scala:646)
at org.apache.spark.ml.PipelineStage.getOrDefault(Pipeline.scala:43)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13033) PySpark ml.regression support export/import

2016-01-27 Thread Tommy Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15119233#comment-15119233
 ] 

Tommy Yu commented on SPARK-13033:
--

I can take this one, but may need 13032 merged at first.

> PySpark ml.regression support export/import
> ---
>
> Key: SPARK-13033
> URL: https://issues.apache.org/jira/browse/SPARK-13033
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Reporter: Yanbo Liang
>Priority: Minor
>
> Add export/import for all estimators and transformers(which have Scala 
> implementation) under pyspark/ml/regression.py. Please refer the 
> implementation at SPARK-13032. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5865) Add doc warnings for methods that return local data structures

2016-01-21 Thread Tommy Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15111317#comment-15111317
 ] 

Tommy Yu commented on SPARK-5865:
-

I will work on this taks. Thanks.

> Add doc warnings for methods that return local data structures
> --
>
> Key: SPARK-5865
> URL: https://issues.apache.org/jira/browse/SPARK-5865
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Reporter: Nicholas Chammas
>Priority: Minor
>  Labels: starter
>
> We should include a note in the doc string for any method that collects an 
> RDD to the driver so that users have some hint of why their call might be 
> OOMing.
> {{RDD.take()}}
> {{RDD.collect()}}
> * 
> [Scala|https://github.com/apache/spark/blob/d8adefefcc2a4af32295440ed1d4917a6968f017/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L803-L806]
> * 
> [Python|https://github.com/apache/spark/blob/d8adefefcc2a4af32295440ed1d4917a6968f017/python/pyspark/rdd.py#L680-L683]
> {{DataFrame.head()}}
> {{DataFrame.toPandas()}}
> * 
> [Python|https://github.com/apache/spark/blob/c76da36c2163276b5c34e59fbb139eeb34ed0faa/python/pyspark/sql/dataframe.py#L637-L645]
> {{Column.toPandas()}}
> * 
> [Python|https://github.com/apache/spark/blob/c76da36c2163276b5c34e59fbb139eeb34ed0faa/python/pyspark/sql/dataframe.py#L965-L973]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10262) Add @Since annotation to ml.attribute

2016-01-21 Thread Tommy Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110743#comment-15110743
 ] 

Tommy Yu commented on SPARK-10262:
--

HI Xiangrui Meng 

I take a look all class under ml.attribute, those all are develop api. Only one 
no-develop api is "AttributeFactory" but it's private to this package either.

I though this task should be closed without any PR need. 

Can you please take a look?

Regards.
Yu Wenpei.

> Add @Since annotation to ml.attribute
> -
>
> Key: SPARK-10262
> URL: https://issues.apache.org/jira/browse/SPARK-10262
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, ML
>Reporter: Xiangrui Meng
>Priority: Minor
>  Labels: starter
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10264) Add @Since annotation to ml.recoomendation

2016-01-13 Thread Tommy Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096218#comment-15096218
 ] 

Tommy Yu commented on SPARK-10264:
--

thanks, I will work on this.

> Add @Since annotation to ml.recoomendation
> --
>
> Key: SPARK-10264
> URL: https://issues.apache.org/jira/browse/SPARK-10264
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, ML
>Reporter: Xiangrui Meng
>Assignee: Tijo Thomas
>Priority: Minor
>  Labels: starter
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10264) Add @Since annotation to ml.recoomendation

2016-01-13 Thread Tommy Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095844#comment-15095844
 ] 

Tommy Yu commented on SPARK-10264:
--

It's long time no update for origin PR for this defect, can let me work on this?

> Add @Since annotation to ml.recoomendation
> --
>
> Key: SPARK-10264
> URL: https://issues.apache.org/jira/browse/SPARK-10264
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, ML
>Reporter: Xiangrui Meng
>Assignee: Tijo Thomas
>Priority: Minor
>  Labels: starter
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12422) Binding Spark Standalone Master to public IP fails

2016-01-05 Thread Tommy Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15084913#comment-15084913
 ] 

Tommy Yu commented on SPARK-12422:
--

Hi 
For docker images, can you please check /etc/hosts file and remove first line 
for ip & hosts wrapper.

Suggest take a look below doc if you want set up a cluster env base on docker.

sometechshit.blogspot.ru/2015/04/running-spark-standalone-cluster-in.html

Regards.

> Binding Spark Standalone Master to public IP fails
> --
>
> Key: SPARK-12422
> URL: https://issues.apache.org/jira/browse/SPARK-12422
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 1.5.2
> Environment: Fails on direct deployment on Mac OSX and also in Docker 
> Environment (running on OSX or Ubuntu)
>Reporter: Bennet Jeutter
>Priority: Blocker
>
> The start of the Spark Standalone Master fails, when the host specified 
> equals the public IP address. For example I created a Docker Machine with 
> public IP 192.168.99.100, then I run:
> /usr/spark/bin/spark-class org.apache.spark.deploy.master.Master -h 
> 192.168.99.100
> It'll fail with:
> Exception in thread "main" java.net.BindException: Failed to bind to: 
> /192.168.99.100:7093: Service 'sparkMaster' failed after 16 retries!
>   at 
> org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
>   at 
> akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393)
>   at 
> akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389)
>   at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
>   at scala.util.Try$.apply(Try.scala:161)
>   at scala.util.Success.map(Try.scala:206)
>   at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
>   at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>   at 
> akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
>   at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> So I thought oh well, lets just bind to the local IP and access it via public 
> IP - this doesn't work, it will give:
> dropping message [class akka.actor.ActorSelectionMessage] for non-local 
> recipient [Actor[akka.tcp://sparkMaster@192.168.99.100:7077/]] arriving at 
> [akka.tcp://sparkMaster@192.168.99.100:7077] inbound addresses are 
> [akka.tcp://sparkMaster@spark-master:7077]
> So there is currently no possibility to run all this... related stackoverflow 
> issues:
> * 
> http://stackoverflow.com/questions/31659228/getting-java-net-bindexception-when-attempting-to-start-spark-master-on-ec2-node
> * 
> http://stackoverflow.com/questions/33768029/access-apache-spark-standalone-master-via-ip



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12638) Parameter explaination not very accurate for rdd function "aggregate"

2016-01-04 Thread Tommy Yu (JIRA)
Tommy Yu created SPARK-12638:


 Summary: Parameter explaination not very accurate for rdd function 
"aggregate"
 Key: SPARK-12638
 URL: https://issues.apache.org/jira/browse/SPARK-12638
 Project: Spark
  Issue Type: Bug
  Components: Documentation, Spark Core
Affects Versions: 1.5.2
Reporter: Tommy Yu
Priority: Trivial


Currently, RDD function aggregate's parameter doesn't explain well, especially 
parameter "zeroValue". 
It's necessary to let junior scala user know that "zeroValue" attend both 
"seqOp" and "combOp" phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org