[jira] [Closed] (SPARK-9965) Scala, Python SQLContext input methods' deprecation statuses do not match

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-9965. -- Resolution: Resolved Fix Version/s: 2.0.0 These methods were removed in 77ab49b8575d2ebd678065fa70b0343d5

[jira] [Commented] (SPARK-9931) Flaky test: mllib/tests.py StreamingLogisticRegressionWithSGDTests. test_training_and_prediction

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15579889#comment-15579889 ] holdenk commented on SPARK-9931: Is this still a test people are finding flakey or did [~j

[jira] [Commented] (SPARK-7653) ML Pipeline and meta-algs should take random seed param

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15579884#comment-15579884 ] holdenk commented on SPARK-7653: I think the simplest workaround would be exposing HasSeed

[jira] [Commented] (SPARK-7941) Cache Cleanup Failure when job is killed by Spark

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15579878#comment-15579878 ] holdenk commented on SPARK-7941: So if its ok - since I don't see other reports of this -

[jira] [Commented] (SPARK-7721) Generate test coverage report from Python

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15579874#comment-15579874 ] holdenk commented on SPARK-7721: [~joshrosen]is this something your still looking at/inter

[jira] [Commented] (SPARK-3981) Consider a better approach to initialize SerDe on executors

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15579868#comment-15579868 ] holdenk commented on SPARK-3981: That's a good question. It seems like much of the code th

[jira] [Commented] (SPARK-2868) Support named accumulators in Python

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15579866#comment-15579866 ] holdenk commented on SPARK-2868: ping [~davies] - would you be available to review if I go

[jira] [Commented] (SPARK-9487) Use the same num. worker threads in Scala/Python unit tests

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15579847#comment-15579847 ] holdenk commented on SPARK-9487: Great, thanks for taking this issue on :) > Use the same

[jira] [Created] (SPARK-17960) Upgrade to Py4J 0.10.4

2016-10-16 Thread holdenk (JIRA)
holdenk created SPARK-17960: --- Summary: Upgrade to Py4J 0.10.4 Key: SPARK-17960 URL: https://issues.apache.org/jira/browse/SPARK-17960 Project: Spark Issue Type: Improvement Components: Py

[jira] [Commented] (SPARK-14212) Add configuration element for --packages option

2016-10-13 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15573381#comment-15573381 ] holdenk commented on SPARK-14212: - Please do! I think I've outlined the basic steps in my

[jira] [Commented] (SPARK-9487) Use the same num. worker threads in Scala/Python unit tests

2016-10-13 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15573380#comment-15573380 ] holdenk commented on SPARK-9487: +1 to [~srowen]'s comment. I would not be surprised to se

[jira] [Commented] (SPARK-12916) Support Row.fromSeq and Row.toSeq methods in pyspark

2016-10-13 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15573371#comment-15573371 ] holdenk commented on SPARK-12916: - +1 with Hyukjin, I'll go ahead and close this as a "Wo

[jira] [Closed] (SPARK-12916) Support Row.fromSeq and Row.toSeq methods in pyspark

2016-10-13 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-12916. --- Resolution: Won't Fix Since Row is now a subclass of Tuple we don't really need this anymore. > Support Row.

[jira] [Commented] (SPARK-16720) Loading CSV file with 2k+ columns fails during attribute resolution on action

2016-10-13 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15573367#comment-15573367 ] holdenk commented on SPARK-16720: - Sounds good - go ahead and close this :) > Loading CS

[jira] [Commented] (SPARK-10972) UDFs in SQL joins

2016-10-13 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15573364#comment-15573364 ] holdenk commented on SPARK-10972: - I don't think that actually solves the problem the use

[jira] [Commented] (SPARK-650) Add a "setup hook" API for running initialization code on each executor

2016-10-13 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15573360#comment-15573360 ] holdenk commented on SPARK-650: --- Would people feel ok if we marked this as a duplicate of 636

[jira] [Commented] (SPARK-15369) Investigate selectively using Jython for parts of PySpark

2016-10-13 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15573287#comment-15573287 ] holdenk commented on SPARK-15369: - I can understand the hesitancy to adopt this long term

[jira] [Commented] (SPARK-4630) Dynamically determine optimal number of partitions

2016-10-10 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15564071#comment-15564071 ] holdenk commented on SPARK-4630: I also agree this would be really good to revisit, from t

[jira] [Commented] (SPARK-11758) Missing Index column while creating a DataFrame from Pandas

2016-10-08 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15559240#comment-15559240 ] holdenk commented on SPARK-11758: - I believe dropping the index field is intentional (but

[jira] [Updated] (SPARK-14420) keepLastCheckpoint Param for Python LDA with EM

2016-10-08 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-14420: Fix Version/s: 2.0.0 > keepLastCheckpoint Param for Python LDA with EM > --

[jira] [Closed] (SPARK-14420) keepLastCheckpoint Param for Python LDA with EM

2016-10-08 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-14420. --- Resolution: Duplicate > keepLastCheckpoint Param for Python LDA with EM > ---

[jira] [Updated] (SPARK-14212) Add configuration element for --packages option

2016-10-08 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-14212: Labels: config starter (was: config fun happy pants spark-shell) > Add configuration element for --package

[jira] [Updated] (SPARK-14212) Add configuration element for --packages option

2016-10-08 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-14212: Priority: Trivial (was: Major) > Add configuration element for --packages option > ---

[jira] [Updated] (SPARK-14212) Add configuration element for --packages option

2016-10-08 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-14212: Component/s: (was: Spark Shell) (was: Spark Core) Documentation >

[jira] [Commented] (SPARK-14212) Add configuration element for --packages option

2016-10-08 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15558540#comment-15558540 ] holdenk commented on SPARK-14212: - So I think this would be a good option to document for

[jira] [Closed] (SPARK-14017) dataframe.dtypes -> pyspark.sql.types aliases

2016-10-08 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-14017. --- Resolution: Won't Fix Thanks for bringing this issue up - I don't think we necessarily want to add these ali

[jira] [Closed] (SPARK-14229) PySpark DataFrame.rdd's can't be saved to an arbitrary Hadoop OutputFormat

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-14229. --- Resolution: Won't Fix I don't think this is really a bug - if you want to save from dataframes there is the

[jira] [Commented] (SPARK-13585) addPyFile behavior change between 1.6 and before

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557248#comment-15557248 ] holdenk commented on SPARK-13585: - What is the use case for overwriting the old pyFile? T

[jira] [Commented] (SPARK-13606) Error from python worker: /usr/local/bin/python2.7: undefined symbol: _PyCodec_LookupTextEncoding

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557244#comment-15557244 ] holdenk commented on SPARK-13606: - Are you still experiencing this? > Error from python

[jira] [Commented] (SPARK-13534) Implement Apache Arrow serializer for Spark DataFrame for use in DataFrame.toPandas

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557242#comment-15557242 ] holdenk commented on SPARK-13534: - For people following along arrow is in the middle of v

[jira] [Closed] (SPARK-13368) PySpark JavaModel fails to extract params from Spark side automatically

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-13368. --- Resolution: Fixed > PySpark JavaModel fails to extract params from Spark side automatically > ---

[jira] [Commented] (SPARK-13368) PySpark JavaModel fails to extract params from Spark side automatically

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557239#comment-15557239 ] holdenk commented on SPARK-13368: - It seems that we don't have this in the example anymor

[jira] [Commented] (SPARK-13303) Spark fails with pandas import error when pandas is not explicitly imported by user

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557230#comment-15557230 ] holdenk commented on SPARK-13303: - What about if we added a requirements file? We have on

[jira] [Commented] (SPARK-11722) Rdds could be different between orginal one and save-out-then-read-in one

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557226#comment-15557226 ] holdenk commented on SPARK-11722: - Is this still an issue you are experiencing and if so

[jira] [Commented] (SPARK-12776) Implement Python API for Datasets

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557224#comment-15557224 ] holdenk commented on SPARK-12776: - Just re-opening discussion here - the migration to dat

[jira] [Commented] (SPARK-12100) bug in spark/python/pyspark/rdd.py portable_hash()

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557219#comment-15557219 ] holdenk commented on SPARK-12100: - Just noting related progress in https://github.com/apa

[jira] [Commented] (SPARK-11874) DistributedCache for PySpark

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557217#comment-15557217 ] holdenk commented on SPARK-11874: - I think this is not intended to be supported, although

[jira] [Closed] (SPARK-12774) DataFrame.mapPartitions apply function operates on Pandas DataFrame instead of a generator or rows

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-12774. --- Resolution: Won't Fix In some ways yes avoiding unecessary iteration can be good, but allowing Spark to spil

[jira] [Commented] (SPARK-11571) Twitter Api for PySpark

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557205#comment-15557205 ] holdenk commented on SPARK-11571: - Is there anything you are looking to do with this API?

[jira] [Commented] (SPARK-3600) RDD[Double] doesn't use primitive arrays for caching

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557195#comment-15557195 ] holdenk commented on SPARK-3600: Is this something we still want to work on or does `Datas

[jira] [Commented] (SPARK-3513) Provide a utility for running a function once on each executor

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557193#comment-15557193 ] holdenk commented on SPARK-3513: This seems closely related to SPARK-650 and SPARK-636 as

[jira] [Commented] (SPARK-3348) Support user-defined SparkListeners properly

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557186#comment-15557186 ] holdenk commented on SPARK-3348: Is there still interest in seeing this happen? Should we

[jira] [Closed] (SPARK-3312) Add a groupByKey which returns a special GroupBy object like in pandas

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-3312. -- Resolution: Won't Fix > Add a groupByKey which returns a special GroupBy object like in pandas > ---

[jira] [Commented] (SPARK-3312) Add a groupByKey which returns a special GroupBy object like in pandas

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557182#comment-15557182 ] holdenk commented on SPARK-3312: I'm going to go ahead and close this, now that `Datasets`

[jira] [Commented] (SPARK-3132) Avoid serialization for Array[Byte] in TorrentBroadcast

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557178#comment-15557178 ] holdenk commented on SPARK-3132: Is there any progress on this or would it be ok for me to

[jira] [Commented] (SPARK-2722) Mechanism for escaping spark configs is not consistent

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557158#comment-15557158 ] holdenk commented on SPARK-2722: I think at this point trying to change the escaping of th

[jira] [Commented] (SPARK-2032) Add an RDD.samplePartitions method for partition-level sampling

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557153#comment-15557153 ] holdenk commented on SPARK-2032: I'm assuming since there hasn't been any activity for awh

[jira] [Commented] (SPARK-1865) Improve behavior of cleanup of disk state

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557149#comment-15557149 ] holdenk commented on SPARK-1865: So ALS specifically has a work around for this with clean

[jira] [Commented] (SPARK-1792) Missing Spark-Shell Configure Options

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557146#comment-15557146 ] holdenk commented on SPARK-1792: It feels like we've already got a pretty good mechanism f

[jira] [Commented] (SPARK-1762) Add functionality to pin RDDs in cache

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557133#comment-15557133 ] holdenk commented on SPARK-1762: Is this something we are still interested in? I could see

[jira] [Commented] (SPARK-10161) Support Pyspark shell over Mesos Cluster Mode

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557069#comment-15557069 ] holdenk commented on SPARK-10161: - That being said - I'm not sure I see the value of this

[jira] [Commented] (SPARK-10161) Support Pyspark shell over Mesos Cluster Mode

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557068#comment-15557068 ] holdenk commented on SPARK-10161: - I think this is an issue accross cluster modes, maybe

[jira] [Updated] (SPARK-9487) Use the same num. worker threads in Scala/Python unit tests

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-9487: --- Labels: starter (was: ) > Use the same num. worker threads in Scala/Python unit tests > -

[jira] [Commented] (SPARK-9487) Use the same num. worker threads in Scala/Python unit tests

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557018#comment-15557018 ] holdenk commented on SPARK-9487: This will maybe break some tests in the process but it wo

[jira] [Closed] (SPARK-8760) allow moving and symlinking binaries

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-8760. -- Resolution: Fixed This is a "partially fixed" but I think fixed is a close enough description. We don't use rea

[jira] [Closed] (SPARK-8757) Check missing and add user guide for MLlib Python API

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-8757. -- Resolution: Fixed All sub issues fixed, and well past 1.5 release. > Check missing and add user guide for MLlib

[jira] [Closed] (SPARK-8719) Adding Python support for 1-sample, 2-sided Kolmogorov Smirnov Test

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-8719. -- Resolution: Duplicate > Adding Python support for 1-sample, 2-sided Kolmogorov Smirnov Test > --

[jira] [Updated] (SPARK-8605) Exclude files in StreamingContext. textFileStream(directory)

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-8605: --- Component/s: (was: PySpark) Streaming > Exclude files in StreamingContext. textFileStream

[jira] [Commented] (SPARK-8605) Exclude files in StreamingContext. textFileStream(directory)

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556785#comment-15556785 ] holdenk commented on SPARK-8605: This is semi-documented (namely only atomic moves are sup

[jira] [Commented] (SPARK-7177) Create standard way to wrap Spark CLI scripts for external projects

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556780#comment-15556780 ] holdenk commented on SPARK-7177: I've run into similar challenges when working on Sparklin

[jira] [Commented] (SPARK-7941) Cache Cleanup Failure when job is killed by Spark

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556775#comment-15556775 ] holdenk commented on SPARK-7941: Are you still experiencing this issue [~cqnguyen] or woul

[jira] [Updated] (SPARK-8780) Move Python doctest code example from models to algorithms

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-8780: --- Labels: starter (was: ) > Move Python doctest code example from models to algorithms > --

[jira] [Commented] (SPARK-8780) Move Python doctest code example from models to algorithms

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556763#comment-15556763 ] holdenk commented on SPARK-8780: Is this something we still want to do? This could be a gr

[jira] [Commented] (SPARK-6831) Document how to use external data sources

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556758#comment-15556758 ] holdenk commented on SPARK-6831: Is this something we are planning to do at all? It doesn'

[jira] [Closed] (SPARK-6780) Add saveAsTextFileByKey method for PySpark

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-6780. -- Resolution: Won't Fix Since SPARK-3533 is WON'T FIX this one should be to. > Add saveAsTextFileByKey method for

[jira] [Closed] (SPARK-7613) Serialization fails in pyspark for lambdas referencing class data members

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-7613. -- Resolution: Won't Fix I believe this is expected behaviour and the current best practice is simply to make a lo

[jira] [Commented] (SPARK-7638) Python API for pmml.export

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556732#comment-15556732 ] holdenk commented on SPARK-7638: Do we still want to do this or focus on adding PMML expor

[jira] [Commented] (SPARK-6174) Improve doc: Python ALS, MatrixFactorizationModel

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556720#comment-15556720 ] holdenk commented on SPARK-6174: I think Bryan did a good job of this I'd be in favour of

[jira] [Commented] (SPARK-5981) pyspark ML models should support predict/transform on vector within map

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556714#comment-15556714 ] holdenk commented on SPARK-5981: I'm not sure porting the models to Python sounds like a g

[jira] [Resolved] (SPARK-4851) "Uninitialized staticmethod object" error in PySpark

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-4851. Resolution: Fixed The provided repro now runs (although we need to provide it with the correct number of ar

[jira] [Commented] (SPARK-1425) PySpark can crash Executors if worker.py fails while serializing data

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556680#comment-15556680 ] holdenk commented on SPARK-1425: Is this still an issue or do we have a repro case for it?

[jira] [Closed] (SPARK-5160) Python module in jars

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk closed SPARK-5160. -- Resolution: Fixed This is now supported. > Python module in jars > - > > Ke

[jira] [Commented] (SPARK-4488) Add control over map-side aggregation

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556650#comment-15556650 ] holdenk commented on SPARK-4488: So while the associated PR is closed, we ended up adding

[jira] [Resolved] (SPARK-2999) Compress all the serialized data

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-2999. Resolution: Fixed Fixed in b5c51c8df480f1a82a82e4d597d8eea631bffb4e > Compress all the serialized data > --

[jira] [Commented] (SPARK-2868) Support named accumulators in Python

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556582#comment-15556582 ] holdenk commented on SPARK-2868: Is this something we are still interested in pursuing (cc

[jira] [Resolved] (SPARK-2654) Leveled logging in PySpark

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-2654. Resolution: Fixed This has been fixed in SPARK-3444 / ae98eec730125c1153dcac9ea941959cc79e4f42 > Leveled lo

[jira] [Commented] (SPARK-650) Add a "setup hook" API for running initialization code on each executor

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556324#comment-15556324 ] holdenk commented on SPARK-650: --- I think this is a duplicate of SPARK-636 yes? > Add a "setu

[jira] [Commented] (SPARK-636) Add mechanism to run system management/configuration tasks on all workers

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556322#comment-15556322 ] holdenk commented on SPARK-636: --- Does broadcasting get us close enough to handling this or is

[jira] [Commented] (SPARK-15611) Got the same sequence random number in every forked worker.

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556249#comment-15556249 ] holdenk commented on SPARK-15611: - So this is marked as resolved but there is an open PR

[jira] [Commented] (SPARK-14503) spark.ml API for FPGrowth

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1198#comment-1198 ] holdenk commented on SPARK-14503: - +1 for porting the current functionality then updating

[jira] [Commented] (SPARK-14503) spark.ml API for FPGrowth

2016-10-07 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1202#comment-1202 ] holdenk commented on SPARK-14503: - [~jeffzhang] & [~yuhaoyan] are you still working on th

[jira] [Commented] (SPARK-15902) Add a deprecation warning for Python 2.6

2016-10-06 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15552519#comment-15552519 ] holdenk commented on SPARK-15902: - Something like that could work - if your interested in

[jira] [Commented] (SPARK-15130) PySpark shared params should include default values to match Scala

2016-10-06 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15552513#comment-15552513 ] holdenk commented on SPARK-15130: - Now that we've had 2.0.1 go out maybe we should take t

[jira] [Commented] (SPARK-15369) Investigate selectively using Jython for parts of PySpark

2016-10-05 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15550012#comment-15550012 ] holdenk commented on SPARK-15369: - Certainly we can investigate speeding up the serializa

[jira] [Commented] (SPARK-16589) Chained cartesian produces incorrect number of records

2016-10-03 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15543349#comment-15543349 ] holdenk commented on SPARK-16589: - Is this something you are still investigating/working

[jira] [Commented] (SPARK-16407) Allow users to supply custom StreamSinkProviders

2016-09-17 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15498394#comment-15498394 ] holdenk commented on SPARK-16407: - I'm not sure I understand - this really isn't exposing

[jira] [Commented] (SPARK-16407) Allow users to supply custom StreamSinkProviders

2016-09-17 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15498389#comment-15498389 ] holdenk commented on SPARK-16407: - The current sink interface output depends on DataFrame

[jira] [Commented] (SPARK-16407) Allow users to supply custom StreamSinkProviders

2016-09-15 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15493936#comment-15493936 ] holdenk commented on SPARK-16407: - I think its important to keep in mind that these APIs

[jira] [Commented] (SPARK-16407) Allow users to supply custom StreamSinkProviders

2016-09-14 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15491787#comment-15491787 ] holdenk commented on SPARK-16407: - That's part of why I decided to just use the ForeachRD

[jira] [Commented] (SPARK-16407) Allow users to supply custom StreamSinkProviders

2016-09-14 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15491784#comment-15491784 ] holdenk commented on SPARK-16407: - It's true it doesn't work in SQL - but I don't think t

[jira] [Commented] (SPARK-16407) Allow users to supply custom StreamSinkProviders

2016-09-14 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15491772#comment-15491772 ] holdenk commented on SPARK-16407: - Right the simplest example where you need to use the t

[jira] [Commented] (SPARK-16424) Add support for Structured Streaming to the ML Pipeline API

2016-09-14 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15491250#comment-15491250 ] holdenk commented on SPARK-16424: - Just an update - we have a really early proof of conce

[jira] [Commented] (SPARK-12072) python dataframe ._jdf.schema().json() breaks on large metadata dataframes

2016-08-23 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15432470#comment-15432470 ] holdenk commented on SPARK-12072: - My guess is probably not a directly related issue - yo

[jira] [Commented] (SPARK-17116) Allow params to be a {string, value} dict at fit time

2016-08-17 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15425507#comment-15425507 ] holdenk commented on SPARK-17116: - So it seems like doing the second one is more likely t

[jira] [Commented] (SPARK-16921) RDD/DataFrame persist() and cache() should return Python context managers

2016-08-09 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414176#comment-15414176 ] holdenk commented on SPARK-16921: - Sounds good :) Was thinking it might bake sense to do

[jira] [Commented] (SPARK-16921) RDD/DataFrame persist() and cache() should return Python context managers

2016-08-09 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15413935#comment-15413935 ] holdenk commented on SPARK-16921: - [~nchammas] are you planning on working on this yourse

[jira] [Created] (SPARK-16861) Refactor PySpark accumulator API to be on top of AccumulatorV2 API

2016-08-02 Thread holdenk (JIRA)
holdenk created SPARK-16861: --- Summary: Refactor PySpark accumulator API to be on top of AccumulatorV2 API Key: SPARK-16861 URL: https://issues.apache.org/jira/browse/SPARK-16861 Project: Spark Iss

[jira] [Commented] (SPARK-16775) Reduce internal warnings from deprecated accumulator API

2016-08-01 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402592#comment-15402592 ] holdenk commented on SPARK-16775: - Yes so my plan is to replace it with the new API in al

[jira] [Created] (SPARK-16838) Add PMML export for ML KMeans in PySpark

2016-08-01 Thread holdenk (JIRA)
holdenk created SPARK-16838: --- Summary: Add PMML export for ML KMeans in PySpark Key: SPARK-16838 URL: https://issues.apache.org/jira/browse/SPARK-16838 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-16775) Reduce internal warnings from deprecated accumulator API

2016-07-31 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401417#comment-15401417 ] holdenk commented on SPARK-16775: - I'm going to get started on this one tomorrow unless s

<    1   2   3   4   5   6   7   8   9   10   >