[jira] [Comment Edited] (SPARK-14103) Python DataFrame CSV load on large file is writing to console in Ipython

2016-04-02 Thread Shubhanshu Mishra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223006#comment-15223006 ] Shubhanshu Mishra edited comment on SPARK-14103 at 4/2/16 6:51 PM: ---

[jira] [Commented] (SPARK-14103) Python DataFrame CSV load on large file is writing to console in Ipython

2016-04-02 Thread Shubhanshu Mishra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223006#comment-15223006 ] Shubhanshu Mishra commented on SPARK-14103: --- [~hyukjin.kwon] thanks for pointing this out. I

[jira] [Commented] (SPARK-14103) Python DataFrame CSV load on large file is writing to console in Ipython

2016-03-30 Thread Shubhanshu Mishra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217981#comment-15217981 ] Shubhanshu Mishra commented on SPARK-14103: --- [~hyukjin.kwon] the temp.txt file is actually just

[jira] [Commented] (SPARK-14103) Python DataFrame CSV load on large file is writing to console in Ipython

2016-03-29 Thread Shubhanshu Mishra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216660#comment-15216660 ] Shubhanshu Mishra commented on SPARK-14103: --- [~srowen] yes, you are right. The issue is not

[jira] [Commented] (SPARK-14103) Python DataFrame CSV load on large file is writing to console in Ipython

2016-03-29 Thread Shubhanshu Mishra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216547#comment-15216547 ] Shubhanshu Mishra commented on SPARK-14103: --- Another issue with your [#comment-15216228] is

[jira] [Commented] (SPARK-14103) Python DataFrame CSV load on large file is writing to console in Ipython

2016-03-29 Thread Shubhanshu Mishra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216541#comment-15216541 ] Shubhanshu Mishra commented on SPARK-14103: --- Ok I tried your suggestion of increasing

[jira] [Commented] (SPARK-14103) Python DataFrame CSV load on large file is writing to console in Ipython

2016-03-29 Thread Shubhanshu Mishra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216437#comment-15216437 ] Shubhanshu Mishra commented on SPARK-14103: --- I just double checked using the following code and

[jira] [Commented] (SPARK-14103) Python DataFrame CSV load on large file is writing to console in Ipython

2016-03-29 Thread Shubhanshu Mishra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216394#comment-15216394 ] Shubhanshu Mishra commented on SPARK-14103: --- [~srowen] In [#comment-15215064] comment I did

[jira] [Commented] (SPARK-14103) Python DataFrame CSV load on large file is writing to console in Ipython

2016-03-29 Thread Shubhanshu Mishra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216350#comment-15216350 ] Shubhanshu Mishra commented on SPARK-14103: --- [~srowen] As I have mentioned above the maximum

[jira] [Commented] (SPARK-14103) Python DataFrame CSV load on large file is writing to console in Ipython

2016-03-28 Thread Shubhanshu Mishra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215076#comment-15215076 ] Shubhanshu Mishra commented on SPARK-14103: --- [~srowen] I just checked the Spark Code on github

[jira] [Reopened] (SPARK-14103) Python DataFrame CSV load on large file is writing to console in Ipython

2016-03-28 Thread Shubhanshu Mishra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubhanshu Mishra reopened SPARK-14103: --- [~srowen] I am reopening the issue as it is not yet resolved. I have added more details

[jira] [Comment Edited] (SPARK-14103) Python DataFrame CSV load on large file is writing to console in Ipython

2016-03-28 Thread Shubhanshu Mishra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215064#comment-15215064 ] Shubhanshu Mishra edited comment on SPARK-14103 at 3/28/16 11:08 PM: -

[jira] [Commented] (SPARK-14103) Python DataFrame CSV load on large file is writing to console in Ipython

2016-03-28 Thread Shubhanshu Mishra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215064#comment-15215064 ] Shubhanshu Mishra commented on SPARK-14103: --- [~srowen]thanks for the reply. As I have mentioned

[jira] [Commented] (SPARK-14103) Python DataFrame CSV load on large file is writing to console in Ipython

2016-03-28 Thread Shubhanshu Mishra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215027#comment-15215027 ] Shubhanshu Mishra commented on SPARK-14103: --- Yes, the error does say so. However, I have

[jira] [Created] (SPARK-14103) Python DataFrame CSV load on large file is writing to console in Ipython

2016-03-23 Thread Shubhanshu Mishra (JIRA)
Shubhanshu Mishra created SPARK-14103: - Summary: Python DataFrame CSV load on large file is writing to console in Ipython Key: SPARK-14103 URL: https://issues.apache.org/jira/browse/SPARK-14103

[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function

2016-03-01 Thread Shubhanshu Mishra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174193#comment-15174193 ] Shubhanshu Mishra commented on SPARK-13525: --- I am running my code from the interactive R

[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function

2016-03-01 Thread Shubhanshu Mishra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174191#comment-15174191 ] Shubhanshu Mishra commented on SPARK-13525: --- Hi [~felixcheung], I used the steps from the link

[jira] [Updated] (SPARK-13570) pyspark save with partitionBy is very slow

2016-02-29 Thread Shubhanshu Mishra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubhanshu Mishra updated SPARK-13570: -- Description: Running the following code to store data from each year and pos in a

[jira] [Updated] (SPARK-13570) pyspark save with partitionBy is very slow

2016-02-29 Thread Shubhanshu Mishra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubhanshu Mishra updated SPARK-13570: -- Description: Running the following code to store data from each year and pos in a

[jira] [Created] (SPARK-13570) pyspark save with partitionBy is very slow

2016-02-29 Thread Shubhanshu Mishra (JIRA)
Shubhanshu Mishra created SPARK-13570: - Summary: pyspark save with partitionBy is very slow Key: SPARK-13570 URL: https://issues.apache.org/jira/browse/SPARK-13570 Project: Spark Issue

[jira] [Updated] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function

2016-02-26 Thread Shubhanshu Mishra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubhanshu Mishra updated SPARK-13525: -- Description: I am following the code steps from this example:

[jira] [Updated] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function

2016-02-26 Thread Shubhanshu Mishra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubhanshu Mishra updated SPARK-13525: -- Description: I am following the code steps from this example:

[jira] [Created] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function

2016-02-26 Thread Shubhanshu Mishra (JIRA)
Shubhanshu Mishra created SPARK-13525: - Summary: SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function Key: SPARK-13525 URL:

[jira] [Created] (SPARK-13517) Summary classes of scala not exposed in Pyspark

2016-02-26 Thread Shubhanshu Mishra (JIRA)
Shubhanshu Mishra created SPARK-13517: - Summary: Summary classes of scala not exposed in Pyspark Key: SPARK-13517 URL: https://issues.apache.org/jira/browse/SPARK-13517 Project: Spark

[jira] [Updated] (SPARK-13430) Expose ml summary function in PySpark for classification and regression models

2016-02-22 Thread Shubhanshu Mishra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubhanshu Mishra updated SPARK-13430: -- Description: I think model summary interface which is available in Spark's scala, Java

[jira] [Created] (SPARK-13430) Expose ml summary function in PySpark for classification and regression models

2016-02-22 Thread Shubhanshu Mishra (JIRA)
Shubhanshu Mishra created SPARK-13430: - Summary: Expose ml summary function in PySpark for classification and regression models Key: SPARK-13430 URL: https://issues.apache.org/jira/browse/SPARK-13430

[jira] [Closed] (SPARK-12808) Formula based GLM in PySpark

2016-02-22 Thread Shubhanshu Mishra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubhanshu Mishra closed SPARK-12808. - Resolution: Duplicate > Formula based GLM in PySpark > > >

[jira] [Commented] (SPARK-12910) Support for specifying version of R to use while creating sparkR libraries

2016-01-19 Thread Shubhanshu Mishra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107370#comment-15107370 ] Shubhanshu Mishra commented on SPARK-12910: --- I have created a pull request at

[jira] [Created] (SPARK-12910) Support for specifying version of R to use while creating sparkR libraries

2016-01-19 Thread Shubhanshu Mishra (JIRA)
Shubhanshu Mishra created SPARK-12910: - Summary: Support for specifying version of R to use while creating sparkR libraries Key: SPARK-12910 URL: https://issues.apache.org/jira/browse/SPARK-12910

[jira] [Created] (SPARK-12916) Support Row.fromSeq and Row.toSeq methods in pyspark

2016-01-19 Thread Shubhanshu Mishra (JIRA)
Shubhanshu Mishra created SPARK-12916: - Summary: Support Row.fromSeq and Row.toSeq methods in pyspark Key: SPARK-12916 URL: https://issues.apache.org/jira/browse/SPARK-12916 Project: Spark

[jira] [Updated] (SPARK-12910) Support for specifying version of R to use while creating sparkR libraries

2016-01-19 Thread Shubhanshu Mishra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubhanshu Mishra updated SPARK-12910: -- Description: When we use `$SPARK_HOME/R/install-dev.sh` it uses the default system R.

[jira] [Created] (SPARK-12808) Formula based GLM in PySpark

2016-01-13 Thread Shubhanshu Mishra (JIRA)
Shubhanshu Mishra created SPARK-12808: - Summary: Formula based GLM in PySpark Key: SPARK-12808 URL: https://issues.apache.org/jira/browse/SPARK-12808 Project: Spark Issue Type: New

[jira] [Created] (SPARK-12240) FileNotFoundException: (Too many open files) when using multiple groupby on DataFrames

2015-12-09 Thread Shubhanshu Mishra (JIRA)
Shubhanshu Mishra created SPARK-12240: - Summary: FileNotFoundException: (Too many open files) when using multiple groupby on DataFrames Key: SPARK-12240 URL: https://issues.apache.org/jira/browse/SPARK-12240

[jira] [Commented] (SPARK-12240) FileNotFoundException: (Too many open files) when using multiple groupby on DataFrames

2015-12-09 Thread Shubhanshu Mishra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049242#comment-15049242 ] Shubhanshu Mishra commented on SPARK-12240: --- Here is the output from the various commands

[jira] [Reopened] (SPARK-11669) Python interface to SparkR GLM module

2015-11-18 Thread Shubhanshu Mishra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubhanshu Mishra reopened SPARK-11669: --- What I mean't when I said a Python API to GLM was that the GLM module is something

[jira] [Commented] (SPARK-11668) R style summary stats in GLM package SparkR

2015-11-11 Thread Shubhanshu Mishra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001745#comment-15001745 ] Shubhanshu Mishra commented on SPARK-11668: --- Thanks just checked the package. The coefficient

[jira] [Created] (SPARK-11669) Python interface to SparkR GLM module

2015-11-11 Thread Shubhanshu Mishra (JIRA)
Shubhanshu Mishra created SPARK-11669: - Summary: Python interface to SparkR GLM module Key: SPARK-11669 URL: https://issues.apache.org/jira/browse/SPARK-11669 Project: Spark Issue Type:

[jira] [Created] (SPARK-11668) R style summary stats in GLM package SparkR

2015-11-11 Thread Shubhanshu Mishra (JIRA)
Shubhanshu Mishra created SPARK-11668: - Summary: R style summary stats in GLM package SparkR Key: SPARK-11668 URL: https://issues.apache.org/jira/browse/SPARK-11668 Project: Spark Issue