[jira] [Assigned] (SPARK-22152) Add Dataset flatten function

2017-10-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22152: Assignee: (was: Apache Spark) > Add Dataset flatten function >

[jira] [Commented] (SPARK-22152) Add Dataset flatten function

2017-10-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196053#comment-16196053 ] Apache Spark commented on SPARK-22152: -- User 'sohum2002' has created a pull request for this issue:

[jira] [Commented] (SPARK-18855) Add RDD flatten function

2017-10-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196054#comment-16196054 ] Apache Spark commented on SPARK-18855: -- User 'sohum2002' has created a pull request for this issue:

[jira] [Assigned] (SPARK-22152) Add Dataset flatten function

2017-10-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22152: Assignee: Apache Spark > Add Dataset flatten function > > >

[jira] [Updated] (SPARK-22200) Kinesis Receivers stops if Kinesis stream was re-sharded

2017-10-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-22200: - Priority: Major (was: Critical) > Kinesis Receivers stops if Kinesis stream was re-sharded >

[jira] [Assigned] (SPARK-22147) BlockId.hashCode allocates a StringBuilder/String on each call

2017-10-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-22147: - Assignee: Sergei Lebedev Issue Type: Improvement (was: Bug) > BlockId.hashCode allocates

[jira] [Updated] (SPARK-19299) Nulls in non nullable columns causes data corruption in parquet

2017-10-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-19299: - Priority: Major (was: Critical) > Nulls in non nullable columns causes data corruption in

[jira] [Deleted] (SPARK-16309) SparkR csv source should have the same default na.string as R

2017-10-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon deleted SPARK-16309: - > SparkR csv source should have the same default na.string as R >

[jira] [Updated] (SPARK-18859) Catalyst codegen does not mark column as nullable when it should. Causes NPE

2017-10-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-18859: - Priority: Major (was: Critical) > Catalyst codegen does not mark column as nullable when it

[jira] [Commented] (SPARK-22115) Add operator for linalg Matrix and Vector

2017-10-08 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196115#comment-16196115 ] Peng Meng commented on SPARK-22115: --- Hi [~mlnick], I am just back from a national holiday. If only for

[jira] [Resolved] (SPARK-22147) BlockId.hashCode allocates a StringBuilder/String on each call

2017-10-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-22147. --- Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 19369

[jira] [Updated] (SPARK-20859) SQL Loader does not recognize multidimensional columns in postgresql (like integer[]][])

2017-10-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-20859: - Priority: Major (was: Critical) > SQL Loader does not recognize multidimensional columns in

[jira] [Commented] (SPARK-16428) Spark file system watcher not working on Windows

2017-10-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196153#comment-16196153 ] Hyukjin Kwon commented on SPARK-16428: -- Hi all, so, is this issue resolvable? > Spark file system

[jira] [Resolved] (SPARK-16681) Optimizer changes order of filter predicates involving UDFs, which changes semantics

2017-10-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-16681. -- Resolution: Cannot Reproduce With the diff: {code} diff --git a/build.sbt b/build.sbt index

[jira] [Commented] (SPARK-16754) NPE when defining case class and searching Encoder in the same line

2017-10-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196172#comment-16196172 ] Hyukjin Kwon commented on SPARK-16754: -- I tried again today in the current master and looks still

[jira] [Comment Edited] (SPARK-20696) tf-idf document clustering with K-means in Apache Spark putting points into one cluster

2017-10-08 Thread rajanimaski (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196165#comment-16196165 ] rajanimaski edited comment on SPARK-20696 at 10/8/17 4:14 PM: -- Spark

[jira] [Resolved] (SPARK-17109) When we serialize UserDefinedGenerator to json, scala reflection throws an error

2017-10-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-17109. --- Resolution: Not A Problem > When we serialize UserDefinedGenerator to json, scala reflection throws

[jira] [Commented] (SPARK-17118) Make examples Python3 compatible

2017-10-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196196#comment-16196196 ] Hyukjin Kwon commented on SPARK-17118: -- Looks working fine now by: {code} PYSPARK_PYTHON=python3

[jira] [Commented] (SPARK-17169) To use scala macros to update code when SharedParamsCodeGen.scala changed

2017-10-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196206#comment-16196206 ] Hyukjin Kwon commented on SPARK-17169: -- [~josephkb], should we maybe leave this as {{Won't Fix}} or

[jira] [Commented] (SPARK-17265) EdgeRDD Difference throws an exception

2017-10-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196219#comment-16196219 ] Hyukjin Kwon commented on SPARK-17265: -- This seems still happening in the current master too {code}

[jira] [Updated] (SPARK-17265) EdgeRDD Difference throws an exception

2017-10-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-17265: - Affects Version/s: 2.3.0 > EdgeRDD Difference throws an exception >

[jira] [Resolved] (SPARK-17285) ZeroOutPaddingBytes Causing Fatal JVM Error

2017-10-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-17285. -- Resolution: Cannot Reproduce Looks almost impossible to reproduce. Please reopen this if

[jira] [Commented] (SPARK-17756) java.lang.ClassCastException when using cartesian with DStream.transform

2017-10-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196248#comment-16196248 ] Hyukjin Kwon commented on SPARK-17756: -- Still happens in the current master. Will set the affected

[jira] [Updated] (SPARK-17756) java.lang.ClassCastException when using cartesian with DStream.transform

2017-10-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-17756: - Affects Version/s: 2.3.0 > java.lang.ClassCastException when using cartesian with

[jira] [Resolved] (SPARK-16704) Union does not work for column with array byte

2017-10-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-16704. -- Resolution: Cannot Reproduce > Union does not work for column with array byte >

[jira] [Commented] (SPARK-17012) Reading data frames via CSV - Allow to specify default value for integers

2017-10-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196190#comment-16196190 ] Hyukjin Kwon commented on SPARK-17012: -- Couldn't we just replace values after loading it into Spark?

[jira] [Commented] (SPARK-17109) When we serialize UserDefinedGenerator to json, scala reflection throws an error

2017-10-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196192#comment-16196192 ] Hyukjin Kwon commented on SPARK-17109: -- Hi all, looks we dropped 2.10 now - SPARK-19810. Would this

[jira] [Comment Edited] (SPARK-22163) Design Issue of Spark Streaming that Causes Random Run-time Exception

2017-10-08 Thread Michael N (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16195813#comment-16195813 ] Michael N edited comment on SPARK-22163 at 10/8/17 4:08 PM: Sean, my text

[jira] [Updated] (SPARK-22178) Refresh Table does not refresh the underlying tables of the persistent view

2017-10-08 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-22178: -- Fix Version/s: 2.3.0 2.2.1 > Refresh Table does not refresh the underlying

[jira] [Resolved] (SPARK-22178) Refresh Table does not refresh the underlying tables of the persistent view

2017-10-08 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-22178. --- Resolution: Fixed This is resolved via https://github.com/apache/spark/pull/19405 >

[jira] [Commented] (SPARK-22169) support byte length literal as identifier

2017-10-08 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196293#comment-16196293 ] Dongjoon Hyun commented on SPARK-22169: --- Hi, @cloud-fan and [~smilegator]. I saw the discussion on

[jira] [Resolved] (SPARK-22169) support byte length literal as identifier

2017-10-08 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-22169. --- Resolution: Fixed This is resolved via https://github.com/apache/spark/pull/19392 . >

[jira] [Commented] (SPARK-16611) Expose several hidden DataFrame/RDD functions

2017-10-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196161#comment-16196161 ] Hyukjin Kwon commented on SPARK-16611: -- (Let me leave two JIRAs that I suspect are related -

[jira] [Resolved] (SPARK-16874) CSV Reader : Can't resolve column name with a point

2017-10-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-16874. -- Resolution: Cannot Reproduce It looks safely backported into 2.0.2 at least. I am resolving

[jira] [Commented] (SPARK-17538) sqlContext.registerDataFrameAsTable is not working sometimes in pyspark 2.0.0

2017-10-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196236#comment-16196236 ] Hyukjin Kwon commented on SPARK-17538: -- I can't reproduce this. Would you be able to provide more

[jira] [Commented] (SPARK-22115) Add operator for linalg Matrix and Vector

2017-10-08 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196288#comment-16196288 ] Nick Pentreath commented on SPARK-22115: Best keep it private for now. There's been lot of

[jira] [Commented] (SPARK-20696) tf-idf document clustering with K-means in Apache Spark putting points into one cluster

2017-10-08 Thread rajanimaski (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196165#comment-16196165 ] rajanimaski commented on SPARK-20696: - Spark k-means(scala mllib api) is consistently producing

[jira] [Updated] (SPARK-16754) NPE when defining case class and searching Encoder in the same line

2017-10-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-16754: - Affects Version/s: 2.3.0 > NPE when defining case class and searching Encoder in the same line >

[jira] [Resolved] (SPARK-17118) Make examples Python3 compatible

2017-10-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-17118. -- Resolution: Fixed > Make examples Python3 compatible > > >

[jira] [Resolved] (SPARK-17012) Reading data frames via CSV - Allow to specify default value for integers

2017-10-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-17012. -- Resolution: Won't Fix Not sure if we need this for now. I am resolving this as {{Won't Fix}}

[jira] [Resolved] (SPARK-17227) Allow configuring record delimiter in csv

2017-10-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-17227. -- Resolution: Duplicate [~aash], let me leave this resolved as a duplicate. > Allow configuring

[jira] [Commented] (SPARK-17275) Flaky test: org.apache.spark.deploy.RPackageUtilsSuite.jars that don't exist are skipped and print warning

2017-10-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196222#comment-16196222 ] Hyukjin Kwon commented on SPARK-17275: -- I believe I am a regular reader of build logs but I think I

[jira] [Updated] (SPARK-22169) support byte length literal as identifier

2017-10-08 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-22169: -- Fix Version/s: 2.3.0 > support byte length literal as identifier >

[jira] [Commented] (SPARK-16882) Failures in JobGenerator Thread are Swallowed, Job Does Not Fail

2017-10-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196187#comment-16196187 ] Hyukjin Kwon commented on SPARK-16882: -- Hi [~zsxwing], do you maybe think this JIRA is resolvable

[jira] [Comment Edited] (SPARK-20696) tf-idf document clustering with K-means in Apache Spark putting points into one cluster

2017-10-08 Thread rajanimaski (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196165#comment-16196165 ] rajanimaski edited comment on SPARK-20696 at 10/8/17 4:13 PM: -- Spark

[jira] [Comment Edited] (SPARK-20696) tf-idf document clustering with K-means in Apache Spark putting points into one cluster

2017-10-08 Thread rajanimaski (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196165#comment-16196165 ] rajanimaski edited comment on SPARK-20696 at 10/8/17 4:13 PM: -- Spark

[jira] [Resolved] (SPARK-17145) Object with many fields causes Seq Serialization Bug

2017-10-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-17145. -- Resolution: Cannot Reproduce The test at least looks passing in the master. I am resolving

[jira] [Resolved] (SPARK-17726) Allow RDD.pipe to take script contents

2017-10-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-17726. -- Resolution: Won't Fix I think there are many workarounds for this, e.g, having a base script

[jira] [Commented] (SPARK-17804) Pandas dtypes are not correctly inferred by pyspark

2017-10-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196254#comment-16196254 ] Hyukjin Kwon commented on SPARK-17804: -- I know the issue but I'd explain the expected input and

[jira] [Commented] (SPARK-21063) Spark return an empty result from remote hadoop cluster

2017-10-08 Thread Andreas Weise (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196354#comment-16196354 ] Andreas Weise commented on SPARK-21063: --- Seems like a bug IMHO. Same problem here with Spark 2.2.0,

[jira] [Comment Edited] (SPARK-21063) Spark return an empty result from remote hadoop cluster

2017-10-08 Thread Andreas Weise (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196354#comment-16196354 ] Andreas Weise edited comment on SPARK-21063 at 10/8/17 10:47 PM: -

[jira] [Commented] (SPARK-17275) Flaky test: org.apache.spark.deploy.RPackageUtilsSuite.jars that don't exist are skipped and print warning

2017-10-08 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196380#comment-16196380 ] Felix Cheung commented on SPARK-17275: -- perhaps we should close this? it's been a year... > Flaky

[jira] [Commented] (SPARK-22115) Add operator for linalg Matrix and Vector

2017-10-08 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196412#comment-16196412 ] Peng Meng commented on SPARK-22115: --- ok, thanks. > Add operator for linalg Matrix and Vector >

[jira] [Commented] (SPARK-13030) Change OneHotEncoder to Estimator

2017-10-08 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196478#comment-16196478 ] zhengruifeng commented on SPARK-13030: -- [~bago.amirbekian] Agree that it should support

[jira] [Created] (SPARK-22222) Fix the ARRAY_MAX in BufferHolder and add a test

2017-10-08 Thread Feng Liu (JIRA)
Feng Liu created SPARK-2: Summary: Fix the ARRAY_MAX in BufferHolder and add a test Key: SPARK-2 URL: https://issues.apache.org/jira/browse/SPARK-2 Project: Spark Issue Type: Bug

[jira] [Resolved] (SPARK-22139) Remove the variable which is never used in SparkConf.scala

2017-10-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-22139. -- Resolution: Invalid > Remove the variable which is never used in SparkConf.scala >