[jira] [Commented] (SPARK-19803) Flaky BlockManagerProactiveReplicationSuite tests

2017-03-15 Thread Kay Ousterhout (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927539#comment-15927539 ] Kay Ousterhout commented on SPARK-19803: Awesome thanks! > Flaky

[jira] [Updated] (SPARK-19968) Use a cached instance of KafkaProducer for writing to kafka via KafkaSink.

2017-03-15 Thread Prashant Sharma (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Sharma updated SPARK-19968: Description: KafkaProducer is thread safe and an instance can be reused for writing every

[jira] [Assigned] (SPARK-19968) Use a cached instance of KafkaProducer for writing to kafka via KafkaSink.

2017-03-15 Thread Prashant Sharma (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Sharma reassigned SPARK-19968: --- Assignee: Prashant Sharma > Use a cached instance of KafkaProducer for writing to

[jira] [Created] (SPARK-19968) Use a cached instance of KafkaProducer for writing to kafka via KafkaSink.

2017-03-15 Thread Prashant Sharma (JIRA)
Prashant Sharma created SPARK-19968: --- Summary: Use a cached instance of KafkaProducer for writing to kafka via KafkaSink. Key: SPARK-19968 URL: https://issues.apache.org/jira/browse/SPARK-19968

[jira] [Commented] (SPARK-19964) SparkSubmitSuite fails due to Timeout

2017-03-15 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927510#comment-15927510 ] Sean Owen commented on SPARK-19964: --- It doesn't fail in master. This sounds like the kind of thing that

[jira] [Updated] (SPARK-19967) Add from_json APIs to SQL

2017-03-15 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-19967: Description: The method "from_json" is a useful method in turning a string column into a nested

[jira] [Closed] (SPARK-19898) Add from_json in FunctionRegistry

2017-03-15 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li closed SPARK-19898. --- Resolution: Duplicate > Add from_json in FunctionRegistry > - > >

[jira] [Commented] (SPARK-19967) Add from_json APIs to SQL

2017-03-15 Thread Takeshi Yamamuro (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927485#comment-15927485 ] Takeshi Yamamuro commented on SPARK-19967: -- oh, looks great! Thanks! > Add from_json APIs to

[jira] [Commented] (SPARK-19967) Add from_json APIs to SQL

2017-03-15 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927478#comment-15927478 ] Xiao Li commented on SPARK-19967: - See the PR https://github.com/apache/spark/pull/17171 It is just

[jira] [Commented] (SPARK-19967) Add from_json APIs to SQL

2017-03-15 Thread Takeshi Yamamuro (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927476#comment-15927476 ] Takeshi Yamamuro commented on SPARK-19967: -- Finally, what's the DDL format for schema? In my

[jira] [Commented] (SPARK-19967) Add from_json APIs to SQL

2017-03-15 Thread Takeshi Yamamuro (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927466#comment-15927466 ] Takeshi Yamamuro commented on SPARK-19967: -- Yea, sure! I'll make a pr in a day! Thanks. BTW I've

[jira] [Commented] (SPARK-19967) Add from_json APIs to SQL

2017-03-15 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927462#comment-15927462 ] Xiao Li commented on SPARK-19967: - cc [~maropu] > Add from_json APIs to SQL > -

[jira] [Updated] (SPARK-19967) Add from_json APIs to SQL

2017-03-15 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-19967: Description: The method "from_json" is a useful method in turning a string column into a nested StructType

[jira] [Created] (SPARK-19967) Add from_json APIs to SQL

2017-03-15 Thread Xiao Li (JIRA)
Xiao Li created SPARK-19967: --- Summary: Add from_json APIs to SQL Key: SPARK-19967 URL: https://issues.apache.org/jira/browse/SPARK-19967 Project: Spark Issue Type: New Feature

[jira] [Resolved] (SPARK-19830) Add parseTableSchema API to ParserInterface

2017-03-15 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-19830. - Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 17171

[jira] [Commented] (SPARK-19965) DataFrame batch reader may fail to infer partitions when reading FileStreamSink's output

2017-03-15 Thread Liwei Lin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927431#comment-15927431 ] Liwei Lin commented on SPARK-19965: --- Hi [~zsxwing], are you working on a patch? Mind if I work on this

[jira] [Commented] (SPARK-18591) Replace hash-based aggregates with sort-based ones if inputs already sorted

2017-03-15 Thread Takeshi Yamamuro (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927409#comment-15927409 ] Takeshi Yamamuro commented on SPARK-18591: -- yea, sure. Have we already filed a JIRA for the

[jira] [Commented] (SPARK-19803) Flaky BlockManagerProactiveReplicationSuite tests

2017-03-15 Thread Shubham Chopra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927396#comment-15927396 ] Shubham Chopra commented on SPARK-19803: I am looking into this and will try to submit a fix in a

[jira] [Commented] (SPARK-18591) Replace hash-based aggregates with sort-based ones if inputs already sorted

2017-03-15 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927388#comment-15927388 ] Wenchen Fan commented on SPARK-18591: - Can we hold it for a while? I think we need to improve the

[jira] [Commented] (SPARK-7420) Flaky test: o.a.s.streaming.JobGeneratorSuite "Do not clear received block data too soon"

2017-03-15 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927385#comment-15927385 ] Imran Rashid commented on SPARK-7420: - This just [failed

[jira] [Updated] (SPARK-7420) Flaky test: o.a.s.streaming.JobGeneratorSuite "Do not clear received block data too soon"

2017-03-15 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid updated SPARK-7420: Attachment: trimmed-unit-tests-74629.log > Flaky test: o.a.s.streaming.JobGeneratorSuite "Do not

[jira] [Reopened] (SPARK-7420) Flaky test: o.a.s.streaming.JobGeneratorSuite "Do not clear received block data too soon"

2017-03-15 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid reopened SPARK-7420: - Assignee: (was: Tathagata Das) > Flaky test: o.a.s.streaming.JobGeneratorSuite "Do not

[jira] [Updated] (SPARK-19966) SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate

2017-03-15 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-19966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] 吴志龙 updated SPARK-19966: Environment: redhot spark on yarn client was: redhot spark on yarn client > SparkListenerBus has already

[jira] [Updated] (SPARK-19966) SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate

2017-03-15 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-19966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] 吴志龙 updated SPARK-19966: Environment: redhot spark on yarn client was: hadoop 2.6.0 jdk 1.7 spark 2.1 spark on yarn client >

[jira] [Updated] (SPARK-19966) SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate

2017-03-15 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-19966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] 吴志龙 updated SPARK-19966: Description: 2017-03-16 10:17:47|GET_HIVE_DATA|delete table |drop table dp_tmp.tmp_20783_20170316;

[jira] [Created] (SPARK-19966) SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate

2017-03-15 Thread JIRA
吴志龙 created SPARK-19966: --- Summary: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate Key: SPARK-19966 URL: https://issues.apache.org/jira/browse/SPARK-19966 Project: Spark

[jira] [Commented] (SPARK-19965) DataFrame batch reader may fail to infer partitions when reading FileStreamSink's output

2017-03-15 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927333#comment-15927333 ] Shixiong Zhu commented on SPARK-19965: -- This is because inferring partitions doesn't ignore the

[jira] [Created] (SPARK-19965) DataFrame batch reader may fail to infer partitions when reading FileStreamSink's output

2017-03-15 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-19965: Summary: DataFrame batch reader may fail to infer partitions when reading FileStreamSink's output Key: SPARK-19965 URL: https://issues.apache.org/jira/browse/SPARK-19965

[jira] [Commented] (SPARK-19803) Flaky BlockManagerProactiveReplicationSuite tests

2017-03-15 Thread Kay Ousterhout (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927263#comment-15927263 ] Kay Ousterhout commented on SPARK-19803: This failed again today:

[jira] [Resolved] (SPARK-19751) Create Data frame API fails with a self referencing bean

2017-03-15 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-19751. - Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 17188

[jira] [Assigned] (SPARK-19751) Create Data frame API fails with a self referencing bean

2017-03-15 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-19751: --- Assignee: Takeshi Yamamuro > Create Data frame API fails with a self referencing bean >

[jira] [Resolved] (SPARK-19961) unify a exception erro msg for dropdatabase

2017-03-15 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-19961. - Resolution: Fixed Assignee: Song Jun Fix Version/s: 2.2.0 > unify a exception

[jira] [Assigned] (SPARK-19948) Document that saveAsTable uses catalog as source of truth for table existence.

2017-03-15 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-19948: --- Assignee: Juliusz Sompolski > Document that saveAsTable uses catalog as source of truth for

[jira] [Resolved] (SPARK-19948) Document that saveAsTable uses catalog as source of truth for table existence.

2017-03-15 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-19948. - Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 17289

[jira] [Assigned] (SPARK-19931) InMemoryTableScanExec should rewrite output partitioning and ordering when aliasing output attributes

2017-03-15 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-19931: --- Assignee: Liang-Chi Hsieh > InMemoryTableScanExec should rewrite output partitioning and

[jira] [Resolved] (SPARK-19931) InMemoryTableScanExec should rewrite output partitioning and ordering when aliasing output attributes

2017-03-15 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-19931. - Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 17175

[jira] [Commented] (SPARK-7736) Exception not failing Python applications (in yarn cluster mode)

2017-03-15 Thread Yash Sharma (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927240#comment-15927240 ] Yash Sharma commented on SPARK-7736: This does not seem Fixed. The application still completes with

[jira] [Resolved] (SPARK-18066) Add Pool usage policies test coverage for FIFO & FAIR Schedulers

2017-03-15 Thread Kay Ousterhout (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kay Ousterhout resolved SPARK-18066. Resolution: Fixed Assignee: Eren Avsarogullari Fix Version/s: 2.2.0 > Add

[jira] [Commented] (SPARK-19872) UnicodeDecodeError in Pyspark on sc.textFile read with repartition

2017-03-15 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927122#comment-15927122 ] Hyukjin Kwon commented on SPARK-19872: -- Yup, test was added. > UnicodeDecodeError in Pyspark on

[jira] [Commented] (SPARK-19954) Joining to a unioned DataFrame does not produce expected result.

2017-03-15 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927111#comment-15927111 ] Hyukjin Kwon commented on SPARK-19954: -- Honestly, we don't usually set {{Blocker}} {quote}

[jira] [Commented] (SPARK-19954) Joining to a unioned DataFrame does not produce expected result.

2017-03-15 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927103#comment-15927103 ] Hyukjin Kwon commented on SPARK-19954: -- I was following "Contributing to JIRA Maintenance"

[jira] [Reopened] (SPARK-19954) Joining to a unioned DataFrame does not produce expected result.

2017-03-15 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-19954: -- > Joining to a unioned DataFrame does not produce expected result. >

[jira] [Resolved] (SPARK-19954) Joining to a unioned DataFrame does not produce expected result.

2017-03-15 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-19954. -- Resolution: Duplicate > Joining to a unioned DataFrame does not produce expected result. >

[jira] [Commented] (SPARK-13369) Number of consecutive fetch failures for a stage before the job is aborted should be configurable

2017-03-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927085#comment-15927085 ] Apache Spark commented on SPARK-13369: -- User 'sitalkedia' has created a pull request for this issue:

[jira] [Commented] (SPARK-19954) Joining to a unioned DataFrame does not produce expected result.

2017-03-15 Thread Herman van Hovell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927071#comment-15927071 ] Herman van Hovell commented on SPARK-19954: --- This is reproducible on Spark 2.1. This is caused

[jira] [Resolved] (SPARK-19960) Move `SparkHadoopWriter` to `internal/io/`

2017-03-15 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-19960. - Resolution: Fixed Assignee: Jiang Xingbo Fix Version/s: 2.2.0 > Move

[jira] [Commented] (SPARK-19872) UnicodeDecodeError in Pyspark on sc.textFile read with repartition

2017-03-15 Thread Brian Bruggeman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926992#comment-15926992 ] Brian Bruggeman commented on SPARK-19872: - Wondering if a test will be added to prevent future

[jira] [Commented] (SPARK-19954) Joining to a unioned DataFrame does not produce expected result.

2017-03-15 Thread Arun Allamsetty (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926882#comment-15926882 ] Arun Allamsetty commented on SPARK-19954: - [~maropu] and [~hyukjin.kwon] were you able to

[jira] [Updated] (SPARK-19964) SparkSubmitSuite fails due to Timeout

2017-03-15 Thread Eren Avsarogullari (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eren Avsarogullari updated SPARK-19964: --- Attachment: SparkSubmitSuite_Stacktrace > SparkSubmitSuite fails due to Timeout >

[jira] [Created] (SPARK-19964) SparkSubmitSuite fails due to Timeout

2017-03-15 Thread Eren Avsarogullari (JIRA)
Eren Avsarogullari created SPARK-19964: -- Summary: SparkSubmitSuite fails due to Timeout Key: SPARK-19964 URL: https://issues.apache.org/jira/browse/SPARK-19964 Project: Spark Issue

[jira] [Resolved] (SPARK-13450) SortMergeJoin will OOM when join rows have lot of same keys

2017-03-15 Thread Herman van Hovell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell resolved SPARK-13450. --- Resolution: Fixed Assignee: Tejas Patil Fix Version/s: 2.2.0 >

[jira] [Created] (SPARK-19963) create view from select Fails when nullif() is used

2017-03-15 Thread Jay Danielsen (JIRA)
Jay Danielsen created SPARK-19963: - Summary: create view from select Fails when nullif() is used Key: SPARK-19963 URL: https://issues.apache.org/jira/browse/SPARK-19963 Project: Spark Issue

[jira] [Comment Edited] (SPARK-19962) add DictVectorizor for DataFrame

2017-03-15 Thread yu peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926723#comment-15926723 ] yu peng edited comment on SPARK-19962 at 3/15/17 6:42 PM: -- yeah, with one

[jira] [Commented] (SPARK-19962) add DictVectorizor for DataFrame

2017-03-15 Thread yu peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926723#comment-15926723 ] yu peng commented on SPARK-19962: - yeah, with one indexer that performing like mutually exclusive n

[jira] [Commented] (SPARK-19962) add DictVectorizor for DataFrame

2017-03-15 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926680#comment-15926680 ] Sean Owen commented on SPARK-19962: --- You would have n indexers and n models for n different columns;

[jira] [Resolved] (SPARK-19872) UnicodeDecodeError in Pyspark on sc.textFile read with repartition

2017-03-15 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-19872. Resolution: Fixed Fix Version/s: 2.2.0 2.1.1 > UnicodeDecodeError in

[jira] [Assigned] (SPARK-19872) UnicodeDecodeError in Pyspark on sc.textFile read with repartition

2017-03-15 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu reassigned SPARK-19872: -- Assignee: Hyukjin Kwon > UnicodeDecodeError in Pyspark on sc.textFile read with repartition >

[jira] [Commented] (SPARK-19962) add DictVectorizor for DataFrame

2017-03-15 Thread yu peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926559#comment-15926559 ] yu peng commented on SPARK-19962: - >Well, it's maintained by the StringIndexerModel for you. no i mean

[jira] [Updated] (SPARK-19962) add DictVectorizor for DataFrame

2017-03-15 Thread yu peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yu peng updated SPARK-19962: Description: it's really useful to have something like sklearn.feature_extraction.DictVectorizor Since

[jira] [Commented] (SPARK-19962) add DictVectorizor for DataFrame

2017-03-15 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926543#comment-15926543 ] Sean Owen commented on SPARK-19962: --- Well, it's maintained by the StringIndexerModel for you. What's

[jira] [Commented] (SPARK-19962) add DictVectorizor for DataFrame

2017-03-15 Thread yu peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926537#comment-15926537 ] yu peng commented on SPARK-19962: - yeah.. StringIndexer does some job on single column and only string

[jira] [Updated] (SPARK-19962) add DictVectorizor for DataFrame

2017-03-15 Thread yu peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yu peng updated SPARK-19962: Description: it's really useful to have something like sklearn.feature_extraction.DictVectorizor Since

[jira] [Commented] (SPARK-19962) add DictVectorizor for DataFrame

2017-03-15 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926518#comment-15926518 ] Sean Owen commented on SPARK-19962: --- That sounds like what

[jira] [Commented] (SPARK-19962) add DictVectorizor for DataFrame

2017-03-15 Thread yu peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926458#comment-15926458 ] yu peng commented on SPARK-19962: - ^ i have updated the description > add DictVectorizor for DataFrame >

[jira] [Updated] (SPARK-19962) add DictVectorizor for DataFrame

2017-03-15 Thread yu peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yu peng updated SPARK-19962: Description: it's really useful to have something like sklearn.feature_extraction.DictVectorizor Since

[jira] [Comment Edited] (SPARK-12297) Add work-around for Parquet/Hive int96 timestamp bug.

2017-03-15 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926403#comment-15926403 ] Imran Rashid edited comment on SPARK-12297 at 3/15/17 3:53 PM: --- To expand

[jira] [Closed] (SPARK-19957) Inconsist KMeans initialization mode behavior between ML and MLlib

2017-03-15 Thread yuhao yang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuhao yang closed SPARK-19957. -- Resolution: Not A Problem > Inconsist KMeans initialization mode behavior between ML and MLlib >

[jira] [Commented] (SPARK-12297) Add work-around for Parquet/Hive int96 timestamp bug.

2017-03-15 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926403#comment-15926403 ] Imran Rashid commented on SPARK-12297: -- To expand on the original description-- Spark copied Hive's

[jira] [Commented] (SPARK-19957) Inconsist KMeans initialization mode behavior between ML and MLlib

2017-03-15 Thread yuhao yang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926404#comment-15926404 ] yuhao yang commented on SPARK-19957: Thanks for the response. > Inconsist KMeans initialization mode

[jira] [Commented] (SPARK-19962) add DictVectorizor for DataFrame

2017-03-15 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926393#comment-15926393 ] Sean Owen commented on SPARK-19962: --- Can you describe what this does, and give an example of

[jira] [Created] (SPARK-19962) add DictVectorizor for DataFrame

2017-03-15 Thread yu peng (JIRA)
yu peng created SPARK-19962: --- Summary: add DictVectorizor for DataFrame Key: SPARK-19962 URL: https://issues.apache.org/jira/browse/SPARK-19962 Project: Spark Issue Type: Wish

[jira] [Updated] (SPARK-19961) unify a exception erro msg for dropdatabase

2017-03-15 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-19961: -- Affects Version/s: (was: 2.2.0) 2.1.0 Priority: Trivial (was:

[jira] [Assigned] (SPARK-19961) unify a exception erro msg for dropdatabase

2017-03-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-19961: Assignee: (was: Apache Spark) > unify a exception erro msg for dropdatabase >

[jira] [Commented] (SPARK-19961) unify a exception erro msg for dropdatabase

2017-03-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926304#comment-15926304 ] Apache Spark commented on SPARK-19961: -- User 'windpiger' has created a pull request for this issue:

[jira] [Assigned] (SPARK-19961) unify a exception erro msg for dropdatabase

2017-03-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-19961: Assignee: Apache Spark > unify a exception erro msg for dropdatabase >

[jira] [Created] (SPARK-19961) unify a exception erro msg for dropdatabase

2017-03-15 Thread Song Jun (JIRA)
Song Jun created SPARK-19961: Summary: unify a exception erro msg for dropdatabase Key: SPARK-19961 URL: https://issues.apache.org/jira/browse/SPARK-19961 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-19944) Move SQLConf from sql/core to sql/catalyst

2017-03-15 Thread Herman van Hovell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell updated SPARK-19944: -- Fix Version/s: 2.1.1 > Move SQLConf from sql/core to sql/catalyst >

[jira] [Assigned] (SPARK-19960) Move `SparkHadoopWriter` to `internal/io/`

2017-03-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-19960: Assignee: (was: Apache Spark) > Move `SparkHadoopWriter` to `internal/io/` >

[jira] [Commented] (SPARK-19960) Move `SparkHadoopWriter` to `internal/io/`

2017-03-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926014#comment-15926014 ] Apache Spark commented on SPARK-19960: -- User 'jiangxb1987' has created a pull request for this

[jira] [Assigned] (SPARK-19960) Move `SparkHadoopWriter` to `internal/io/`

2017-03-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-19960: Assignee: Apache Spark > Move `SparkHadoopWriter` to `internal/io/` >

[jira] [Created] (SPARK-19960) Move `SparkHadoopWriter` to `internal/io/`

2017-03-15 Thread Jiang Xingbo (JIRA)
Jiang Xingbo created SPARK-19960: Summary: Move `SparkHadoopWriter` to `internal/io/` Key: SPARK-19960 URL: https://issues.apache.org/jira/browse/SPARK-19960 Project: Spark Issue Type:

[jira] [Commented] (SPARK-19112) add codec for ZStandard

2017-03-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925933#comment-15925933 ] Apache Spark commented on SPARK-19112: -- User 'dongjinleekr' has created a pull request for this

[jira] [Assigned] (SPARK-19112) add codec for ZStandard

2017-03-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-19112: Assignee: Apache Spark > add codec for ZStandard > --- > >

[jira] [Assigned] (SPARK-19112) add codec for ZStandard

2017-03-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-19112: Assignee: (was: Apache Spark) > add codec for ZStandard > --- > >

[jira] [Assigned] (SPARK-19959) df[java.lang.Long].collect throws NullPointerException if df includes null

2017-03-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-19959: Assignee: (was: Apache Spark) > df[java.lang.Long].collect throws

[jira] [Commented] (SPARK-19959) df[java.lang.Long].collect throws NullPointerException if df includes null

2017-03-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925873#comment-15925873 ] Apache Spark commented on SPARK-19959: -- User 'kiszk' has created a pull request for this issue:

[jira] [Assigned] (SPARK-19959) df[java.lang.Long].collect throws NullPointerException if df includes null

2017-03-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-19959: Assignee: Apache Spark > df[java.lang.Long].collect throws NullPointerException if df

[jira] [Assigned] (SPARK-19946) DebugFilesystem.assertNoOpenStreams should report the open streams to help debugging

2017-03-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-19946: Assignee: (was: Apache Spark) > DebugFilesystem.assertNoOpenStreams should report the

[jira] [Assigned] (SPARK-19946) DebugFilesystem.assertNoOpenStreams should report the open streams to help debugging

2017-03-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-19946: Assignee: Apache Spark > DebugFilesystem.assertNoOpenStreams should report the open

[jira] [Commented] (SPARK-19946) DebugFilesystem.assertNoOpenStreams should report the open streams to help debugging

2017-03-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925856#comment-15925856 ] Apache Spark commented on SPARK-19946: -- User 'bogdanrdc' has created a pull request for this issue:

[jira] [Commented] (SPARK-18591) Replace hash-based aggregates with sort-based ones if inputs already sorted

2017-03-15 Thread Herman van Hovell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925841#comment-15925841 ] Herman van Hovell commented on SPARK-18591: --- [~maropu] I don't think the current planning (of

[jira] [Created] (SPARK-19959) df[java.lang.Long].collect throws NullPointerException if df includes null

2017-03-15 Thread Kazuaki Ishizaki (JIRA)
Kazuaki Ishizaki created SPARK-19959: Summary: df[java.lang.Long].collect throws NullPointerException if df includes null Key: SPARK-19959 URL: https://issues.apache.org/jira/browse/SPARK-19959

[jira] [Resolved] (SPARK-19889) Make TaskContext callbacks synchronized

2017-03-15 Thread Herman van Hovell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell resolved SPARK-19889. --- Resolution: Fixed Assignee: Herman van Hovell Fix Version/s: 2.2.0 >

[jira] [Commented] (SPARK-19228) inferSchema function processed csv date column as string and "dateFormat" DataSource option is ignored

2017-03-15 Thread Sergey Rubtsov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925775#comment-15925775 ] Sergey Rubtsov commented on SPARK-19228: Hi [~hyukjin.kwon], Updated pull request:

[jira] [Resolved] (SPARK-19954) Joining to a unioned DataFrame does not produce expected result.

2017-03-15 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-19954. -- Resolution: Cannot Reproduce This seems fixed in the current master. That code produces the

[jira] [Commented] (SPARK-19954) Joining to a unioned DataFrame does not produce expected result.

2017-03-15 Thread Takeshi Yamamuro (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925754#comment-15925754 ] Takeshi Yamamuro commented on SPARK-19954: -- It seems this issue has already fixed in branch-2.1.

[jira] [Commented] (SPARK-19957) Inconsist KMeans initialization mode behavior between ML and MLlib

2017-03-15 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925753#comment-15925753 ] Sean Owen commented on SPARK-19957: --- Yeah I think this might be "working as intended". > Inconsist

[jira] [Resolved] (SPARK-19958) Support ZStandard Compression

2017-03-15 Thread Lee Dongjin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lee Dongjin resolved SPARK-19958. - Resolution: Duplicate See: https://issues.apache.org/jira/browse/SPARK-19112 > Support

[jira] [Resolved] (SPARK-19936) Page rank example takes long time to complete

2017-03-15 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-19936. --- Resolution: Not A Problem Set spark.blockManager.port ? > Page rank example takes long time to

[jira] [Resolved] (SPARK-19895) Spark SQL could not output a correct result

2017-03-15 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-19895. --- Resolution: Not A Problem > Spark SQL could not output a correct result >

  1   2   >