[jira] [Commented] (SPARK-12278) Move the shuffle related test case from Yarn module to Core module

2016-11-17 Thread Ferdinand Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675998#comment-15675998 ] Ferdinand Xu commented on SPARK-12278: -- Thanks [~srowen] for pointing this out. The main

[jira] [Commented] (SPARK-17932) Failed to run SQL "show table extended like table_name" in Spark2.0.0

2016-11-17 Thread Jiang Xingbo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675985#comment-15675985 ] Jiang Xingbo commented on SPARK-17932: -- I’m working on this, thanks! > Failed to run SQL "show

[jira] [Assigned] (SPARK-18500) Make GenericStrategy be able to prune plans by itself after placeholders are replaced.

2016-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18500: Assignee: Apache Spark > Make GenericStrategy be able to prune plans by itself after

[jira] [Commented] (SPARK-18500) Make GenericStrategy be able to prune plans by itself after placeholders are replaced.

2016-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675978#comment-15675978 ] Apache Spark commented on SPARK-18500: -- User 'ueshin' has created a pull request for this issue:

[jira] [Assigned] (SPARK-18500) Make GenericStrategy be able to prune plans by itself after placeholders are replaced.

2016-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18500: Assignee: (was: Apache Spark) > Make GenericStrategy be able to prune plans by itself

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-11-17 Thread Nathan Howell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675966#comment-15675966 ] Nathan Howell commented on SPARK-18352: --- Sounds good to me. I have an implementation that's passing

[jira] [Created] (SPARK-18500) Make GenericStrategy be able to prune plans by itself after placeholders are replaced.

2016-11-17 Thread Takuya Ueshin (JIRA)
Takuya Ueshin created SPARK-18500: - Summary: Make GenericStrategy be able to prune plans by itself after placeholders are replaced. Key: SPARK-18500 URL: https://issues.apache.org/jira/browse/SPARK-18500

[jira] [Commented] (SPARK-9478) Add sample weights to Random Forest

2016-11-17 Thread Seth Hendrickson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675770#comment-15675770 ] Seth Hendrickson commented on SPARK-9478: - I'm going to work on submitting a PR for adding sample

[jira] [Updated] (SPARK-9478) Add sample weights to Random Forest

2016-11-17 Thread Seth Hendrickson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Seth Hendrickson updated SPARK-9478: Summary: Add sample weights to Random Forest (was: Add class weights to Random Forest) >

[jira] [Commented] (SPARK-18499) Add back support for custom Spark SQL dialects

2016-11-17 Thread Andrew Ash (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675714#comment-15675714 ] Andrew Ash commented on SPARK-18499: Specifically what I'm most interested in is a strict ANSI SQL

[jira] [Created] (SPARK-18499) Add back support for custom Spark SQL dialects

2016-11-17 Thread Andrew Ash (JIRA)
Andrew Ash created SPARK-18499: -- Summary: Add back support for custom Spark SQL dialects Key: SPARK-18499 URL: https://issues.apache.org/jira/browse/SPARK-18499 Project: Spark Issue Type:

[jira] [Commented] (SPARK-9478) Add class weights to Random Forest

2016-11-17 Thread German Eduardo Melo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675699#comment-15675699 ] German Eduardo Melo commented on SPARK-9478: [~sethah] I was wondering if you are working on

[jira] [Commented] (SPARK-17450) spark sql rownumber OOM

2016-11-17 Thread cen yuhai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675639#comment-15675639 ] cen yuhai commented on SPARK-17450: --- I will upgrade to 2.x, please close this issue > spark sql

[jira] [Resolved] (SPARK-18462) SparkListenerDriverAccumUpdates event does not deserialize properly in history server

2016-11-17 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-18462. - Resolution: Fixed Fix Version/s: 2.1.0 2.0.3 >

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-11-17 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675517#comment-15675517 ] Reynold Xin commented on SPARK-18352: - Actually just talked to [~marmbrus] and now I understand more

[jira] [Commented] (SPARK-16803) SaveAsTable does not work when source DataFrame is built on a Hive Table

2016-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675507#comment-15675507 ] Apache Spark commented on SPARK-16803: -- User 'gatorsmile' has created a pull request for this issue:

[jira] [Updated] (SPARK-18487) Add task completion listener to HashAggregate to avoid memory leak

2016-11-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-18487: Summary: Add task completion listener to HashAggregate to avoid memory leak (was: Consume

[jira] [Updated] (SPARK-18487) Consume all elements for Dataset.show/take to avoid memory leak

2016-11-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-18487: Description: The methods such as Dataset.show and take use Limit (CollectLimitExec) which

[jira] [Assigned] (SPARK-18436) isin causing SQL syntax error with JDBC

2016-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18436: Assignee: Apache Spark > isin causing SQL syntax error with JDBC >

[jira] [Assigned] (SPARK-18436) isin causing SQL syntax error with JDBC

2016-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18436: Assignee: (was: Apache Spark) > isin causing SQL syntax error with JDBC >

[jira] [Commented] (SPARK-18436) isin causing SQL syntax error with JDBC

2016-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675473#comment-15675473 ] Apache Spark commented on SPARK-18436: -- User 'windpiger' has created a pull request for this issue:

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-11-17 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675466#comment-15675466 ] Hyukjin Kwon commented on SPARK-18352: -- Ah, you meant producing each row while parsing the whole

[jira] [Commented] (SPARK-18356) Issue + Resolution: Kmeans Spark Performances (ML package)

2016-11-17 Thread yuhao yang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675460#comment-15675460 ] yuhao yang commented on SPARK-18356: I assume the performance improvement depends on the computation

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-11-17 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675446#comment-15675446 ] Reynold Xin commented on SPARK-18352: - No that's not sufficient. It doesn't do streaming. > Parse

[jira] [Comment Edited] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-11-17 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675441#comment-15675441 ] Hyukjin Kwon edited comment on SPARK-18352 at 11/18/16 1:53 AM: Hi

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-11-17 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675441#comment-15675441 ] Hyukjin Kwon commented on SPARK-18352: -- Hi [~rxin], I think it seems this can be simply done after

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-11-17 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675437#comment-15675437 ] Reynold Xin commented on SPARK-18352: - I guess maybe it should be a user-configurable option?

[jira] [Commented] (SPARK-13767) py4j.protocol.Py4JNetworkError: An error occurred while trying to connect to the Java server

2016-11-17 Thread Narayanan Nachiappan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675434#comment-15675434 ] Narayanan Nachiappan commented on SPARK-13767: -- [~rahul.bhati...@gmail.com] Were you able to

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-11-17 Thread Nathan Howell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675421#comment-15675421 ] Nathan Howell commented on SPARK-18352: --- Do you have any ideas how to support this?

[jira] [Assigned] (SPARK-18498) Clean up HDFSMetadataLog API for better testing

2016-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18498: Assignee: (was: Apache Spark) > Clean up HDFSMetadataLog API for better testing >

[jira] [Commented] (SPARK-18498) Clean up HDFSMetadataLog API for better testing

2016-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675412#comment-15675412 ] Apache Spark commented on SPARK-18498: -- User 'tcondie' has created a pull request for this issue:

[jira] [Assigned] (SPARK-18498) Clean up HDFSMetadataLog API for better testing

2016-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18498: Assignee: Apache Spark > Clean up HDFSMetadataLog API for better testing >

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-11-17 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675405#comment-15675405 ] Reynold Xin commented on SPARK-18352: - Are these actually record delimiters? If the top level

[jira] [Resolved] (SPARK-18360) default table path of tables in default database should depend on the location of default database

2016-11-17 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-18360. -- Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 15812

[jira] [Updated] (SPARK-18360) default table path of tables in default database should depend on the location of default database

2016-11-17 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-18360: - Labels: release_notes releasenotes (was: ) > default table path of tables in default database should

[jira] [Commented] (SPARK-18483) spark on yarn always connect to yarn resourcemanager at 0.0.0.0:8032

2016-11-17 Thread inred (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675388#comment-15675388 ] inred commented on SPARK-18483: --- it failed even when i set HADOOP_CONF_DIR=%HADOOP_HOME%\etc\hadoop, i

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-11-17 Thread Nathan Howell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675386#comment-15675386 ] Nathan Howell commented on SPARK-18352: --- Any opinions on configuring this with an option instead of

[jira] [Created] (SPARK-18498) Clean up HDFSMetadataLog API for better testing

2016-11-17 Thread Tyson Condie (JIRA)
Tyson Condie created SPARK-18498: Summary: Clean up HDFSMetadataLog API for better testing Key: SPARK-18498 URL: https://issues.apache.org/jira/browse/SPARK-18498 Project: Spark Issue Type:

[jira] [Updated] (SPARK-18497) ForeachSink fails with "assertion failed: No plan for EventTimeWatermark"

2016-11-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18497: - Target Version/s: 2.1.0 > ForeachSink fails with "assertion failed: No plan for

[jira] [Updated] (SPARK-18497) ForeachSink fails with "assertion failed: No plan for EventTimeWatermark"

2016-11-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18497: - Priority: Critical (was: Major) > ForeachSink fails with "assertion failed: No plan for

[jira] [Created] (SPARK-18497) ForeachSink fails with "assertion failed: No plan for EventTimeWatermark"

2016-11-17 Thread Aaron Davidson (JIRA)
Aaron Davidson created SPARK-18497: -- Summary: ForeachSink fails with "assertion failed: No plan for EventTimeWatermark" Key: SPARK-18497 URL: https://issues.apache.org/jira/browse/SPARK-18497

[jira] [Updated] (SPARK-18454) Changes to fix Nearest Neighbor Search for LSH

2016-11-17 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yun Ni updated SPARK-18454: --- Description: We all agree to do the following improvement to Multi-Probe NN Search: (1) Use approxQuantile

[jira] [Updated] (SPARK-18454) Changes to fix Nearest Neighbor Search for LSH

2016-11-17 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yun Ni updated SPARK-18454: --- Description: We all agree to do the following improvement to Multi-Probe NN Search: (1) Use approxQuantile

[jira] [Updated] (SPARK-18454) Changes to fix Nearest Neighbor Search for LSH

2016-11-17 Thread Yun Ni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yun Ni updated SPARK-18454: --- Summary: Changes to fix Nearest Neighbor Search for LSH (was: Changes to fix Multi-Probe Nearest Neighbor

[jira] [Commented] (SPARK-18321) ML 2.1 QA: API: Java compatibility, docs

2016-11-17 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675345#comment-15675345 ] Joseph K. Bradley commented on SPARK-18321: --- I just noticed there are also problems with having

[jira] [Commented] (SPARK-4105) FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle

2016-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675306#comment-15675306 ] Apache Spark commented on SPARK-4105: - User 'davies' has created a pull request for this issue:

[jira] [Commented] (SPARK-18321) ML 2.1 QA: API: Java compatibility, docs

2016-11-17 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675287#comment-15675287 ] Joseph K. Bradley commented on SPARK-18321: --- I'm guessing it's because it's a private class

[jira] [Assigned] (SPARK-4105) FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle

2016-11-17 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu reassigned SPARK-4105: - Assignee: Davies Liu (was: Josh Rosen) > FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle

[jira] [Commented] (SPARK-18462) SparkListenerDriverAccumUpdates event does not deserialize properly in history server

2016-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675234#comment-15675234 ] Apache Spark commented on SPARK-18462: -- User 'JoshRosen' has created a pull request for this issue:

[jira] [Assigned] (SPARK-18462) SparkListenerDriverAccumUpdates event does not deserialize properly in history server

2016-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18462: Assignee: Josh Rosen (was: Apache Spark) > SparkListenerDriverAccumUpdates event does

[jira] [Assigned] (SPARK-18462) SparkListenerDriverAccumUpdates event does not deserialize properly in history server

2016-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18462: Assignee: Apache Spark (was: Josh Rosen) > SparkListenerDriverAccumUpdates event does

[jira] [Updated] (SPARK-18462) SparkListenerDriverAccumUpdates event does not deserialize properly in history server

2016-11-17 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-18462: --- Target Version/s: 2.0.3, 2.1.0 > SparkListenerDriverAccumUpdates event does not deserialize properly

[jira] [Updated] (SPARK-18496) java.lang.AssertionError: assertion failed

2016-11-17 Thread Harish (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish updated SPARK-18496: --- Affects Version/s: 2.0.2 > java.lang.AssertionError: assertion failed >

[jira] [Created] (SPARK-18496) java.lang.AssertionError: assertion failed

2016-11-17 Thread Harish (JIRA)
Harish created SPARK-18496: -- Summary: java.lang.AssertionError: assertion failed Key: SPARK-18496 URL: https://issues.apache.org/jira/browse/SPARK-18496 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-18495) Web UI should document meaning of green dot in DAG visualization

2016-11-17 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674980#comment-15674980 ] Nicholas Chammas commented on SPARK-18495: -- cc [~andrewor14] > Web UI should document meaning

[jira] [Created] (SPARK-18495) Web UI should document meaning of green dot in DAG visualization

2016-11-17 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-18495: Summary: Web UI should document meaning of green dot in DAG visualization Key: SPARK-18495 URL: https://issues.apache.org/jira/browse/SPARK-18495 Project:

[jira] [Updated] (SPARK-18493) Add withWatermark and checkpoint to python dataframe

2016-11-17 Thread Burak Yavuz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Burak Yavuz updated SPARK-18493: Component/s: PySpark > Add withWatermark and checkpoint to python dataframe >

[jira] [Commented] (SPARK-18252) Improve serialized BloomFilter size

2016-11-17 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674801#comment-15674801 ] Reynold Xin commented on SPARK-18252: - Those two methods are pretty inefficient. When we use this in

[jira] [Commented] (SPARK-18252) Improve serialized BloomFilter size

2016-11-17 Thread Aleksey Ponkin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674821#comment-15674821 ] Aleksey Ponkin commented on SPARK-18252: I do not have any benchmarks, but I believe that

[jira] [Commented] (SPARK-18252) Improve serialized BloomFilter size

2016-11-17 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674818#comment-15674818 ] Reynold Xin commented on SPARK-18252: - Regarding this - can you find some performance data on how the

[jira] [Created] (SPARK-18493) Add withWatermark and checkpoint to python dataframe

2016-11-17 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-18493: --- Summary: Add withWatermark and checkpoint to python dataframe Key: SPARK-18493 URL: https://issues.apache.org/jira/browse/SPARK-18493 Project: Spark Issue

[jira] [Commented] (SPARK-18493) Add withWatermark and checkpoint to python dataframe

2016-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674806#comment-15674806 ] Apache Spark commented on SPARK-18493: -- User 'brkyvz' has created a pull request for this issue:

[jira] [Assigned] (SPARK-18493) Add withWatermark and checkpoint to python dataframe

2016-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18493: Assignee: Apache Spark > Add withWatermark and checkpoint to python dataframe >

[jira] [Assigned] (SPARK-18493) Add withWatermark and checkpoint to python dataframe

2016-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18493: Assignee: (was: Apache Spark) > Add withWatermark and checkpoint to python dataframe

[jira] [Commented] (SPARK-18252) Improve serialized BloomFilter size

2016-11-17 Thread Aleksey Ponkin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674791#comment-15674791 ] Aleksey Ponkin commented on SPARK-18252: well, I do not know anything about vectorized probing,

[jira] [Commented] (SPARK-18252) Improve serialized BloomFilter size

2016-11-17 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674767#comment-15674767 ] Reynold Xin commented on SPARK-18252: - For 3, the sketch package has no external dependency, and was

[jira] [Commented] (SPARK-18475) Be able to provide higher parallelization for StructuredStreaming Kafka Source

2016-11-17 Thread Ofir Manor (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674628#comment-15674628 ] Ofir Manor commented on SPARK-18475: I was just wondering if it actually works, but it seems you

[jira] [Comment Edited] (SPARK-18492) GeneratedIterator grows beyond 64 KB

2016-11-17 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674599#comment-15674599 ] Kazuaki Ishizaki edited comment on SPARK-18492 at 11/17/16 7:31 PM: I

[jira] [Commented] (SPARK-18492) GeneratedIterator grows beyond 64 KB

2016-11-17 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674599#comment-15674599 ] Kazuaki Ishizaki commented on SPARK-18492: -- Can you post a small program that can reproduce this

[jira] [Commented] (SPARK-9487) Use the same num. worker threads in Scala/Python unit tests

2016-11-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674579#comment-15674579 ] Sean Owen commented on SPARK-9487: -- Agree, it seems like it should not be sensitive to ordering within

[jira] [Comment Edited] (SPARK-18252) Improve serialized BloomFilter size

2016-11-17 Thread Aleksey Ponkin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674558#comment-15674558 ] Aleksey Ponkin edited comment on SPARK-18252 at 11/17/16 7:23 PM: -- Hi,

[jira] [Comment Edited] (SPARK-18252) Improve serialized BloomFilter size

2016-11-17 Thread Aleksey Ponkin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674558#comment-15674558 ] Aleksey Ponkin edited comment on SPARK-18252 at 11/17/16 7:22 PM: -- Hi,

[jira] [Commented] (SPARK-18252) Improve serialized BloomFilter size

2016-11-17 Thread Aleksey Ponkin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674558#comment-15674558 ] Aleksey Ponkin commented on SPARK-18252: Hi, good points. 1. Problem with current implementation

[jira] [Issue Comment Deleted] (SPARK-18252) Improve serialized BloomFilter size

2016-11-17 Thread Aleksey Ponkin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Ponkin updated SPARK-18252: --- Comment: was deleted (was: Hi, good points. 1. Problem with current implementation that

[jira] [Commented] (SPARK-18252) Improve serialized BloomFilter size

2016-11-17 Thread Aleksey Ponkin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674559#comment-15674559 ] Aleksey Ponkin commented on SPARK-18252: Hi, good points. 1. Problem with current implementation

[jira] [Commented] (SPARK-9487) Use the same num. worker threads in Scala/Python unit tests

2016-11-17 Thread Saikat Kanjilal (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674543#comment-15674543 ] Saikat Kanjilal commented on SPARK-9487: Sean, I took a look at the code and here it is:

[jira] [Commented] (SPARK-18475) Be able to provide higher parallelization for StructuredStreaming Kafka Source

2016-11-17 Thread Burak Yavuz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674535#comment-15674535 ] Burak Yavuz commented on SPARK-18475: - [~c...@koeninger.org] I don't see where you may need strict

[jira] [Commented] (SPARK-18252) Improve serialized BloomFilter size

2016-11-17 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674532#comment-15674532 ] Reynold Xin commented on SPARK-18252: - I'm not sure if it is worth fixing this: 1. We already

[jira] [Created] (SPARK-18492) GeneratedIterator grows beyond 64 KB

2016-11-17 Thread Norris Merritt (JIRA)
Norris Merritt created SPARK-18492: -- Summary: GeneratedIterator grows beyond 64 KB Key: SPARK-18492 URL: https://issues.apache.org/jira/browse/SPARK-18492 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-18475) Be able to provide higher parallelization for StructuredStreaming Kafka Source

2016-11-17 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674459#comment-15674459 ] Cody Koeninger commented on SPARK-18475: This has come up several times, and my answer is

[jira] [Updated] (SPARK-18317) ML, Graph 2.1 QA: API: Binary incompatible changes

2016-11-17 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-18317: -- Attachment: spark-graphx_2.11-2.0.2_to_2.11-2.1.0-SNAPSHOT.html

[jira] [Resolved] (SPARK-18317) ML, Graph 2.1 QA: API: Binary incompatible changes

2016-11-17 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-18317. --- Resolution: Done > ML, Graph 2.1 QA: API: Binary incompatible changes >

[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce

2016-11-17 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674308#comment-15674308 ] holdenk commented on SPARK-2620: I don't think its been resolved, does your code need to be in the repl or

[jira] [Commented] (SPARK-17436) dataframe.write sometimes does not keep sorting

2016-11-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674253#comment-15674253 ] Sean Owen commented on SPARK-17436: --- Not sure, I have no failures on OS X, on Ubuntu, and all of the

[jira] [Commented] (SPARK-18475) Be able to provide higher parallelization for StructuredStreaming Kafka Source

2016-11-17 Thread Burak Yavuz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674254#comment-15674254 ] Burak Yavuz commented on SPARK-18475: - [~ofirm] Thanks for your comment. I've seen significant

[jira] [Updated] (SPARK-18490) duplicate nodename extrainfo of ShuffleExchange

2016-11-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-18490: -- Assignee: Song Jun > duplicate nodename extrainfo of ShuffleExchange >

[jira] [Resolved] (SPARK-18490) duplicate nodename extrainfo of ShuffleExchange

2016-11-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-18490. --- Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 15920

[jira] [Commented] (SPARK-18356) Issue + Resolution: Kmeans Spark Performances (ML package)

2016-11-17 Thread zakaria hili (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674161#comment-15674161 ] zakaria hili commented on SPARK-18356: -- Hi [~yuhaoyan], I tried to improve the Kmeans using the same

[jira] [Commented] (SPARK-10816) EventTime based sessionization

2016-11-17 Thread Amit Sela (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674147#comment-15674147 ] Amit Sela commented on SPARK-10816: --- [~rxin] it might be worth taking into account a more generic

[jira] [Commented] (SPARK-12965) Indexer setInputCol() doesn't resolve column names like DataFrame.col()

2016-11-17 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674129#comment-15674129 ] Barry Becker commented on SPARK-12965: -- This is a big issue for us because we don't control the

[jira] [Commented] (SPARK-18475) Be able to provide higher parallelization for StructuredStreaming Kafka Source

2016-11-17 Thread Ofir Manor (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674092#comment-15674092 ] Ofir Manor commented on SPARK-18475: Are you sure this is working? Having a visible perf effect? As

[jira] [Commented] (SPARK-17436) dataframe.write sometimes does not keep sorting

2016-11-17 Thread Ran Haim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674032#comment-15674032 ] Ran Haim commented on SPARK-17436: -- I have basiaclly cloned the repository from

[jira] [Comment Edited] (SPARK-17436) dataframe.write sometimes does not keep sorting

2016-11-17 Thread Ran Haim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674032#comment-15674032 ] Ran Haim edited comment on SPARK-17436 at 11/17/16 3:48 PM: I have basiaclly

[jira] [Commented] (SPARK-18172) AnalysisException in first/last during aggregation

2016-11-17 Thread Emlyn Corrin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15673931#comment-15673931 ] Emlyn Corrin commented on SPARK-18172: -- I'm not sure I've got the time to build from source at the

[jira] [Commented] (SPARK-18490) duplicate nodename extrainfo of ShuffleExchange

2016-11-17 Thread Song Jun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15673918#comment-15673918 ] Song Jun commented on SPARK-18490: -- it is not a bug, just simplify this code > duplicate nodename

[jira] [Commented] (SPARK-18004) DataFrame filter Predicate push-down fails for Oracle Timestamp type columns

2016-11-17 Thread Herman van Hovell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15673904#comment-15673904 ] Herman van Hovell commented on SPARK-18004: --- which format should be passed to oracle? >

[jira] [Updated] (SPARK-18444) SparkR running in yarn-cluster mode should not download Spark package

2016-11-17 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-18444: Target Version/s: 2.1.0 > SparkR running in yarn-cluster mode should not download Spark package >

[jira] [Updated] (SPARK-18444) SparkR running in yarn-cluster mode should not download Spark package

2016-11-17 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-18444: Priority: Critical (was: Major) > SparkR running in yarn-cluster mode should not download Spark

[jira] [Assigned] (SPARK-18444) SparkR running in yarn-cluster mode should not download Spark package

2016-11-17 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang reassigned SPARK-18444: --- Assignee: Yanbo Liang > SparkR running in yarn-cluster mode should not download Spark

[jira] [Commented] (SPARK-18004) DataFrame filter Predicate push-down fails for Oracle Timestamp type columns

2016-11-17 Thread Suhas Nalapure (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15673885#comment-15673885 ] Suhas Nalapure commented on SPARK-18004: Date format as per the physical plan logged by Spark

  1   2   >