[jira] [Commented] (SPARK-14592) Create table like

2016-04-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15239035#comment-15239035 ] Liang-Chi Hsieh commented on SPARK-14592: - Will submit PR soon. > Create table l

[jira] [Commented] (SPARK-14592) Create table like

2016-04-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15239034#comment-15239034 ] Liang-Chi Hsieh commented on SPARK-14592: - I am working on this... > Create tabl

[jira] [Issue Comment Deleted] (SPARK-14592) Create table like

2016-04-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-14592: Comment: was deleted (was: Will submit PR soon.) > Create table like > - >

[jira] [Issue Comment Deleted] (SPARK-14592) Create table like

2016-04-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-14592: Comment: was deleted (was: I am working on this...) > Create table like >

[jira] [Created] (SPARK-14627) In TypedAggregateExpression update method we call encoder.shift many times

2016-04-13 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-14627: --- Summary: In TypedAggregateExpression update method we call encoder.shift many times Key: SPARK-14627 URL: https://issues.apache.org/jira/browse/SPARK-14627 Proj

[jira] [Closed] (SPARK-14627) In TypedAggregateExpression update method we call encoder.shift many times

2016-04-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh closed SPARK-14627. --- Resolution: Won't Fix > In TypedAggregateExpression update method we call encoder.shift many

[jira] [Reopened] (SPARK-14627) Avoid shilfting encoder when delta is zero

2016-04-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh reopened SPARK-14627: - > Avoid shilfting encoder when delta is zero > -- > >

[jira] [Updated] (SPARK-14627) Avoid shilfting encoder when delta is zero

2016-04-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-14627: Summary: Avoid shilfting encoder when delta is zero (was: In TypedAggregateExpression upda

[jira] [Updated] (SPARK-14627) Avoid shilfting encoder when delta is zero

2016-04-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-14627: Description: We can also improve encoder's shift method to return itself when shift delta i

[jira] [Closed] (SPARK-14627) Avoid shilfting encoder when delta is zero

2016-04-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh closed SPARK-14627. --- Resolution: Won't Fix > Avoid shilfting encoder when delta is zero >

[jira] [Closed] (SPARK-14432) Add API to calculate the approximate quantiles for multiple columns

2016-04-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh closed SPARK-14432. --- Resolution: Duplicate > Add API to calculate the approximate quantiles for multiple columns >

[jira] [Commented] (SPARK-14083) Analyze JVM bytecode and turn closures into Catalyst expressions

2016-04-18 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245427#comment-15245427 ] Liang-Chi Hsieh commented on SPARK-14083: - Based on [~joshrosen]'s code, I added

[jira] [Created] (SPARK-14838) Skip automatically broadcast a plan when it contains ObjectProducer

2016-04-21 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-14838: --- Summary: Skip automatically broadcast a plan when it contains ObjectProducer Key: SPARK-14838 URL: https://issues.apache.org/jira/browse/SPARK-14838 Project: Sp

[jira] [Updated] (SPARK-14838) Implement statistics in SerializeFromObject to avoid failure when estimating sizeInBytes for ObjectType

2016-04-22 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-14838: Summary: Implement statistics in SerializeFromObject to avoid failure when estimating sizeI

[jira] [Commented] (SPARK-28652) spark.kubernetes.pyspark.pythonVersion is never passed to executors

2019-08-11 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904729#comment-16904729 ] Liang-Chi Hsieh commented on SPARK-28652: - This looks interesting to me. I tried

[jira] [Commented] (SPARK-28652) spark.kubernetes.pyspark.pythonVersion is never passed to executors

2019-08-11 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904743#comment-16904743 ] Liang-Chi Hsieh commented on SPARK-28652: - As existing tests don't explicitly ch

[jira] [Updated] (SPARK-28652) spark.kubernetes.pyspark.pythonVersion is never passed to executors

2019-08-11 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-28652: Priority: Minor (was: Major) > spark.kubernetes.pyspark.pythonVersion is never passed to

[jira] [Updated] (SPARK-28652) spark.kubernetes.pyspark.pythonVersion is never passed to executors

2019-08-11 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-28652: Issue Type: Test (was: Bug) > spark.kubernetes.pyspark.pythonVersion is never passed to e

[jira] [Created] (SPARK-28722) Change sequential label sorting in StringIndexer fit to parallel

2019-08-13 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-28722: --- Summary: Change sequential label sorting in StringIndexer fit to parallel Key: SPARK-28722 URL: https://issues.apache.org/jira/browse/SPARK-28722 Project: Spark

[jira] [Comment Edited] (SPARK-28732) org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator - failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java' when st

2019-08-16 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909409#comment-16909409 ] Liang-Chi Hsieh edited comment on SPARK-28732 at 8/16/19 9:19 PM:

[jira] [Commented] (SPARK-28732) org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator - failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java' when storing

2019-08-16 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909409#comment-16909409 ] Liang-Chi Hsieh commented on SPARK-28732: - As {{count}} return type is LongType,

[jira] [Commented] (SPARK-28761) spark.driver.maxResultSize only applies to compressed data

2019-08-16 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909420#comment-16909420 ] Liang-Chi Hsieh commented on SPARK-28761: - If you do it at SparkPlan.scala#L344,

[jira] [Commented] (SPARK-28672) [UDF] Duplicate function creation should not allow

2019-08-19 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911007#comment-16911007 ] Liang-Chi Hsieh commented on SPARK-28672: - Is there any rule in Hive regarding t

[jira] [Commented] (SPARK-23519) Create View Commands Fails with The view output (col1,col1) contains duplicate column name

2019-08-22 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-23519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913915#comment-16913915 ] Liang-Chi Hsieh commented on SPARK-23519: - Thanks for pinging me. I am going on

[jira] [Commented] (SPARK-24666) Word2Vec generate infinity vectors when numIterations are large

2019-08-24 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-24666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914950#comment-16914950 ] Liang-Chi Hsieh commented on SPARK-24666: - I tried to run word2vec with Quora Qu

[jira] [Created] (SPARK-28866) Persist item factors RDD when checkpointing in ALS

2019-08-25 Thread Liang-Chi Hsieh (Jira)
Liang-Chi Hsieh created SPARK-28866: --- Summary: Persist item factors RDD when checkpointing in ALS Key: SPARK-28866 URL: https://issues.apache.org/jira/browse/SPARK-28866 Project: Spark Issu

[jira] [Commented] (SPARK-25549) High level API to collect RDD statistics

2019-08-25 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-25549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16915362#comment-16915362 ] Liang-Chi Hsieh commented on SPARK-25549: - Close this as it is not needed now.

[jira] [Resolved] (SPARK-25549) High level API to collect RDD statistics

2019-08-25 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-25549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-25549. - Resolution: Won't Fix > High level API to collect RDD statistics > -

[jira] [Commented] (SPARK-23519) Create View Commands Fails with The view output (col1,col1) contains duplicate column name

2019-08-26 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-23519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16915809#comment-16915809 ] Liang-Chi Hsieh commented on SPARK-23519: - I test with Hive 2.1. It doesn't supp

[jira] [Created] (SPARK-28920) Set up java version for github workflow

2019-08-29 Thread Liang-Chi Hsieh (Jira)
Liang-Chi Hsieh created SPARK-28920: --- Summary: Set up java version for github workflow Key: SPARK-28920 URL: https://issues.apache.org/jira/browse/SPARK-28920 Project: Spark Issue Type: Imp

[jira] [Assigned] (SPARK-28933) Reduce unnecessary shuffle in ALS when initializing factors

2019-08-30 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh reassigned SPARK-28933: --- Assignee: Liang-Chi Hsieh > Reduce unnecessary shuffle in ALS when initializing fac

[jira] [Created] (SPARK-28933) Reduce unnecessary shuffle in ALS when initializing factors

2019-08-30 Thread Liang-Chi Hsieh (Jira)
Liang-Chi Hsieh created SPARK-28933: --- Summary: Reduce unnecessary shuffle in ALS when initializing factors Key: SPARK-28933 URL: https://issues.apache.org/jira/browse/SPARK-28933 Project: Spark

[jira] [Resolved] (SPARK-28926) CLONE - ArrayIndexOutOfBoundsException and Not-stable AUC metrics in ALS for datasets with 12 billion instances

2019-08-30 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-28926. - Resolution: Duplicate I think this is duplicate to SPARK-28927. > CLONE - ArrayIndexOut

[jira] [Commented] (SPARK-28935) Document SQL metrics for Details for Query Plan

2019-08-30 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919975#comment-16919975 ] Liang-Chi Hsieh commented on SPARK-28935: - Thanks for pinging me! I will look in

[jira] [Commented] (SPARK-23519) Create View Commands Fails with The view output (col1,col1) contains duplicate column name

2019-08-31 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-23519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920302#comment-16920302 ] Liang-Chi Hsieh commented on SPARK-23519: - This was closed and then reopened and

[jira] [Updated] (SPARK-23519) Create View Commands Fails with The view output (col1,col1) contains duplicate column name

2019-08-31 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-23519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-23519: Labels: (was: bulk-closed) > Create View Commands Fails with The view output (col1,col1

[jira] [Updated] (SPARK-23519) Create View Commands Fails with The view output (col1,col1) contains duplicate column name

2019-08-31 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-23519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-23519: Component/s: (was: Spark Core) > Create View Commands Fails with The view output (col

[jira] [Commented] (SPARK-28927) ArrayIndexOutOfBoundsException and Not-stable AUC metrics in ALS for datasets with 12 billion instances

2019-09-01 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920480#comment-16920480 ] Liang-Chi Hsieh commented on SPARK-28927: - Does this only happen on 2.2.1? How a

[jira] [Commented] (SPARK-28935) Document SQL metrics for Details for Query Plan

2019-09-01 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920550#comment-16920550 ] Liang-Chi Hsieh commented on SPARK-28935: - Thanks! [~smilegator] It should be h

[jira] [Commented] (SPARK-28933) Reduce unnecessary shuffle in ALS when initializing factors

2019-09-01 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920584#comment-16920584 ] Liang-Chi Hsieh commented on SPARK-28933: - This issue was resolved by [https://g

[jira] [Resolved] (SPARK-28933) Reduce unnecessary shuffle in ALS when initializing factors

2019-09-01 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-28933. - Resolution: Resolved > Reduce unnecessary shuffle in ALS when initializing factors > ---

[jira] [Updated] (SPARK-28933) Reduce unnecessary shuffle in ALS when initializing factors

2019-09-01 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-28933: Fix Version/s: 3.0.0 > Reduce unnecessary shuffle in ALS when initializing factors > -

[jira] [Created] (SPARK-29013) Structurally equivalent subexpression elimination

2019-09-06 Thread Liang-Chi Hsieh (Jira)
Liang-Chi Hsieh created SPARK-29013: --- Summary: Structurally equivalent subexpression elimination Key: SPARK-29013 URL: https://issues.apache.org/jira/browse/SPARK-29013 Project: Spark Issue

[jira] [Assigned] (SPARK-23265) Update multi-column error handling logic in QuantileDiscretizer

2019-09-09 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-23265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh reassigned SPARK-23265: --- Assignee: Huaxin Gao > Update multi-column error handling logic in QuantileDiscreti

[jira] [Resolved] (SPARK-23265) Update multi-column error handling logic in QuantileDiscretizer

2019-09-09 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-23265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-23265. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 20442 [http

[jira] [Commented] (SPARK-28927) ArrayIndexOutOfBoundsException and Not-stable AUC metrics in ALS for datasets with 12 billion instances

2019-09-10 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926814#comment-16926814 ] Liang-Chi Hsieh commented on SPARK-28927: - Hi [~JerryHouse], do you use any non-

[jira] [Created] (SPARK-29042) Sampling-based RDD with unordered input should be INDETERMINATE

2019-09-10 Thread Liang-Chi Hsieh (Jira)
Liang-Chi Hsieh created SPARK-29042: --- Summary: Sampling-based RDD with unordered input should be INDETERMINATE Key: SPARK-29042 URL: https://issues.apache.org/jira/browse/SPARK-29042 Project: Spark

[jira] [Commented] (SPARK-26205) Optimize InSet expression for bytes, shorts, ints, dates

2019-09-12 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-26205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928649#comment-16928649 ] Liang-Chi Hsieh commented on SPARK-26205: - Yeah, I will look at it. > Optimize

[jira] [Commented] (SPARK-26205) Optimize InSet expression for bytes, shorts, ints, dates

2019-09-13 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-26205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929334#comment-16929334 ] Liang-Chi Hsieh commented on SPARK-26205: - [~cloud_fan] I ran a simple test, see

[jira] [Resolved] (SPARK-29042) Sampling-based RDD with unordered input should be INDETERMINATE

2019-09-13 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-29042. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 25751 [http

[jira] [Assigned] (SPARK-29042) Sampling-based RDD with unordered input should be INDETERMINATE

2019-09-13 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh reassigned SPARK-29042: --- Assignee: Liang-Chi Hsieh > Sampling-based RDD with unordered input should be INDET

[jira] [Assigned] (SPARK-28927) ArrayIndexOutOfBoundsException and Not-stable AUC metrics in ALS for datasets with 12 billion instances

2019-09-14 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh reassigned SPARK-28927: --- Assignee: Liang-Chi Hsieh > ArrayIndexOutOfBoundsException and Not-stable AUC metri

[jira] [Commented] (SPARK-28927) ArrayIndexOutOfBoundsException and Not-stable AUC metrics in ALS for datasets with 12 billion instances

2019-09-16 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930658#comment-16930658 ] Liang-Chi Hsieh commented on SPARK-28927: - Because you are using 2.2.1, spark.sq

[jira] [Comment Edited] (SPARK-28927) ArrayIndexOutOfBoundsException and Not-stable AUC metrics in ALS for datasets with 12 billion instances

2019-09-16 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930658#comment-16930658 ] Liang-Chi Hsieh edited comment on SPARK-28927 at 9/16/19 3:35 PM:

[jira] [Comment Edited] (SPARK-28927) ArrayIndexOutOfBoundsException and Not-stable AUC metrics in ALS for datasets with 12 billion instances

2019-09-16 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930658#comment-16930658 ] Liang-Chi Hsieh edited comment on SPARK-28927 at 9/16/19 3:36 PM:

[jira] [Created] (SPARK-29100) Codegen with switch in InSet expression causes compilation error

2019-09-16 Thread Liang-Chi Hsieh (Jira)
Liang-Chi Hsieh created SPARK-29100: --- Summary: Codegen with switch in InSet expression causes compilation error Key: SPARK-29100 URL: https://issues.apache.org/jira/browse/SPARK-29100 Project: Spark

[jira] [Updated] (SPARK-29100) Codegen with switch in InSet expression causes compilation error

2019-09-16 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-29100: Description: SPARK-26205 adds an optimization to InSet that generates Java switch conditio

[jira] [Assigned] (SPARK-29100) Codegen with switch in InSet expression causes compilation error

2019-09-16 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh reassigned SPARK-29100: --- Assignee: Liang-Chi Hsieh > Codegen with switch in InSet expression causes compilat

[jira] [Commented] (SPARK-26205) Optimize InSet expression for bytes, shorts, ints, dates

2019-09-16 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-26205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930787#comment-16930787 ] Liang-Chi Hsieh commented on SPARK-26205: - [~cloud_fan]. I see now. Created SPAR

[jira] [Commented] (SPARK-28927) ArrayIndexOutOfBoundsException and Not-stable AUC metrics in ALS for datasets with 12 billion instances

2019-09-16 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930827#comment-16930827 ] Liang-Chi Hsieh commented on SPARK-28927: - Regarding to AUC unstable issue, the

[jira] [Assigned] (SPARK-22796) Add multiple column support to PySpark QuantileDiscretizer

2019-09-18 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-22796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh reassigned SPARK-22796: --- Assignee: Huaxin Gao > Add multiple column support to PySpark QuantileDiscretizer >

[jira] [Resolved] (SPARK-22796) Add multiple column support to PySpark QuantileDiscretizer

2019-09-18 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-22796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-22796. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 25812 [http

[jira] [Updated] (SPARK-29042) Sampling-based RDD with unordered input should be INDETERMINATE

2019-09-18 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-29042: Fix Version/s: 2.4.5 > Sampling-based RDD with unordered input should be INDETERMINATE > -

[jira] [Commented] (SPARK-29042) Sampling-based RDD with unordered input should be INDETERMINATE

2019-09-18 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932784#comment-16932784 ] Liang-Chi Hsieh commented on SPARK-29042: - [~hyukjin.kwon] Am I setting the fix

[jira] [Created] (SPARK-29181) Cache preferred locations of checkpointed RDD

2019-09-19 Thread Liang-Chi Hsieh (Jira)
Liang-Chi Hsieh created SPARK-29181: --- Summary: Cache preferred locations of checkpointed RDD Key: SPARK-29181 URL: https://issues.apache.org/jira/browse/SPARK-29181 Project: Spark Issue Typ

[jira] [Created] (SPARK-29182) Cache preferred locations of checkpointed RDD

2019-09-19 Thread Liang-Chi Hsieh (Jira)
Liang-Chi Hsieh created SPARK-29182: --- Summary: Cache preferred locations of checkpointed RDD Key: SPARK-29182 URL: https://issues.apache.org/jira/browse/SPARK-29182 Project: Spark Issue Typ

[jira] [Commented] (SPARK-29181) Cache preferred locations of checkpointed RDD

2019-09-19 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933955#comment-16933955 ] Liang-Chi Hsieh commented on SPARK-29181: - [~dongjoon] Thanks. Not aware of crea

[jira] [Resolved] (SPARK-29181) Cache preferred locations of checkpointed RDD

2019-09-19 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-29181. - Resolution: Duplicate > Cache preferred locations of checkpointed RDD >

[jira] [Assigned] (SPARK-29182) Cache preferred locations of checkpointed RDD

2019-09-19 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh reassigned SPARK-29182: --- Assignee: Liang-Chi Hsieh > Cache preferred locations of checkpointed RDD > ---

[jira] [Created] (SPARK-29239) Subquery should not cause NPE when eliminating subexpression

2019-09-25 Thread Liang-Chi Hsieh (Jira)
Liang-Chi Hsieh created SPARK-29239: --- Summary: Subquery should not cause NPE when eliminating subexpression Key: SPARK-29239 URL: https://issues.apache.org/jira/browse/SPARK-29239 Project: Spark

[jira] [Commented] (SPARK-29239) Subquery should not cause NPE when eliminating subexpression

2019-09-25 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937486#comment-16937486 ] Liang-Chi Hsieh commented on SPARK-29239: - Yes. > Subquery should not cause NPE

[jira] [Commented] (SPARK-29239) Subquery should not cause NPE when eliminating subexpression

2019-09-25 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937495#comment-16937495 ] Liang-Chi Hsieh commented on SPARK-29239: - I added SPARK-29221 to the title of t

[jira] [Created] (SPARK-27832) Don't decompress and create column batch when the task is completed

2019-05-24 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-27832: --- Summary: Don't decompress and create column batch when the task is completed Key: SPARK-27832 URL: https://issues.apache.org/jira/browse/SPARK-27832 Project: Sp

[jira] [Commented] (SPARK-27837) Running rand() in SQL with seed of column results in error (rand(col1))

2019-05-25 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16848123#comment-16848123 ] Liang-Chi Hsieh commented on SPARK-27837: - The problem isn't that val1 isn't an

[jira] [Commented] (SPARK-27836) Issue with seeded rand() function in Spark SQL

2019-05-25 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16848124#comment-16848124 ] Liang-Chi Hsieh commented on SPARK-27836: - rand function initializes only once w

[jira] [Commented] (SPARK-27837) Running rand() in SQL with seed of column results in error (rand(col1))

2019-05-25 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16848125#comment-16848125 ] Liang-Chi Hsieh commented on SPARK-27837: - Please see the analysis exception: In

[jira] [Commented] (SPARK-27837) Running rand() in SQL with seed of column results in error (rand(col1))

2019-05-27 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16848752#comment-16848752 ] Liang-Chi Hsieh commented on SPARK-27837: - I don't see it makes sense. I checked

[jira] [Commented] (SPARK-27855) Union failed between 2 datasets of the same type converted from different dataframes

2019-05-27 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849016#comment-16849016 ] Liang-Chi Hsieh commented on SPARK-27855: - If you notice, the printed schema of

[jira] [Commented] (SPARK-27837) Running rand() in SQL with seed of column results in error (rand(col1))

2019-05-28 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849836#comment-16849836 ] Liang-Chi Hsieh commented on SPARK-27837: - Ah, I see. MySQL disallows nonconstan

[jira] [Commented] (SPARK-27837) Running rand() in SQL with seed of column results in error (rand(col1))

2019-05-28 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849871#comment-16849871 ] Liang-Chi Hsieh commented on SPARK-27837: - Btw, I think this is not a bug but li

[jira] [Resolved] (SPARK-27832) Don't decompress and create column batch when the task is completed

2019-05-29 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-27832. - Resolution: Won't Fix > Don't decompress and create column batch when the task is comple

[jira] [Commented] (SPARK-27873) Csv reader, adding a corrupt record column causes error if enforceSchema=false

2019-05-30 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16851948#comment-16851948 ] Liang-Chi Hsieh commented on SPARK-27873: - I guess what Marcin meant is: {code}

[jira] [Commented] (SPARK-27873) Csv reader, adding a corrupt record column causes error if enforceSchema=false

2019-05-30 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16851952#comment-16851952 ] Liang-Chi Hsieh commented on SPARK-27873: - I can prepare a PR if Marcin or Hyukj

[jira] [Commented] (SPARK-27798) from_avro can modify variables in other rows in local mode

2019-06-04 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856362#comment-16856362 ] Liang-Chi Hsieh commented on SPARK-27798: - Is anyone working one this? If none,

[jira] [Commented] (SPARK-27913) Spark SQL's native ORC reader implements its own schema evolution

2019-06-09 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16859450#comment-16859450 ] Liang-Chi Hsieh commented on SPARK-27913: - But seems the above reproducible exam

[jira] [Commented] (SPARK-27966) input_file_name empty when listing files in parallel

2019-06-09 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16859451#comment-16859451 ] Liang-Chi Hsieh commented on SPARK-27966: - Can you show the output of explaining

[jira] [Created] (SPARK-27984) Jenkins job spark-master-package fails due to invalid gpg option

2019-06-09 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-27984: --- Summary: Jenkins job spark-master-package fails due to invalid gpg option Key: SPARK-27984 URL: https://issues.apache.org/jira/browse/SPARK-27984 Project: Spark

[jira] [Updated] (SPARK-27984) Jenkins job spark-master-package fails due to invalid gpg option

2019-06-09 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-27984: Description: I noticed the failures on Jenkins job {{spark-master-package}}: [https://amp

[jira] [Commented] (SPARK-28009) PipedRDD: Block not locked for reading failure

2019-06-12 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16862180#comment-16862180 ] Liang-Chi Hsieh commented on SPARK-28009: - I think this looks like duplicate to

[jira] [Created] (SPARK-28031) Improve or remove doctest on over function of Column

2019-06-12 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-28031: --- Summary: Improve or remove doctest on over function of Column Key: SPARK-28031 URL: https://issues.apache.org/jira/browse/SPARK-28031 Project: Spark Is

[jira] [Commented] (SPARK-27966) input_file_name empty when listing files in parallel

2019-06-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16863030#comment-16863030 ] Liang-Chi Hsieh commented on SPARK-27966: - I can't see where input_file_name is,

[jira] [Commented] (SPARK-28006) User-defined grouped transform pandas_udf for window operations

2019-06-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16863180#comment-16863180 ] Liang-Chi Hsieh commented on SPARK-28006: - I'm curious about two questions: Can

[jira] [Commented] (SPARK-28043) Reading json with duplicate columns drops the first column value

2019-06-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16863680#comment-16863680 ] Liang-Chi Hsieh commented on SPARK-28043: - I tried to look around that, like ht

[jira] [Commented] (SPARK-28043) Reading json with duplicate columns drops the first column value

2019-06-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16864107#comment-16864107 ] Liang-Chi Hsieh commented on SPARK-28043: - To make duplicate JSON keys work, I t

[jira] [Commented] (SPARK-28054) Unable to insert partitioned table dynamically when partition name is upper case

2019-06-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16864189#comment-16864189 ] Liang-Chi Hsieh commented on SPARK-28054: - Is this query working on Hive? > Una

[jira] [Commented] (SPARK-28054) Unable to insert partitioned table dynamically when partition name is upper case

2019-06-16 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16865006#comment-16865006 ] Liang-Chi Hsieh commented on SPARK-28054: - I tested on Hive, the query works. Bt

[jira] [Commented] (SPARK-28058) Reading csv with DROPMALFORMED sometimes doesn't drop malformed records

2019-06-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16865650#comment-16865650 ] Liang-Chi Hsieh commented on SPARK-28058: - This is due to CSV parser column prun

[jira] [Commented] (SPARK-28058) Reading csv with DROPMALFORMED sometimes doesn't drop malformed records

2019-06-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16865664#comment-16865664 ] Liang-Chi Hsieh commented on SPARK-28058: - Although this isn't a bug, I think it

[jira] [Created] (SPARK-28082) Add a note to DROPMALFORMED mode of CSV for column pruning

2019-06-17 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-28082: --- Summary: Add a note to DROPMALFORMED mode of CSV for column pruning Key: SPARK-28082 URL: https://issues.apache.org/jira/browse/SPARK-28082 Project: Spark

[jira] [Commented] (SPARK-28058) Reading csv with DROPMALFORMED sometimes doesn't drop malformed records

2019-06-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16865695#comment-16865695 ] Liang-Chi Hsieh commented on SPARK-28058: - [~stwhit] Thanks for letting us know

<    4   5   6   7   8   9   10   11   12   13   >