[jira] [Commented] (SPARK-24884) Implement regexp_extract_all
[ https://issues.apache.org/jira/browse/SPARK-24884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566713#comment-16566713 ] xueyu commented on SPARK-24884: --- I'd like to work on this issue, could you please assign it to me, thanks, [~zsxwing][~jiangxb] > Implement regexp_extract_all > > > Key: SPARK-24884 > URL: https://issues.apache.org/jira/browse/SPARK-24884 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.1 >Reporter: Nick Nicolini >Priority: Major > > I've recently hit many cases of regexp parsing where we need to match on > something that is always arbitrary in length; for example, a text block that > looks something like: > {code:java} > AAA:WORDS| > BBB:TEXT| > MSG:ASDF| > MSG:QWER| > ... > MSG:ZXCV|{code} > Where I need to pull out all values between "MSG:" and "|", which can occur > in each instance between 1 and n times. I cannot reliably use the existing > {{regexp_extract}} method since the number of occurrences is always > arbitrary, and while I can write a UDF to handle this it'd be great if this > was supported natively in Spark. > Perhaps we can implement something like {{regexp_extract_all}} as > [Presto|https://prestodb.io/docs/current/functions/regexp.html] and > [Pig|https://pig.apache.org/docs/latest/api/org/apache/pig/builtin/REGEX_EXTRACT_ALL.html] > have? > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24639) Add three configs in the doc
[ https://issues.apache.org/jira/browse/SPARK-24639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xueyu updated SPARK-24639: -- External issue URL: https://github.com/apache/spark/pull/21624 > Add three configs in the doc > > > Key: SPARK-24639 > URL: https://issues.apache.org/jira/browse/SPARK-24639 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 2.3.1 >Reporter: xueyu >Priority: Trivial > > Add some missing configs mentioned in spark-24560, which are > spark.python.task.killTimeout, spark.worker.driverTerminateTimeout and > spark.ui.consoleProgress.update.interval -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24639) Add three configs in the doc
xueyu created SPARK-24639: - Summary: Add three configs in the doc Key: SPARK-24639 URL: https://issues.apache.org/jira/browse/SPARK-24639 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 2.3.1 Reporter: xueyu Add some missing configs mentioned in spark-24560, which are spark.python.task.killTimeout, spark.worker.driverTerminateTimeout and spark.ui.consoleProgress.update.interval -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24566) spark.storage.blockManagerSlaveTimeoutMs default config
[ https://issues.apache.org/jira/browse/SPARK-24566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xueyu updated SPARK-24566: -- External issue URL: https://github.com/apache/spark/pull/21575 > spark.storage.blockManagerSlaveTimeoutMs default config > --- > > Key: SPARK-24566 > URL: https://issues.apache.org/jira/browse/SPARK-24566 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: xueyu >Priority: Major > > As configuration doc said, use "spark.network.timeout" replacing > "spark.storage.blockManagerSlaveTimeoutMs" when it is not configured. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24566) spark.storage.blockManagerSlaveTimeoutMs default config
xueyu created SPARK-24566: - Summary: spark.storage.blockManagerSlaveTimeoutMs default config Key: SPARK-24566 URL: https://issues.apache.org/jira/browse/SPARK-24566 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.3.1 Reporter: xueyu As configuration doc said, use "spark.network.timeout" replacing "spark.storage.blockManagerSlaveTimeoutMs" when it is not configured. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24560) Fix some getTimeAsMs as getTimeAsSeconds
[ https://issues.apache.org/jira/browse/SPARK-24560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xueyu updated SPARK-24560: -- Docs Text: (was: There are some places using "getTimeAsMs" rather than "getTimeAsSeconds". This will return a wrong value when the user specifies a value without a time unit.) > Fix some getTimeAsMs as getTimeAsSeconds > > > Key: SPARK-24560 > URL: https://issues.apache.org/jira/browse/SPARK-24560 > Project: Spark > Issue Type: Improvement > Components: Mesos, Spark Core >Affects Versions: 2.3.1 >Reporter: xueyu >Priority: Major > > There are some places using "getTimeAsMs" rather than "getTimeAsSeconds". > This will return a wrong value when the user specifies a value without a time > unit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24560) Fix some getTimeAsMs as getTimeAsSeconds
[ https://issues.apache.org/jira/browse/SPARK-24560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xueyu updated SPARK-24560: -- Description: There are some places using "getTimeAsMs" rather than "getTimeAsSeconds". This will return a wrong value when the user specifies a value without a time unit. > Fix some getTimeAsMs as getTimeAsSeconds > > > Key: SPARK-24560 > URL: https://issues.apache.org/jira/browse/SPARK-24560 > Project: Spark > Issue Type: Improvement > Components: Mesos, Spark Core >Affects Versions: 2.3.1 >Reporter: xueyu >Priority: Major > > There are some places using "getTimeAsMs" rather than "getTimeAsSeconds". > This will return a wrong value when the user specifies a value without a time > unit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24560) Fix some getTimeAsMs as getTimeAsSeconds
[ https://issues.apache.org/jira/browse/SPARK-24560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xueyu updated SPARK-24560: -- External issue URL: https://github.com/apache/spark/pull/21567 > Fix some getTimeAsMs as getTimeAsSeconds > > > Key: SPARK-24560 > URL: https://issues.apache.org/jira/browse/SPARK-24560 > Project: Spark > Issue Type: Improvement > Components: Mesos, Spark Core >Affects Versions: 2.3.1 >Reporter: xueyu >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24560) Fix some getTimeAsMs as getTimeAsSeconds
xueyu created SPARK-24560: - Summary: Fix some getTimeAsMs as getTimeAsSeconds Key: SPARK-24560 URL: https://issues.apache.org/jira/browse/SPARK-24560 Project: Spark Issue Type: Improvement Components: Mesos, Spark Core Affects Versions: 2.3.1 Reporter: xueyu -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-24455) fix typo in TaskSchedulerImpl's comments
[ https://issues.apache.org/jira/browse/SPARK-24455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xueyu closed SPARK-24455. - > fix typo in TaskSchedulerImpl's comments > > > Key: SPARK-24455 > URL: https://issues.apache.org/jira/browse/SPARK-24455 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: xueyu >Assignee: xueyu >Priority: Trivial > Fix For: 2.3.2, 2.4.0 > > > fix the method name in TaskSchedulerImpl.scala 's comments -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24455) fix typo in TaskSchedulerImpl's comments
[ https://issues.apache.org/jira/browse/SPARK-24455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xueyu updated SPARK-24455: -- External issue URL: https://github.com/apache/spark/pull/21485 > fix typo in TaskSchedulerImpl's comments > > > Key: SPARK-24455 > URL: https://issues.apache.org/jira/browse/SPARK-24455 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: xueyu >Priority: Trivial > > fix the method name in TaskSchedulerImpl.scala 's comments -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24455) fix typo in TaskSchedulerImpl's comments
xueyu created SPARK-24455: - Summary: fix typo in TaskSchedulerImpl's comments Key: SPARK-24455 URL: https://issues.apache.org/jira/browse/SPARK-24455 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.3.0 Reporter: xueyu fix the method name in TaskSchedulerImpl.scala 's comments -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org